@VXShare
For models i built, what i required anything which can be disassemble.
However I genuniely dont understand the copy-right issue. Most of the "clean" applications are accessible directly from vendors. I also dont understand why not bulk-sharing known good samples with the respected research institutues. What i know is AI based models most of the time requires balanced dataset in order to learn significant features otherwise your model will be quite limited. Respected researchers(like sophos and Booz Allen Hamilton etc.) also shared this groundtruth benign dataset problem in their papers yet only solution is building one to one good connection with security companies(impossible for individuals) or have a lot of money(lol). It is exactly same issue as "should we share malware", which hunted us for years. This solved by brave individuals, but now we have groundtruth benign problem.
I have built many novel models easily passing %95 accuracy with low fpr, yet these models limited to very small subsets and cannot be generalized or verified in real life. I'm about the throw them to the trash and switch anomaly based models due to these dataset problem or else completely quit researching this topic.