Well i finally did it. I just released my test dataset for AI Evaluation. Its a simulated company, represented by 60,000 documents, the readme in the image explains it all ... If you are interested, its at https://codeberg.org/Lorenz_Systems/Company_Sim.git

#EUAIAct #DigitalSovereignty #SovereignCloud #FOSS #FLOSS #Codeberg #Forgejo #OpenSource #DataGovernance #Auditability #ForensicAI #EUTech #PrivacyByDesign #InformationRetrieval #KnowledgeManagement #DeterministicAI #EUPL

WARNING !!! .... I have identified a serious flaw in the data generation process.

Invoices and other routine documents are being created as boilerplates only. Lacking the "real" data. I am Working right now on a fix and should be able to replace the defective code, and the 30,000 document prompt list fairly soon.

My most sincere apologies for this, but unfortunately, with such a large dataset, it only became apparent when i used it to benchmark my our Knowledge Management tools this week.

It's fixed and available again ...