Recently I had the idea to mirror all the uploads (releases and screenshot images) on the #Luanti ContentDB onto the #InternetArchive, putting them all into an item that gets incrementally updated. It's a quite big dataset (72 GiB currently) so having it archived there and accessible on a per-file basis would be nice.

I began uploading to it yesterday and let it run for the day, got a significant chunk of the files uploaded but now when I checked back today the item had been automatically deleted for containing malware.

...guess we're restarting it.

To be clear, the detections were false positives for the Windows binary of a websocket proxy in arclib (https://content.luanti.org/packages/Warr1024/arclib/). It's not a case of actual malware having been uploaded onto ContentDB.

I understand that IA needs to have automated processes to check what users are uploading, but having an entire item be nuked because of 15 files in a big dataset of otherwise harmless files having four detections is mildly annoying. Delete the individual files, or at least send a notification message instead of silently deleting the whole item!

Thankfully I know to check for this by now when uploading stuff to archive.org. You can still go into the item history for the item (archive.org/history/[item-name]) and get a list of VirusTotal URLs that were associated with the detected files. Isolate the files in question, and then upload the rest into a new item hoping nothing new also gets detected.

Archipelago Library

Archipelago multi-world randomizer integration library

ContentDB

I've ran into the same issue when trying to archive old Android games. Some middleware used for ad-based reward systems and similar things (think Tapjoy and the like) may get detected as adware or PUP by some vendors. That's honestly a correct assessment, but when a certain threshold of detections is exceeded, now you cannot upload a specific version of an Android game to IA.

(Also worth noting that it's likely some of those old versions of middleware are entirely defanged by now due to servers being shut down...)

Of course, if you're uploading 300 versions of the same app and two are detected, that's not much of a loss in terms of preservation, it's just once again a headache of isolating the detected files and reuploading it entirely.

@ROllerozxa Yeah. The Contentdb outage the other day was definitely a reminder.

@Catwoman69y2k Yeah that was the motivation for trying to do this!

Just dumping the full uploads dataset may not be very useful since they have non-descriptive random filenames, but if someone were to host a mirror of the ContentDB API with just the package metadata, they could then redirect to archive.org for the release zips. Maybe slow, but would work in case of emergency.

@ROllerozxa I eill say that Im glad that Luanti (and any if the games on there) are not attached to the kind of authentication system that Microsoft uses. I was able to play all the VoxeLibre I wanted to during the ContentDB outage.