| Github | https://github.com/angea |
| Github | https://github.com/corkami |
| Pronouns | he/him |
| Github | https://github.com/angea |
| Github | https://github.com/corkami |
| Pronouns | he/him |
Oh dear the entire https://www.lyonlabs.org site is offline *and* excluded from archive.org.
It's a massive archive of vintage and modern GEOS and C64 material a lot of it seemingly not found elsewhere.
To check if a file starts with MZ or GIF, just use file/libmagic.
You don't need AI or Magika for that.
TrID has a lot of heuristics, but a lot of false positives.
Magika is useful in different ways, across binary and source types, and is quite fast. But not useful against weird or adversary files.
Magika is a fast file type identifier that covers many file types, binary formats or source texts.
It's not made to detect adversarial attacks.
It's useful for different things that classic binary scanning can't do at this speed.
Magika was trained on all the file types with enough available samples.
Weird files are out of scope of Magika. It just wasn't trained on them.
It's trivial to inject some data in a file and keep it functional (w/ my tool Mitra, for example).
So take a JPG, inject a lot of JavaScript data, and ...guess what ?
Check it out: https://github.com/corkami/mitra
Of course, it's possible to create weird files that will fool Magika and other tools.
Polymocks, polyglots...
I made quite a few - check my CCC talk last year:
https://speakerdeck.com/ange/fearsome-file-formats-18374bc4-b3f2-429f-862e-2177ab4d7aae
So file contents are used to determine the file type.
To check if the file starts with '\x7FELF', 'MZ' or 'GIF', you don't need IA.
But some file formats don't start with a clear 'magic' signature at offset zero.
And what if you also want to tell the apart C++, RUST and HTML ?
No magic for source files.
To identify file types, the worst way are file extensions:
the extension is stored in the filesystem entry, not in the file content.
It can be lost, modified, variable...
Almost all file formats are known under several file extensions:
.JPG/.JPEG, .ZIP/.APK/.DOCX, .EXE/.DLL, .ELF/.SO ...