Mastodawn

I'm intereted in opinions... but I think I know what I should do.

I produce files that have a 64-bit ID generated by an STM32's RNG. This seems to do a reasonable job at being random (no collisions yet, in ~5k files), but I don't fully trust it, and 64-bit isn't that big. [it's likely that future hardware will have CSRNG, but the file IDs will probably remain 64-bit].

When handling this data, it's sometimes split across multiple files, which share that 64-bit ID... this and some other herustics (like timestamps) allow you to confidently associate these parts into a whole.

For some years, the files have been uploaded, processed, poked and prodded - but reside in a filesystem structure, and that was it. Accessible by me with utilities, but not accessible to the others.

Until now, the files have been "uniquely" identified and referred to by that 64-bit ID. And the uniqueness has persisted.

Now that we're building a Web UI for better accessibility, the details are being brought into a database, which has a basic unique 'id' column.

How do I refer to these files going forward?... by the hopefully-unique 64-bit ID, or by the actually-unique database ID?

I like the 64-bit ID, because it's the "source of truth", and it's familiar... but I'm not confident (enough) that they'll remain unique over time.

I like the database ID, because it's guaranteed to be unique within the system... but I don't necessarily want to depend on lookups via the database.

I've considered adding a small metadata file that identifies the file's database ID and it's other parts, so lookup can remain a "filesystem only" activity with either ID.

I've also considered using a standard 128-bit UUID that is generated and stored on the filesystem and in the database.

I don't know why this decision is being so problematic for me. 🫣

Use the 64-bit hopefully-unique ID

20%

Use the database's actually-unique ID

20%

Use a 128-bit UUID

50%

Something else

10%

Poll ended at Mar 14 at 4:18pm.

Show thread

Michael Ossmann Mar 13

@attie What are the consequences of a collision? If it happens once or twice per decade, would it cause major problems?

Show thread

Attie Grande Mar 13

@mossmann You'd potentially reference the wrong file, or a file could "go missing" ... depending on implementation, it might be anywhere from "silent" through "confused" up to potentially "not prevent injury" (or worse, and difficult to judge when this becomes "cause" rather than "not prevent")...

Show thread

Michael Ossmann

@attie Sounds like that is sufficient justification to make a change away from the "hopefully unique" ID. I like the content hash suggestion or a UUID.

Show thread

Attie Grande Mar 13

@mossmann Agreed!

I'm thinking I'll go for UUID - generated on insertion into the database, guaranteed unique by a database constraint, and then also stored next to the file in the filesystem, which provides consistency if the database content needs to be regenerated (i.e: read from file if it exists, or generate if it doesn't)

People are also somewhat familiar with UUID, where a "reasonable" hash (e.g: SHA256) is a bit more unknown and larger.