Something I’ve been debating writing about for a week, now —

Reddit’s Native Image Hosting : Architected Without Security.

Reddit introduced native image hosting 7 years and 7 days ago.

Example used in the announcement: https://i.redd.it/lasm5nl33o4x.png

There are few to no access controls on these images, and only under specific circumstances.

https://i.redd.it/lasm5nl33o4x.png

Reddit

Reddit Native Images are publicly available to anyone who knows the image name & extension, by default.

This makes it trivial for someone in a private subreddit or private chat to hand off image URLs to a third party, who can retrieve the images at their leisure, whether or not logged in to the site and whether or not they have access to the subreddit or chat hosting the image.

Prior to sometime in 2021, Reddit Native Hosted Images associated with posts which were taken down pursuant to DMCA takedown requests, Personally Identifiable Information Rule Violation, Non-Consensual Intimate Media Rule Violation, and other such evils, those images were still retrievable by anyone with the image URL, and many still are.

In 2021 going forward, any such image associated with a taken-down post was also taken offline; Images on previously actioned posts were not taken offline.

The naming convention for the images is a 12 or 13 digit BASE36-encoded number, in LSB / little-endian order, where the 7 or 8 most significant digits (edit:rightmost) are confirmed to be serialised, incrementally.

The 5 least significant digits may also variously be part of the serially incrementing segment and/or may be checksum bits for the filename, file extension, or other housekeeping.

Because BASE36 only needs 6 significant bits per digit for representation, this reduces the non-serially-deterministic bitspace of the filename to (I believe) between 7.7k and 15k possibilities — a space trivial to brute force by distributed requests,

Making it possible for a dedicated attacker to relatively easily explore the images uploaded to Reddit in a known arbitrary time frame by characterizing suitably chosen publicly faced images’ names, and requesting (fuskering) the interval.

This design flaw may be one reason behind Reddit’s recent choice to drastically throttle document requests from not-logged-in viewers - to counter & prevent systematic plundering of images posted to non-public-facing subreddits & chats.

Executive Summary:

None of the images you have uploaded to Reddit are actually private; while Reddit’s naming convention for images makes it infeasible to blindly stumble upon an image URL, the limitations of that convention may make it feasible for a determined attacker with appropriate distributed sessions to attack and exfiltrate images uploaded to the site in a suitably limited timeframe.

Your images cease being publicly retrievable when you delete the associated post/gallery.

Reddit should undertake a process of sequestering all images stored on their servers which are associated with posts removed by moderators or pursuant to DMCA takedown or Sitewide Rules Violations, regardless of when the images were uploaded to the site, to counter exploitation of those images and preserve user privacy and safety.

Reddit should also verify that images uploaded to deleted chats are no longer publicly retrievable.

@PennyOaken Reddit do something that helps its users? Lol.
@solipsistnation ironically this write up was built out of a request by someone who wanted a viable null hypothesis / alternate hypothesis for why Reddit has begun charging third parties $12,000/5M API requests, as a hypothesis exploring who / which Market Segments they’re targeting with that pricing.