If you happen to be looking for bad redactions in a large set of data files today for some reason, there's an open source tool for that.

https://github.com/freelawproject/x-ray

GitHub - freelawproject/x-ray: A tool to detect whether a PDF has a bad redaction

A tool to detect whether a PDF has a bad redaction - freelawproject/x-ray

GitHub

@evacide interesting, however there could perhaps be edge cases it misses such as non-rectangular regions

another good tool to use would be smth that just extracts all the text in a pdf document which you can then grep against

@solonovamax @evacide ‘grep’, ‘cat’, and ‘awk’ the heroes of finding needles in haystacks.
@joelh @solonovamax pdf generally does not represent text as text. As one person said, if the name for the glyph matches what the glyph depicts, that's purely a coincidence. But there are tools for extracting text from pdfs: mutool draw -F text black_armed_joy.pdf
Then you can grep that. If you try to grep the pdf directly, you'll be disappointed.
@evacide given the sky-high levels of competence among Trump loyalists...

@floe @evacide I did wonder if they would have cocked this up. Either leaving text under black boxes, or using not 100% opaque black on images.

... I hear they have indeed cocked it up ...

@evacide PDF is a truly horrible format inside, an absolute nightmare to edit. That is why even simple redactions are so difficult. Also why it breaks screen readers.

Each page is actually a little program in a language related to but not the same as postscript, containing instructions to draw the page. Instructions in an arbitrary order decided by the program that generated the PDF, which bears no connection the reading order or layout of the document.

@Qybat @evacide

But, it's such a flexible and extensible format !
/s

@mtnrbq65 @evacide True! It's been extended eight times now, including one which was a complete redesign of the format that still maintained backwards compatibility. That's why it has so many duplicated structures within, why it has two different metadata mechanisms, and yet the basic datatypes are still based on a format once common in 1990s Mac OS computers.
@evacide i always open the pdf files in inkscape. often you can move the black bar around. somehow i didnt manage it to work in gimp. no clue why inkscape keeps the layers

Such a useful tool.

On an unrelated note. I hope its use will not be in a manner that a legal team could come after them for attempting to gain access to classified information or to expose victims.
Stay safe!

@evacide I'm glad my donations to the EFF are funding stuff like this. Didn't know their chief of cybersecurity was this cool do.