Censor, a new document redaction tool, is there!

It allows to draw black rectangles on PDF documents and to permanently remove the text and images below. Find it on @Codeberg: https://codeberg.org/censor/Censor, get it from @flathub: https://flathub.org/apps/page.codeberg.censor.Censor, or translate it on Codeberg Translate: https://translate.codeberg.org/engage/censor!

It is a free and open-source graphical user interface (GUI) for #Linux and the #GNOME desktop, and uses the #MuPDF library with its #python bindings from the #PyMuPDF module.

#censorship #redaction #PDF #Codeberg #Flatpak #Flathub

Censor

PDF Document Redaction for the GNOME Desktop

Codeberg.org

“A historic moment for Censor”

#Censor – the PDF redaction tool for the @gnome desktop – comes now with a new edit history. It allows you to undo and redo redactions using the right-click context menu or keyboard shortcuts. Also, a bug that prevented repeated saving to the same file path was fixed.

Get the new version from @flathub: https://flathub.org/apps/page.codeberg.censor.Censor, and find it on @Codeberg: https://codeberg.org/censor/Censor/releases/tag/v0.4.0

You may now talk Chinese, Dutch, English, Estonian, Finnish, French, German, Italian, and Vietnamese with Censor (thanks a lot to the translators!). If your language is missing from this list I invite you to contribute at Codeberg Translate: https://translate.codeberg.org/engage/censor

#censorship #redaction #PDF #Codeberg #Flatpak #Flathub #GNOME #python #MuPDF #PyMuPDF #Linux

Install Censor on Linux | Flathub

Redact PDF documents

“Better safe than sorry”

For release 0.5.0 of #Censor, a lot of work went into improving the security of PDF redaction.

PDF documents are tricky, and irrevocably removing elements from them is even more. With this release, before saving a redacted document, garbage is now properly collected and documents are sanitized, which means that metadata, page thumbnails, etc. are removed.

Also, vector graphics are now removed with a more strict option when they overlap with redaction rectangles. On top of that, I added redaction of PDF annotations.

The user interface was refreshed: with undo and redo buttons in the toolbar and improved document saving experience. Drawing rectangles is now indicated by a crosshair cursor.

Thanks to the translators, you may now talk also Czech with Censor!

Get it from @flathub: https://flathub.org/apps/page.codeberg.censor.Censor, or contribute on @Codeberg: https://codeberg.org/censor/Censor

#censorship #redaction #PDF #Codeberg #Flatpak #Flathub #GNOME #python #MuPDF #PyMuPDF #linux

Install Censor on Linux | Flathub

Redact PDF documents

That said and celebrated ;), there are things that #Censor is not yet well redacting.

The upstream library #MuPDF (with its #Python bindings in #PyMuPDF) supports by default only redaction of text, vector graphics and images. Testing on a variety of PDF files (thanks to #pypdf, #qpdf, #ghostscript, and their issue reporters, as well as @pdfarranger for their hint) let me discover that some vector graphics are not properly redacted and an upstream issue has been reported for that.

Also, form fields (widgets), signatures and links may be incompletely redacted.

You can find an updated list of “What is redacted? What not?” here: https://codeberg.org/censor/Censor/issues/120

#pdf #redaction #security

meta: What is redacted? What not?

> **Warning** > The following description is **not** valid for Censor until version 0.4.0. I recommend to update to [version 0.5.0](https://codeberg.org/censor/Censor/releases/tag/v0.5.0) for secure redaction. ## Elements under redaction rectangles - [x] Text: - characters are removed when ...

Codeberg.org

“Secure #redaction by design and through extensive #testing

#Censor 0.6.0 comes with many more #security improvements, motivated by extensive testing on more then 1,000 #PDF document samples. You may now redact securely also links, form fields and widgets. In rare cases, when partial image redaction fails, the more secure full image removal is used instead.

But even more important, Censor now warns you, when unsuccessful redaction is detected during postprocessing. This reduces the impact of known issues of unsecure redaction.

Polish is the 11th language you may speak with Censor. Thanks to its translators (among them, @mondstern)!

Thanks a lot also to #pypdf, #qpdf, #pikepdf, #Ghostscript, #MuPDF, #PyMuPDF, and #poppler contributors for the great resource of PDF document samples!

Find it at @flathub: https://flathub.org/apps/page.codeberg.censor.Censor and @Codeberg: https://codeberg.org/censor/Censor

#Censorship #Codeberg #Flathub #GNOME #Linux #Python

Install Censor on Linux | Flathub

Redact PDF documents

“Zoom in, zoom out, redact your points!”

#Censor v0.7.0 ships improved zooming with pinch-to-zoom gesture on touchpads and touchscreens and various security bug fixes. Find it at @flathub: https://flathub.org/apps/page.codeberg.censor.Censor and @Codeberg: https://codeberg.org/censor/Censor

You may now redact safely also #PDF documents with cropped, scaled, or rotated pages. The upstream #PyMuPDF library has some issues treating these documents. Thus, I implemented manual transformation and drawing of the redaction rectangles, and verified the proper redaction with a specially created sample document, submitted to the collection by #pypdf: https://github.com/py-pdf/sample-files/pull/36

Additionally, point- or line-like elements are now properly redacted. Sanitization during post-processing keeps now entries in form fields (note: this changes previous behavior).

Thanks to the translators, Censor is now available in 13 languages including Croatian and Russian.

#censorship #redaction #Codeberg #Flatpak #Flathub #GNOME #linux #python #MuPDF

Install Censor on Linux | Flathub

Redact PDF documents

Good news for the #ArchLinux users among you: #Censor is now available in the #Arch User Repository (#AUR): https://aur.archlinux.org/packages/censor Your feedback is welcome!

Already since version 0.3.0, Censor has been packed for #NixOS: https://search.nixos.org/packages?channel=unstable&show=censor Thanks to @pi_crew for maintaining!

#pdf #redaction #linux #packaging #PKGBUILD #maintenance

AUR (en) - censor

@mahlzahn
Once again I am begging developers to give us just ONE LINE telling us what your software actually does when announcing a new release.
#TellUsWhatYourDoftwareDoes
@pi_crew
@stib @pi_crew Haha, OK, I'll try to remember that ;)
@mahlzahn Awaiting Censor in GNOME circle :D
@M23SNEZHOK Thanks, and also for your translation contribution! I opened an issue over there ;) https://gitlab.gnome.org/Teams/Circle/-/issues/264 Maybe with GNOME 51?
New app: Censor (#264) · Issues · Teams / Circle · GitLab

App information App name: Censor Code repository page:

GitLab

@mahlzahn

Thank you so much for mention me ❤️

@flathub @Codeberg

@mahlzahn @Codeberg @flathub has this something to do with recent events ??

@mahlzahn @Codeberg @flathub

Recommended by US executive 😜

@mahlzahn @Codeberg @flathub

Why not start with black pages and allow to select regions which show the document.
Would be much more convenient for Americans.

@m_berberich
Not only for them!

@fragdenstaat and @okfde have made a nice best-of with their limited art edition
https://000000.limited/

This year's “mask edition” and the amazing “philosophy edition” from 2021 are still available!

Geschwärzte Kunst

#000000 ist die limitierte Kunstedition zur Informationsfreiheit von FragDenStaat.

#000000
@mahlzahn Should've called it DOJ or FBI ;)

@mahlzahn
Has anyone tried that and tested a bit with different PDFs and techniques to store (and possibly retrieve) data?
From what I've learned about PDFs so far I think it will be quite difficult to implement censoring that works reliably on each and every PDF.

Does it rasterize PDFs by default or will it actually find and remove the elements to be censored?

@mahlzahn I just realized that you are the author, not just a third party posting about it, sorry ;)

So yeah, I'd be curious to learn more on how you implemented the censoring and maybe the tests you have done so far.

(For example I'm thinking about stuff like form data which could be stored elsewhere, different layers, text in OCRed PDFs, real text, text converted to vectors, text in rasterzied images, alpha masks... Not an expert here but I have some ideas of what could possibly go wrong.)

@pdfarranger I think that your concerns are totally valid and it's unfortunately far beyond my expertise to judge how well the redaction is done. It uses the MuPDF library that does the magic. Probably, it is the only open source, for-Linux library currently available that allows to redact PDF documents. Poppler can't do that, yet. (https://gitlab.freedesktop.org/poppler/poppler/-/issues/1186).

While searching for possibilities, I found out that redaction annotations are part of the PDF specification since version 1.7 https://pdfa.org/why-the-need-to-redact-implies-using-pdf/

On the other hand, Artifex, the company behind MuPDF, seems to promote rasterization for high-security redactions  https://pdfa.org/presentation/high-security-pdf-redactions/
https://pdfa.org/presentation/high-security-pdf-redactions/

Redact (sanitize / censor / remove text) feature (#1186) · Issues · poppler / poppler · GitLab

Hi! This is the corresponding upstream/library feature request for what I suggested downstream in Evince: In business/organizational settings,...

GitLab

@mahlzahn I didn't know such a feature existed in an open source library! That is of course much better than implementing it just for one GUI tool.

But I am still curious on how good/deep MuPDF performs that task. Wish I had more time to poke at it myself...

@pdfarranger Me too, and experimenting with the feature already let me discover some edge case where redaction is not properly done for vector graphics below redaction rectangles (but needs to be confirmed with latest versions of MuPDF and PyMuPDF). https://codeberg.org/censor/Censor/issues/5
bug: redaction of vector graphics incomplete

## Description Paths of vector graphics below selected boxes may not be fully removed. ## Implementation Censor uses the default parameters for [applying the redaction](https://pymupdf.readthedocs.io/en/latest/page.html#Page.apply_redactions): `graphics=PDF_REDACT_LINE_ART_REMOVE_IF_TOUC...

Codeberg.org
@pdfarranger An update on this issue, I applied a wrong parameter (because of upstream documentation-source mismatch). With the upcoming 0.5.0 release it will be fixed. https://codeberg.org/censor/Censor/pulls/115 I am also experimenting with a less restrictive redaction of vector graphics, but I'd probably let the user enable this option. https://codeberg.org/censor/Censor/pulls/114 I'd appreciate your thoughts on that.
fix: redact vector graphics

Apply `pymupdf.PDF_REDACT_LINE_ART_REMOVE_IF_TOUCHED` during redaction application. It wrongly was not applied because of a [documentation issue](https://github.com/pymupdf/PyMuPDF/issues/4924). Closes #5

Codeberg.org
@pdfarranger Also, do you have a pointer to a set of complicated, diverse #PDF example files? My current test file is just very basic …
This one by pypdf looks already promising (at least for the diversity): https://github.com/py-pdf/sample-files. And https://github.com/pikepdf/pikepdf/tree/main/tests/resources by pikepdf, too!
GitHub - py-pdf/sample-files: Files which can be used to test PDF readers

Files which can be used to test PDF readers. Contribute to py-pdf/sample-files development by creating an account on GitHub.

GitHub

@mahlzahn I'm not involved in depth here either, but qpdf came to mind, see the two subdirectories for PDF files: https://github.com/qpdf/qpdf/tree/main/qpdf/qtest

I know that the author also has a "private collection" of buggy or interesting files sent by users which can not be published due to privacy concerns.
Here is an example of a rare issue which only popped up once in PDF Arranger and qpdf: https://github.com/qpdf/qpdf/issues/672

You could also look at the poppler or mupdf libraries, they are likely to have test files as well.

qpdf/qpdf/qtest at main · qpdf/qpdf

qpdf: A content-preserving PDF document transformer - qpdf/qpdf

GitHub

@mahlzahn I don't have in depth knowledge of those topics I'm afraid, so I can only give the general recommendation to err on the safe site as far as possible and keep defaults safe as well. Also I'd try to keep complexity low, I remember that there have been issues of revealing seemingly redacted information due to otherwise harmless bugs. Example: https://www.malwarebytes.com/blog/news/2023/03/google-pixel-cropped-or-edited-images-can-be-recovered

The more I search the more I find:
https://arxiv.org/pdf/2206.02285v2

Google Pixel: Cropped or edited images can be recovered

A vulnerability in the Markup tool that comes pre-installed on Pixel phones allows anyone with access to the edited image to view parts of the original.

Malwarebytes
@mahlzahn Also I'm pretty sure that there was an information leak with embedded fonts where only the characters used in that font were embedded in the PDF but obviously not removed when redacting. Can't find any of this online at the moment unfortunately.