Mastodawn

Just print it to a PDF printer.

unexposedhazard Jun 20, 2024

This feels like it should be a browser plugin that automatically anonymizes anything you download.

I feel like this will cause quality degradation, like repeatedly re-compressing a jpeg. Relevant xkcd

Digital Data

xkcd

Zorsith Jun 20, 2024

I feel like it would be negligible degradation for this purpose. Still might not anonymize whomever shares it though, could be watermarked with the same Metadata (en.m.wikipedia.org/…/Machine_Identification_Code) without being noticeable to the naked eye

Machine Identification Code - Wikipedia

onion Jun 20, 2024

You can ask ChatGPT to spit out the latex code

What

That’s not how PDF works at all.

See my reply to another comment

You’re still wrong. the only place where it could cause quality loss if embedded bitmap images are compressed with lower quality settings (which you can adjust). PDF is a vector format, i.e. a mathematical description of what is to be rendered on screen. It was explicitly designed to be scalable, transmittable and rendered on a wide variety of devices without quality loss.

No point discussing this if neither of us is going to prove it one way or the other.

Bitmaps are actually a key part of what I was thinking about, so you agree with me there it seems. There’s also the issue of using the wrong paper size. .IIRC Windows usually defaults to Letter for printing even in places where A4 is the only common size and no one has heard of Letter, and most people don’t realise their prints are cropped/resized. This would still apply when printing to PDF.

My point is that all these things can be controlled in the settings of your PDF printer driver. So it’s not entirely straightforward but definitely doable.

Passerby6497 Jun 20, 2024

Why would it cause degradation? You’re not recompressing anything, you’re taking the visible content and writing it to a new PDF file.

You’re pushing it through one system that converts a PDF file into printer instructions, and then through another system that converts printer instructions into a PDF file. Each step probably has to make adjustments with the data it’s pushing through.

Without looking deeply into the systems involved, I have to assume it’s not a lossless process.

TomSelleck Jun 20, 2024

You should maybe look a bit more into it. How do you think commercial printers or even hobbyists maintain fidelity in their images? Most images pass through multiple programs during the printing process and still maintain the quality. It’s not just copy/paste.

They maintain a high quality but not lossless.

As a trivial example, if you use the wrong paper size (like Letter instead of A4) then it might crop parts of the page or add borders or resize everything. Again I’ll admit, in 99% of cases it doesn’t matter, but it might matter if, say, an embedded picture was meant to be exactly to scale.

FellowEnt Jun 20, 2024

Lossless is the default for print output.

TomSelleck Jun 20, 2024

My friend, I worked in commercial printing for 2 decades. You’re still making assumptions that are wrong. There are ways to transfer files that are lossless and even ways to improve and upscale artwork. Why do you care so much about this?

“There are ways” ≠ this is what happens by default when done by the average user

tacosanonymous Jun 20, 2024

Magnum PI over here hittin em up with the facts.

4am Jun 20, 2024

Those printer instructions are called Postscript and they’re the basis of PDF.

You are thinking that the printing process will rasterize the PDF and then essentially OCR/vector map it back. It’s (usually) not that complicated.

Unless of course you print everything and then scan it again, like this guy probably does.

Turun Jun 20, 2024

I don’t understand the “that’s no how PDFs work” criticism.

Removing data from the original file is the whole point of the exercise! Of course unique tokens can be hidden in plain sight in images, letter spacing, etc. If we want to make sure to remove that we need to degrade the quality of the PDF so that this information is lost in said lossy conversion.

maegul (he/they)Jun 20, 2024

Yea, academics need to just shut the publication system down. The more they keep pandering to it the more they look like fools.

mayo_cider [he/him]Jun 20, 2024

I feel like most of the academia in the research side would be happy to see it collapse, but the current system is too deeply tied in the money for any quick change

I worked in academia for almost a decade and never met a researcher who wouldn’t openly support sci-hub (well, some warned their students that it was illegal to type these spesific search terms and click on the wrong link downloading the pdf for free)

mayo_cider [he/him]Jun 20, 2024

One lecturer actually had notes on their slides for the differences between the latest version and the one before it of the course book, since the latest one wasn’t available for free anywhere but they wanted to use couple chapters from the new book (they scanned and distributed relevant parts themself)

TankieTanuki [he/him]Jun 20, 2024

So you’re saying the problem is capitalism…

maegul (he/they)Jun 21, 2024

Yep. But that is all a part of the problem. If academics can’t organise themselves enough to have some influence over something which is basically owned and run them already (they write the papers and then review the papers and then are the ones reading and citing the papers and caring the most about the quality and popularity of the papers) … then they can’t be trusted to ensure the quality of their practice and institutions going forward, especially under the ever increasing encroachment of capitalistic forces.

Modern day academics are damn well lucky that they inherited a system and culture that developed some old aristocratic ideals into a set of conventions and practices!

mayo_cider [he/him]Jun 21, 2024

Tbh they already do everything they can, if you ever need a paper, e-mail the author and they’ll most likely send you the “last version” before publication they still hold the rights to distribute

It’s chicken/egg or “you first” problem.

You spend on your work. You probably have loans. Your income is pitiful. And this is the structural thing that gets you out. Now someone says “hey take a risk, don’t do it and break the system.”

Well…you first 🤷‍♂️

Rolando Jun 20, 2024

There are a couple things we can do:

decline to review for the big journals. why give them free labor? Do academic service in other ways.
if you’re organizing a workshop or conference, put the papers online for free. If you’re just participating and not organizing, then suggest they put the papers online for free. Here’s an example: aclanthology.org If that’s too time-consuming, use: arxiv.org

ACL Anthology

RBG Jun 20, 2024

Fully agree but I can tell you about point 1 that there enough gullible scientists in the world that see nothing wrong with the current system.

They will gadly pick up free review when Nature comes knocking, since its “such an honour” for such a reputable paper.

Feathercrown Jun 20, 2024

Such a reputable paper that’s no doubt accepted dozens of ChatGPT papers by now. Wow, how prestigious!

xantoxis Jun 20, 2024

Something else we can do: regulate. Like every other corrupt industry in the history of this country, we need the force of law to fix it–and for pretty much all the same reasons. People worked at Triangle Shirtwaist because they had to, not because they thought it was a great place to work.

Totally agree

Avicenna Jun 20, 2024

more like the only way to float, not just move up. good luck getting grants without papers in this scum of the Earth publishers

Too true

angrymouse Jun 20, 2024

100% ppl need stop thinking big changes can be made “by individuals”, this kind of stuff needs regulation and state alternatives or is impossible to break as an average worker.

Exactly. Asking some grad student to take on these ancient, corrupt publishing systems is ridiculous

skillissuer Jun 20, 2024

applied for a grant last month, now to finalize grant you need to publish things in open access format. (EU country; there’s a push for all publicly funded research to be open access, with it being a requirement from year ??? on, not sure when, but soon) there’s some special funding set aside just for open access fees, which is still rotten because these leeches still stand to profit. then, if you miss that, then there’s an agreement where my uni pays a selection of publishers to let in certain number of articles per year open access, which is basically the same thing but with different source of funding (not from grant, but straight from ministry)

qjkxbmwvz Jun 20, 2024

Funding agencies have huge power here; demanding that research be published in OA journals is perhaps a good start (with limits on $ spent publishing, perhaps).

blindsight Jun 20, 2024

This is probably the avenue to shut this down. If funding is contingent on making the publication freely available to download, and that comes from a major government funding source, then this whole scam could die essentially overnight.

That would need to somehow get enough political support to pass muster in the first place and pass the inevitable legal challenge that follows, too. So, really, this is just another example of regulatory capture ruining everything.

skillissuer Jun 20, 2024

i hear you, but this leaves this massive gaping hole very quickly filled by predatory journals

the better solution would be journals created and maintained by universities or other institutions with national (or international, like from EU) funding

maegul (he/they)Jun 21, 2024

I’m sympathetic, but to a limit.

There are a lot of academics out there with a good amount of clout and who are relatively safe. I don’t think I’ve heard of anything remotely worthy on these topics from any researcher with clout, publicly at least. Even privately (I used to be in academia), my feeling was most don’t even know how to think and talk about it, in large part because I don’t think they do think and talk about it all.

And that’s because most academics are frankly shit at thinking and engaging on collective and systematic issues. Many just do not want to, and instead want to embrace the whole “I live and work in an ideal white tower disconnected from society because what I do is bigger than society”. Many get their dopamine kicks from the publication system and don’t think about how that’s not a good thing. Seriously, they don’t deserve as much sympathy as you might think … academia can be a surprisingly childish place. That the publication system came to be at all is proof of that frankly, where they were all duped by someone feeding them ego-dopamine hits. It’s honestly kinda sad.

bolexforsoup Jun 21, 2024

I’m sympathetic but to a limit

That’s all I’m saying 🤷‍♂️

Ragdoll X Jun 20, 2024

As someone who’s not too familiar with the bureaucracy of academia I have to ask: Can’t the authors just upload all their studies to ResearchGate or some other website if they want? I know that they often share it privately with others when they request a paper, so can they post it publicly too?

VeganPizza69 Ⓥ Jun 20, 2024

Publishing comes with IP laws and copyright. For example, open access articles should be easy to upload without concern. “Private” articles being republished somewhere without license is “piracy”, and ResearchGate did get in trouble for it. It’s complicated. www.chemistryworld.com/news/…/4018095.article

Pre-prints are a different story.

Publishers settle copyright infringement lawsuit with ResearchGate

Elsevier and the American Chemical Society resolve ongoing claims with an automated check on papers' copyright status

Chemistry World

nintendiator Jun 21, 2024

That can easily be fixed at the source: as the author of the paper, you can just license it to be open if you want.

skillissuer Jun 20, 2024

you’re risking copyright nastygrams, but people still do it, and even upload preprints and full articles to scihub, because fuck that and it’s maybe free citations

maegul (he/they)Jun 21, 2024

The problems are wider than that. Besides, relying “individuals just doing the right thing and going a little further to do so” is, IMO, a trap. Fix the system instead. The little thing everyone can do is think about the system and realise it needs fixing.

ID411 Jun 20, 2024

Imagine there must be a payoff for them ? Wider distribution ?

porous_grey_matter Jun 20, 2024

Nope, you just can’t get a job unless you suck it up and publish in these journals, because they’re already famous. And established profs use their cosy relationships with editors to gatekeep and stifle competition for their funding :(

I kind of assume this with any digital media. Games, music, ebooks, stock videos, whatever - embedding a tiny unique ID is very easy and can allow publishers to track down leakers/pirates.

Honestly, even though as a consumer I don’t like it, I don’t mind it that much. Doesn’t seem right to take the extreme position of “publishers should not be allowed to have ANY way of finding out who is leaking things”. There needs to be a balance.

Online phone-home DRM is a huge fuck no, but a benign little piece of metadata that doesn’t interact with anything and can’t be used to spy on me? Whatever, I can accept it.

cron Jun 20, 2024

Definitely better than some of the DRM-riddled proprietary eBook formats.

aberrate_junior_beatnik (he/him)Jun 20, 2024

Plus, if you have two people with legit access, you can pretty easily figure out what’s going on and defeat it.

blindsight Jun 20, 2024

It would be pretty trivial for a script to automatically detect and delete tags like this, I would think. Diff two versions of the file and swap all diff characters to any non-display character.

henfredemars Jun 20, 2024

I object because my public funds were used to pay for most of these papers. Publishers shouldn’t behave as if they own it.

That’s true. I was actually thinking/talking about this practice in general, not specifically with regards to Elsevier.

I definitely agree that scientific journals as they are today are unacceptable.

plinky [he/him]Jun 20, 2024

It can be used to spy on any decent scientist who will send papers his/hers/theirs institution has access to, but their friend doesn’t. Much fun. As a reminder, publishers don’t pay reviewers, don’t pay for additional research, editing is typically minimal, and research is funded publicly, so what they own is social capital of owning big journal

It can be used to spy on any decent scientist who will send papers his/hers/theirs institution has access to, but their friend doesn’t.

By “spy” I mean things like: know how many times I’ve read the PDF, when I’ve opened it, which parts of it I’ve read most, what program I used to open the PDF, how many copies of the PDF I’ve made, how many people I’ve emailed it to, etc. etc. etc.

This technique can do none of that. The only thing it can do is: if someone uploads the PDF to a mass sharing network, and an employee of the publisher downloads it from that mass sharing network and compares this metadata with the internal database, then they can see which of their users uploaded it and when they originally downloaded the PDF. It tells them nothing about how it got there. Maybe the original user shared it with 20 of their colleagues (a legitimate use of a downloaded PDF), and one of those colleagues uploaded that file to the mass sharing site without telling the original downloader. It doesn’t prove one way or the other. It’s an extremely small amount of information that’s only useful for catching systemic uploaders, e.g. a single user who has uploaded hundreds or thousands of PDFs that they downloaded from the publisher using the same account.

And a savvy user can always strip that metadata out.

As a reminder, …

All true, and fucked up, but it’s not related to what I was talking about. I was talking about the general use of this technique.