Elsevier - Mander

Just print it to a PDF printer.
This feels like it should be a browser plugin that automatically anonymizes anything you download.
I feel like this will cause quality degradation, like repeatedly re-compressing a jpeg. Relevant xkcd
Digital Data

xkcd
I feel like it would be negligible degradation for this purpose. Still might not anonymize whomever shares it though, could be watermarked with the same Metadata (en.m.wikipedia.org/…/Machine_Identification_Code) without being noticeable to the naked eye
Machine Identification Code - Wikipedia

You can ask ChatGPT to spit out the latex code
That’s not how PDF works at all.
See my reply to another comment
You’re still wrong. the only place where it could cause quality loss if embedded bitmap images are compressed with lower quality settings (which you can adjust). PDF is a vector format, i.e. a mathematical description of what is to be rendered on screen. It was explicitly designed to be scalable, transmittable and rendered on a wide variety of devices without quality loss.

No point discussing this if neither of us is going to prove it one way or the other.

Bitmaps are actually a key part of what I was thinking about, so you agree with me there it seems. There’s also the issue of using the wrong paper size. .IIRC Windows usually defaults to Letter for printing even in places where A4 is the only common size and no one has heard of Letter, and most people don’t realise their prints are cropped/resized. This would still apply when printing to PDF.

My point is that all these things can be controlled in the settings of your PDF printer driver. So it’s not entirely straightforward but definitely doable.
Why would it cause degradation? You’re not recompressing anything, you’re taking the visible content and writing it to a new PDF file.

You’re pushing it through one system that converts a PDF file into printer instructions, and then through another system that converts printer instructions into a PDF file. Each step probably has to make adjustments with the data it’s pushing through.

Without looking deeply into the systems involved, I have to assume it’s not a lossless process.

You should maybe look a bit more into it. How do you think commercial printers or even hobbyists maintain fidelity in their images? Most images pass through multiple programs during the printing process and still maintain the quality. It’s not just copy/paste.

They maintain a high quality but not lossless.

As a trivial example, if you use the wrong paper size (like Letter instead of A4) then it might crop parts of the page or add borders or resize everything. Again I’ll admit, in 99% of cases it doesn’t matter, but it might matter if, say, an embedded picture was meant to be exactly to scale.

Lossless is the default for print output.
My friend, I worked in commercial printing for 2 decades. You’re still making assumptions that are wrong. There are ways to transfer files that are lossless and even ways to improve and upscale artwork. Why do you care so much about this?
“There are ways” ≠ this is what happens by default when done by the average user
Magnum PI over here hittin em up with the facts.

Those printer instructions are called Postscript and they’re the basis of PDF.

You are thinking that the printing process will rasterize the PDF and then essentially OCR/vector map it back. It’s (usually) not that complicated.

Unless of course you print everything and then scan it again, like this guy probably does.

I don’t understand the “that’s no how PDFs work” criticism.

Removing data from the original file is the whole point of the exercise! Of course unique tokens can be hidden in plain sight in images, letter spacing, etc. If we want to make sure to remove that we need to degrade the quality of the PDF so that this information is lost in said lossy conversion.

Yea, academics need to just shut the publication system down. The more they keep pandering to it the more they look like fools.

I feel like most of the academia in the research side would be happy to see it collapse, but the current system is too deeply tied in the money for any quick change

I worked in academia for almost a decade and never met a researcher who wouldn’t openly support sci-hub (well, some warned their students that it was illegal to type these spesific search terms and click on the wrong link downloading the pdf for free)

One lecturer actually had notes on their slides for the differences between the latest version and the one before it of the course book, since the latest one wasn’t available for free anywhere but they wanted to use couple chapters from the new book (they scanned and distributed relevant parts themself)
So you’re saying the problem is capitalism…

Yep. But that is all a part of the problem. If academics can’t organise themselves enough to have some influence over something which is basically owned and run them already (they write the papers and then review the papers and then are the ones reading and citing the papers and caring the most about the quality and popularity of the papers) … then they can’t be trusted to ensure the quality of their practice and institutions going forward, especially under the ever increasing encroachment of capitalistic forces.

Modern day academics are damn well lucky that they inherited a system and culture that developed some old aristocratic ideals into a set of conventions and practices!

Tbh they already do everything they can, if you ever need a paper, e-mail the author and they’ll most likely send you the “last version” before publication they still hold the rights to distribute

It’s chicken/egg or “you first” problem.

You spend on your work. You probably have loans. Your income is pitiful. And this is the structural thing that gets you out. Now someone says “hey take a risk, don’t do it and break the system.”

Well…you first 🤷‍♂️

There are a couple things we can do:

  • decline to review for the big journals. why give them free labor? Do academic service in other ways.
  • if you’re organizing a workshop or conference, put the papers online for free. If you’re just participating and not organizing, then suggest they put the papers online for free. Here’s an example: aclanthology.org If that’s too time-consuming, use: arxiv.org
ACL Anthology

Fully agree but I can tell you about point 1 that there enough gullible scientists in the world that see nothing wrong with the current system.

They will gadly pick up free review when Nature comes knocking, since its “such an honour” for such a reputable paper.

Such a reputable paper that’s no doubt accepted dozens of ChatGPT papers by now. Wow, how prestigious!
Something else we can do: regulate. Like every other corrupt industry in the history of this country, we need the force of law to fix it–and for pretty much all the same reasons. People worked at Triangle Shirtwaist because they had to, not because they thought it was a great place to work.
more like the only way to float, not just move up. good luck getting grants without papers in this scum of the Earth publishers
100% ppl need stop thinking big changes can be made “by individuals”, this kind of stuff needs regulation and state alternatives or is impossible to break as an average worker.
Exactly. Asking some grad student to take on these ancient, corrupt publishing systems is ridiculous
applied for a grant last month, now to finalize grant you need to publish things in open access format. (EU country; there’s a push for all publicly funded research to be open access, with it being a requirement from year ??? on, not sure when, but soon) there’s some special funding set aside just for open access fees, which is still rotten because these leeches still stand to profit. then, if you miss that, then there’s an agreement where my uni pays a selection of publishers to let in certain number of articles per year open access, which is basically the same thing but with different source of funding (not from grant, but straight from ministry)
Funding agencies have huge power here; demanding that research be published in OA journals is perhaps a good start (with limits on $ spent publishing, perhaps).

This is probably the avenue to shut this down. If funding is contingent on making the publication freely available to download, and that comes from a major government funding source, then this whole scam could die essentially overnight.

That would need to somehow get enough political support to pass muster in the first place and pass the inevitable legal challenge that follows, too. So, really, this is just another example of regulatory capture ruining everything.

i hear you, but this leaves this massive gaping hole very quickly filled by predatory journals

the better solution would be journals created and maintained by universities or other institutions with national (or international, like from EU) funding

I’m sympathetic, but to a limit.

There are a lot of academics out there with a good amount of clout and who are relatively safe. I don’t think I’ve heard of anything remotely worthy on these topics from any researcher with clout, publicly at least. Even privately (I used to be in academia), my feeling was most don’t even know how to think and talk about it, in large part because I don’t think they do think and talk about it all.

And that’s because most academics are frankly shit at thinking and engaging on collective and systematic issues. Many just do not want to, and instead want to embrace the whole “I live and work in an ideal white tower disconnected from society because what I do is bigger than society”. Many get their dopamine kicks from the publication system and don’t think about how that’s not a good thing. Seriously, they don’t deserve as much sympathy as you might think … academia can be a surprisingly childish place. That the publication system came to be at all is proof of that frankly, where they were all duped by someone feeding them ego-dopamine hits. It’s honestly kinda sad.

I’m sympathetic but to a limit

That’s all I’m saying 🤷‍♂️

As someone who’s not too familiar with the bureaucracy of academia I have to ask: Can’t the authors just upload all their studies to ResearchGate or some other website if they want? I know that they often share it privately with others when they request a paper, so can they post it publicly too?

Publishing comes with IP laws and copyright. For example, open access articles should be easy to upload without concern. “Private” articles being republished somewhere without license is “piracy”, and ResearchGate did get in trouble for it. It’s complicated. www.chemistryworld.com/news/…/4018095.article

Pre-prints are a different story.

Publishers settle copyright infringement lawsuit with ResearchGate

Elsevier and the American Chemical Society resolve ongoing claims with an automated check on papers' copyright status

Chemistry World
That can easily be fixed at the source: as the author of the paper, you can just license it to be open if you want.
you’re risking copyright nastygrams, but people still do it, and even upload preprints and full articles to scihub, because fuck that and it’s maybe free citations
The problems are wider than that. Besides, relying “individuals just doing the right thing and going a little further to do so” is, IMO, a trap. Fix the system instead. The little thing everyone can do is think about the system and realise it needs fixing.
Imagine there must be a payoff for them ? Wider distribution ?
Nope, you just can’t get a job unless you suck it up and publish in these journals, because they’re already famous. And established profs use their cosy relationships with editors to gatekeep and stifle competition for their funding :(

I kind of assume this with any digital media. Games, music, ebooks, stock videos, whatever - embedding a tiny unique ID is very easy and can allow publishers to track down leakers/pirates.

Honestly, even though as a consumer I don’t like it, I don’t mind it that much. Doesn’t seem right to take the extreme position of “publishers should not be allowed to have ANY way of finding out who is leaking things”. There needs to be a balance.

Online phone-home DRM is a huge fuck no, but a benign little piece of metadata that doesn’t interact with anything and can’t be used to spy on me? Whatever, I can accept it.

Definitely better than some of the DRM-riddled proprietary eBook formats.
Plus, if you have two people with legit access, you can pretty easily figure out what’s going on and defeat it.
It would be pretty trivial for a script to automatically detect and delete tags like this, I would think. Diff two versions of the file and swap all diff characters to any non-display character.
I object because my public funds were used to pay for most of these papers. Publishers shouldn’t behave as if they own it.

That’s true. I was actually thinking/talking about this practice in general, not specifically with regards to Elsevier.

I definitely agree that scientific journals as they are today are unacceptable.

It can be used to spy on any decent scientist who will send papers his/hers/theirs institution has access to, but their friend doesn’t. Much fun. As a reminder, publishers don’t pay reviewers, don’t pay for additional research, editing is typically minimal, and research is funded publicly, so what they own is social capital of owning big journal

It can be used to spy on any decent scientist who will send papers his/hers/theirs institution has access to, but their friend doesn’t.

By “spy” I mean things like: know how many times I’ve read the PDF, when I’ve opened it, which parts of it I’ve read most, what program I used to open the PDF, how many copies of the PDF I’ve made, how many people I’ve emailed it to, etc. etc. etc.

This technique can do none of that. The only thing it can do is: if someone uploads the PDF to a mass sharing network, and an employee of the publisher downloads it from that mass sharing network and compares this metadata with the internal database, then they can see which of their users uploaded it and when they originally downloaded the PDF. It tells them nothing about how it got there. Maybe the original user shared it with 20 of their colleagues (a legitimate use of a downloaded PDF), and one of those colleagues uploaded that file to the mass sharing site without telling the original downloader. It doesn’t prove one way or the other. It’s an extremely small amount of information that’s only useful for catching systemic uploaders, e.g. a single user who has uploaded hundreds or thousands of PDFs that they downloaded from the publisher using the same account.

And a savvy user can always strip that metadata out.

As a reminder, …

All true, and fucked up, but it’s not related to what I was talking about. I was talking about the general use of this technique.

Doesn’t seem right to take the extreme position of “publishers should not be allowed to have ANY way of finding out who is leaking things”. There needs to be a balance.

Nah, fuck that; that’s both the opposite of an extreme position and is exactly the one we should take!

Copyright itself is a privilege and only exists in the first place “to promote the progress of science and the useful arts.” Any entity that doesn’t respect that purpose doesn’t deserve to benefit from it at all.