Embedded compiler developer. I like debugging stories and computing history.
Posts are removed after a month. Nothing lasts forever.
And cats.
| Homepage | https://wozniak.ca |
| Journal | https://wozniak.ca/journal/ |
| [email protected] |
Embedded compiler developer. I like debugging stories and computing history.
Posts are removed after a month. Nothing lasts forever.
And cats.
| Homepage | https://wozniak.ca |
| Journal | https://wozniak.ca/journal/ |
| [email protected] |
"If we stopped surveillance advertising, most modern internet companies would collapse"
Good! They're laying off humans at a record pace in order to juice their stock prices and further facilitate this bubble. Take 'em all out.
So, fuck 300 years of "naturalists" who believed nature is always red in tooth and claw or whatever, because I just saw a crow tell a blue jay to get a peanut for him and the jay did just that.
I'm out tossing peanuts to the jays, and there are some new-ish crows around who aren't quite comfortable with me, though one did stay on the roof a second or two longer than usual. But one crow clearly isn't ready for that.
The jays are a lot more brave, and I've sometimes worried they're hogging all the peanuts and the crows might stop coming. But here is the sequence of events:
1) I hear a crow cawing from the roof overhead of me, very close.
2) I toss a couple of peanuts and land them at the edge of the roof without them rolling off.
3) Jay comes, as usual, grabs the peanut but flies to the fence across the parking lot and leaves the peanut there where the crow sweeps down from on high to get hir nut.
100% this was intentional cooperation. The jay didn't drop the peanut by accident: after watching them for 6 months I know how they handle things with their beaks and errors do happen but rarely, and unless placed just right on the top of the fence, a peanut would bounce or roll off if dropped.
Anyway, that was kinda neat to see. Well done, all involved.
RE: https://infosec.exchange/@SecureOwl/116404712213309413
This thread is WILD. One more proof that data protection on the larger 'net is an absolute joke.
Why I refuse to use Machine Translation
In the last few years, there has been a lot of talk about how artificial intelligence (actually: commercial chatbots and LLMs) will be transforming our way of working – how it will make some jobs more efficient, and others obsolete. There are also concerns that such systems do not live up to the hype – though this has not stopped CEO and their consultants from pushing them into the workplace, in the hopes of drastically reducing their work force and labor costs even though they cannot substitute for their workers’ process knowledge.
I translate old German folk tales into English, and translation work is already heavily automated these days due to the sheer amount of material that needs to be translated. Thus, it is unsurprising that many people have asked me whether I use machine translation for my work – usually with the assumption that this would save me time.
In this essay, I am going to tell you why I won’t use AI systems for my translation work. I could talk about the ethical concerns – how the work of others is used to train LLM systems without compensation while charging for their output, or how they consume massive amounts of electricity and other resources while our planet and its ecosystems are already on the precipice, or how they are used to build up the mother of all investment bubbles.
I could also add some personal grievances. For instance, in my day job as a bid manager, I also have to price server systems for our customers, and when I recently noticed that a simple 16 GB DDR5 RAM module had a purchase price of €1,600, I realized that something is going very wrong indeed. Furthermore, anonymous bot networks are constantly scraping my websites for LLM training data, forcing me to upgrade my website hosting plan twice last fall to keep outages at a tolerable level.
But since others have elaborated on the ethical concerns in much more detail than I ever could, I won’t be talking about these further. Instead, I will be discussing the practical reasons why machine translation does not fit into my working processes when translating German folk tales.
Reading the Fraktur Typeset
The first challenge for machine translation is parsing the source material. For copyright reasons, I exclusively use public domain works – German folk tale collections which were largely published in the 19th century. And the vast majority of these works were not printed with the modern Antiqua letters, but the old German Fraktur typeset. Here is a reasonably “clean” example of a story I have translated (the source page is here):
Usually, texts that are converted into a new language by machine translation are already in a machine-readable format – but these old digital scans are not. Thus, before I could use machine translations for these texts, I would need to convert them into a machine-readable format. While OCR (“Optical Character Recognition”) tools exist that can handle Fraktur typesets, the output would require additional effort for proofreading, especially since the input data is highly variable in its quality.
Thus, in contrast to the original premise, machine translation would actually increase my workload even before I got to the actual translation step.
Translating Old Words and Phrases
LLM systems are largely trained on the most commonly available modern texts (such as Reddit posts). 19th century German folk tales are not “modern texts”. They are rife with old words and phrases that were only used in some small geographical area and are no longer in modern use. Would a standard machine translation system (i.e., one trained on Reddit) come up with a decent translation for “Bindelbaum” – to pick just one example that stuck in my mind? Especially considering that the old texts that could provide some context were not in a machine-readable format, and thus of limited use for training the LLMs?
Perhaps they could, and perhaps they couldn’t. However, “maybe this is an accurate translation” is not good enough for my purposes, and indeed, it is not sufficient for any professional translator. If I provide a translation for certain old words and prices, I need to be as sure as possible that this translation is accurate – and if I am uncertain, I need to explain that to my readers as well.
Thus, I would have to double-check every machine-translated text I work with with my own research – which, again, would not save me any time. And if I am doing all the research anyway, I might as well skip the machine translation and do it all by myself in the first place.
Providing Context
But truth to be told, the actual translation is the easiest part of my work. German folk tales were told in a specific time and a specific cultural context. The original audience for these tales (mostly 19th century German peasants) were deeply familiar with this context.
A modern audience will usually not be familiar with this context. Many aspects of these folk tales are hard to grasp even for modern Germans – so what chance does an international audience have?
This is why one of my most important tasks as a translator is to explain this context. This is why my books have many hundreds of footnotes, and explanatory commentary following each tale. While I am not primarily writing my books as scientific treatises, I have spent enough years in academia that I have views on providing inaccurate information. Sure, mistakes can and will happen. But allowing errors to proliferate in my manuscripts because I was outsourcing the most critical aspects of my research to LLM systems would be a gross violation of ethical standards (not that this seems to stop a lot of LLM users…).
So I will do my research the proper way. And with each paragraph I translate, I contemplate its hidden meanings and context, and how to convey it to my readers. But if I don’t do the first step of the work myself – that is, translating and thinking about every single sentence – then I have already lost my first opportunity to truly understand the story.
Preserving Unique Voices
German folk tales were told by tens of thousands of people, each of whom had their own unique way of telling their stories. And later on, they were collected by hundreds of folklore researchers, each of whom had their own unique editorial approach. That adds up to a lot of unique voices.
However, LLMs are well-known to generate texts that trend towards the average. They have been trained on vast archives of human-written texts, and their task is to create texts that are “most likely” to fit the prompt – the common denominator, if you will. Worse, it will be the most common denominator of Reddit users and the like. The only LLM system that might even come even close to capturing the unique voices of the original texts would be one that has been trained exclusively on their translations – including my translations.
While I want people to be entertained by my translations, these tales are also part of my country’s cultural heritage. Not even trying to capture the unique voices of these long-ago storytellers and instead replacing them with the generic output of LLMs feels hugely disrespectful.
They deserve better, and my audience deserves better as well.
#LLM #MachineTranslation #TranslationThere's a lot of legislation going around "for the children" right now that will have the impact of making us all less safe and free, making computers more difficult to use, and generally making life worse for everyone.
Lots of folks are talking about this as if it is the only intended outcome of these laws, but it isn't
Part of these laws are also about children.
About controlling what kids can read, who they can talk to, what they can watch, and how they can interact with one another, making it harder for kids to use digital resources to learn about themselves and the world while at the same time making it easier for abusive and controlling parents to abuse and control their kids.
I found it - and, of course, it's not by a woman at all, and not related to a book. My brain is definitely *not* a computer.
https://aeon.co/essays/your-brain-does-not-process-information-and-it-is-not-a-computer
i guess fundamentally i'm not convinced that whatever amount useful functionality has emerged from generative AI (and i think the jury's still out on whether that amount is non-zero) couldn't have been achieved in more cost-effective, equitable and sustainable ways by having taken the money and resources we've shoveled into gen AI and investing them elsewhere
("sustainable" in the sense of environmentally sustainable but also "can we keep this in a working state" sustainable)
Reminder if you support us to sign and share if you can! ✍🏻
> Open Letter: Stop the Uncritical Adoption of AI Technologies in Academia
We're coming up to a year since I sent this letter to my executive board and I will resend it on or close to the year mark with even more international and national support against AI.
Thank you!