Would you be interested in uploading books to ProleWiki if we opened library editing without an account?

https://lemmygrad.ml/post/10942166

Would you be interested in uploading books to ProleWiki if we opened library editing without an account? - Lemmygrad

Just gauging interest for now. It’s a question we’ve thrown around internally before but never settled. As a reminder ‘anons’ (people without an account) can edit wiki pages like on wikipedia, but they go into a moderation queue that the stewardship checks and approves/rejects accordingly. They don’t instantly appear. On the one hand, more books. On the other hand, there is a small process to it (adding the infobox and proper categorization) and the big part imo is we have to trust that people will upload the book exactly as it is in the source, which I’m not sure we can even do (and therefore won’t do). It’s easy to add a vandalizing sentence in the middle of a chapter. So would you be interested in uploading books to PW if it was available?

Suggestion for a weekly essay discussion thread.

https://lemmygrad.ml/post/10806217

ProleWiki gets a shout-out on ProlesPod

https://lemmygrad.ml/post/10398236

ProleWiki RAG MCP vs WSWS' (trots) Socialism AI

https://lemmygrad.ml/post/10103797

ProleWiki RAG MCP vs WSWS' (trots) Socialism AI - Lemmygrad

How many keywords can you stuff in a title right? I’m posting this in the prolewiki community because we’ll be discussing ProleWiki’s own in-development RAG for LLMs, but first you probably saw the WSWS, i.e. the trots, published ‘Socialism AI’. In their press release [https://www.wsws.org/en/articles/2025/12/12/gpid-d12.html], they basically self-congratulate themselves about how cool this is for the workers movement and socialism and great victory this and great victory that blahblahblah. You know how trots are. Their system is usable through ai.wsws.org [http://ai.wsws.org] or something iirc, it’s a web-interface so yes it’s cool that it comes as a package you can just run from any device and don’t have to fiddle with it, there’s also a lot of problems with it especially when coming from self-proclaimed communists. Though with how much of a joke trots are to everyone, I feel like I’m not really throwing oil into the fire with this post lol. We looked into how their system works because they give absolutely 0 indication on the technical implementation, and found several notices of copyright in the Terms of Service. They say that the output from their AI belongs to them, for example. Courts in the US have found that LLM output is public domain but sure I guess, not really my area of expertise. We’ll get into it. ## Understanding what WSWS did * WSWS did not train a model from the ground-up * WSWS did not fine-tune an existing open-source model * WSWS is not running and hosting their own model. What WSWS does (and you can find this out from just using browser tools, i.e. F12 on their homepage) is use the chatGPT and Deepseek APIs. Their pipeline is like this (as far as we can ascertain from simple browser tools): You send your prompt -> they add their own instructions to it -> LLM fetches WSWS blog articles to answer your prompt -> LLM reads blog articles -> LLM answers your prompt with the WSWS blog articles as sources. This is what we call RAG, or Retrieval-Augmented Generation. The technique is legit, I’m not disputing that, it’s just the way they did it is both inefficient and concerning. ## The Problems I have with that way of doing things We’ll get into the technical problems when I detail what the ProleWiki MCP will look like. it’s also very closed-source and obfuscated. Mind you I did not create an account (too much hassle if I want to retain my privacy on it), but you have to understand your prompt + llm output transits through OpenAI and Deepseek. There is no privacy when using this service, it goes straight to the feds with OAI. Secondly they sell paid tiers, starting at 5$ per month for 150 messages which is… absolutely nothing. Thirdly everything is closed off. They did not release any documentation on how this works or how you could run this yourself. Selling paid tiers is not a problem in itself at least for me personally. You have to break even and they do pay API access to openAI and Deepseek (though Deepseek is very cheap). The problem I have is they at least should offer an open-source implementation for people who know how to use it, at the very least make the RAG files available. This is not the case. I’m also a proponent of paying it forward. Yes this costs them money, but they could find a way to break even in ways that don’t consist of just selling another SaaS (software-as-a-service). Let people pay it forward for others or something. Accept that you will lose some money on running this and cover with dues or people in the party who have money and don’t mind maintaining this service. Accept donations. Lots of ways you can do this that are not so commercial, i.e. “if you can’t pay you must vacate the premises”. ## The technical implementation: ProleWiki MCP vs. Socialism AI A few months ago we started working with a dev who was making the Marxists Internet Archive available for RAG use. This project evolved and they are now making a ProleWiki MCP with the pages we sent them. It’ll still be RAG, but more efficient. So first, let’s look at how the Socialism AI RAG works. If you remember the pipeline: You send your prompt -> they add their own instructions to it -> LLM fetches WSWS blog articles to answer your prompt (<-- we are here) -> LLM reads blog articles -> LLM answers your prompt with the WSWS blog articles as sources. The problem we’ve found is what kind of data exactly the LLM gets access to. Imagine it like a bin the LLM can sift through to make an answer with. If you provide it with the link to the page, it parses that as html code, with all its tags, headers, script calls etc. Imagine me giving you a page full of html code and asking you “can you answer when Lenin was born from this info?” You can, but it’s gonna take a while and a lot of it is simply unnecessary. And you only have this one page to make an answer. If Lenin’s DOB is not neatly written on it, you have to do extra thinking to put it together (this is the context window - the LLM simply won’t look through 250k WSWS articles, it has to pick and choose which articles are more likely to help answer the question). Therefore we can optimize this bin. Instead of giving you full pages you can pick from, we can give you individual lines. In our RAG for ProleWiki, what our dev did was some math that extracts every line from our pages on the principle of 1 line = 1 idea. Then it puts these ideas together in a matrix and sorts them by semantic closeness. What this means is if you’re the LLM, you don’t get a full page on the October Revolution or Lenin [https://en.prolewiki.org/wiki/Vladimir_Lenin] to answer a question with. You can see our page on Lenin is quite lengthy and if you asked a question that is not on this page when the LLM pulled it to look at it before answering (for example you can see the self-exile section is empty), it might not answer your question as best it could. With the semantic matrix, instead of picking from pages, it picks from lines to make a coherent answer. Instead of looking at just Lenin’s page and filling its entire context window with it, it looks at semantic information relating to Lenin’s self-exile on ProleWiki - or other sources you add to the corpus, the ‘bin’ - and then makes an answer on this. This means if we have information about Lenin’s self-exile on say the USSR page (because why not!), it will pull exactly that thread from that page. And this is much more powerful than what the WSWS did and why they offer such measly usage rates. They are filling up context window and sending noise tokens because they’re giving an entire <!DOCTYPE HTML><head><meta-name>… html page instead of just the relevant content. ## But where does the MCP come in? MCPs are kinda new, and were made for AI to work with. I wouldn’t be the best person to explain them but basically it lets an LLM look at some data (website, files, etc) and work with that data in some way. Mostly used in agentic work, tools are exposed to the llm such as view file or edit file, so it can perform these operations itself instead of having you do it and then confirm. So if you have an agent (such as crush [https://github.com/charmbracelet/crush], our favorite here on lemmygrad), an LLM can and will view and edit the files you tell it to. These are an example of 2 tools. With an MCP, you give the LLM access to data it can read and can also give it its own tools. You could make a tool “ProleWiki-fetch”. When the LLM decides to use this tool, it communicates with the ProleWiki MCP you have installed locally and lets it say “okay, let’s use the prolewiki-fetch tool to look at data from prolewiki to answer this question”. Then the MCP does its magic and sends back to the LLM the information. And not only that, but as we said you can also run this locally. We are still figuring out how we’ll package all of this but most likely we’ll make the source files available so that anyone can build any RAG or make their own cloud web interface if they want. Likewise for the MCP, it will be downloadable with our source files so that you can just add it to your agent interface and start using it to query the LLM and answer with prolewiki content. Communism is not in a position of strength currently. So, I don’t see any reason we should be trying to hide and obfuscate any of our content. On the contrary, proletarian education demands it be accessible without discrimination. Unlike trots, we trust the people to make the right decisions collectively - if someone wants to use ProleWiki content to train a model and paywall that, let them. There will be 10 more that won’t be. In fact speaking of models, our dev is also working on something there… but I was asked not to say too much about it as it’s very experimental 🤐

Neobrutalist prolewiki idk

https://lemmygrad.ml/post/9982113

Lmao, the NazBols are livid! Keep doing the good work, all the editors and moderators.

https://lemmygrad.ml/post/9957238

Is there a pdf or epub version of The CIA's Shining Path?

https://lemmygrad.ml/post/9943225

Is there a pdf or epub version of The CIA's Shining Path? - Lemmygrad

Like the title says. I’d like to be able to read it offline.

Avis aux francophones: Rejoignez-nous sur ProleWiki !

https://lemmygrad.ml/post/9797629

Avis aux francophones: Rejoignez-nous sur ProleWiki ! - Lemmygrad

Bonjour à tous.tes, Petit message en français pour vous dire qu’on a traduit ProleWiki vers le français depuis l’instance anglais et que nous cherchons maintenant des éditeurs.trices ! Il reste pas mal de boulot pour finir l’intégration des nouvelles pages, et j’ai préparé un petit guide qui explique où vous pouvez nous aider avec vos contributions : https://fr.prolewiki.org/wiki/Essai:Comment_aider_sur_ProleWiki_(français) [https://fr.prolewiki.org/wiki/Essai:Comment_aider_sur_ProleWiki_(fran%C3%A7ais)] (que je vais certainement encore remplir et essayer de simplifier) N’hésitez pas à partager un maximum, on cherche vraiment à faire vivre l’instance et la rendre autonome. Et j’espère vous voir sur ProleWiki !

EN ProleWiki has been translated and uploaded to French ProleWiki

https://lemmygrad.ml/post/9775149

EN ProleWiki has been translated and uploaded to French ProleWiki - Lemmygrad

Obviously this is still in what could be considered “late beta”, but the pipeline was a huge success. https://fr.prolewiki.org/ [https://fr.prolewiki.org/] The translation quality is honestly very good, we picked the right model and prompt for this. This got us I would say 75-80% of the way there, the remaining % points are busywork that you won’t escape, or at least I don’t know how to automate it… Think of it this way, ProleWiki EN has 5 years of organic content being written over time with links and page redirects being made. We are starting from 0. So, currently, most pages have redlinks (here’s a benchmark one: https://fr.prolewiki.org/wiki/Corée [https://fr.prolewiki.org/wiki/Cor%C3%A9e]) because the redirects are not created. The pages exist, it’s just that the links should to go, say, “Kim Il Sung” instead of “Kim Il-Sung”. Normally you’d create a redirect like Wikipedia does, i.e. Kim Il-Sung takes you to Kim Il Sung. But we don’t have that history so we have to create them. We could have exported the redirects but I decided against it because it would probably be a bigger headache. Same for the templates, we’re going to run them through Deepseek as needed. Aside from that we focused on getting the triad of homepages (Home/Library/Essays) cleaned up and ready to go. Here’s the essays for example: https://fr.prolewiki.org/wiki/ProleWiki:Essays [https://fr.prolewiki.org/wiki/ProleWiki:Essays] I’m hopeful that with this out of the way we will get new editors and even anonymous editors interested in participating (tomorrow I think I will open up anonymous editing on the French instance to every namespace). It’ll take some time to finish cleaning everything up and tbh even the english instance isn’t completely pristine. I saw some pages that I didn’t even know existed and were clearly test pages from 2020 lol. Obviously fixing these red links is not going to happen overnight, we’re in for the long haul. But we got 80% out of the way in a week. I learned some practices in regards to this pipeline, things I would do differently. Tbh we were getting kinda antsy to get this up and running. But if we were to redo this for other languages I would do some things a bit different to save on the headache. The pipeline was: download all PW pages through API -> Run through LLM to translate from EN to FR -> use regex script to clean up translation artifacts -> upload to website. Simple enough in theory but not so small in practice, esp. the regex to clean up the translation artifacts.

One more update -> on prolewiki website you can now highlight any text on desktop and press ctrl+k to perform a search for it

https://lemmygrad.ml/post/9689980

One more update -> on prolewiki website you can now highlight any text on desktop and press ctrl+k to perform a search for it - Lemmygrad

Only on English for now bc we need to duplicate code on all instances 😩 Just a small addition but I think it just makes sense and will probably help a lot of people out. Also interested if someone has ideas on how to advertise this to our readers because I have no idea where to put this info. We also have a reading mode if you press the 0 key on desktop, no idea how to tell people (but I want to put that one in menu instead tbh)