7 unnecessary #Assumptions about #Life in the #Universe : Medium
#AI’s #Memorization #Crisis : Misc
Why Finding #Motivation Is Often Such a #Struggle : Misc
Latest #KnowledgeLinks
7 unnecessary #Assumptions about #Life in the #Universe : Medium
#AI’s #Memorization #Crisis : Misc
Why Finding #Motivation Is Often Such a #Struggle : Misc
Latest #KnowledgeLinks
AI's Memorization Crisis - The Atlantic
https://www.theatlantic.com/technology/2026/01/ai-memorization-research/685552/
> Large language models don’t “learn”—they copy. And that could change everything for the tech industry.
Aharon Azulay (@AharonAzulay)
작성자는 관찰 결과가 일치한다고 말하며, 이러한 시스템들이 잘 알려지지 않은 arXiv 논문의 수치적 세부사항까지 기억할 수 있음을 지적하고 있습니다. 연구·검증 관점에서 모델의 기억(기록) 능력과 데이터 출처 관련 행동을 시사하는 댓글입니다.
A quotation from Bill Watterson
CALVIN: As you can see, I have memorized this utterly useless piece of information long enough to pass a test question. I now intend to forget it forever. You’ve taught me nothing except how to cynically manipulate the system. Congratulations.Bill Watterson (b. 1958) American cartoonist
Calvin and Hobbes (1994-01-27)
More about this quote: wist.info/watterson-bill/81087…
#quote #quotes #quotation #qotd #billwatterson #calvinandhobbes #cramming #cynicism #education #learning #lesson #memorization #rotememorization #school #teaching #test
This dataset contains all the results (including reconstructed texts, similarity scores etc.) of the reconstrution of DTF texts. The work is presented at the 4th Annual Conference of Computational Literary Studies, Krakow 2025. This dataset is also available in this GitHub repository. This work was created in the context of the work of the association German National Research Data Infrastructure (NFDI) e.V. NFDI is financed by the Federal Republic of Germany and the 16 federal states, and the consortium Text+ is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - project number 460033370. The authors would like to thank for the funding and support. Furthermore, thanks also include all institutions and actors who are committed to the association and its goals.
In our own work, we researched memorization in language models for code and ways to let them regurgitate training data:
> From the training data that was identified to be potentially extractable we were able to extract 47% from a CodeGen-Mono-16B code completion model.
> We also observe that models memorise more, as their parameter count grows, and that their pre-training data are also vulnerable to attack
Urteil GEMA gegen Open AI:
> Sowohl durch die Memorisierung in den Sprachmodellen als auch durch die Wiedergabe der Liedtexte in den Outputs des Chatbot lägen Eingriffe in die urheberrechtlichen Verwertungsrechte vor
https://www.justiz.bayern.de/gerichte-und-behoerden/landgericht/muenchen-1/presse/2025/11.php