NanoGPT Slowrun: Language Modeling with Limited Data, Infinite Compute

https://qlabs.sh/slowrun

#HackerNews #NanoGPT #Slowrun #LanguageModeling #LimitedData #InfiniteCompute

NanoGPT Slowrun - Q

Python Trending (@pythontrending)

dLLM(dllm): Simple Diffusion Language Modeling이라는 프로젝트/도구 공개 알림입니다. 확산(diffusion) 기반 기법을 언어 모델링에 적용한 간단한 구현체 또는 연구용 레퍼런스로 보이며, 확산 기반 LLM 실험·연구를 위한 오픈 프로젝트 성격으로 해석됩니다.

https://x.com/pythontrending/status/2029150890003722658

#diffusion #llm #languagemodeling #opensource

Python Trending 🇺🇦 (@pythontrending) on X

dllm - dLLM: Simple Diffusion Language Modeling https://t.co/C1OuWPVlg2

X (formerly Twitter)

Ngày 9/21: MultiHead Attention mở khóa khả năng xử lý đa chiều cho mô hình ngôn ngữ. Thay vì 1 cơ chế chú ý, mô hình dùng nhiều "head" song song để chuyên sâu vào ngữ pháp, ngữ nghĩa, mối quan hệ từ vựng, hoặc cảm xúc. Ví dụ: câu "Sarah thích bảo tàng Paris" giúp các head đồng thời phân tích cấu trúc, mối liên hệ và ngữ cảnh. Tối ưu tính toán mà vẫn nâng cao độ sâu ngữ nghĩa. #AI #MachineLearning #NgônNgữTựNhiên #LanguageModeling

https://www.reddit.com/r/LocalLLaMA/comments/1pon3oz/day_9_21_day

🚀 ĐồngMai và/component Cydonia v4.2.0 đã ra mắt! Cùng dùng nền tảng Magistral 2509, Bruchously tăng tính sáng tạo so với v4.1. Mình chia sẻ cốt lõi trên Hugging Face. Đang trải nghiệm vers Goulden Air 4.6CODECHAN đi. #AI #LanguageModeling #HuggingFace #T cytokine #Cydonia

https://www.reddit.com/r/LocalLLaMA/comments/1oa29de/drummers_cydonia_and_magidonia_24b_v420/

Tokenization for language modeling: BPE vs. Unigram Language Modeling (2020)

https://ndingwall.github.io/blog/tokenization

#HackerNews #Tokenization #LanguageModeling #BPE #Unigram #NLP

Tokenization for language modeling: Byte Pair Encoding vs Unigram Language Modeling

Tokenizers used by the best-performing language models (Bert, GPT-2, etc.) poorly reflect the morphology of English text. I had hoped to use some quarantine time to design one that more closely aligns to relationships between wordforms. But Kaj Bostrom and Greg Durrett beat me to it and so this blog post materialized instead. I add some additional motivation, evaluate both methods against ‘gold standard’ tokenizations, and speculate about what might come next.

Nick Dingwall
Andrey Markov & Claude Shannon Counted Letters to Build the First Language-Generation Models
Shannon’s said: “OCRO HLI RGWR NMIELWIS”
#Shannon #Markov #NLP #AIhistory #LanguageModeling
https://spectrum.ieee.org/andrey-markov-and-claude-shannon-built-the-first-language-generation-models
Andrey Markov & Claude Shannon Counted Letters to Build the First Language-Generation Models

Shannon’s said: “OCRO HLI RGWR NMIELWIS”

IEEE Spectrum

🎯 #OuteTTS introduces a novel approach to text-to-speech synthesis using pure #languagemodeling
🔧 Built on #LLaMa architecture with just 350M parameters, featuring:

Zero-shot #voicecloning capability
Integration with #WavTokenizer (75 tokens/sec)
Local deployment via #llamacpp
#GGUF format compatibility

🔍 Technical Implementation:

Audio tokenization process
CTC forced alignment
Structured prompt system
Temperature-adjustable outputs

⚠️ Current Limitations:

Limited vocabulary range
String-only input support
Best performance with shorter sentences
Variable temperature sensitivity

https://github.com/edwko/OuteTTS
https://huggingface.co/OuteAI/OuteTTS-0.1-350M

GitHub - edwko/OuteTTS: Interface for OuteTTS models.

Interface for OuteTTS models. Contribute to edwko/OuteTTS development by creating an account on GitHub.

GitHub
New #languagemodeling #nlp #ai #paper, led by Angelica Chen! We break the steepest MLM training loss drop into *2* phase changes: first in internal grammatical structure, then external capabilities. Big implications for emergence, simplicity bias, and interpretability! https://arxiv.org/abs/2309.07311
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs

Most interpretability research in NLP focuses on understanding the behavior and features of a fully trained model. However, certain insights into model behavior may only be accessible by observing the trajectory of the training process. We present a case study of syntax acquisition in masked language models (MLMs) that demonstrates how analyzing the evolution of interpretable artifacts throughout training deepens our understanding of emergent behavior. In particular, we study Syntactic Attention Structure (SAS), a naturally emerging property of MLMs wherein specific Transformer heads tend to focus on specific syntactic relations. We identify a brief window in pretraining when models abruptly acquire SAS, concurrent with a steep drop in loss. This breakthrough precipitates the subsequent acquisition of linguistic capabilities. We then examine the causal role of SAS by manipulating SAS during training, and demonstrate that SAS is necessary for the development of grammatical capabilities. We further find that SAS competes with other beneficial traits during training, and that briefly suppressing SAS improves model quality. These findings offer an interpretation of a real-world example of both simplicity bias and breakthrough training dynamics.

arXiv.org
Join our upcoming #DevDaysHyd on "The Power of Effective Prompt Engineering: Enhance your interactions with Gen-AI." on August 26th.

🗓️ Date: 26th August, 2023
🕒 Time: 10am - 1pm
👥 Mode: In-person
🏢 Venue: ZapCom Group Inc, Dallas Center, Rai Durg, Hyderabad.
📍 Location: https://goo.gl/maps/3D6wnghoN3cowcy17

Find more details and register at swecha.org/devdays

Meet our speaker: Vishal Jaishankar, is a Software Engineer at Microsoft, working on programming distributed systems and Software supply chain management. He loves learning new tech and applying them in his work.

Kindly note: Laptops are allowed at the venue, and we encourage you to bring your laptops for hands-on activities.

#PromptEngineering #LanguageModeling #AICommunication #NLPInsights #TextGeneration #CodeGeneration #PromptOptimization #ContextualAI
Bevor Sie zu Google Maps weitergehen

GPT-4 API by OpenAI Now Available – Analytics India Magazine #GPT4API

Hashtags: #chatGPT #AIAdvancements #LanguageModeling Summery: OpenAI, the leading artificial intelligence research lab, has announced the general availability of its GPT-4 API. GPT-4, or Generative Pre-trained Transformer 4, is the latest version of OpenAI's language model, which has been trained on a vast amount of internet text to generate human-like responses. The GPT-4 API allows developers to…

https://webappia.com/gpt-4-api-by-openai-now-available-analytics-india-magazine-gpt4api/

GPT-4 API by OpenAI Now Available – Analytics India Magazine #GPT4API

OpenAI announces general availability for GPT-4 API and plans to remove older models in Completion API by next yearOpenAI announces general availability for GPT-4 API and plans to remove older models in Completion API

Webappia