55 Followers
114 Following
11 Posts

Modeling Linguistic Variation for Inclusive NLP

ML PhD advised by @diyiyang at Georgia Tech/Stanford

Alum hackNY '17 & NYU Abu Dhabi '19
Burqueño
he/him

Personal Sitehttps://williamheld.com/
Essential article asking "Can #LLM transform #computational #socialscience": https://calebziems.com/assets/pdf/preprints/css_chatgpt.pdf
(by @caleb_ziems with @Held, @omar, @diyiyang). Key excerpt below: zero-shot e.g. ChatGPT doesn't currently out-perform fine-tuned FLAN.

Can Large Language Models (#ChatGPT) transform Computational Social Science?

Our recent work (with @Held, @omar, @diyiyang) shows how they might (in partnership w/ experts).

We evaluate on 24 #CSS tasks + draw a roadmap 🚗🗺️ to guide #LLM-augmented social science 🚀

Paper: https://calebziems.com/assets/pdf/preprints/css_chatgpt.pdf

🧵 thread

Want to finetune FlanT5, but don't have access to a massive GPU? I got it working for my research with RTX 2080's!

Here's a gist which demos how easy model parallel training and inference is with HuggingFace `.parallelize()`: https://gist.github.com/Helw150/f7ec01a4cdca13686b2508c37fe3a9b1

Flan T5 Parallel Usage

Flan T5 Parallel Usage. GitHub Gist: instantly share code, notes, and snippets.

Gist
United is making me use facial recognition to board my flight and I don't like that one bit

Hi 🐘.

If you are looking for a winter break project, here is the full collection of ML/coding puzzles.

* https://github.com/srush/tensor-puzzles
* https://github.com/srush/gpu-puzzles
* https://github.com/srush/autodiff-puzzles
* https://github.com/srush/raspy

GitHub - srush/Tensor-Puzzles: Solve puzzles. Improve your pytorch.

Solve puzzles. Improve your pytorch. Contribute to srush/Tensor-Puzzles development by creating an account on GitHub.

GitHub

Interesting distillation work, but I did a spit take when I saw T5-XXL (11 Billion parameters) called "small". I can't even run that on my academic server 😂 #NLProc

http://arxiv.org/abs/2212.08410

Teaching Small Language Models to Reason

Chain of thought prompting successfully improves the reasoning capabilities of large language models, achieving state of the art results on a range of datasets. However, these reasoning capabilities only appear to emerge in models with a size of over 100 billion parameters. In this paper, we explore the transfer of such reasoning capabilities to models with less than 100 billion parameters via knowledge distillation. Specifically, we finetune a student model on the chain of thought outputs generated by a larger teacher model. Our experiments show that the proposed method improves task performance across arithmetic, commonsense and symbolic reasoning datasets. For example, the accuracy of T5 XXL on GSM8K improves from 8.11% to 21.99% when finetuned on PaLM-540B generated chains of thought.

arXiv.org
Chain of Thought reasoning prompts—like "Let's think step by step"—make large language models more performant. Including, it turns out, at spewing out toxic and biased content. In our preprint, we evaluate zero-shot CoT on harmful questions & stereotypes: https://arxiv.org/abs/2212.08061
On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning

Generating a Chain of Thought (CoT) has been shown to consistently improve large language model (LLM) performance on a wide range of NLP tasks. However, prior work has mainly focused on logical reasoning tasks (e.g. arithmetic, commonsense QA); it remains unclear whether improvements hold for more diverse types of reasoning, especially in socially situated contexts. Concretely, we perform a controlled evaluation of zero-shot CoT across two socially sensitive domains: harmful questions and stereotype benchmarks. We find that zero-shot CoT reasoning in sensitive domains significantly increases a model's likelihood to produce harmful or undesirable output, with trends holding across different prompt formats and model variants. Furthermore, we show that harmful CoTs increase with model size, but decrease with improved instruction following. Our work suggests that zero-shot CoT should be used with caution on socially important tasks, especially when marginalized groups or sensitive topics are involved.

arXiv.org