Mastodawn

Caleb Ziems Apr 21, 2023

Can Large Language Models (#ChatGPT) transform Computational Social Science?

Our recent work (with @Held, @omar, @diyiyang) shows how they might (in partnership w/ experts).

We evaluate on 24 #CSS tasks + draw a roadmap 🚗🗺️ to guide #LLM-augmented social science 🚀

Paper: https://calebziems.com/assets/pdf/preprints/css_chatgpt.pdf

🧵 thread

Show thread

Caleb Ziems Apr 21, 2023

1️⃣ Can #LLMs augment human annotation to increase quality + save time?

✅ Yes! #LLMs have fair agreement w/ humans on 12/17 tasks (0.2 < kappa < 0.7).

LLMs can join humans in a #MajorityVote to reliably label text w/ 7%-50% less human effort (so invest savings in #experts!)

Show thread

Caleb Ziems

2️⃣ Can #LLMs replace human annotation?

❌ not for expert taxonomies (#ImplicitHate) or parsing tasks (#ArgumentExtraction) [<40% acc.]

❔ maybe where objective ground truth (#misinfo) or common definitions (#emotions) exist [>70% acc.]

(but human-in-the-loop is recommended)

Show thread

Caleb Ziems Apr 21, 2023

3️⃣ Can LLMs help humans code unstructured text w/ open-ended generations?

✅ yes, humans prefer #ChatGPT explanations just as often as gold references

4️⃣ Can LLMs replace human inductive analysis?

❌ no, LLMs don't outperform humans; experts should instead curate #LLM outputs

Show thread

Caleb Ziems Apr 21, 2023

5️⃣ Are #LLMs better at some scientific fields rather than others?

❌ we don't see any systematic bias against any field

🤔 instead, performance varies more by the complexity of the input --- document-level analysis is the most challenging!

Show thread

Caleb Ziems Apr 21, 2023

6️⃣ How should I decide which model to use?

Keep the following in mind:

📈 performance scales w/ model size
🎵 #FLAN lets you tune w/ your own labels
💲 #ChatGPT is often cheapest
💬 #ChatGPT is best for generation
💻 code-instructed #GPT3 excels at parsing

Show thread

Caleb Ziems Apr 21, 2023

7️⃣ How can I get the most out of my model?

We recommend these best-practices…

🔠 enumerate options with multiple-choice
↩️ separate options with new lines
⚠️ give instructions and repeat constraints *after* the context
🤖 ask for machine-parseable JSON