Dolly 2.0 is a really big deal: https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm

"The first open source, instruction-following LLM, fine-tuned on a human-generated instruction dataset licensed for research and commercial use"

My notes so far on trying to run it: https://til.simonwillison.net/llms/dolly-2

Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM

Introducing Dolly, the first open-source, commercially viable instruction-tuned LLM, enabling accessible and cost-effective AI solutions.

Databricks

One of the most exciting things about Dolly 2.0 is the fine-tuning instruction set, which was hand-built by 5,000 Databricks employees and released under a CC license

Here's that training set in Datasette Lite: https://lite.datasette.io/?json=https://github.com/databrickslabs/dolly/blob/master/data/databricks-dolly-15k.jsonl#/data/databricks-dolly-15k?_facet=category

Datasette

@simon @film_girl Typo in the training set. Is that column the model's response? I wonder where it pulled the misspelled name from.
@ironicsans @film_girl no, none of the data in there is generated by a model - it was all manually entered by Databricks staff members