Ben Schmidt

2.2K Followers
555 Following
167 Posts
VP Information Design, Nomic, building interfaces to latent spaces with data visualization; onetime history & digital humanities professor.
Homepagehttps://benschmidt.org
githubhttps://github.com/bmschmidt
The log that dare not speak its name.

Today Nomic released version 3.0 of GPT4All, the easiest and best way to run LLMs on your own computer without the cloud. Fully FOSS. If you haven't ever run a private local LLM -- or even if it's been 6 months -- try some of the llama3 or mistral derivatives, and chat locally with your own documents. It's remarkable how good open-source local quantized models have gotten, even as commercial models have been stuck at GPT-4 level for a year.

https://www.nomic.ai/gpt4all

GPT4All - Private & Local AI Chatbot

Run open-source AI models locally on your device. GPT4All delivers private, high-performance AI with no cloud required—your data stays on your machine.

Nomic
Come work at Nomic! I'm hiring a front end/Web engineer to build the next generation of data interfaces for curating, exploring, and model-building from text and image data. Apply here or pass it on: https://jobs.ashbyhq.com/nomic.ai/42e7a74f-9a26-42fb-9b48-5af6b0698045
Front End/Web Engineer

As a Front End/Web Engineer, you will use the next generation of browser technologies to create a collaborative and data-intensive environment for interacting with and editing massive unstructured datasets.

Does anyone know why and when Google Maps switched back from using a globe at the farthest-out zoom levels, and back to Web Mercator? (I would guess the answer is A/B testing revealed people got confused and they turned t off quietly, but maybe there's a story somewhere?)
I hate unnecessary precision in all things, so I made this notebook showing exactly how you can get substantial compression benefits on floating point Arrow data sent to/from the web browser by zeroing out the least significant bits in the mantissa. https://observablehq.com/@nomic/poor-mans-bfloat16
Poor man's BFloat16

Unnecessary precision is a bad thing. BFloat 16 is a half-precision floating point format used primarily in deep learning applications that, unlike standard IEEE floating points, keeps the 8-bit exponent of a single-precision floating point number and truncates the mantissa at 7 bits. [email protected] (Image from Wikipedia) Passing float16 values directly to javascript can be a pain sometimes because they don't serialize/deserialize that easily. Sometimes it's easiest to just pass around floats. But also, in ja

Observable
We're stocking the office library at Nomic… five books per person. Got four down (image)… what's your pick for the last slot at an information cartography firm?

I'll be doing a roundtable about vectors and embeddings with Leland McInnes (creator of UMAP), Laurens van der Maaten (creator of T-SNE), and Andriy Mulyar (CTO at Nomic) tomorrow. Register on zoom

15 Nov 2023 10:30-11:30 AM EST Registration Link:
https://us06web.zoom.us/webinar/register/4916994827498/WN_Qs8dX9E9QtyzGdCoHJQO4A#/registration

Welcome! You are invited to join a webinar: Vectors & Embeddings Roundtable. After registering, you will receive a confirmation email about joining the webinar.

Large language models (LLMs) have received a great deal of attention over the past year, since the dramatic release and meteoric rise of ChatGPT. Many other models and tools have been developed and refined in the intervening time. LLMs have their flaws but are incredibly powerful—among other things, they can answer questions, write stories, compose poetry and songs, and craft code. We’re still in the early stages of learning what these tools can do for us. But—how do these models work? How is text processed, represented, stored, and produced by LLM-based systems? The deep neural networks that LLMs are based on are only able to work with numbers—lots and lots of numbers. How do we take text and convert it to a form that a deep learning model can engage with? And then, how do we store that converted data in a manner that is efficient and useful to access? This is where vectors and embeddings come in. Vectors are ordered groupings of numbers, where each number has a certain meaning—and, since they’re made up of numbers, vectors are a data structure that ML models can interface with. The question is then how to convert text into vectors in a consistent way, and that’s where embeddings enter the picture: they’re methods for converting text into vectors and back, and thus they represent a “translation layer” between what we humans can understand, and what an LLM can understand. In this event, we have a great roundtable panel lined up for you. Andriy Mulyar and Ben Schmidt of Nomic AI, Laurens van der Maaten of Meta AI, and Leland McInnes of the Tutte Institute for Mathematics and Computing are here to shed some light on embeddings and vectors, and the challenges involved with storing, searching, and visualizing vector datasets. Vectors and embeddings are a key topic underlying the magic of generative AI, and we hope you leave this event with a much better understanding of them and how to work with them!

Zoom
@Dorialexander releases a Mistral fine tune that lets you talk to the 18th century. https://huggingface.co/spaces/Pclanglais/MonadGPT It thinks there are seven planets (including the sun and moon) and has some good tips for organizing a 1730s-style party…
MonadGPT - a Hugging Face Space by Pclanglais

Discover amazing ML apps made by the community

Introduced my kid to Dalle-3, and his immediate request was the USS Monitor. Kind of amazing how spectacularly it fails at this task--from the text description it understands it's a warship with a rotating turret, but it insists on creating some battleship-sized monstrosity and not the tiny thing that actually was. Lesson: historical images are far more common in the experience of people than of models, and that gives them greater context for imagining possibilities.
This guide to data visualization from the Publication Office of the European Union is full of wisdom, both in content and organization. https://data.europa.eu/apps/data-visualisation-guide/
According to @JanWillemTulp the author is @maarten, which definitely tracks.
(h/t @marianeerens.bsky.social)
Data Visualisation Guide