Mastodawn

Adrian Cockcroft

#SW2con started out with an excellent keynote from the #RedMonk analyst team, the intersection of developers and AI, how we got here and what are the issues. Some papers to read linked from the QR codes. /cc @CSLee @rstephensme

#SW2con second keynote is Datastax CTO Jonathan Ellis talking about vector databases. Very deep dive into the tech. Great to see Jonathan again… he spent the last year building open source JVector https://github.com/jbellis/jvector

GitHub - jbellis/jvector: JVector: the most advanced embedded vector search engine

JVector: the most advanced embedded vector search engine - jbellis/jvector

GitHub

#SW2con next keynote is a talk on GitHub Copilot, how it works, the development and tuning process behind getting it to work well, by Mario Rodriguez - Senior VP of Product at Microsoft.

#SW2con Emily Johnson from IBM - first time speaker - doing a great job talking about observability with Instana and optimization with Turbonomic. I was an advisor to Instana when they started, and it’s good to see IBM developing and supporting the product after they acquired the team.

#SW2con next up I’m in the Code-assists track hearing from Aso Kukic of Sourcegraph about their Cody tool. Start by defining levels of Code AI assist. Human-initiated, AI initiated, and AI led.

#SW2con Dennis Pilarinos, CEO of Unblocked taking about security and trust for AI apps. Applying these patterns to the new AI apps.

#SW2con good talk on what to think about when considering AI trust and security issues.

#SW2con IBM Fellow Trent Grey-Donald talking in detail about how code assistants work. I was chatting to Trent last night and discovered we have a bunch of friends in common. The hallway track here is very good, I usually meet some new interesting people…

#SW2con Intent matters a lot, the flow of how it works is shown… the IBM example was turning COBOL into Java, unlike the earlier GitHub examples in Python and Cody examples in JavaScript. They can all do most languages but for a specific use case and language one or the other products is likely to be better tuned.

#SW2con Code Assistants - the road ahead. Measure user feedback, pick the right context, combine LLMs with established techniques, go beyond coding.

#SW2con attending afternoon sessions related to RAG. First up: Building Something Real with Retrieval Augmented Generation (RAG) - Jon Bratseth, CEO, Vespa.ai

#SW2con more advice on RAG from Jon Bratseth - good level of detail on how they work and what’s important.

#SW2con Joe Shockman co-founder of Grounded AI up next. Goal of making today’s workloads more efficient and avoiding the pitfalls we’ve been seeing.

#SW2con A Recipe for Fine-tuning with Direct Preference Optimization (DPO) - Jesse Kipp, Director of Engineering, Cloudflare - good reference to a recent paper http://arxiv.org/abs/2405.00732

LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Low Rank Adaptation (LoRA) has emerged as one of the most widely adopted methods for Parameter Efficient Fine-Tuning (PEFT) of Large Language Models (LLMs). LoRA reduces the number of trainable parameters and memory usage while achieving comparable performance to full fine-tuning. We aim to assess the viability of training and serving LLMs fine-tuned with LoRA in real-world applications. First, we measure the quality of LLMs fine-tuned with quantized low rank adapters across 10 base models and 31 tasks for a total of 310 models. We find that 4-bit LoRA fine-tuned models outperform base models by 34 points and GPT-4 by 10 points on average. Second, we investigate the most effective base models for fine-tuning and assess the correlative and predictive capacities of task complexity heuristics in forecasting the outcomes of fine-tuning. Finally, we evaluate the latency and concurrency capabilities of LoRAX, an open-source Multi-LoRA inference server that facilitates the deployment of multiple LoRA fine-tuned models on a single GPU using shared base model weights and dynamic adapter loading. LoRAX powers LoRA Land, a web application that hosts 25 LoRA fine-tuned Mistral-7B LLMs on a single NVIDIA A100 GPU with 80GB memory. LoRA Land highlights the quality and cost-effectiveness of employing multiple specialized LLMs over a single, general-purpose LLM.

arXiv.org

#SW2con when to do fine tuning vs RAG. Useful talk by Jesse Kipp on the differences and techniques.

#SW2con I took a break to work on my talk for tomorrow (I’m the keynote before lunch) and now we are back for the final set of day 1 keynotes with: KAITO: Building an Open Source Platform for AI - Lachlan Evenson, Principal PDM Manager, Microsoft Azure. Good advice on how open models work, nice demo of Kubernetes deployment and some next steps.

#SW2con Refining RAG Performance - Chris Maddock, Head of Product Marketing & Solutions Architecture, Unstructured.io. Open source and commercial versions.

#SW2con Final keynote of the day with Paul Kedrosky - a fun talk looking at what’s going on with AI as more capabilities emerge.

#SW2con Day 2 kicks off with Sriram Subramanian of CloudDon talking about building responsible AI systems. Good discussion of the differences between alternative ethical frameworks and how each LLM interprets them.

#SW2con interview between Heather Joslyn of The New Stack and Paige Bailey of Google - discussing some of the new Google AI announcements and other recent news.

#SW2con Rob Zuber CTO of CircleCI talks about how to detect AI hallucinations.

#SW2con Marc Austin of Hedgehog talking about why AI needs a new network. Discussing Ethernet based solutions prior to the move to the high speed Ultra Ethernet Consortium vs. using Infiniband.

joy larkin 🌺✨May 14, 2024

@adrianco “That event has a lot of AI-focused content for developers and other technical practitioners” says the AI marketer.

Is the crowd mostly outside of SF?

@joy The event is in Colorado, there are people from all over the place. Hawaii, West coast, a bunch of locals, Texas, East coast and I think some from Europe.

joy larkin 🌺✨May 15, 2024

@adrianco Yeah, the data point I'm looking for is whether or not AI topics are a focus at technical conferences (non-data science) outside of the usual hubs, e.g. SF/NYC/Paris, etc.

@joy This conference is a deliberate outlier. It’s always been whatever is the latest stuff and it’s always been held here because it’s a long way from anywhere and that acts as a filter to make people engage more.

Jonathan Yu May 14, 2024

@adrianco Are these being recorded? 👀

@jawnsy No, you have to be here. It’s deliberately an in-person oriented event. You can probably find other similar public talks by the authors.

Jonathan Yu May 14, 2024

@adrianco Well, I'm glad you're there to share a few toots about it. Thanks, Adrian!

@jawnsy You’re welcome, there’s other people tooting with #SW2con as well (and people who are seeing too much of it and don’t care can mute that tag).

Steve Loughran May 14, 2024

@adrianco people who say unit tests are “the boring stuff” aren’t writing sophisticated enough tests. I can see some exploration of the configuration-space of a simple class -but using mocking to inject network failures into a live connection with a remote store is the kind of problem which is as hard as the production code…

#generativeAI #ai

The Cody talk was a bit high-level, didn’t learn as much as I expected about how they work and what’s different between various copilot approaches.

Carol Lee May 14, 2024

@adrianco Oh I love this! ccing my co-authors @grimalkina & @KFosterMarks on here as well!