Peter Bull

@pbull
2 Followers
6 Following
47 Posts

Co-founder DrivenData. Celebrating a decade of data for good.

ML challenges | https://www.drivendata.org/
Data projects | https://drivendata.co/
Open source | https://github.com/pjbull

🎉 Excited to launch this challenge! 🎉 Over a year of data collection, curation, and annotation that we undertook to produce a first-of-its-kind dataset.

Help us build speech models that understand kids 2-5. This gap bottlenecks literacy assessments, language acquisition testing, speech pathology screenings, and any kind of tool that interacts with early learners' speech (which is a lot, since they are not writing yet!). $120k in prizes and huge impact!

https://kidsasr.drivendata.org/

Great set of events for #SeattleAIWeek this week! Definitely join some if you are in town and let me know if you want to catch up https://luma.com/Seattle-AI-Week-2025
#SeattleAIWeek 2025 · Events Calendar

View and subscribe to events from #SeattleAIWeek 2025 on Luma. Showcasing the PNW as the best place to be in AI. Community-driven. Future-focused. Submit your event now using the + button.

🚀 New release: cloudpathlib v0.23.0

🥧 Now with Python 3.14 (π) support!
📁 New copy & move methods mean you can reduce usage of shutil 🎉

Check out the full release and docs here:
👉 https://cloudpathlib.drivendata.org/stable/

Super interesting work on new proposed columnar data file format called F3 with embedded wasm binary to decode the data 🤯 (which obviates the need for 3rd party library support). Favorable comparisons on compression, throughput and random reads to existing formats.

https://db.cs.cmu.edu/papers/2025/zeng-sigmod2025.pdf

Very cool to see Wikimedia embracing LLM tools and launching a hybrid similarity search API and open source embeddings for Wikipedia! Also supports Q&A style queries.
https://www.wikidata.org/wiki/Wikidata:Embedding_Project

Interesting to see empirical research coming out for LLMs as education aids. In this study, active use of LLMs helped CS students debug compiler errors. Removing LLM access demonstrated no lasting learning benefit from having had access to it...

https://learninganalytics.upenn.edu/ryanbaker/ICCE2025_paper_28.pdf

Great opportunity to work on AI in conservation and biodiversity with Roland Kays! In-person in NC, check it out now since it is only open for a week:
https://www.governmentjobs.com/careers/%7B0%7Dnorthcarolina/jobs/newprint/5021239
Job Bulletin

We just shipped two major features for cloudpathlib ✨📦 ✨ ! First, http support—treat an URL like any other path in Python code (open, read_text, join). Second, compatibility with open and os Python built-ins for seamless transition of legacy code and third-party library support.

https://cloudpathlib.drivendata.org

Exemplary FAQ post from the authors of "Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task" https://www.brainonllm.com/faq

I'd love to see more authors who are explicit about what NOT to claim based on a study, including wording for lay audiences that is not appropriate.

Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task

Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task

Thought I would spot check a application someone was posting about 100% vibecoding. Can you spot the issue?

Kudos to the LLM, this is verbatim from the fastapi docs. Sometimes verbatim from the docs is not what you want for your application though....