180 Followers
102 Following
92 Posts
Incoming Asst Prof at Université de Montréal (Jan 2024). Postdoc @harvardhci; PhD @CornellInfoSci. Programming and culture, notational programming, AI and HCI, critical computing. Building ChainForge (chainforge.ai). Prev:  intern, AI/ML research.
ChainForgehttps://chainforge.ai/docs/
Personal Sitehttps://ianarawjo.com
@ianarawjo @eaganj @aurelien @andresmh @landay yeah, i think it's pretty much undebatable that UIST is better than CHI overall
I highly recommend making this leap into open development that @ianarawjo advocated for and @wattenberg endorsed. If you have the ability to make and deploy a tool that's genuinely useful, do it. You can still publish. You can even enable *other CHI papers* in the same cycle!
We'll feature these papers with little explainers in the coming days, starting w/ now-former postdoc and newly minted Prof. @ianarawjo's amazing #ChainForge! Here's our conditionally accepted pre-print: https://arxiv.org/abs/2309.09128 One of the things I love about this paper is how...
ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing

Evaluating outputs of large language models (LLMs) is challenging, requiring making -- and making sense of -- many responses. Yet tools that go beyond basic prompting tend to require knowledge of programming APIs, focus on narrow domains, or are closed-source. We present ChainForge, an open-source visual toolkit for prompt engineering and on-demand hypothesis testing of text generation LLMs. ChainForge provides a graphical interface for comparison of responses across models and prompt variations. Our system was designed to support three tasks: model selection, prompt template design, and hypothesis testing (e.g., auditing). We released ChainForge early in its development and iterated on its design with academics and online users. Through in-lab and interview studies, we find that a range of people could use ChainForge to investigate hypotheses that matter to them, including in real-world settings. We identify three modes of prompt engineering and LLM hypothesis testing: opportunistic exploration, limited evaluation, and iterative refinement.

arXiv.org
Accepted (conditionally)!! #CHI2024
Booked for PLATEAU 2024! See some of you at UC Berkeley on Feb 19-20 to chat programming + HCI! 🥳 https://2024.plateau-workshop.org
PLATEAU Workshop | PLATEAU Workshop

Bringing together Programming Languages and Human-Computer Interaction

Voilà --it's a start! https://hci.iro.umontreal.ca
Montréal HCI Group

I’m incredibly resentful right now about the apparent fact that you have to be involved with LLM nonsense in some way if you want to further your career right now and I can’t wait for this to blow over. I’m so bitter about it.
The rush to run a user study or need-finding interview in today’s HCI derives from an under-appreciation of theory.

Is there a software license which is as permissive as MIT License but explicitly forbids using the code to train any AI?

#foss #ai #chatgpt #github

I wrote all of this for a reporter. I want to share it here. I don't have the energy for alt text, sorry, and the copy-paste mechanism on Twitter DMs on my phone is broken. Maybe I'll manage on my computer later. This is part one of two, really spilling my soul in ink on my feelings about this past month and a half. I don't know how much will make it into the article; probably not much.