how to make programming terrible for everyone | jneens web site

how to make programming terrible for everyone

you can tell i have adhd by the 17 footnotes
@jneen god i love footnotes
@demize they're so good. terry pratchett used to have like 5 levels of nested footnotes spanning 3 pages and i aspire to that level of whimsy
@jneen I just thought you were being thorough...
@arclight that's how i feel too lol

@jneen My most recent project was a Python implementation of an aerosol scrubbing model that is implemented as part of larger R code. The R code is not documented to the level we need so the Python implementation was mainly an excuse to chase down references for everything left undocumented or unattributed in the R application. The original documentation was about 50 pages for the entire application; my documentation was about 200 pages for just one model (though strip out the example plots and source code and there's still at least 50 pages of design info and technical basis). In this case the documentation was far more valuable than the code because it cited chapter and verse where every equation, correlation, and piece of data came from. It's good having a pedantic technical reviewer that holds you accountable.

So yeah, I appreciate the work that went into your post and the background detail. :)

Aside: Is a commercial chatbot even capable of providing references for its work? My understanding is that all the attribution is laundered away when the LLM is constructed; all it can produce is obsequious hearsay...

@arclight i'm glad you had the documentation for that project, sounds like a life-saver. human communication >>>>
@arclight to your other question, part of the reason there's ambiguity on this is that LLMs can *claim* to provide references and introspect about its output, but that introspection and those references are still just... output

@jneen We've seen with the number of legal cases where lawyers have been caught out with fabricated cases as well as journal papers with fabricated references that the system will simply stick tokens together to meet its optimization threshold. So even if attribution hadn't been intentionally bleached away, nothing the chatbot emitted could be trusted unless there's some trustworthy deterministic (non-LLM) system that can verify the existence of citations and assess their relevance. *Everything* is a fabrication.

What concerns me is not the LLM part of the chatbot - that's just a pile of linear algebra - it's the cobbled-together UI that responds like an obsequious servile intern, Stepford ELIZA on Prozac. That part of the system is built on 30+ years of dark pattern research to keep people spending tokens. Right, wrong, as long as users keep spending, the system is operating as designed. The only acceptance test is that line goes up.

@arclight exactly! i've seen someone ask a chatbot "would you hallucinate if i asked you X" and it's just... not how it works.
@jneen @arclight a point I like to emphasize in these discussions is that if you gave me the same kind of money currently being lit on fire, I could build you a system that would take a natural-language query about code you want you write and then come up with a short list of open-source projects (along with their licenses) that have code that does that thing, along with a pointer to that exact snippet of code. It would of course have some results that weren't great sometimes and there would be an art to querying it effectively, like any search engine... But it would be a much better programming assistant than code LLMs.
@arclight @jneen — this! Keeping engagement, keeping one hoping that the next response will be the right one. Feels like gambling. A recent JA Westenberg piece goes well with this: https://mastodon.social/@Daojoan/116219554271259845.
@jneen I get extra twitchy about chatbot use because my job is software QA on nuclear safety analysis code (here's a decent technical basis for an earlier related code https://www.osti.gov/biblio/10200672/) We have enough problems with coarse models and missing or uncertain data, we don't need a machine confidently fabricating nonsense. I'm not going in front of a regulator to explain our answers are bullshit because someone trusted a chatbot to fill in the blanks. Health & safety of the people comes first, then environmental protection, then protection of equipment. It's simply unethical to use these systems in any part of the safety analysis or design or licensing process. There's too much at stake.
An assessment of the potential for in-vessel fission product scrubbing following a core damage event in IFR (Technical Report) | OSTI.GOV

A model has been developed to analyze fission product scrubbing in sodium pools. The modeling approach is to apply classical theories of aerosol scrubbing, developed for the case of isolated bubbles rising through water, to the decontamination of gases produced as a result of a postulated core damage event in the liquid metal-cooled IFR. The modeling considers aerosol capture by Brownian diffusion, inertial deposition, and gravitational sedimentation. In addition, the effect of sodium vapor condensation on aerosol scrubbing is treated using both approximate and detailed transient models derived from the literature. The modeling currently does not address thermophoresis or diffusiophoresis scrubbing mechanisms, and is also limited to the scrubbing of discrete aerosol particulate; i.e., the decontamination of volatile gaseous fission products through vapor-phase condensation is not addressed in this study. The model is applied to IFR through a set of parametric calculations focused on determining key modeling uncertainties and sensitivities. Although the design of IFR is not firmly established, representative parameters for the calculations were selected based on the design of the Large Pool Plant (LPP). The results of the parametric calculations regarding aerosol scrubbing in sodium for conditions relevant to the LPP during a fuel pin failure incident are summarized as follows. The overall decontamination (DF) for the reference case (8.2 m pool depth, 770 K pool temperature, 2.4 cm initial bubble diameter, 0.1 pm aerosol particle diameter, 1573 K initial gas phase temperature, and 72.9 mole % initial sodium vapor fraction) is predicted to be 36. The overall DF may fall as low as 15 for aerosol particle diameters in the range 0.2-0.3 pm. For particle diameters of <0.06 pm or >1 pm, the overall DF is predicted to be >100. Factors which strongly influence the overall DF include the inlet sodium vapor fraction, inlet gas bubble diameter, and aerosol particle diameter. The sodium pool depth also plays a significant role in determining the overall DF, but the inlet gas phase temperature has a negligible effect on the DF. | OSTI.GOV

@jneen er is the body of the blog post supposed to be ~7MB of HTML and take ~28s to load?
@packetcat oops i left the inlined images in there
@packetcat there we go. should be about 43kb now
@jneen yep, loads very quickly for me now!
@jneen amazing read. Thanks!
@jneen this is spot on and great writing!
@whack thank you so much! i had a lot of help, it was a whole mess for a while lmao
@jneen grandpa mentioned
@nex3 oh my god really?
@jneen yeah there are only two Weizenbaum branches that still exist and I'm that one

@jneen I was in the middle of reading this when a friend texted me saying, “read this!”

It was absolutely worth reading. Thank you for writing it.

@jneen this is brilliant. I loved this from the ending:

"""
I think the ultimate fate of AI programming won’t be too far from that of The Last One. When a programming tool is unreliable, completely resists mental-modeling, and is incapbable of consistently rejecting invalid input, I think it’s reasonable to say it’s not fit for purpose, and is certainly not the future of programming. We simply cannot develop mental models of AI through traditional means. But we have to remember that just because we don’t understand it doesn’t mean it’s hiding secret insight or power.
"""

Have you read of the "Sim City effect" (disclaimer: Noah Wardrip-Fruin was one of my graduate advisors)? It's a nice discussion that extends the "Eliza effect" in several dimensions and seems to match up with a lot of what you're saying here.

@[email protected]

Yet we can build a mental model of #AI: it's simply a corrupted decompression of a lossy compressed archive of our competitors work.

Those who builds the archive gives to our competitor access to our work in exchange of giving us access to their. All for the largest fee they can get at any given time.

@[email protected]

@giacomo @jneen the size of the archive makes this unusable as a predictive model of the LLM's behavior though, at least for the kinds of predictions necessary to support use for building programs.

To be useful as a tool in the way that a programming language is, humans would need to be able to build a mental model of its operation that could fit in our heads and allow us to predict what it would do. I can understand the math behind an LLM fairly well and can even make *some* kinds of predictions like "it will tend to fail in these ways." I might even be able to leverage this to exert some level of control over the system if all I want to make it do is fail in a particular way or email me the user's github tokens or something. But to control it precisely in the ways necessary to effectively program with it is harder-to-impossible. It might *seem* to be giving me what I want because (especially for simple programming tasks) it sometimes or even often gives me what I ask for. But that's an illusion of control as it's not dependable and when it fails I've got little recourse other than to abandon it for a different tool.

@[email protected]

Of course, such mental model won't let you predict a #CodingAgent's output. Even just because you know a certain amount of random input is included in its computation to give an illusion of creativity and counter some plagiarism accusation. That's why I included the "corrupted" term.

On the other hand, it gives you a useful insight of what you might exfiltrate from a competitor work and how to setup the context to get specific parts of its proprietary software, such as secret algorithms, test suites, or even secret keys (in particular short ones like ed25519) that the agent got access to.

@[email protected]

@Jetengineweasel The top half of the essay linked from what I'm replying to -- up till "Evaluating AI as a programming language" -- is relevant to what we were talking about earlier today. "predictability" and "discoverability" especially are important for _any_ program, not just languages

( @jneen sorry for the tangent <3 )

@zwol @Jetengineweasel i think there may be part of this thread I can't see, but yes, it's 100% rehashing of a lot of basic UX stuff
@jneen i was referring back to an off-masto conversation, sorry for the confusion
@jneen regarding the ELIZA effect, I think it points at a more basic cognitive error than just "having no reference point to fall back on". so much of our own understanding of how we think is grounded in language that it's hard to believe that something that can (convincingly mimic) talk *can't* think.

@jneen but i went through a master's program in cognitive science and one of the things the intro classes really hammered on is: a great deal of the work the human brain does isn't based on language at all. or vision, which is the other thing we're really *aware* of relying on.

and -- not coincidentally -- that work is the work we struggle to write algorithms *of any kind* to handle.

@zwol that's incredibly interesting - do you have any reading on that topic you'd recommend?
@jneen Hmm, it's been a long time, but I'd suggest you start with "Cognition in the Wild" (E. Hutchins) and "The Way We Think" (Fauconnier and Turner). Also anything you can find on nonhuman animal cognition
@jneen great post. minor nitpick, I think it's "capable"? anyway I often find myself wishing I could "write computer programs using natural language" but like, I don't want to start speaking INFORM
@amsomniac ha! inform and applescript are my usual cautionary tales. inform is actually very interesting as an art project itself, though - the fact that it exists is a marvel, and the things people have made in it as well
@amsomniac i'm not sure what you mean by "capable" here, did I make a typo somewhere?
@jneen you know, I thought I saw one but I just looked for it and couldn't find it. sorry!
@amsomniac ha it's all good, nw
@jneen "incapbable" that's it!
@amsomniac oh my god lmao that's a really funny one.
@jneen anyway minor typos like these are a charming attribute of human writing
@amsomniac it took me an embarrassingly long time to see even quoted here in your post lmao

@jneen fantastic piece, thank you! puts into very readable words some deep concerns I found difficult to express.

👏👏👏

@Slash909uk thank you so much, means a lot :]