how to make programming terrible for everyone | jneens web site

how to make programming terrible for everyone

you can tell i have adhd by the 17 footnotes
@jneen I just thought you were being thorough...
@arclight that's how i feel too lol

@jneen My most recent project was a Python implementation of an aerosol scrubbing model that is implemented as part of larger R code. The R code is not documented to the level we need so the Python implementation was mainly an excuse to chase down references for everything left undocumented or unattributed in the R application. The original documentation was about 50 pages for the entire application; my documentation was about 200 pages for just one model (though strip out the example plots and source code and there's still at least 50 pages of design info and technical basis). In this case the documentation was far more valuable than the code because it cited chapter and verse where every equation, correlation, and piece of data came from. It's good having a pedantic technical reviewer that holds you accountable.

So yeah, I appreciate the work that went into your post and the background detail. :)

Aside: Is a commercial chatbot even capable of providing references for its work? My understanding is that all the attribution is laundered away when the LLM is constructed; all it can produce is obsequious hearsay...

@arclight i'm glad you had the documentation for that project, sounds like a life-saver. human communication >>>>
@arclight to your other question, part of the reason there's ambiguity on this is that LLMs can *claim* to provide references and introspect about its output, but that introspection and those references are still just... output

@jneen We've seen with the number of legal cases where lawyers have been caught out with fabricated cases as well as journal papers with fabricated references that the system will simply stick tokens together to meet its optimization threshold. So even if attribution hadn't been intentionally bleached away, nothing the chatbot emitted could be trusted unless there's some trustworthy deterministic (non-LLM) system that can verify the existence of citations and assess their relevance. *Everything* is a fabrication.

What concerns me is not the LLM part of the chatbot - that's just a pile of linear algebra - it's the cobbled-together UI that responds like an obsequious servile intern, Stepford ELIZA on Prozac. That part of the system is built on 30+ years of dark pattern research to keep people spending tokens. Right, wrong, as long as users keep spending, the system is operating as designed. The only acceptance test is that line goes up.

@arclight exactly! i've seen someone ask a chatbot "would you hallucinate if i asked you X" and it's just... not how it works.
@jneen @arclight a point I like to emphasize in these discussions is that if you gave me the same kind of money currently being lit on fire, I could build you a system that would take a natural-language query about code you want you write and then come up with a short list of open-source projects (along with their licenses) that have code that does that thing, along with a pointer to that exact snippet of code. It would of course have some results that weren't great sometimes and there would be an art to querying it effectively, like any search engine... But it would be a much better programming assistant than code LLMs.