I just want computer-assisted, approximate pattern recognition.

I would like to be able to say, show me in the text I've written what my tics are, what parts of this code I'm writing look similar to other code already written, where it looks like a common error, an old bug, like a frequently-used command I can alias, a habit I might want to break, a cliche to avoid.

I don't want the computer to create more almost-work repetitiion, I want it to help me see the work I've done clearly.

"look the computer can generate more code faster" the world absolutely does not need or want more code, nothing needs more code for the sake of code, we need utility, functionality and empathy, an encoded understanding of the problem being solved and the humans around it. Code is the price we pay for that encoded understanding. What you've created is an entropy spigot pointed at the proxy metric graph you’re stuck using because your management doesn't understand anything.
@mhoye So true. I’ve always tried to argue for less code, so there’s less to maintain. Productivity can mean deleting code.
@benjamingeer @mhoye the only good diff is a red diff!

@grimmware @benjamingeer You joke, but one of the biggest differences between junior engineers and senior engineers is precisely how they feel about that.

Juniors who brag about how much code they're writing just do not impress senior engineers who talk about how much redundancy they can prune out or simplify in a codebase.

@benjamingeer @mhoye I think a lot of people who aren't technical think that more code = better/more capable, but really it's like more code = more shit that can break

"The engineer knows their work is finished not when there is nothing left to be added, but when there is nothing left to be taken away."

@sidereal @benjamingeer @mhoye I feel like we stopped hearing the term "feature bloat" a few years ago, as if complaints about it were silenced.
@foolishowl @sidereal @benjamingeer So... I don't actually think "feature bloat" is real; I think it's vanity to believe that just because you don't want or need something that nobody does or should, and I think one of the absolute worst, most regressive and pernicious cultural habits of free software culture is the insistence that This Current Exact Amount Of Computer and What I Know Today is The Correct Amount and that None More Computing Or Learning Is Necessary.
@foolishowl @sidereal @benjamingeer That said I also think that the human-factors/usability crime scene that is Actually It's Gnu Linux Or As I've Taken To Calling It Gnuplus Linux has made it a _structural_ challenge to add utility to tools in a way that doesn't immediately create cognitive burdens for operators, but that the software industry's relentless insistence that every year or two an "update" is actually Surprise, You Get A Whole-Ass Different Product To Relearn is a genuine nightmare.
@foolishowl @sidereal @benjamingeer And... this all drags back to almost everything I see supposed AI doing well. Boilerplate, filling out yaml, people talking about how productive they are because it solves problems that are kinda unrelated to a problem anyone cares about, they're just tedium that hasn 't been detediumized yet. A dowsing rod for usability problems. And that's not nothing, but it's a tool that's creating an acre of YAML because nobody with opinions has made a decision tree.
@mhoye @sidereal @benjamingeer While I can see the validity of complaints about FOSS reactionaries, in my daily work I keep seeing apps get updated with more features and literally incomprehensible user interfaces. And it's something I hear people complain about a lot.
@foolishowl @sidereal @benjamingeer Yeah, that's my argument - it's not the new features that are at issue, it's the forced relearn-everything-else we're so cavalier about.
@mhoye In a new job, I got handed a chunk of code once, “see what you can do with this”. After a year, it did more, was between 2x and 100x faster, peak memory usage had gone from 10s of GB to 100s of MB, and the loc went from ~12000 to ~9000. Clearly my productivity was negative! For over a year! 😂
@UweHalfHand @mhoye I recall a post from long ago about a software developer who did this kind of factoring, eliminating lots of duplication and obsolete code, who was then fired because his company was maintaining the code under a contract that paid in part by lines of code produced. The code might have been as bloated as it was because of that incentive.
@not2b @mhoye That’s horrible! But I can see how that incentive would produce such an outcome. In my case, that was fortunately not so.

@mhoye saw this initially in isolation via boost and interpreted the "code" part of "generate more code faster" as "machine code", and i was like...

"but... but... faster codegen gives you more edit/compile iterations per hour, which means you get to experiment more, which helps you to explore more of the solution space, which..."

then i scrolled up and realized you meant "gen-a.i.-vomited source code", in which case: yes; agreed.

@mhoye

> "What you've created is an entropy spigot pointed at the proxy metric graph we're using because your management doesn't understand anything."

I want this translated into Latin and tattooed on my bum.

The flourish with which I shall despatch the opposition will make me the darling of the debating club.

@doboprobodyne @mhoye

"Id quod creavisti est puteus entropiae ad graphium mensurae substitutae directus, quem adhibemus quia administratio tua nihil intellegit."

@greve @doboprobodyne @mhoye lmao Of *course* someone here would actually provide the translation!
@mhoye One of biggest gripes with modern software is that it's so much easier to just keep slapping more layers and libraries and middleware in there (and machines are fast enough to mask much the absurd amount of cruft that must execute constantly) than actually making things simple and clean and small and stable and understandble. Accelerating that process of accretion of layers is completely the opposite direction I want to go in.

@mhoye

The student asked the master, "how should one develop code?"

The master replied "the novice always create more code; the middling maintain code; the skillful eliminate code"

The student then asked "but what does the sage do?"

The master asked "with what?"

At that moment, the student was enlightened.

@mhoye This is the fundamental problem

Every previous advance in computing was an abstraction. You delegate some thinking and testing, and get to be responsible for less code.

LLMs increase the volume of code that you personally are responsible for. And, at present, it’s easier for them to help you generate stuff than help you understand what you’ve done.

There is a trap here but the tool providers must be aware of it. I’m not aware of any theoretical reason why they can’t improve.

@neilk @mhoye "the tool providers must be aware of it" Must they? I have seen no evidence of that yet. 🤷

"... not aware of any theoretical reason why they can't improve" This strikes me as true but unhelpful? Like, LLM code generation seems genuinely surprising to 2020-me; I personally am not optimistic that it will turn out to be genuinely useful for programmers, but that is a live debate.

But regardless, it isn't even *tackling* the "help you understand the codebase" problem.

@esnyder @mhoye I have worked for large corporations and also been on the outside making disruptive / open source tech

100% of the time, many if not most of the people at the big corp are fully aware of the problems with their product. 100%.

Unlike the rest of the world, the big corp has a product pipeline, complex internal politics, existing revenue streams, message discipline, paid PR, a sympathetic business press, and an investor story to maintain.

@esnyder @mhoye Consider what would happen if OpenAI declared tomorrow that they’re not really sure how to manage code gen and many customers are struggling with adoption

Also consider the case where they are racing like mad to solve this, and think they have an answer, but it isn’t ready yet. They not only can’t announce it, they can’t even let on they know it’s a problem

https://en.m.wikipedia.org/wiki/Osborne_effect

Osborne effect - Wikipedia

@mhoye funnily enough, I found that "AI" works best as autocompletion, at times where your code could absolutely be simplified in some way. In a way, if AI autocompletion works well, it's likely that you have an opportunity to simplify your stuff.
@mhoye I spent the majority of my later career _unwriting_ code I had written earlier. I would frequently joke to other engineers that "is like to get into negative total coffee, but if I can at least get back to zero I can stop"
@mhoye I've heard the same sentiment expressed about scientific research. More research papers generated does not equal more scientific progress.
@mhoye Fundamental misunderstanding of efficiency
@mhoye "Code is a liability, not an asset."
@mhoye Wxactly. And that means LESS code. If AI creates more code this means the IT creates more beaurocracy.

@mhoye

I'm wondering, from anyone in the industry, what percentage of the time programmers/coders spend generating new code for new problems vs percentage of time spent fixing, improving, implementing and testing?

@mhoye

This is an ancient complaint. IBM, near the beginning of coding, wondered how to assess productivity and decided to do so with "k-locs": thousands of lines of code. The people around the birth of Microsoft were properly scornful of this measure.

But it keeps coming back.

@mhoye 🤷Actually, I've been experimenting, and that's not exactly true.

Describing to an AI assistant what you want usually takes longer than writing the code.

It might result in better thought out code because you have to verbalize the requirements, and the idiot intern will almost certainly take every shortcut that you did not specify explicitly in your description.

But for stuff where I don't have to learn the language & environment at the same time as coding, I'm surprisingly faster solo.

Now starting to use packages I don't know and that don't have a good, proper documentation? Suddenly the AI ("having read the whole Internet") might have an idea (in the sense of a statistical probability) how to use the package, while I might have to read source code or experiment how to use it properly.
@mhoye They're the same people who, if they become software engineers, believe that since the problem was code, the answer must be more code.
@mhoye the last time some dumbass went and decided to evaluate his engineers by how many lines of code they had submitted we all mocked him, and for good reason
@mhoye oh gosh I think a lot about this too. So many problem spaces where we want personalization in a good way, an idiographic look at *our own activity* that allows us to reflect and identify areas for change, and so many places where that desire is instead being met with an approach of trying to align an individual's patterns to some mythical population average, with a core assumption that we want uniformity and homogeneity, which is not at all the same motivator and need
@mhoye so, too, the frustration of being met with a doctor telling you about population averages when you want a polygenic risk score; we have so much information cut OUT of these seemingly "data rich" approaches

@grimalkina We talked the other day about deliberate and structured practice, and I wonder basically all the time now what I could learn about myself if I could actually see my efforts the way the machine sees them, from a wider view than I can muster, the patterns I can't see because I'm too close or too far away from them, how I could share what I learned with other people also trying to learn.

Computers could be so good for us.

@mhoye this is poignant for me too because with my life experience and coming to coding entirely self-taught, I frequently want to be able to access "how someone else would think about this" in a depthful and systemic way, deeper than I can get from cobbling together snippets in forums with context that's far afield from what I'm doing. The personalized tutor idea is and has always been a really incredible idea.
@mhoye and people always leap to thinking "you want assistance to replace your work," but I actually would want this kind of thing to expand my work, I have no doubt I would expose myself to more examples of other types of problem-solving, more experimentation, if I just had a few more bridges to get over what I don't know I don't know
@mhoye Reminded of when I was pitching our startup and VCs would say, "you're trying to build 'in the moment' documentation for developers that doesn't interrupt their train of thought and allows them to create more intent and decision context alongside the code they're writing....for why??? for other developers?? They have stack overflow for that"
@grimalkina @mhoye That paragraph is an arrow through my heart, thank you.
@grimalkina This thing where being honest with the current systems means you're either coerced towards some statistical mean Becaus Helth or identified as an exploitable outlier whale is just so demoralizing, the actual opposite of what would benefit us. I'm not here choose between being Average Guy, Money Spigot or Non-Factor, I want to grow into whatever's new about me.
@grimalkina And... look, in an environment where trust and even intimacy are possible, to be able to share not what someone understands about themselves but in some discursive way how they came to understand is maybe revolutionary, just in terms of humans learning to understand humans. There's so much possible, that aren't even accessible hypotheticals if we're trying to scrape fractions of pennies out of individual interactions so made up needles on made up dials wiggle towards made up numbers.

@mhoye I think there are many opportunities like this to make really clever improvements to existing tools with AI, especially around search. The problem is that you need to really understand the models and take them apart and put them back together again in very specific configurations.

That takes a lot more skill than just calling the ChatGPT API with "write this code for me".

@pbloem I don't think it means trusting current models at all. Large models trained on anonymously obtained data are just all garbage when you're trying to tailor something to your own personal self. You might as well be trying to create a romantic four course meal for your date out of the bag of hotdogs you found on the curb on the way home.

@mhoye I think I disagree.

Untransparent training data is bad ethically, but these models are representation learners as well as generators. The text embeddings they provide are incredibly powerful, well-disentangled and pretty objective.

You could easily use them to do lookups for the things you described without ever sampling any text.

@mhoye A model like Comma (mostly) solves the data issue, and isn't too big.

https://huggingface.co/common-pile

common-pile (Common Pile)

Org profile for Common Pile on Hugging Face, the AI community building the future.

@mhoye I just want something that can spot and fix when I type ; in text and really meant '.

@mhoye *lots* of what people describe AI-code-assistance as helping them with, I would have approached with code slushpiles/skeletons/personal comments/libraries. Efficiency, but increasing my understanding, maybe even my team's understanding or everyone's, if it got to "library".

My take is that that got outsourced to Stack Overflow just long enough to be forgotten.

@mhoye started researching something similar to that a couple of years ago

gave up due to the reception ("oh cool this will enable more AI-based code to-" ok we're done, not continuing this project)