Mastodawn

Kent Pitman Mar 4

At the end of @screwlisp's show, in the discussion of @cdegroot's book, @ramin_hal9001 was talking about continuations. I wanted to make a random point that isn't often made about Lisp that I think is important.

I often do binary partitions of languages (like the static/dynamic split, but more exotic), and one of them is whether they are leading or following, let's say. there are some aspects in which scheme is a follower, not a leader, in the sense that it tends to eschew some things that Common Lisp does for a variety of reasons, but one of them is "we don't know how to compile this well". There is a preference for a formal semantics that is very tight and that everything is well-understood. It is perhaps fortunate that Scheme came along after garbage collection was well-worked and did not seem to fear that it would be a problem, but I would say that Lisp had already basically dealt led on garbage collection.

The basic issue is this: Should a language incorporate things that maybe are not really well-understood but just because people need to do them and on an assumption that they might as well standardize the 'gesture' (to use the CLIM terminology) or 'notation' (to use the more familiar) for saying you want to do that thing.

Scheme did not like Lisp macros, for example, and only adopted macros when hygienic macros were worked out. Lisp, on the other hand, started with the idea that macros were just necessary and worried about the details of making them sound later.

Scheme people (and I'm generalizing to make a point here, with apologies for casting an entire group with a broad brush that is probably unfair) think Common Lisp macros more unhygienic than they actually are because they don't give enough credit to things like he package system, which Scheme does not have, and which protects CL users a lot more than they give credit for in avoiding collisions. They also don't fairly understand the degree to which Lisp2 protects from the most common scenarios that would happen all the time in Scheme if there were a symbol-based macro system. So CL isn't really as much at risk these days, but it was a bigger issue before packages, and the point is that Lisp decided it would figure out how to tighten later, but that it was too important to leave out, where Scheme held back design until it knew.

But, and this is where I wanted to get to, Scheme led on continuations. That's a hard problem and while it's possible, it's still difficult. I don't quite remember if the original language feature had fully worked through all the tail call situations in the way that ultimately it did. But it was brave to say that full continuations could be made adequately efficient.

And the Lisp community in general, and here I will include Scheme in that, though on other days I think these communities sufficiently different that I would not, have collectively been much more brave and leading than many languages, which only grudgingly allow functionality that they know how to compile.

In the early days of Lisp, the choice to do dynamic memory management was very brave. It took a long time to make GC's efficient, and generational GC was what finally I think made people believe this could be done well in large address spaces. (In small address spaces, it was possible because touching all the memory to do a GC did not introduce thrashing if data was "paged out". And in modern hardware, memory is cheap, so the size is not always a per se issue.

But there was an intermediate time in which lots of memory was addressable but not fully realized as RAM, only virtualized, and GC was a mess in that space.

The Lisp Machines had 3 different unrelated but co-resident and mutually usable garbage collection strategies that could be separately enabled, 2 of them using hardware support (typed pointers) and one of them requiring that computation cease for a while because the virtual machine would be temporarily inconsistent for the last-ditch thing that particular GC could do to save the day when otherwise things were going to fail badly.

For a while, dynamic memory management would not be used in real time applications, but ultimately the bet Lisp had made on it proved that it could be done, and it drove the doing of it in a way that holding back would not have.

My (possibly faulty) understanding is that the Java GC was made to work by at least some displaced Lisp GC experts, for example. But certainly the choice to make Java be garbage collected probably derives from the Lispers on its design team feeling it was by then a solved problem.

This aspect of languages' designs, whether they lead or follow, whether they are brave or timid, is not often talked about. But i wanted to give the idea some air. It's cool to have languages that can use existing tech well, but cooler I personally think to see designers consciously driving the creation of such tech.

Generational GC changes the way you program and it's not *just* that it's efficient.

We used MIT-Scheme (which, by the early 90s was showing its age). We did all manner of weird optimizing to use memory efficiently. Lots of set! to re-use structure where possible. Or (map! f list) -- same as (map...) but with set-car! to modify in-place -- because it made a HUGE difference not recreating all of those cons cells => bumps memory use => next GC round is that much sooner (and then everything STOPS, because Mark & Sweep). Also stupid (fluid-let ...) tricks to save space in closures.

We were writing Scheme as if it were C because that was how you got speed in that particular world.

1/3

And then Bruce Duba joined the group (had just come from Indiana).

"Guys, you're doing this ALL WRONG",

"Yeah, we know already. It's ugly, impure, and sucks. But it's faster, unfortunately",

"No, you need a better Scheme; you should try Chez".

...and, to be sure, just that much *was* a significant improvement. Chez was much more actively maintained, had a better repertoire of optimizations, etc...

... but the real eye-opener was what happened when we ripped out all of the set! and fluid-let code. That's when we got the multiple-orders-of-magnitude speed improvement.

2/3

See, setq/set! is a total disaster for generational GC. It bashes old-space cells to point to new-space; the premise of generational GC being that this mostly shouldn't happen. The super-often new-generation-only pass is now doing a whole lot of old-space traversal because of all of those cells added to the root set by the set! calls, ... which then loses most of the benefit of generational GC.

(fluid-let and dynamic-wind also became way LESS cheap, mainly due to missing multiple optimization opportunities)

In short, with generational GC, straightforward side-effect-free code wins. It took a while for me to recalibrate my intuitions re what sorts of things were fast/cheap vs not.

3/3

There were other weirdnesses as well.

Even if GC saves you the horror of referencing freed storage, or freeing stuff twice, you still have to worry about memory leaks and moreover, dropping references as fast as you can matters

With copying GC, leaks are useless shit that has to be copied -- yes it eventually ends up in an old generation but until then it's getting copied -- and copying is where generational GC is doing work, and it's stuff unnecessarily surviving to the medium term that hurts you the most (generational GC *relies* on stuff becoming garbage as quickly as possible)

And so, tracking down leaks and finding places to put in weak pointers started mattering more...

4/3

screwlisp Mar 4

@wrog
Did you see the garbage collection handbook's note on performance depending on having about five times as much memory as was technically needed? @dougmerritt
@kentpitman @cdegroot @ramin_hal9001

@screwlisp @kentpitman @cdegroot @ramin_hal9001 @dougmerritt

5? maybe for mark&sweep

but I can't see how more than 2 would ever be necessary for a copying GC. Once you have enough space to copy everything *to* (on the off-chance that absolutely everything actually *needs* to be copied), you're basically done...

... and if you're following the usual pattern where 90% of what you create becomes garbage almost immediately, you can get by with far less.

@screwlisp @kentpitman @cdegroot @dougmerritt

Ramin Honary Mar 4

@wrog Haskell was first invented in 1990 or 91ish, and at that time they had already started to ask questions like, “what if we just ban set! entirely,” abolish mutable variables, make everything lazily evaluated by default. If you have been programming in C/C++ for a while, that abolishing mutable variables would lead to a performance increase seems very counter-intuitive.

But for all the reasons you mentioned about not forcing a search for updated pointers in old-generation GC heaps, and also the fact that this forces the programmer to write their source code such that it is essentially already in the Static-Single-Assignment (SSA) form, which is nowadays an optimization pass that most compilers do prior to register allocation, this allowed for more aggressive optimization to be used and results in more efficient code.

@ramin_hal9001 @screwlisp @wrog @dougmerritt @cdegroot

Kent Pitman Mar 5

The LispM did a nice thing (at some tremendous cost in hardware, I guess, but useful in the early days) by having various kinds of forwarding pointers for this. At least you knew you were going to incur overhead, though, and pricing it properly at least said there was a premium for not side-effecting and tended to cause people to not do it. And the copying GC could fix the problem eventually, so you didn't pay the price forever, though you did pay for having such specific hardware or for cycles in systems trying to emulate that which couldn't hide the overhead cost. I tend to prefer the pricing model over the prohibition model, but I see both sides of that.

If my memory is correct (so yduJ or wrog please fix me if I goof this): MOO, as a language, is in an interesting space in that actual objects are mutable but list structure is not. This observes that it's very unlikely that you allocated an actual object (what CL would call standard class, but the uses are different in MOO because all of those objects are persistent and less likely to be allocated casually, so less likely to be garbage the GC would want to be involved in anyway).

I always say "good" or "bad" is true in a context. It's not true that side effect is good or bad in the abstract, it's a property of how it engages the ecology of other operations and processes.

And, Ramin, the abolishing of mutable variables has other intangible expressional costs, so it's not a simple no-brainer. But yes, if people are locked into a mindset that says such changes couldn't improve performance, they'd be surprised. Ultimately, I prefer to design languages around how people want to express things, and I like occasionally doing mutation even if it's not common, so I like languages that allow it and don't mind if there's a bit of a penalty for it or if one says "don't do this a lot because it's not aesthetic or not efficient or whatever".

To make a really crude analogy, one has free speech in a society not to say the ordinary things one needs to say. Those things are favored speech regardless because people want a society where they can do ordinary things. Free speech is everything about preserving the right to say things that are not popular. So it is not accidental that there are controversies about it. But it's still nice to have it in those situations where you're outside of norms for reasonable reasons. :)

DougMerritt (log😅 = 💧log😄)Mar 5

@kentpitman
> Ultimately, I prefer to design languages around how people want to express things, and I like occasionally doing mutation even if it's not common, so I like languages that allow it and don't mind if there's a bit of a penalty for it or if one says "don't do this a lot because it's not aesthetic or not efficient or whatever".

Me too -- although I remain open to possibilities. Usually such want me to switch paradigms, though, not just add to my toolbox.

@ramin_hal9001 @screwlisp @wrog @cdegroot

@dougmerritt @screwlisp @wrog @cdegroot

Ramin Honary Mar 5

“the abolishing of mutable variables has other intangible expressional costs, so it’s not a simple no-brainer.”

@kentpitman I prefer the term “constraint” to “expressional cost,” because constraints are the difference between a haiku and a long-form essay. For example, I am very curious what the code for the machine learning algorithm that trains an LLM would look like expressed as an APL program. I don’t know, but I get the sense it would be a very beautiful two or three lines of code, as opposed to the same algorithm expressed in C++ which would probably be a hundred or a thousand lines of code.

Not that I disagree with you, on the contrary, that is why I was convinced to switch to Scheme as a more expressive language than Haskell. I like the idea of starting with Scheme as the untyped lambda calculus, and then using it to define more rigorous forms of expression, working your way up to languages like ML or Haskell, as macro systems of Scheme.

Kent Pitman Mar 5

@ramin_hal9001

I'm not 100% positive I understand your use of constraint here, but I think it is more substantive than that. If you want to use the metaphor you've chosen, a haiku reaches close to theoretical minimum of what can be compressed into a statement, while a long-form essay does not. This metaphor is not perfect, though, and will lead astray if looked at too closely, causing an excess focus on differential size, which is not actually the key issue to me.
I won't do it here, but as I've alluded to more than once I think on the LispyGopher show, I believe that it is possible to rigorously assign cost to the loss of expression between languages.

That is, that a transformation of expressional form is not, claims of Turing equivalence notwithstanding, cost-free both in terms of efficiency and in terms of expressional equivalence of the language. It has implications (positive or negative) any time you make such changes.

Put another way, I no longer believe in Turing Equivalence as a practical truth, even if it has theoretical basis.

And I am pretty sure the substantive loss can be expressed rigorously, if someone cared to do it, but because I'm not a formalist, I'm lazy about sketching how to do that in writing, though I think I did so verbally in one of those episodes.

It's in my queue to write about. For now I'll just rest on bold claims. :) Hey, it got Fermat quite a ways, right?

But also, I had a conversation with ChatGPT recently where I convinced it of my position and it says I should write it up... for whatever that's worth. :)

cc @screwlisp @wrog @dougmerritt @cdegroot

DougMerritt (log😅 = 💧log😄)Mar 5

@kentpitman
> That is, that a transformation of expressional form is not, claims of Turing equivalence notwithstanding, cost-free both in terms of efficiency and in terms of expressional equivalence of the language. It has implications (positive or negative) any time you make such changes.

I hope everyone here is already clear that "expressiveness" is something that comes along on *top* of a language's Turing equivalence.

Indeed Turing Machines (and pure typed and untyped lambda calculus and SKI combinatory calculus and so on) are all *dreadful* in terms of expressiveness.

And for that matter, expressiveness can be on top of Turing incomplete languages. Like chess notation; people argue that the algebraic notation is more expressive than the old descriptive notation. (People used to argue in the other direction)

@ramin_hal9001 @screwlisp @wrog @cdegroot