@screwlisp is having some site connectivity problems so asked me to remind everyone that we'll be on the anonradio forum at the top of the hour (a bit less than ten minutes hence) for those who like that kind of thing:

https://anonradio.net:8443/anonradio

He'll also be monitoring LambdaMOO at "telnet lambda.moo.mud.org 8888" for those who do that kind of thing. there are also emacs clients you should get if you're REALLY using telnet.

Topic for today, I'm told, may include the climate, the war, the oil price hikes, some rambles I've recently posted on CLIM, and the book by @cdegroot called The Genius of Lisp, which we'll also revisit again next week.

cc @ramin_hal9001

#LispyGopher #Gopher #Lisp #CommonLisp

At the end of @screwlisp's show, in the discussion of @cdegroot's book, @ramin_hal9001 was talking about continuations. I wanted to make a random point that isn't often made about Lisp that I think is important.

I often do binary partitions of languages (like the static/dynamic split, but more exotic), and one of them is whether they are leading or following, let's say. there are some aspects in which scheme is a follower, not a leader, in the sense that it tends to eschew some things that Common Lisp does for a variety of reasons, but one of them is "we don't know how to compile this well". There is a preference for a formal semantics that is very tight and that everything is well-understood. It is perhaps fortunate that Scheme came along after garbage collection was well-worked and did not seem to fear that it would be a problem, but I would say that Lisp had already basically dealt led on garbage collection.

The basic issue is this: Should a language incorporate things that maybe are not really well-understood but just because people need to do them and on an assumption that they might as well standardize the 'gesture' (to use the CLIM terminology) or 'notation' (to use the more familiar) for saying you want to do that thing.

Scheme did not like Lisp macros, for example, and only adopted macros when hygienic macros were worked out. Lisp, on the other hand, started with the idea that macros were just necessary and worried about the details of making them sound later.

Scheme people (and I'm generalizing to make a point here, with apologies for casting an entire group with a broad brush that is probably unfair) think Common Lisp macros more unhygienic than they actually are because they don't give enough credit to things like he package system, which Scheme does not have, and which protects CL users a lot more than they give credit for in avoiding collisions. They also don't fairly understand the degree to which Lisp2 protects from the most common scenarios that would happen all the time in Scheme if there were a symbol-based macro system. So CL isn't really as much at risk these days, but it was a bigger issue before packages, and the point is that Lisp decided it would figure out how to tighten later, but that it was too important to leave out, where Scheme held back design until it knew.

But, and this is where I wanted to get to, Scheme led on continuations. That's a hard problem and while it's possible, it's still difficult. I don't quite remember if the original language feature had fully worked through all the tail call situations in the way that ultimately it did. But it was brave to say that full continuations could be made adequately efficient.

And the Lisp community in general, and here I will include Scheme in that, though on other days I think these communities sufficiently different that I would not, have collectively been much more brave and leading than many languages, which only grudgingly allow functionality that they know how to compile.

In the early days of Lisp, the choice to do dynamic memory management was very brave. It took a long time to make GC's efficient, and generational GC was what finally I think made people believe this could be done well in large address spaces. (In small address spaces, it was possible because touching all the memory to do a GC did not introduce thrashing if data was "paged out". And in modern hardware, memory is cheap, so the size is not always a per se issue.

But there was an intermediate time in which lots of memory was addressable but not fully realized as RAM, only virtualized, and GC was a mess in that space.

The Lisp Machines had 3 different unrelated but co-resident and mutually usable garbage collection strategies that could be separately enabled, 2 of them using hardware support (typed pointers) and one of them requiring that computation cease for a while because the virtual machine would be temporarily inconsistent for the last-ditch thing that particular GC could do to save the day when otherwise things were going to fail badly.

For a while, dynamic memory management would not be used in real time applications, but ultimately the bet Lisp had made on it proved that it could be done, and it drove the doing of it in a way that holding back would not have.

My (possibly faulty) understanding is that the Java GC was made to work by at least some displaced Lisp GC experts, for example. But certainly the choice to make Java be garbage collected probably derives from the Lispers on its design team feeling it was by then a solved problem.

This aspect of languages' designs, whether they lead or follow, whether they are brave or timid, is not often talked about. But i wanted to give the idea some air. It's cool to have languages that can use existing tech well, but cooler I personally think to see designers consciously driving the creation of such tech.

@kentpitman @screwlisp @cdegroot @ramin_hal9001

Generational GC changes the way you program and it's not *just* that it's efficient.

We used MIT-Scheme (which, by the early 90s was showing its age). We did all manner of weird optimizing to use memory efficiently. Lots of set! to re-use structure where possible. Or (map! f list) -- same as (map...) but with set-car! to modify in-place -- because it made a HUGE difference not recreating all of those cons cells => bumps memory use => next GC round is that much sooner (and then everything STOPS, because Mark & Sweep). Also stupid (fluid-let ...) tricks to save space in closures.

We were writing Scheme as if it were C because that was how you got speed in that particular world.

1/3

@kentpitman @screwlisp @cdegroot @ramin_hal9001

And then Bruce Duba joined the group (had just come from Indiana).

"Guys, you're doing this ALL WRONG",

"Yeah, we know already. It's ugly, impure, and sucks. But it's faster, unfortunately",

"No, you need a better Scheme; you should try Chez".

...and, to be sure, just that much *was* a significant improvement. Chez was much more actively maintained, had a better repertoire of optimizations, etc...

... but the real eye-opener was what happened when we ripped out all of the set! and fluid-let code. That's when we got the multiple-orders-of-magnitude speed improvement.

2/3

@kentpitman @screwlisp @cdegroot @ramin_hal9001

See, setq/set! is a total disaster for generational GC. It bashes old-space cells to point to new-space; the premise of generational GC being that this mostly shouldn't happen. The super-often new-generation-only pass is now doing a whole lot of old-space traversal because of all of those cells added to the root set by the set! calls, ... which then loses most of the benefit of generational GC.

(fluid-let and dynamic-wind also became way LESS cheap, mainly due to missing multiple optimization opportunities)

In short, with generational GC, straightforward side-effect-free code wins. It took a while for me to recalibrate my intuitions re what sorts of things were fast/cheap vs not.

3/3

@kentpitman @screwlisp @cdegroot @ramin_hal9001

There were other weirdnesses as well.

Even if GC saves you the horror of referencing freed storage, or freeing stuff twice, you still have to worry about memory leaks and moreover, dropping references as fast as you can matters

With copying GC, leaks are useless shit that has to be copied -- yes it eventually ends up in an old generation but until then it's getting copied -- and copying is where generational GC is doing work, and it's stuff unnecessarily surviving to the medium term that hurts you the most (generational GC *relies* on stuff becoming garbage as quickly as possible)

And so, tracking down leaks and finding places to put in weak pointers started mattering more...

4/3

@wrog
Did you see the garbage collection handbook's note on performance depending on having about five times as much memory as was technically needed? @dougmerritt
@kentpitman @cdegroot @ramin_hal9001

@screwlisp @kentpitman @cdegroot @ramin_hal9001 @dougmerritt

5? maybe for mark&sweep

but I can't see how more than 2 would ever be necessary for a copying GC. Once you have enough space to copy everything *to* (on the off-chance that absolutely everything actually *needs* to be copied), you're basically done...

... and if you're following the usual pattern where 90% of what you create becomes garbage almost immediately, you can get by with far less.

@wrog Haskell was first invented in 1990 or 91ish, and at that time they had already started to ask questions like, β€œwhat if we just ban set! entirely,” abolish mutable variables, make everything lazily evaluated by default. If you have been programming in C/C++ for a while, that abolishing mutable variables would lead to a performance increase seems very counter-intuitive.

But for all the reasons you mentioned about not forcing a search for updated pointers in old-generation GC heaps, and also the fact that this forces the programmer to write their source code such that it is essentially already in the Static-Single-Assignment (SSA) form, which is nowadays an optimization pass that most compilers do prior to register allocation, this allowed for more aggressive optimization to be used and results in more efficient code.

@screwlisp @kentpitman @cdegroot @dougmerritt

@ramin_hal9001 @screwlisp @wrog @dougmerritt @cdegroot

The LispM did a nice thing (at some tremendous cost in hardware, I guess, but useful in the early days) by having various kinds of forwarding pointers for this. At least you knew you were going to incur overhead, though, and pricing it properly at least said there was a premium for not side-effecting and tended to cause people to not do it. And the copying GC could fix the problem eventually, so you didn't pay the price forever, though you did pay for having such specific hardware or for cycles in systems trying to emulate that which couldn't hide the overhead cost. I tend to prefer the pricing model over the prohibition model, but I see both sides of that.

If my memory is correct (so yduJ or wrog please fix me if I goof this): MOO, as a language, is in an interesting space in that actual objects are mutable but list structure is not. This observes that it's very unlikely that you allocated an actual object (what CL would call standard class, but the uses are different in MOO because all of those objects are persistent and less likely to be allocated casually, so less likely to be garbage the GC would want to be involved in anyway).

I always say "good" or "bad" is true in a context. It's not true that side effect is good or bad in the abstract, it's a property of how it engages the ecology of other operations and processes.

And, Ramin, the abolishing of mutable variables has other intangible expressional costs, so it's not a simple no-brainer. But yes, if people are locked into a mindset that says such changes couldn't improve performance, they'd be surprised. Ultimately, I prefer to design languages around how people want to express things, and I like occasionally doing mutation even if it's not common, so I like languages that allow it and don't mind if there's a bit of a penalty for it or if one says "don't do this a lot because it's not aesthetic or not efficient or whatever".

To make a really crude analogy, one has free speech in a society not to say the ordinary things one needs to say. Those things are favored speech regardless because people want a society where they can do ordinary things. Free speech is everything about preserving the right to say things that are not popular. So it is not accidental that there are controversies about it. But it's still nice to have it in those situations where you're outside of norms for reasonable reasons. :)

@kentpitman
> Ultimately, I prefer to design languages around how people want to express things, and I like occasionally doing mutation even if it's not common, so I like languages that allow it and don't mind if there's a bit of a penalty for it or if one says "don't do this a lot because it's not aesthetic or not efficient or whatever".

Me too -- although I remain open to possibilities. Usually such want me to switch paradigms, though, not just add to my toolbox.

@ramin_hal9001 @screwlisp @wrog @cdegroot

β€œthe abolishing of mutable variables has other intangible expressional costs, so it’s not a simple no-brainer.”

@kentpitman I prefer the term β€œconstraint” to β€œexpressional cost,” because constraints are the difference between a haiku and a long-form essay. For example, I am very curious what the code for the machine learning algorithm that trains an LLM would look like expressed as an APL program. I don’t know, but I get the sense it would be a very beautiful two or three lines of code, as opposed to the same algorithm expressed in C++ which would probably be a hundred or a thousand lines of code.

Not that I disagree with you, on the contrary, that is why I was convinced to switch to Scheme as a more expressive language than Haskell. I like the idea of starting with Scheme as the untyped lambda calculus, and then using it to define more rigorous forms of expression, working your way up to languages like ML or Haskell, as macro systems of Scheme.

@dougmerritt @screwlisp @wrog @cdegroot

@ramin_hal9001

I'm not 100% positive I understand your use of constraint here, but I think it is more substantive than that. If you want to use the metaphor you've chosen, a haiku reaches close to theoretical minimum of what can be compressed into a statement, while a long-form essay does not. This metaphor is not perfect, though, and will lead astray if looked at too closely, causing an excess focus on differential size, which is not actually the key issue to me.
I won't do it here, but as I've alluded to more than once I think on the LispyGopher show, I believe that it is possible to rigorously assign cost to the loss of expression between languages.

That is, that a transformation of expressional form is not, claims of Turing equivalence notwithstanding, cost-free both in terms of efficiency and in terms of expressional equivalence of the language. It has implications (positive or negative) any time you make such changes.

Put another way, I no longer believe in Turing Equivalence as a practical truth, even if it has theoretical basis.

And I am pretty sure the substantive loss can be expressed rigorously, if someone cared to do it, but because I'm not a formalist, I'm lazy about sketching how to do that in writing, though I think I did so verbally in one of those episodes.

It's in my queue to write about. For now I'll just rest on bold claims. :) Hey, it got Fermat quite a ways, right?

But also, I had a conversation with ChatGPT recently where I convinced it of my position and it says I should write it up... for whatever that's worth. :)

cc @screwlisp @wrog @dougmerritt @cdegroot

@kentpitman
> That is, that a transformation of expressional form is not, claims of Turing equivalence notwithstanding, cost-free both in terms of efficiency and in terms of expressional equivalence of the language. It has implications (positive or negative) any time you make such changes.

I hope everyone here is already clear that "expressiveness" is something that comes along on *top* of a language's Turing equivalence.

Indeed Turing Machines (and pure typed and untyped lambda calculus and SKI combinatory calculus and so on) are all *dreadful* in terms of expressiveness.

And for that matter, expressiveness can be on top of Turing incomplete languages. Like chess notation; people argue that the algebraic notation is more expressive than the old descriptive notation. (People used to argue in the other direction)

@ramin_hal9001 @screwlisp @wrog @cdegroot

@dougmerritt @kentpitman @ramin_hal9001 @screwlisp @cdegroot

[..it's possible I'm missing the point, but I'm going to launch anyway...]

I believe trying to define/formalize "expressiveness" is roughly as doomed as trying to define/formalize "intelligence". w.r.t. the latter, there's been nearly a century of bashing on this since Church and Turing and we're still no further along than "we know it when we see it"

(and I STILL think that was Turing's intended point in proposing his Test, i.e., if you can fool a human into thinking it's intelligent, you're done; that this is the only real test we've ever had is a testament to how ill-defined the concept is...)

1/11

@dougmerritt @kentpitman @ramin_hal9001 @screwlisp @cdegroot

The point of Turing equivalence is that even though we have different forms for expressing algorithms and there are apparently vast differences in comprehensibility, they all inter-translate, so any differences in what can utltimately be achieved by the various forms of expression is an illusion. We have, thus far, only one notion of computability.

(which is not to say there can't be others out there, but nobody's found them yet)

2/11

@dougmerritt @kentpitman @ramin_hal9001 @screwlisp @cdegroot

I believe expressiveness is a cognition issue, i.e., having to do with how the human brian works and how we learn. If you train yourself to recognize certain kinds of patterns, then certain kinds of problems become easier to solve.
... and right there I've just summarized every mathematics, science, and programming curriiculum on the planet.

What's "easy" depends on the patterns you've learned. The more patterns you know, the more problems you can solve. Every time you can express a set of patterns as sub-patterns of one big super-pattern small enough to keep in your head, that's a win.

I'm not actually sure there's anything more to "intelligence" than this.

3/11

@dougmerritt @kentpitman @ramin_hal9001 @screwlisp @cdegroot

I still remember trying to teach my dad about recursion.

He was a research chemist. At some point he needed to do some hairy statistical computations that were a bit too much for the programmable calculators he had in his lab. Warner-Lambert research had just gotten some IBM mainframe -- this was early 1970s, and so he decided to learn FORTRAN -- and he became one of their local power-users.

Roughly in the same time-frame, 11-year-old me found a DEC-10 manual one of my brothers had brought home from college. It did languages.

Part 1 was FORTRAN.
Part 2 was Basic.

But it was last section of the book that was the acid trip.

Part 3 was about Algol.

4/11

@dougmerritt @kentpitman @ramin_hal9001 @screwlisp @cdegroot

This was post-Algol-68, but evidently the DEC folks were not happy with Algol-68 (I found out later *nobody* was happy with Algol-68), so ... various footnotes about where they deviated from the spec; not that I had any reason to care at that point.

I encountered the recursive definition of factorial and I was like,

"That can't possibly work."

(the FORTRAN and Basic manuals were super clear about how each subprogram has its dedicated storage; calling one while it was still active is every bit an error like dividing by zero. You're just doing it wrong...)

5/11

@dougmerritt @kentpitman @ramin_hal9001 @screwlisp @cdegroot

Then there was the section on call-by-name (the default parameter passing convention for Algol)

... including a half page on Jenson's Device, that, I should note, was presented COMPLETELY UN-IRONICALLY because this was still 1972,

as in, "Here's this neat trick that you'll want to know about."

And my reaction was, "WTFF, why???"

and also, "That can't possibly work, either."

Not having any actual computers to play with yet, that was that for a while.

Some years later, I got to college and had my first actual programming course...

6/11

@dougmerritt @kentpitman @ramin_hal9001 @screwlisp @cdegroot

... in Pascal.. And there I finally learned about and was able to get used to using recursion.

Although I'd say I didn't *really* get it until the following semester taking the assembler course and learning about *stacks*.

It was like recursion was sufficiently weird that I didn't really want to trust it until/unless I had a sense of what was actually happening under the hood,

And THEN it was cool.

7/11

@dougmerritt @kentpitman @ramin_hal9001 @screwlisp @cdegroot

To the point where, the following summer as an intern, I was needing to write a tree walk, and I wrote it in FORTRAN β€” because that's what was available at AT&T Basking Ridge (long story) β€” using fake recursion (local vars get dimensions as arrays, every call/return becomes a computed goto, you get the idea…) because I wanted to see if this *could* actually be done in FORTRAN, and it could, and it worked, and there was much rejoicing; I think my supervisor (who, to be fair, was not really a programmer) blue-screened on that one.

And *then* I tried to explain it all to my dad...

8/11

@dougmerritt @kentpitman @ramin_hal9001 @screwlisp @cdegroot

You may say that untyped lambda calculus and SKI combinatory calculus and so on) are all *dreadful* in terms of expressiveness, and I will probably agree,

... but it also seems to me that Barendregt got pretty good at it.

I'm also guessing TECO wouldn't have existed without there being people who managed to wrap their brains around it and found it to be expressive and concise. I myself never got there (also never really tried TBH),

... but at the same time, it's *still* the case that if I need to write a one-liner to do something, chances are, I'll be doing it in Perl, and I've heard people complain about *that* language being essentially write-only line-noise.

10/11

@wrog
> I'm also guessing TECO wouldn't have existed without there being people who managed to wrap their brains around it and found it to be expressive and concise. I myself never got there (also never really tried TBH),

I'm one of those people, BTW. My proof is that I wrote a closed-loop stick figure ASCII animation juggling three balls.

As with any complex TECO thing, the resulting code was write-only -- and that was always the problem with even mildly powerful TECO macros.

Perl at its worst can be described as write-only line noise, yes, but in my experience is *STILL* better than TECO!

I am indeed fortunate to be able to stick with Emacs and Vi.

@kentpitman @ramin_hal9001 @screwlisp @cdegroot

@dougmerritt @wrog @ramin_hal9001 @screwlisp @cdegroot

TECO was a necessary innovation under word-addressed memory. With 36 bits per word, you couldn't afford that much space for an instruction. 5 7-bit bytes (with a bit left over) 8n one word was a lot more compact than an assembly instruction. With only 256 KW (kilowords) total addressable in 18 bits, you had to get all the power packed in you could. And we didn't have WYSIWYG yet, and most computer people couldn't type. So it would make a lot more sense to you if you were doing hunt and peck with almost no visibility into what you're changing. Typing -3cifoo$$ to mean go back three characters and insert foo and show me what the few characters around my cursor look like was extremely natural in context. That it became a programming language was a natural extension of that so that you didn't have to keep typing the same things over and over again.

@dougmerritt @wrog @ramin_hal9001 @screwlisp @cdegroot

In effect, a Q register, what passed for storage in TECO, was something you can name in one bite. So 1,2mA meaning call what's in A with args 1 and 2 was a high-level language function call with two arguments that fit into a single machine word. Even the PDP-10 pushj instruction, which was pretty sophisticated as a way of calling a function, couldn't pass arguments with that degree of compactness.

@kentpitman @dougmerritt @wrog @ramin_hal9001 @screwlisp @cdegroot

Yes, right. To all that. One minor point is that the PDP-6/10 had a byte-addressing instruction that was pretty weird (overkill in flexibility, like every PDP-6/10 instruction). So that data packing wasn't all that unreasonable.

I showed up to the TECO world in Jan. 1973 with a gofer programming gig in the Macsyma group. The Datapoint terminals were already there, so I missed the pre-(almost)WYSIWYG days.

@djl
Lucky you; I went through teletypes, and then glass terminals lacking cursor control, before finally being in an environment with cursor control terminals capable of WYSIWYG -- and at that, it was pretty random back then who had heard the pro-WYSIWYG arguments and who had not, so...

@kentpitman @wrog @ramin_hal9001 @screwlisp @cdegroot

@dougmerritt @djl @wrog @ramin_hal9001 @screwlisp @cdegroot

For those looking on who might not know these terms, teletypes had paper feeding through and mostly did only output that was left-to-right and then fed that line and then did not back up ever to a previous line. They were also loud and clunky, mostly, and had keyboards that had keys you had to press way down in order to get them to take.

Glass terminals were displays that could only do output to the bottom line of the screen, kind of like a paper terminal but without the paper. Once it scrolled up, you couldn't generally scroll back down. But that's why it might sound like it would have cursor control but did not yet.

@kentpitman
Yes, and to clarify your final two sentences, the *display* scrolled up with each additional line emitted -- the *cursor* could never scroll up.

In my environment at Berkeley, these were Lear Siegler ADM 3 terminals. The slightly later ADM 3a terminals finally allowed the cursor to be moved around at will (although they didn't have any fancier abilities, unlike still later devices).

Thanks for thinking to explain what I did not.

@djl @wrog @ramin_hal9001 @screwlisp @cdegroot

@kentpitman @dougmerritt @wrog @ramin_hal9001 @screwlisp @cdegroot

The datapoint terminals were _almost_ wysiwyg: they didn't have a cursor, so the TECO of the time inserted "/\" in the text displayed, and you could insert text there, delete the next character and the like.

But TECO allowed you to change the "/\" to whatever you liked, so if you left your terminal, someone would change that to "/\Foo is loser" and Foo wouldn't be able to delete that text from Foo's file...

@dougmerritt @kentpitman @wrog @ramin_hal9001 @screwlisp @cdegroot

Yes. I missed the teletype round. Sort of. Father was site engineer for one of the early LINC 8 installations, and later a PDP-7 installation, and they had teletypes. They.Were.Horrid.

Peter Belmont (later Ada developer) tried to persuade me to do programming, but I was busy doing other things. The IBM card puches had really sweet keyboards, though.

@dougmerritt @kentpitman @wrog @ramin_hal9001 @screwlisp @cdegroot

I've been through 17 or so environments, and I was always able to find an editor that could be persuaded to act the way I wanted: CCA, NEC, AT&T and even Word for MS-DOS.

Hilariously, Word for Windows defeated me. There was no way to persuade it to act as a civilized text editor, so I acquired the source code to WordPad and implemented my usual TECO macros in C++, and used that for 20 years or so.

@djl
Hey, you want what you want.

Also: spoken like a true hacker. "I will bend the universe (of computing) to my will!"

@kentpitman @wrog @ramin_hal9001 @screwlisp @cdegroot