Mastodawn

Irenes (many)23h ago

we gave in to the urge to start writing a text editor https://code.irenes.space/ivy

(it doesn't edit anything yet)

ivy - Warm, friendly modal text editor for the terminal.

Irenes (many)23h ago

this is our first time using the Rust smol library, which seems quite nice. pleasingly, there isn't some big war between the authors of different rust async runtimes; rather, roughly the same group of authors wrote first tokio, then async-std, then most recently smol. this last one refactors the whole thing into a bunch of tiny, loosely-coupled libraries; smol itself is just a shorthand to import a few of those libraries at once. so that's pretty neat.

Irenes (many)23h ago

thus far we have two direct dependencies and 35 transitive ones, which we're pretty pleased with, that seems nice and small to us

Irenes (many)19h ago

we have partially-implemented versions of the hjkl commands pushed, now. it's time to add an abstraction we've been really looking forward to, a helper that handles movement commands...

so in case you're keeping track of how long a project of ours can exist before we feel the need to use async closures, the answer is about three hours, 40 minutes

Irenes (many)16h ago

neat. we found and fixed a bug in our function that iterates through the file and keeps track of byte offsets to each line. it wasn't properly handling empty lines.

... see, finding a bug like that feels like progress to us because it demonstrates that the abstraction is doing the things we think it is, and when it fails it was just a minor tweak needed

Irenes (many)16h ago

like it makes us more confident of the approach than we were before

(we are way over-read on text editor implementation strategies, like in our body's early 20s our system read dozens of papers about it, so it's not like we really need more confidence, but hey)

Irenes (many)16h ago

(we're going to eventually use a buffer gap, but right this moment it's just a single consecutive buffer)

Irenes (many)15h ago

we definitely want to eventually support files larger than can fit in memory (so, like, in the hundreds of gigs)

not soon, but eventually

though it may be easier to get that into the architecture early on, rather than retrofitting it.. hm. well, we'll chew on that

Irenes (many)15h ago

technology has advanced since the last time we seriously tried writing an editor, and we clearly do not need to support files larger than 2^64 bytes, so we won't try to :)

Irenes (many)15h ago

(for you young ones: pointers used to be 32-bit!!! in fact, they used to be smaller, but by the time we were learning languages that use pointers, they were 32-bit)

Irenes (many)15h ago

(these days, unless you're on a microcontroller, you can take whatever size a pointer is on the platform and safely assume you will never need to describe a size or offset of anything that won't fit in it. that was not always true.)

Irenes (many)13h ago

not gonna lie, the continued progress seen at https://code.irenes.space/ivy/log/ feels really good

when we were young, jumping into a new project for a day or a week used to be really easy. we can do it for work just fine, but in recent years we've really struggled to channel intrinsic motivation for this sort of thing long enough to actually get anywhere

ivy - Warm, friendly modal text editor for the terminal.

Irenes (many)13h ago

... which is fine; our habits are kind of time-oblivious, in the sense that we have a lot of dissociative memory stuff going on so we manage our tasks in ways that make forward progress regardless. there are projects we've finished in bursts of a couple hours every few months

but it's really nice to be properly deep in something

Irenes (many)11h ago

yay it can scroll through a file now

still doesn't do any actual editing, but it's starting to look quite solid as far as the viewing goes

Irenes (many)11h ago

we paid really close attention to what gets redrawn when. some of you may remember that conversation the other week about how terminal programs used to be good with screen readers because there was a natural efficiency need to only redraw things under active change, and then everyone stopped paying attention to that.

when our thing is more mature we're for sure planning to test how it feels out loud.

Irenes (many)6h ago

this code's looking and feeling a lot cleaner than the last time we tried to do byte-level terminal stuff that was in a project from a couple years ago that's still nominally ongoing but has been kinda stalled.

Irenes (many)6h ago

there was supposed to be a period in there, ah well :)

Irenes (many)6h ago

we may try to do the really fiddly thing, decoding terminal control sequences from a stream of input that is itself decoded characters having various encodings.

last time we stalled out on that, but we think smol's facility for Streams as the async equivalent to Iterators may be just what we need for it

Irenes (many)5h ago

it still feels absurd not being able to use BufReader (in any of its many versions, from many implementors)

notionally it solves a problem we have, but in point of fact it does not do that because POSIX stream semantics aren't just a list of bytes, the bytes have behavior over time and sometimes that matters

Irenes (many)5h ago

this is a pretty common nuance for language libraries to not handle well, it's just frustrating because BufReader is an abstraction that's gotten a lot of attention from a lot of people and yet it just does not let you do things like "read all the bytes that are present without blocking"

Irenes (many)5h ago

you can achieve that goal but to do it, you need to buffer those bytes on your own, externally

Irenes (many)5h ago

anyway we figured out the whole BufReader mess years ago for other stuff, and it's fine, we know not to go down that rabbit hole again and the stuff we've been doing instead has been working

@ireneista either a state machine or a parser that outputs linked list entries to be consumed by something else is what we would try. curious what y'all end up doing

Irenes (many)3h ago

@rho yeah... well, you ideally want to avoid allocating all those entries explicitly because it's an inner loop, and the time spent doing that will dominate the overall cost of the entire program

Irenes (many)3h ago

@rho but de facto, yes, we're going to have some form of data structure that communicates between the layers, and we're going to pretend to ourselves that maybe Rust can do very aggressive inlining and avoid allocating it most of the time (it probably can't, but finding out for sure is months of work, so it makes a plausible avenue for willful self-deception)

@ireneista The things I find myself frequently wanting are "un-read these bytes" (maybe that wouldn't be necessary if learned how to write async code instead of using a select()-loop) and "wait until the next byte or there's a timeout" (also maybe less necessary with an async API?).

The latter feels like a general-purpose thing to do? I have vague memories of writing double-click/long-press detection and wishing for "send an event after a synthetic timeout" instead of needing to manually schedule/unschedule a timer (and worrying about edge cases).

Irenes (many)5h ago

@snowfox no, you definitely do still need at least one byte of un-reading for a lot of things, even with async heh. you're not imagining that. we convinced ourselves of it the hard way.

Irenes (many)5h ago

@snowfox adding timeouts to stuff is one thing that most Rust async libraries are good at, so good news on that front

@ireneista I was thinking "async means I can keep the un-consumed bytes in a local variable instead of needing to use an instance variable" though I guess un-reading may be easier.

But if you can only un-read a single byte (ungetc doesn't guarantee any more than that) that also means you have to read a byte at a time, which seems inefficient?

I think what I really want is a way to read with MSG_PEEK (does that work on pipes?) and then tell the kernel how many bytes were consumed. That'd allow tricks like parsing a HTTP CONNECT, only consuming the "header" bytes, and passing the socket to another process, without necessitating reading 2-4 bytes at a time to make sure you don't overshoot the end-of-header.

Irenes (many)1h ago

@snowfox oh, yes you totally can use locals for that, we were just trying to avoid it for reasons that probably don't apply to you

don't worry about ungetc unless you're working in C, it's not a kernel facility, libc just has a byte of RAM in your address space that it manages, you can do the same thing yourself

unsure about MSG_PEEK. sounds slick if it works.

ferunando 11h ago

@ireneista i will come back to ask about how this is going later this year. i'm setting up myself a mostly-text setup-and-targets and will definitely want to hear more about screen-reader and terminals.

Irenes (many)11h ago

@gureito yeah definitely do circle back!

the esoteric programmer 9h ago

@ireneista the terminal programs were good with screenreaders bit is only halfway true. The only terminal apps which were really good with screenreaders were designed in such a way that the new stuff got appended to the end of the buffer. A new menu option is selected? append to end. Something about a progress bar changed? clear and write the value again, or, you guessed it, append to end. The reason why tty apps aren't very accessible with screenreaders now is that they use unicode block characters for drawing more intricate shapes, also even if they only redraw what changed, they treat the whole terminal like a screen where they can put a character anywhere, which makes the screenreader often read the whole thing. Console/tty specific screenreaders like the speakup kernel module and fenrir or tdsr, have sofisticated heuristics to somewhat deal with this, but it's quite difficult to make a complicated TTY app accessible to screenreaders

Irenes (many)9h ago

@esoteric_programmer that definitely makes sense. we specifically heard it worked okay in the context of parser-based interactive fiction, where the entire game is pretty much a transcript that only gets appended to, so that would fit with what you're saying.

the esoteric programmer 9h ago

@ireneista yeah, look at irssi for example, a lot of the interactions there are based on append to end, For a menu based tool, the thing that generates a kernel configuration file from menu options is accessible, but that's because of the same strategy. For something esoteric AF that doesn't work out of the box, archinstall is a good showcase, but here comes a heuristic of espeakup, speakup+ctrl+8 triggers a mode called highlight tracking, which manages to make sense of the stuff somewhat, I haven't looked into the espeakup code yet to see how that actually works, and considering that tty mode is going away soon enough...O well, sad

Irenes (many)9h ago

@esoteric_programmer hmmm, yes we see

@ireneista love the description 😻

Irenes (many)11h ago

@mkhl thanks! 💜

SnowFox 12h ago

@ireneista That was also true for a while in the 32-bit days! Hard disks were below 2 GB, and on some platforms max file size was 2 GB, and even after that wasn't true, surely a *text* file will never have a reason to be over 2 GB, right?

I'm not sure if any text editors did "just mmap() the whole file" though (I assume classic Mac OS can't support it at all if you don't enable virtual memory; I'm not sure about Windows 98).

Irenes (many)12h ago

@snowfox we'd be shocked if anyone took the mmap()-only strategy in those days, yeah. a megabyte was a lot of RAM.

Philippa Cowderoy 15h ago

@ireneista My first compiler had a bunch of different memory models you could choose because the target was 8086/286...

...having 16-bit code and data pointers residing in separate segments had its uses

Graham Sutherland / Polynomial 15h ago

@ireneista but what if you want to edit a hundred exabytes of json? :P

Irenes (many)15h ago

@gsuberland thus far, the largest dataset we've ever had to work with was only 160 PiB, so fingers crossed that never comes up :D

amy tech (bones)14h ago

@ireneista unsolicited, but you might also be interested in piece tables instead of buffer gap. Lots of interesting tricks you can do with them including baking undo directly into the buffer representation. Also allows you to mmap the initial load, and edits don't require loading anything new off the disk.

pho.spookygirl.boo/source/media-thing/browse/default/packages/text/src/PieceTable.ts;87712c3f45c29b96e6e6c0fdf15855f62a47f805?as=source&blame=off - a typescript version. I've got a paper lying around that describes them too.

PieceTable.ts · media-thing

Irenes (many)14h ago

@amy ah yeah it's worth considering, thanks for that

@ireneista how will you represent files in-memory? I think strings usually assume some valid encoding, and text files are crazy.

(In a former life I was the guy ppl came to with “we don’t know how to read this file, pls fix” and I’d find out parts of it were in some old Russian encoding and convert the lot to utf8)

Irenes (many)14h ago

@rudi oh, bytes, as far as that goes. we've been writing our own encoding-handling code because we care about not losing valid bytes during error recovery, and passing through invalid bytes unmodified, and stuff like that.

but that's not even the hard part, the hard part is that it's an editor and vectors are overly confining for that use.

Irenes (many)14h ago

@rudi wasn't it an amazing fun prank on all of us in the future how the various ISO-Latin encodings use the same codepoint for all the various currency symbols?

@ireneista it just makes me appreciate more what we have now. Unicode is messy because history but utf-8 is a marvel

Irenes (many)9h ago

@rudi yeah utf-8 is really good. it's great everyone got that together properly before collectively forgetting what kinds of things matter at the byte level, heh.

Irenes (many)9h ago

@rudi apologies for the negativity. you're right, it's better to be appreciative.

Tim Ward ⭐🇪🇺🔶 #FBPE 15h ago

@ireneista Arrrghh.

I once had to work on a system which thought it could keep byte offsets to each line. It got into something of a mess with things like invalid character encodings. Where the line couldn't be parsed from bytes to characters, but you still needed to know where the end of line was so that you could process the *next* line, which *probably* wasn't mangled in the same way.

Irenes (many)15h ago

@TimWardCam yeah we are very much trying to do the right thing for invalid character encodings. we're a little nervous we'll inadvertently do something Unicode-specific and make things harder for ourselves, since we're starting with just UTF-8 and broken UTF-8, but we do know our way around encodings so hopefully we'll manage to avoid that.

Tim Ward ⭐🇪🇺🔶 #FBPE 15h ago

@ireneista 😀 👍

I think the problem we had might have been to do with incompatible error handing between two of the libraries we were using - the input was supposed to be CSV which was another layer of libraries and complications (we weren't going to attempt to write our own bytes -> characters parsing).

Irenes (many)15h ago

@TimWardCam ah yep we can definitely see how that would make it significantly harder

we are doing our own bytes-to-characters stuff; we kind of feel like it wouldn't be a robust editor otherwise. that's more work, of course, but at least we get to be precise about how the weird cases are handled.

snowyfox 19h ago

.