I know how to model a state machine with a flow chart, it make sense.

But I’m designing an API to control websockets and manage a web websocket service via HTTP similar to pushpin. So I have at least 4 state machines. But then there’s the events and commands that get sent around that change state and I’m at a complete loss of how to model that.

I can muddle through in my head but, I want to prove it out somehow. Anyone have any experience modeling state in a distributed system?

I have at least 4 types of actors.
- API Gateway and it's connection state
- The lambda’s it executes when connections have activity (open, message, disconnect)
- Your application getting a http request to notify you of events and get commands
- and the commands you send to the control api anytime

The last three could have any number of them running at a time

And then I got quirks of api gateway which suck for application developers (one reason I'm not using it directly and building this service)

You have an open callback, and you can reject the connection or accept it, by the return of the callback. So you can't actually do any work (eg, sending messages) in this callback and you have to do the work after it returns. But how do you know if it's returned?

So you have to queue the work and poll to see if the connection exists and then do the work. Its possible the connection has been closed too, so you need to poll for a timeout before giving up and throwing away the work.

So the open callback needs an application response to know if it should accept the connection and now I'm in this weird state where without a queue I can't do reasonable things like on the application side;

```
accecpt()
subscribeToChannel(foo)
sendMessageToChannel(foo, 'new connection!')
```
And have the new connection actually get the message

I can do silly things like always accept the connection and queue the callback to the application, that isn't horrible tbh but breaks access control and now needs locking or queueing if any messages come in before the the now non-blocking open event happens.

I don't like it.

Or I can force the API to not do any operations but accept the connection on open events (maybe set metadata), but in this distributed system world, I can't actually guarantee that a call to the control api would happen after the connection would happen after the connection was accepted.

```
accept() // return back to the open callback
subscribeToChannel() // new event to the control api
sendmessageToChannel() // I hope api gateway knows I accepted it
```

Latencies inside and outside of datacenters would mean it should probably work. Maybe I can paper over it with retries? or a polling job? 🤔
I think marking a connection in my db as "recently opened" and giving it a few seconds grace to stop 404ing in api gateway might be enough here. Will report back.
And of course I now have TLAᐩ open in a tab http://lamport.azurewebsites.net/tla/tla.html
The TLA+ Home Page

I convinced myself I knew how to fix the immediate problem while ignoring my initial query which was to better understand the entire problem. So back to TLA+

This is so retro but also somehow on the cutting edge. I'm reading help docs in a framed layout with linked mp4s with pdf transcripts and zip files! It's .. helpful!?

Lol Professor Lamport just scolded me for not following his instruction to hit "how to play" in the web player, I notice that's not in the transcript 😂

Link here if you're interested https://lamport.azurewebsites.net/video/intro.html

This is kind of refreshing from marketing landing pages. I guess this is comp sci academia at it's finest?

It's the "or some other method" that gets me here though. I also loled at "I don’t know exactly what else TLA+ is good for."

Besides #tlaplus it seems Lamport invented Paxos!? I feel like an idiot not realizing who he was sooner. https://en.wikipedia.org/wiki/Paxos_(computer_science)
Paxos (computer science) - Wikipedia

And I'm being told I should stop the video and read his Time Clock Systems paper http://lamport.azurewebsites.net/pubs/time-clocks.pdf

I think this video is a masterclass in a good straightforward video that makes clear points and communicates value. I'm enjoying the hell out of this class. (It's a class, no other way to do describe it)

I might as well drop this video here, but you should go to the lectures page for full context https://lamport.azurewebsites.net/video/videos.html

TLA+ Video Course

He wears a lot of hats

So I solved lecture 4 which is a program to solve the Die Hard 3 Jug riddle in TLA+

I stuck big != 4 (which is written "big # 4" or "big /= 4" because math) as an invariant and it failed on a path that got us to 4 gallons!

This is really neat!

and look a pretty pdf!

I did a lesson this morning before work. This escalated quickly. The amount of math concepts I never learned is astounding.

Programming looks like math and uses a lot of the same words but they mean all sorts of different things.

I knew "discrete math" had a lot more too it when I took it in school and oh boy it does. I think what I'm learning falls under that.

I don't know actual notation so bear with me.

In programming you have a `Map<key,value>` (Lamport kept calling it a set or array but my model of that doesn't fit) and in math you have a `function(key)=value` which sort of makes sense, but you treat the function like an array, and the key space is called a domain, and the domain doesn't have to be materialized, like the key could be "0 - infinity" or any range you can conceptualize.

@reconbot sounds like a job for cloudformation and/or step functions (provisioning part) but maybe I misunderstood

@brianleroux I could use the step functions for the polling but I might do it in lambda since it's gonna be rare I actually hit this issue and can mark the specific connection id as "recently opened" and have my command processing retry 404s for a few seconds.

Anyway that's why I want to explore modeling and proofs =p

I'm using a CDK stack and an npm library - really early not for use yet
https://github.com/reconbot/slip-sockets

GitHub - reconbot/SLIP-Sockets: A serverless GRIP/Pushpin like WebSocket server for AWS API Gateway

A serverless GRIP/Pushpin like WebSocket server for AWS API Gateway - GitHub - reconbot/SLIP-Sockets: A serverless GRIP/Pushpin like WebSocket server for AWS API Gateway

GitHub
@reconbot you mean the websocket connection? better to always accept and then close with an error code when the callback returns.
@jed But it could send messages before the close hits, I'd need to track state about it and ignore or queue messages sent too soon?
@reconbot by it do you mean the client on the other side of the websocket? yeah, api gateway sucks for this. if you control the client ideally you’d wait until a “ready” event is sent or something, or close with error if messages are sent.
@jed it sure does =)
@reconbot cloudflare durable objects are a breath of fresh air here, fwiw. it's Just JavaScript™️.