Mastodawn

Show thread

Leandro Nov 14, 2022

Now on its own it isn't very interesting, its just a small AST for data structures.

The magic is that this is really an ✨intermediate representation✨

A shared interface really, between the ser and de parts of serde.

Show thread

Leandro Nov 14, 2022

So for the serialization use-case on one side, we have a tiny @@deriving that generates a function (like shown in the picture) that turns your data into this IR.

Show thread

Leandro Nov 14, 2022

And on the other side we have modules that consume this IR and spit out other formats like JSON, s-expressions, TOML, KDL, XML, what have you.

I haven't sketched any of those yet enough to show a screenshot.

Show thread

Leandro Nov 14, 2022

fml why did i choose metaquot. this is going to take a little bit. bear with me.

Show thread

Leandro Nov 14, 2022

yeah, no. these meta-quotation libraries are unusable to me. I'll need tons more documentation and examples on how to do common things. I'm getting too spoiled by the amazing efforts in Rust to keep libraries well documented.

Going back to ast_builder.

Show thread

Leandro Nov 15, 2022

ocaml i am so angry at you right now

Show thread

Leandro Nov 15, 2022

seriously, at some point this is just smashing the keyboard until the types magically fit

Show thread

Leandro Nov 15, 2022

so i got a deriver for serializers into the serde.ml language somewhat working, and this is already making it so much eaiser to test stuff -- viva la #metaprogramming!

#ocaml #rust #serde

Show thread

Leandro Nov 15, 2022

(yes, i know this is outputting cursed ocaml right now, bear with me)

Show thread

Leandro Nov 15, 2022

fork yes! its valid #ocaml now. i think.

it also derives:

* newtypes
* tuples
* variants (unit, tuple, and record ones)

and knows how to follow modules/etc

ok, will add support for records next

Show thread

Leandro Nov 15, 2022

ofc you'd never see any of this code, its all generates for you, what you see is this:

Show thread

Leandro Nov 15, 2022

oh no, what did i break now

Show thread

Leandro Nov 15, 2022

friggin' brew. we're back on track. I wrote a sexpr serializer to iron out the serializer implementator interface, and it feels good!

(minus all the explicit first-class module signatures type escaping)

Show thread

Leandro Nov 15, 2022

the important bits is that this is what you normally write as a user:

type t = { ... } [@@deriving serializer]

and then you can call

t |> serialize_t |> Serde_sexpr.to_string

wanna go to json? just change to Serde_json.to_string

maybe someone will finally implement a deriving(debug) with this!

Show thread

Leandro Nov 15, 2022

but for authors of serializers, there's a few things that always get in the way:

1. error handling (esp. over collections)

2. mapping over the serde.ml data definition language

3. parametrizing over the kind of output you want to have (usually a string, but could be a buffer/channel that goes straight to stdout/disk/network)

Show thread

Leandro Nov 15, 2022

all of those are provided by the Ser.Mapper interface, which is created at runtime based on the definition of the serializer you are using.

See this cursed module: https://github.com/ostera/serde.ml/blob/main/serde/ser.ml#L163-L185

serde.ml/ser.ml at main · ostera/serde.ml

Serialization framework for OCaml. Contribute to ostera/serde.ml development by creating an account on GitHub.

GitHub

Show thread

Leandro Nov 15, 2022

IT'S ALIVE! 🧟🧙‍♂️✨

https://github.com/ostera/serde.ml

#ocaml #rust #serde #foss #buildinpublic

GitHub - ostera/serde.ml: Serialization framework for OCaml

Serialization framework for OCaml. Contribute to ostera/serde.ml development by creating an account on GitHub.

GitHub

Show thread

Leandro Nov 15, 2022

uuuuugh yojson why don't you have a clean clear exported ast WHY

folks, we need more _type-only_ packages. like we have for http type defs.

in any case, here we go: one line change and we got serialization to json.

sweet.

Show thread

Leandro Nov 15, 2022

i heard you also liked xml

Show thread

Leandro Nov 15, 2022

the beauty of having an abstract language for describing OCaml data, is that concrete data formats become fully pluggable.

One serializer, Many formats.

Just from yaml, to json, to protobuf in one line.

Show thread

Leandro Nov 15, 2022

Ok, time for food. When I'm back we'll go for the Deserialize path which will be harder methinks.

Let me know what other data formats you'd like to see serialized, and go check out the code here: https://github.com/ostera/serde.ml

GitHub - ostera/serde.ml: Serialization framework for OCaml

Serialization framework for OCaml. Contribute to ostera/serde.ml development by creating an account on GitHub.

GitHub

Show thread

Leandro Nov 15, 2022

Okay back from food and a @anmonteiro retweeted so this is how I know I need to get back to it.

What does it take to deserialize? Let’s see how #rust #serde does it.

Show thread

Leandro Nov 15, 2022

in short, we need:

1. a deserializer, that turns bytes into serde.ml data

2. a visitor/fold, to transform to turn serde.ml data into our custom type

3. a deriver, to generate visitors from types

Show thread

Leandro Nov 15, 2022

so i think i'll start by making an interface for deserializers.

the idea here is that you tell serde how to deserialize these specific things from your data format, and then a visitor drives the work, only really deserializing what's needed

Show thread

Leandro Nov 15, 2022

something like this -- you get a visitor passed in, a final value type, and you get to choose per-data-format what's the best way to consume the input

#ocaml #rust #serde

Show thread

Leandro Nov 15, 2022

actually screw this, we'll start by manually expanding what a deserailizer would look like.

this will give me a much better idea of what the visitor/reader/deser interface needs to be.

Show thread

Leandro Nov 15, 2022

(also gives me an excuse to read even more serde code muahah)

Show thread

Leandro Nov 15, 2022

oof, long hours #ocaml-ing here...I think I've got another ~2 hours in me before I get sleepy again. Thank you accidental nap!

The deserializer and visitor interfaces are coming along nicely! 👌🏼✨

Show thread

Leandro Nov 15, 2022

If you were manually implementing a deserializer, this is how you'd use it to deserialize an identifier when you're coming from some data into a record with fields Name, Role, and Clearance.

Thankfully there will be a [@@deriving deserializer] too 🤩

#ocaml #serde

Show thread

Leandro Nov 16, 2022

okay making some coffee then we're back. last night we got here:

* create a visitor for a small variant
* create a visitor for tags
* thread these into `Serde_sexpr` transparently

Show thread

Leandro Nov 16, 2022

still a ton to figure out, but at least the structure is starting to fall into place!

now i'll brew the cofffee and thread this Variant_visitor into an implementation of the Variant_access_intf so that we actually use it.

code here: https://github.com/ostera/serde.ml

brb

#ocaml #rust #buildinpublic #devtools #serde

GitHub - ostera/serde.ml: Serialization framework for OCaml

Serialization framework for OCaml. Contribute to ostera/serde.ml development by creating an account on GitHub.

GitHub

current status

guess what this module does

Show thread

Leandro Nov 16, 2022

tried for a minute to build a reader that abstracts over the source of the data it reads (eg. string, file, buffer, what have you) and omg no

Show thread

Leandro Nov 16, 2022

the sheer amount of #TypeLevel fuckery I have to do to get these modules to play along nicely and errors to surface tidily is worrying.

makes me really miss #rust traits for

In short. I’ve got a Reader interface, a Formar Deserializer, and a Visitor interface.

Together they create a Deserializer for a specific format and specific datatype over a specific source of data.

Show thread

Leandro Nov 16, 2022

The Reader offers a basic low level next/peek/skip API over some data. So you can build a Reader for Bytes, String, In_channel, or whatever data you chunk bytes out of. Nom nom nom.

When you build a Format Deserializer (like something to parse JSON), you need a Reader. But you don’t want to read everything! That would be too inefficient. You want to only read the things you need to build up a specific datatype.

That’s where the Visitor comes in. You pass one to the Format Deserializer.

Show thread

Leandro Nov 16, 2022

The Visitor itself decides what to do with specific pieces of data. Like what happens if we find an Int, or a Tuple, or a Variant Record.

This is specifically built for your datatype, so it knows about the fields your record needs, and the different constructors of your variant, so it can guide the deserializer forward.

It does this by receiving, in specific cases, tiny modules that offer further deserializing options. Like Variant_access (yes we do #JiraffeCase in #ocaml).

Show thread

Leandro Nov 16, 2022

So when you visitors “visit_variant” function is called, you can peek at the variant tag name with Visitor_access.tag, and then depending on what kind of variant you know you have, you can call:

Visitor_access.unit_variant

Visitor_access.tuple_variant ~len

Visitor_access.record_variant ~fields

Each of these takes different arguments, and then drives the deserializer towards to read exactly what you’re expecting, and nothing more.

Show thread

Leandro Nov 16, 2022

Operation wise this can be extremely efficient (as shown by #rust #serde, but I’m yet to benchmark this on #ocaml and my implementation is very naive anyway) since we only really do the minimum amount of work we need.

And yes this means that hand writing a Visitor is quite a bit of work.

Fortunately we can derive them from your types! Viva la #metaprogramming!

Show thread

Leandro Nov 16, 2022

Anyway, I’m getting close to having a manual visitor running for a subset of S-expressions (variants).

What I’m afraid of is that there is no clear uniform interface for futures or readers (buffered or otherwise, sync or async) in the ecosystem, which makes it impossible to reuse parsers in other libraries (yojson, sexplib, tyxml, etc) out of the box.

Some say this is a feature. But in this case it looks like I’m going to have to borrow a lot of code to make these #parsers fit into #serde.ml

Show thread

Leandro

okay, #opam sucks terribly -- i am constantly accidentally installing wrong versions of things. You wanna install a new tool and it downgrades a bunch of stuff. Not cool!

BUT #dune test --watch is so far an excellent testing experience for compiled languages.

makes #ocaml feel light and dynamic and i love it

Show thread

Leandro Nov 17, 2022

fml the variant access visitors don't have enough information to know _which_ variant you want

and when they go through the sequence deserailization, the sequence visitor also has no clue what you want

i have got to be missing something there

Show thread

Leandro Nov 17, 2022

nevermind, got it!

Problem was that I was recursing over the same Visitor, whereas I really need to generate custom visitors per variant, so I can just tell the framework. "please visit this tuple variant using Field_Tuple1_visitor" instead.

Then that visitor knows exactly how to deserialize that specific variant.

```ocaml
| Tuple1 of string
```

This is a PITA to write manually, but we'll codegen all this stuff.

Show thread

Leandro Nov 17, 2022

alright we're on! 🚀 hand-rolled deserializer can read empty variants and variants with tuples.

Just need to figure out how to get Sexplib to always print quotes around strings...because why wouldn't you. I'm sure there's a good reason to save those 2 bytes, but it is definitely getting in my way right now.

Next up: variant records!

Show thread

Leandro Nov 17, 2022

aaand #serde variant record support is here too #ocaml can be so nice 🐪

a little taste of transcoding too, since we can read from X and use serde to go into Y, y'all automatically get:

sexpr -> json
sexpr -> xml
json -> xml
json -> sexpr
etc

the more formats we write, the more we can do!

okay now lets try to do some codegen

Show thread

Leandro Nov 17, 2022

ast_builder we meet again

Show thread

Leandro Nov 18, 2022

slowly getting there! using #ocaml #ppx can be tedious, but at least you know that the ASTs you build aren't complete gibberish.

Here we are generating some helper values (like a string version of the type name and the variants) and our first visitor for tags.

Show thread

Leandro Nov 18, 2022

oh just thought of a great use case for serde.ml -- serde-js!

And say goodbye to hoping your `external` is type-safe 😹

(yes i know of decco and bs-json)

Show thread

Leandro Nov 18, 2022

aaaand we've got the first derived #serde #deserializer! 🤩

Viva la #metaprogramming!

#ocaml #rust #ppx

Show thread

Leandro Nov 18, 2022

alright, enough for tonight. here's the repo with an ok readme: https://github.com/ostera/serde.ml

let me know what you think folks! thoughts, reviews, contributions welcome.

Would be fun to see more data formats (avro, cbor, yaml, toml, messagepack, postcard, etc), and also would love to get a small benchmarking thing going.

Show thread

Leandro Nov 19, 2022

added support for #records abstract types (aliases, tuples), and working now on getting applied types and generics:

type 'x pair = 'x * 'x
type v2f = float pair
[@@deriving deserializer]

#ocaml #serde is coming along!

Show thread

Leandro Nov 19, 2022

One issue with this approach is that we don't have type-driven resolution of methods (a-la #rustlang traits, #haskell typeclasses, or modular implicits), so automatic generics would require some user input: what serializer should we use for these types?

It is still type-safe, since we'll just fail to deserialize values if you give us the wrong deserializer, but it isn't nearly as ✨magical✨

modular implicits pls land 😭

Show thread

Leandro Nov 19, 2022

okay, putting a puase here. will instead work on the #serde #json deserialization, lets see if we can reuse #yojson streaming parser!

Show thread

Leandro Nov 19, 2022

so I dropped the first-class module approach for the Reader interface and went with a GADT.

this looks alright, and not that much #typelevel magic going on, just an #existentialtype.

Show thread

Leandro Nov 19, 2022

this actually started with @c_cube recommending to reuse the Yojson #json #parser, which is using a generated lexer, so it keeps its own sort of state and wouldn't be possible to reuse with an interface like this.

And then I just threaded a custom state throughout a Format Deserializer, which means anyone can pick whatever way of reading content they want.

Which yes means #yojson gets its own lexer_state type threaded.

Now if only Yojson gave me a clear peek/drop/next API, that'd be dope.

Show thread

Leandro Nov 19, 2022

another small improvement -- we've got type-aliased module types, so your signatures now can look like normal types:

type state.
state Deserializer.t ->
state Visitor.t->
result

Instead of this MONSTROSITY:

type value.
state ->
(module Deserializer with type state = state) ->
(module Visitor.Intf with type value = value) ->
result

(don't @ me)

Show thread

Leandro Nov 20, 2022

this is what a derived map visitor (for constructing records and arbitrarily sized map-like collections) will look lie.

This allows us to parse JSON objects into records by looking _exactly_ at the keys we care about, and reading those _directly_ into the final values rather than building up large generic JSON data that we shove into types later.

Show thread

Leandro Nov 20, 2022

Having multiple visitors derived automatically also means that we can read data in the same format (eg. JSON), but different representation.

For example, by default I was working with _sequences_ since I already had the Sequence_access module working. This meant that Records were first possible to construct from a _packed_ representation that doesn't have any keys (eg. just an array of values).

This can save tons of space over the wire and make your app more #responsive and boost #performance!

Show thread

Leandro Nov 20, 2022

aaand we have derived map visitors for #ocaml #serde, so we're one step closer to deserializing json

Show thread

Leandro Nov 20, 2022

yes this is the expanded code from a single `[@@deriving deserializer]` call, but let me do variants now.

variants are good.

Show thread

Leandro Nov 20, 2022

aaand the derived deserializer for #ocaml records works.

viva la #metaprogramming! 👏

Show thread

Chas Emerick Nov 17, 2022

@leostera Strongly suggest using a lockfile, or at least set version minimums in your primary opam file; doing either eliminates this kind of problem

Show thread

Leandro Nov 17, 2022

@cemerick do lockfiles automatically make a switch workspace specific? 🤔

For ex. how would opam know to follow a lockfile when I just go `opam install ocamlformat` in a new shell/other dir?

Show thread

Chas Emerick Nov 20, 2022

@leostera (sorry, missed this reply)

but no, lockfiles are all about _removing_ workspace-specific difficulties (the particular state of your current switch and installed packages are why doing a new one-off install is triggering undesirable downgrades, etc)

opam will always use a lockfile if present as long as you include `--locked` in any relevant commands (`install`, `switch create`, etc). IIRC, the plan is to make `--locked` the default in some future release