Alright, since I'm taking the day off warp, let's do some #ocaml instead.

We'll build a #rust favorite: serde

So first things first, we'll want to define some intermediate representation, the serde.ml language.

Then we can make something that reads/writes to that from some specific format (json, toml, etc).

Then we can make something that transforms serde.ml into typed ml data.

Currently reading how the serde derived serialization works.

Okay, so this is what I've got so far.

Here's the serde.ml data model, more or less fitting the basic types you can build up (or that come with) #ocaml:

Now on its own it isn't very interesting, its just a small AST for data structures.

The magic is that this is really an ✨intermediate representation✨

A shared interface really, between the ser and de parts of serde.

So for the serialization use-case on one side, we have a tiny @@deriving that generates a function (like shown in the picture) that turns your data into this IR.

And on the other side we have modules that consume this IR and spit out other formats like JSON, s-expressions, TOML, KDL, XML, what have you.

I haven't sketched any of those yet enough to show a screenshot.

fml why did i choose metaquot. this is going to take a little bit. bear with me.

yeah, no. these meta-quotation libraries are unusable to me. I'll need tons more documentation and examples on how to do common things. I'm getting too spoiled by the amazing efforts in Rust to keep libraries well documented.

Going back to ast_builder.

ocaml i am so angry at you right now
seriously, at some point this is just smashing the keyboard until the types magically fit

so i got a deriver for serializers into the serde.ml language somewhat working, and this is already making it so much eaiser to test stuff -- viva la #metaprogramming!

#ocaml #rust #serde

(yes, i know this is outputting cursed ocaml right now, bear with me)

fork yes! its valid #ocaml now. i think.

it also derives:

* newtypes
* tuples
* variants (unit, tuple, and record ones)

and knows how to follow modules/etc

ok, will add support for records next

ofc you'd never see any of this code, its all generates for you, what you see is this:
oh no, what did i break now

friggin' brew. we're back on track. I wrote a sexpr serializer to iron out the serializer implementator interface, and it feels good!

(minus all the explicit first-class module signatures type escaping)

the important bits is that this is what you normally write as a user:

type t = { ... } [@@deriving serializer]

and then you can call

t |> serialize_t |> Serde_sexpr.to_string

wanna go to json? just change to Serde_json.to_string

maybe someone will finally implement a deriving(debug) with this!

but for authors of serializers, there's a few things that always get in the way:

1. error handling (esp. over collections)

2. mapping over the serde.ml data definition language

3. parametrizing over the kind of output you want to have (usually a string, but could be a buffer/channel that goes straight to stdout/disk/network)

all of those are provided by the Ser.Mapper interface, which is created at runtime based on the definition of the serializer you are using.

See this cursed module: https://github.com/ostera/serde.ml/blob/main/serde/ser.ml#L163-L185

serde.ml/ser.ml at main · ostera/serde.ml

Serialization framework for OCaml. Contribute to ostera/serde.ml development by creating an account on GitHub.

GitHub
GitHub - ostera/serde.ml: Serialization framework for OCaml

Serialization framework for OCaml. Contribute to ostera/serde.ml development by creating an account on GitHub.

GitHub

uuuuugh yojson why don't you have a clean clear exported ast WHY

folks, we need more _type-only_ packages. like we have for http type defs.

in any case, here we go: one line change and we got serialization to json.

sweet.

i heard you also liked xml

the beauty of having an abstract language for describing OCaml data, is that concrete data formats become fully pluggable.

One serializer, Many formats.

Just from yaml, to json, to protobuf in one line.

Ok, time for food. When I'm back we'll go for the Deserialize path which will be harder methinks.

Let me know what other data formats you'd like to see serialized, and go check out the code here: https://github.com/ostera/serde.ml

GitHub - ostera/serde.ml: Serialization framework for OCaml

Serialization framework for OCaml. Contribute to ostera/serde.ml development by creating an account on GitHub.

GitHub

Okay back from food and a @anmonteiro retweeted so this is how I know I need to get back to it.

What does it take to deserialize? Let’s see how #rust #serde does it.

in short, we need:

1. a deserializer, that turns bytes into serde.ml data

2. a visitor/fold, to transform to turn serde.ml data into our custom type

3. a deriver, to generate visitors from types

so i think i'll start by making an interface for deserializers.

the idea here is that you tell serde how to deserialize these specific things from your data format, and then a visitor drives the work, only really deserializing what's needed

something like this -- you get a visitor passed in, a final value type, and you get to choose per-data-format what's the best way to consume the input

#ocaml #rust #serde

actually screw this, we'll start by manually expanding what a deserailizer would look like.

this will give me a much better idea of what the visitor/reader/deser interface needs to be.

(also gives me an excuse to read even more serde code muahah)

oof, long hours #ocaml-ing here...I think I've got another ~2 hours in me before I get sleepy again. Thank you accidental nap!

The deserializer and visitor interfaces are coming along nicely! 👌🏼✨

the reason to do all this threading of modules is that:

* we want the visitor to drive the serialization
* but it shouldn't know about the deserializer
* and the deserializer should be forced to use the visitor

If you were manually implementing a deserializer, this is how you'd use it to deserialize an identifier when you're coming from some data into a record with fields Name, Role, and Clearance.

Thankfully there will be a [@@deriving deserializer] too 🤩

#ocaml #serde

okay making some coffee then we're back. last night we got here:

* create a visitor for a small variant
* create a visitor for tags
* thread these into `Serde_sexpr` transparently

still a ton to figure out, but at least the structure is starting to fall into place!

now i'll brew the cofffee and thread this Variant_visitor into an implementation of the Variant_access_intf so that we actually use it.

code here: https://github.com/ostera/serde.ml

brb

#ocaml #rust #buildinpublic #devtools #serde

GitHub - ostera/serde.ml: Serialization framework for OCaml

Serialization framework for OCaml. Contribute to ostera/serde.ml development by creating an account on GitHub.

GitHub
current status
guess what this module does
tried for a minute to build a reader that abstracts over the source of the data it reads (eg. string, file, buffer, what have you) and omg no

the sheer amount of #TypeLevel fuckery I have to do to get these modules to play along nicely and errors to surface tidily is worrying.

makes me really miss #rust traits for

In short. I’ve got a Reader interface, a Formar Deserializer, and a Visitor interface.

Together they create a Deserializer for a specific format and specific datatype over a specific source of data.

The Reader offers a basic low level next/peek/skip API over some data. So you can build a Reader for Bytes, String, In_channel, or whatever data you chunk bytes out of. Nom nom nom.

When you build a Format Deserializer (like something to parse JSON), you need a Reader. But you don’t want to read everything! That would be too inefficient. You want to only read the things you need to build up a specific datatype.

That’s where the Visitor comes in. You pass one to the Format Deserializer.

The Visitor itself decides what to do with specific pieces of data. Like what happens if we find an Int, or a Tuple, or a Variant Record.

This is specifically built for your datatype, so it knows about the fields your record needs, and the different constructors of your variant, so it can guide the deserializer forward.

It does this by receiving, in specific cases, tiny modules that offer further deserializing options. Like Variant_access (yes we do #JiraffeCase in #ocaml).

So when you visitors “visit_variant” function is called, you can peek at the variant tag name with Visitor_access.tag, and then depending on what kind of variant you know you have, you can call:

Visitor_access.unit_variant

Visitor_access.tuple_variant ~len

Visitor_access.record_variant ~fields

Each of these takes different arguments, and then drives the deserializer towards to read exactly what you’re expecting, and nothing more.

Operation wise this can be extremely efficient (as shown by #rust #serde, but I’m yet to benchmark this on #ocaml and my implementation is very naive anyway) since we only really do the minimum amount of work we need.

And yes this means that hand writing a Visitor is quite a bit of work.

Fortunately we can derive them from your types! Viva la #metaprogramming!

@leostera forgive me my ignorance (never used serde) but it looks a lot like atdgen. Or am I mistaken?

@andreypopp hello there 👋🏼 good to see you here.

I haven’t used atdgen in anger really, so I don’t know the internals, etc, but this is for generating serializers/deserializers *from* OCaml types. It’s not necessarily built for interoperativity across languages. Maybe I misunderstand what atdgen is for tho!

@leostera got it, I actually wish atdgen would be a ppx instead of separate syntax
@leostera @andreypopp what is it serialising to?
@ulrikstrid @andreypopp atdgen or serde.ml ?
@ulrikstrid @andreypopp serde.ml lets you plug in the format serializer/deserializer later, so using [@@deriving(serializer, deserializer)] just gives you the ser/de functions to interact with the specific formats (serde_sexpr, serde_json, serde_xml, etc)
serde.ml/data.ml at main · ostera/serde.ml

Serialization framework for OCaml. Contribute to ostera/serde.ml development by creating an account on GitHub.

GitHub