(yes, i know this is outputting cursed ocaml right now, bear with me)

fork yes! its valid #ocaml now. i think.

it also derives:

* newtypes
* tuples
* variants (unit, tuple, and record ones)

and knows how to follow modules/etc

ok, will add support for records next

ofc you'd never see any of this code, its all generates for you, what you see is this:
oh no, what did i break now

friggin' brew. we're back on track. I wrote a sexpr serializer to iron out the serializer implementator interface, and it feels good!

(minus all the explicit first-class module signatures type escaping)

the important bits is that this is what you normally write as a user:

type t = { ... } [@@deriving serializer]

and then you can call

t |> serialize_t |> Serde_sexpr.to_string

wanna go to json? just change to Serde_json.to_string

maybe someone will finally implement a deriving(debug) with this!

but for authors of serializers, there's a few things that always get in the way:

1. error handling (esp. over collections)

2. mapping over the serde.ml data definition language

3. parametrizing over the kind of output you want to have (usually a string, but could be a buffer/channel that goes straight to stdout/disk/network)

all of those are provided by the Ser.Mapper interface, which is created at runtime based on the definition of the serializer you are using.

See this cursed module: https://github.com/ostera/serde.ml/blob/main/serde/ser.ml#L163-L185

serde.ml/ser.ml at main · ostera/serde.ml

Serialization framework for OCaml. Contribute to ostera/serde.ml development by creating an account on GitHub.

GitHub
GitHub - ostera/serde.ml: Serialization framework for OCaml

Serialization framework for OCaml. Contribute to ostera/serde.ml development by creating an account on GitHub.

GitHub

uuuuugh yojson why don't you have a clean clear exported ast WHY

folks, we need more _type-only_ packages. like we have for http type defs.

in any case, here we go: one line change and we got serialization to json.

sweet.

i heard you also liked xml

the beauty of having an abstract language for describing OCaml data, is that concrete data formats become fully pluggable.

One serializer, Many formats.

Just from yaml, to json, to protobuf in one line.

Ok, time for food. When I'm back we'll go for the Deserialize path which will be harder methinks.

Let me know what other data formats you'd like to see serialized, and go check out the code here: https://github.com/ostera/serde.ml

GitHub - ostera/serde.ml: Serialization framework for OCaml

Serialization framework for OCaml. Contribute to ostera/serde.ml development by creating an account on GitHub.

GitHub

Okay back from food and a @anmonteiro retweeted so this is how I know I need to get back to it.

What does it take to deserialize? Let’s see how #rust #serde does it.

in short, we need:

1. a deserializer, that turns bytes into serde.ml data

2. a visitor/fold, to transform to turn serde.ml data into our custom type

3. a deriver, to generate visitors from types

so i think i'll start by making an interface for deserializers.

the idea here is that you tell serde how to deserialize these specific things from your data format, and then a visitor drives the work, only really deserializing what's needed

something like this -- you get a visitor passed in, a final value type, and you get to choose per-data-format what's the best way to consume the input

#ocaml #rust #serde

actually screw this, we'll start by manually expanding what a deserailizer would look like.

this will give me a much better idea of what the visitor/reader/deser interface needs to be.

(also gives me an excuse to read even more serde code muahah)

oof, long hours #ocaml-ing here...I think I've got another ~2 hours in me before I get sleepy again. Thank you accidental nap!

The deserializer and visitor interfaces are coming along nicely! 👌🏼✨

If you were manually implementing a deserializer, this is how you'd use it to deserialize an identifier when you're coming from some data into a record with fields Name, Role, and Clearance.

Thankfully there will be a [@@deriving deserializer] too 🤩

#ocaml #serde

okay making some coffee then we're back. last night we got here:

* create a visitor for a small variant
* create a visitor for tags
* thread these into `Serde_sexpr` transparently

still a ton to figure out, but at least the structure is starting to fall into place!

now i'll brew the cofffee and thread this Variant_visitor into an implementation of the Variant_access_intf so that we actually use it.

code here: https://github.com/ostera/serde.ml

brb

#ocaml #rust #buildinpublic #devtools #serde

GitHub - ostera/serde.ml: Serialization framework for OCaml

Serialization framework for OCaml. Contribute to ostera/serde.ml development by creating an account on GitHub.

GitHub
current status
guess what this module does
tried for a minute to build a reader that abstracts over the source of the data it reads (eg. string, file, buffer, what have you) and omg no

the sheer amount of #TypeLevel fuckery I have to do to get these modules to play along nicely and errors to surface tidily is worrying.

makes me really miss #rust traits for

In short. I’ve got a Reader interface, a Formar Deserializer, and a Visitor interface.

Together they create a Deserializer for a specific format and specific datatype over a specific source of data.

The Reader offers a basic low level next/peek/skip API over some data. So you can build a Reader for Bytes, String, In_channel, or whatever data you chunk bytes out of. Nom nom nom.

When you build a Format Deserializer (like something to parse JSON), you need a Reader. But you don’t want to read everything! That would be too inefficient. You want to only read the things you need to build up a specific datatype.

That’s where the Visitor comes in. You pass one to the Format Deserializer.

The Visitor itself decides what to do with specific pieces of data. Like what happens if we find an Int, or a Tuple, or a Variant Record.

This is specifically built for your datatype, so it knows about the fields your record needs, and the different constructors of your variant, so it can guide the deserializer forward.

It does this by receiving, in specific cases, tiny modules that offer further deserializing options. Like Variant_access (yes we do #JiraffeCase in #ocaml).

So when you visitors “visit_variant” function is called, you can peek at the variant tag name with Visitor_access.tag, and then depending on what kind of variant you know you have, you can call:

Visitor_access.unit_variant

Visitor_access.tuple_variant ~len

Visitor_access.record_variant ~fields

Each of these takes different arguments, and then drives the deserializer towards to read exactly what you’re expecting, and nothing more.

Operation wise this can be extremely efficient (as shown by #rust #serde, but I’m yet to benchmark this on #ocaml and my implementation is very naive anyway) since we only really do the minimum amount of work we need.

And yes this means that hand writing a Visitor is quite a bit of work.

Fortunately we can derive them from your types! Viva la #metaprogramming!

Anyway, I’m getting close to having a manual visitor running for a subset of S-expressions (variants).

What I’m afraid of is that there is no clear uniform interface for futures or readers (buffered or otherwise, sync or async) in the ecosystem, which makes it impossible to reuse parsers in other libraries (yojson, sexplib, tyxml, etc) out of the box.

Some say this is a feature. But in this case it looks like I’m going to have to borrow a lot of code to make these #parsers fit into #serde.ml

okay, #opam sucks terribly -- i am constantly accidentally installing wrong versions of things. You wanna install a new tool and it downgrades a bunch of stuff. Not cool!

BUT #dune test --watch is so far an excellent testing experience for compiled languages.

makes #ocaml feel light and dynamic and i love it

fml the variant access visitors don't have enough information to know _which_ variant you want

and when they go through the sequence deserailization, the sequence visitor also has no clue what you want

i have got to be missing something there

nevermind, got it!

Problem was that I was recursing over the same Visitor, whereas I really need to generate custom visitors per variant, so I can just tell the framework. "please visit this tuple variant using Field_Tuple1_visitor" instead.

Then that visitor knows exactly how to deserialize that specific variant.

```ocaml
| Tuple1 of string
```

This is a PITA to write manually, but we'll codegen all this stuff.

alright we're on! 🚀 hand-rolled deserializer can read empty variants and variants with tuples.

Just need to figure out how to get Sexplib to always print quotes around strings...because why wouldn't you. I'm sure there's a good reason to save those 2 bytes, but it is definitely getting in my way right now.

Next up: variant records!

aaand #serde variant record support is here too #ocaml can be so nice 🐪

a little taste of transcoding too, since we can read from X and use serde to go into Y, y'all automatically get:

sexpr -> json
sexpr -> xml
json -> xml
json -> sexpr
etc

the more formats we write, the more we can do!

okay now lets try to do some codegen

ast_builder we meet again

slowly getting there! using #ocaml #ppx can be tedious, but at least you know that the ASTs you build aren't complete gibberish.

Here we are generating some helper values (like a string version of the type name and the variants) and our first visitor for tags.

oh just thought of a great use case for serde.ml -- serde-js!

And say goodbye to hoping your `external` is type-safe 😹

(yes i know of decco and bs-json)

aaaand we've got the first derived #serde #deserializer! 🤩

Viva la #metaprogramming!

#ocaml #rust #ppx

alright, enough for tonight. here's the repo with an ok readme: https://github.com/ostera/serde.ml

let me know what you think folks! thoughts, reviews, contributions welcome.

Would be fun to see more data formats (avro, cbor, yaml, toml, messagepack, postcard, etc), and also would love to get a small benchmarking thing going.

added support for #records abstract types (aliases, tuples), and working now on getting applied types and generics:

type 'x pair = 'x * 'x
type v2f = float pair
[@@deriving deserializer]

#ocaml #serde is coming along!

One issue with this approach is that we don't have type-driven resolution of methods (a-la #rustlang traits, #haskell typeclasses, or modular implicits), so automatic generics would require some user input: what serializer should we use for these types?

It is still type-safe, since we'll just fail to deserialize values if you give us the wrong deserializer, but it isn't nearly as ✨magical✨

modular implicits pls land 😭

okay, putting a puase here. will instead work on the #serde #json deserialization, lets see if we can reuse #yojson streaming parser!

so I dropped the first-class module approach for the Reader interface and went with a GADT.

this looks alright, and not that much #typelevel magic going on, just an #existentialtype.

this actually started with @c_cube recommending to reuse the Yojson #json #parser, which is using a generated lexer, so it keeps its own sort of state and wouldn't be possible to reuse with an interface like this.

And then I just threaded a custom state throughout a Format Deserializer, which means anyone can pick whatever way of reading content they want.

Which yes means #yojson gets its own lexer_state type threaded.

Now if only Yojson gave me a clear peek/drop/next API, that'd be dope.

another small improvement -- we've got type-aliased module types, so your signatures now can look like normal types:

type state.
state Deserializer.t ->
state Visitor.t->
result

Instead of this MONSTROSITY:

type value.
state ->
(module Deserializer with type state = state) ->
(module Visitor.Intf with type value = value) ->
result

(don't @ me)

this is what a derived map visitor (for constructing records and arbitrarily sized map-like collections) will look lie.

This allows us to parse JSON objects into records by looking _exactly_ at the keys we care about, and reading those _directly_ into the final values rather than building up large generic JSON data that we shove into types later.

Having multiple visitors derived automatically also means that we can read data in the same format (eg. JSON), but different representation.

For example, by default I was working with _sequences_ since I already had the Sequence_access module working. This meant that Records were first possible to construct from a _packed_ representation that doesn't have any keys (eg. just an array of values).

This can save tons of space over the wire and make your app more #responsive and boost #performance!

aaand we have derived map visitors for #ocaml #serde, so we're one step closer to deserializing json

yes this is the expanded code from a single `[@@deriving deserializer]` call, but let me do variants now.

variants are good.

aaand the derived deserializer for #ocaml records works.

viva la #metaprogramming! 👏

OK! now we call #contributors -- if you're into #ocaml (or #rust) and would like to help out working on #serde, feel free to reach out!

Fix shit, break it, open a PR, hit me up here or on the OCaml Discord. Happy to chat and help 🙏

This has been super fun, and I'd love to keep working on it, but I also know myself and I'm more likely to do this if there's other people that want to use it.

If you liked this thread, boost it! ♻️

Happy #ocaml coding! 🐫

https://github.com/ostera/serde.ml

GitHub - ostera/serde.ml: Serialization framework for OCaml

Serialization framework for OCaml. Contribute to ostera/serde.ml development by creating an account on GitHub.

GitHub