Now on its own it isn't very interesting, its just a small AST for data structures.
The magic is that this is really an ✨intermediate representation✨
A shared interface really, between the ser and de parts of serde.
Now on its own it isn't very interesting, its just a small AST for data structures.
The magic is that this is really an ✨intermediate representation✨
A shared interface really, between the ser and de parts of serde.
And on the other side we have modules that consume this IR and spit out other formats like JSON, s-expressions, TOML, KDL, XML, what have you.
I haven't sketched any of those yet enough to show a screenshot.
yeah, no. these meta-quotation libraries are unusable to me. I'll need tons more documentation and examples on how to do common things. I'm getting too spoiled by the amazing efforts in Rust to keep libraries well documented.
Going back to ast_builder.
so i got a deriver for serializers into the serde.ml language somewhat working, and this is already making it so much eaiser to test stuff -- viva la #metaprogramming!
fork yes! its valid #ocaml now. i think.
it also derives:
* newtypes
* tuples
* variants (unit, tuple, and record ones)
and knows how to follow modules/etc
ok, will add support for records next
friggin' brew. we're back on track. I wrote a sexpr serializer to iron out the serializer implementator interface, and it feels good!
(minus all the explicit first-class module signatures type escaping)
the important bits is that this is what you normally write as a user:
type t = { ... } [@@deriving serializer]
and then you can call
t |> serialize_t |> Serde_sexpr.to_string
wanna go to json? just change to Serde_json.to_string
maybe someone will finally implement a deriving(debug) with this!
but for authors of serializers, there's a few things that always get in the way:
1. error handling (esp. over collections)
2. mapping over the serde.ml data definition language
3. parametrizing over the kind of output you want to have (usually a string, but could be a buffer/channel that goes straight to stdout/disk/network)
all of those are provided by the Ser.Mapper interface, which is created at runtime based on the definition of the serializer you are using.
See this cursed module: https://github.com/ostera/serde.ml/blob/main/serde/ser.ml#L163-L185
uuuuugh yojson why don't you have a clean clear exported ast WHY
folks, we need more _type-only_ packages. like we have for http type defs.
in any case, here we go: one line change and we got serialization to json.
sweet.
the beauty of having an abstract language for describing OCaml data, is that concrete data formats become fully pluggable.
One serializer, Many formats.
Just from yaml, to json, to protobuf in one line.
Ok, time for food. When I'm back we'll go for the Deserialize path which will be harder methinks.
Let me know what other data formats you'd like to see serialized, and go check out the code here: https://github.com/ostera/serde.ml
Okay back from food and a @anmonteiro retweeted so this is how I know I need to get back to it.
What does it take to deserialize? Let’s see how #rust #serde does it.
in short, we need:
1. a deserializer, that turns bytes into serde.ml data
2. a visitor/fold, to transform to turn serde.ml data into our custom type
3. a deriver, to generate visitors from types
so i think i'll start by making an interface for deserializers.
the idea here is that you tell serde how to deserialize these specific things from your data format, and then a visitor drives the work, only really deserializing what's needed
actually screw this, we'll start by manually expanding what a deserailizer would look like.
this will give me a much better idea of what the visitor/reader/deser interface needs to be.
oof, long hours #ocaml-ing here...I think I've got another ~2 hours in me before I get sleepy again. Thank you accidental nap!
The deserializer and visitor interfaces are coming along nicely! 👌🏼✨
okay making some coffee then we're back. last night we got here:
* create a visitor for a small variant
* create a visitor for tags
* thread these into `Serde_sexpr` transparently
still a ton to figure out, but at least the structure is starting to fall into place!
now i'll brew the cofffee and thread this Variant_visitor into an implementation of the Variant_access_intf so that we actually use it.
code here: https://github.com/ostera/serde.ml
brb
the sheer amount of #TypeLevel fuckery I have to do to get these modules to play along nicely and errors to surface tidily is worrying.
makes me really miss #rust traits for
In short. I’ve got a Reader interface, a Formar Deserializer, and a Visitor interface.
Together they create a Deserializer for a specific format and specific datatype over a specific source of data.
The Reader offers a basic low level next/peek/skip API over some data. So you can build a Reader for Bytes, String, In_channel, or whatever data you chunk bytes out of. Nom nom nom.
When you build a Format Deserializer (like something to parse JSON), you need a Reader. But you don’t want to read everything! That would be too inefficient. You want to only read the things you need to build up a specific datatype.
That’s where the Visitor comes in. You pass one to the Format Deserializer.
The Visitor itself decides what to do with specific pieces of data. Like what happens if we find an Int, or a Tuple, or a Variant Record.
This is specifically built for your datatype, so it knows about the fields your record needs, and the different constructors of your variant, so it can guide the deserializer forward.
It does this by receiving, in specific cases, tiny modules that offer further deserializing options. Like Variant_access (yes we do #JiraffeCase in #ocaml).
So when you visitors “visit_variant” function is called, you can peek at the variant tag name with Visitor_access.tag, and then depending on what kind of variant you know you have, you can call:
Visitor_access.unit_variant
Visitor_access.tuple_variant ~len
Visitor_access.record_variant ~fields
Each of these takes different arguments, and then drives the deserializer towards to read exactly what you’re expecting, and nothing more.
Operation wise this can be extremely efficient (as shown by #rust #serde, but I’m yet to benchmark this on #ocaml and my implementation is very naive anyway) since we only really do the minimum amount of work we need.
And yes this means that hand writing a Visitor is quite a bit of work.
Fortunately we can derive them from your types! Viva la #metaprogramming!
Anyway, I’m getting close to having a manual visitor running for a subset of S-expressions (variants).
What I’m afraid of is that there is no clear uniform interface for futures or readers (buffered or otherwise, sync or async) in the ecosystem, which makes it impossible to reuse parsers in other libraries (yojson, sexplib, tyxml, etc) out of the box.
Some say this is a feature. But in this case it looks like I’m going to have to borrow a lot of code to make these #parsers fit into #serde.ml
okay, #opam sucks terribly -- i am constantly accidentally installing wrong versions of things. You wanna install a new tool and it downgrades a bunch of stuff. Not cool!
BUT #dune test --watch is so far an excellent testing experience for compiled languages.
makes #ocaml feel light and dynamic and i love it
fml the variant access visitors don't have enough information to know _which_ variant you want
and when they go through the sequence deserailization, the sequence visitor also has no clue what you want
i have got to be missing something there
nevermind, got it!
Problem was that I was recursing over the same Visitor, whereas I really need to generate custom visitors per variant, so I can just tell the framework. "please visit this tuple variant using Field_Tuple1_visitor" instead.
Then that visitor knows exactly how to deserialize that specific variant.
```ocaml
| Tuple1 of string
```
This is a PITA to write manually, but we'll codegen all this stuff.
alright we're on! 🚀 hand-rolled deserializer can read empty variants and variants with tuples.
Just need to figure out how to get Sexplib to always print quotes around strings...because why wouldn't you. I'm sure there's a good reason to save those 2 bytes, but it is definitely getting in my way right now.
Next up: variant records!
aaand #serde variant record support is here too #ocaml can be so nice 🐪
a little taste of transcoding too, since we can read from X and use serde to go into Y, y'all automatically get:
sexpr -> json
sexpr -> xml
json -> xml
json -> sexpr
etc
the more formats we write, the more we can do!
okay now lets try to do some codegen
oh just thought of a great use case for serde.ml -- serde-js!
And say goodbye to hoping your `external` is type-safe 😹
(yes i know of decco and bs-json)
aaaand we've got the first derived #serde #deserializer! 🤩
Viva la #metaprogramming!
alright, enough for tonight. here's the repo with an ok readme: https://github.com/ostera/serde.ml
let me know what you think folks! thoughts, reviews, contributions welcome.
Would be fun to see more data formats (avro, cbor, yaml, toml, messagepack, postcard, etc), and also would love to get a small benchmarking thing going.
One issue with this approach is that we don't have type-driven resolution of methods (a-la #rustlang traits, #haskell typeclasses, or modular implicits), so automatic generics would require some user input: what serializer should we use for these types?
It is still type-safe, since we'll just fail to deserialize values if you give us the wrong deserializer, but it isn't nearly as ✨magical✨
modular implicits pls land 😭
so I dropped the first-class module approach for the Reader interface and went with a GADT.
this looks alright, and not that much #typelevel magic going on, just an #existentialtype.
this actually started with @c_cube recommending to reuse the Yojson #json #parser, which is using a generated lexer, so it keeps its own sort of state and wouldn't be possible to reuse with an interface like this.
And then I just threaded a custom state throughout a Format Deserializer, which means anyone can pick whatever way of reading content they want.
Which yes means #yojson gets its own lexer_state type threaded.
Now if only Yojson gave me a clear peek/drop/next API, that'd be dope.
another small improvement -- we've got type-aliased module types, so your signatures now can look like normal types:
type state.
state Deserializer.t ->
state Visitor.t->
result
Instead of this MONSTROSITY:
type value.
state ->
(module Deserializer with type state = state) ->
(module Visitor.Intf with type value = value) ->
result
(don't @ me)
this is what a derived map visitor (for constructing records and arbitrarily sized map-like collections) will look lie.
This allows us to parse JSON objects into records by looking _exactly_ at the keys we care about, and reading those _directly_ into the final values rather than building up large generic JSON data that we shove into types later.
Having multiple visitors derived automatically also means that we can read data in the same format (eg. JSON), but different representation.
For example, by default I was working with _sequences_ since I already had the Sequence_access module working. This meant that Records were first possible to construct from a _packed_ representation that doesn't have any keys (eg. just an array of values).
This can save tons of space over the wire and make your app more #responsive and boost #performance!
yes this is the expanded code from a single `[@@deriving deserializer]` call, but let me do variants now.
variants are good.
aaand the derived deserializer for #ocaml records works.
viva la #metaprogramming! 👏
@cemerick do lockfiles automatically make a switch workspace specific? 🤔
For ex. how would opam know to follow a lockfile when I just go `opam install ocamlformat` in a new shell/other dir?
@leostera (sorry, missed this reply)
but no, lockfiles are all about _removing_ workspace-specific difficulties (the particular state of your current switch and installed packages are why doing a new one-off install is triggering undesirable downgrades, etc)
opam will always use a lockfile if present as long as you include `--locked` in any relevant commands (`install`, `switch create`, etc). IIRC, the plan is to make `--locked` the default in some future release