Проектируем как синьор: универсальная бинаризация

Здравствуйте, меня зовут Дмитрий Карловский и я.. да не важно кто я. Важно о чём я говорю, и как аргументирую. Кто меня знает, тому и не надо рассказывать. А кто не знает — у того есть прекрасная возможность подойти к вопросу с чистым разумом. А это крайне важно, если мы хотим спроектировать что-то по настоящему хорошо, а не как обычно. Что ещё за VaryPack?

https://habr.com/ru/articles/975020/

#VaryPack #MsgPack #CBOR #BSON

Проектируем как синьор: универсальная бинаризация

Здравствуйте, меня зовут Дмитрий Карловский и я.. да не важно кто я. Важно о чём я говорю, и как аргументирую. Кто меня знает, тому и не надо рассказывать. А кто не знает — у того есть прекрасная...

Хабр

JSON? JSONB? BSON? CBOR? MsgPack? А, VaryPackǃ

VaryPack - новый, простой, гибкий, шустрый и компактный формат бинарной сериализации произвольных данных. Что за модная тема?

https://habr.com/ru/articles/966270/

#VaryPack #MsgPack #CBOR #JSON #JSONB #BSON

JSON? JSONB? BSON? CBOR? MsgPack? А, VaryPackǃ

Наконец-то зарелизил спеку VaryPack - новый, простой, гибкий, шустрый и компактный формат бинарной сериализации произвольных данных. TS библиотека в MAM - $mol_vary , в NPM - mol_vary . Это всего 600...

Хабр

I remember some article telling how a company was constantly hitting AWS quotas, because their JSON payload, itself fitting into limits, was put into a string field inside another JSON object, used for communication between servers, therefore all quotes and backslashes were double-escaped as \" and \\, increasing payload size.

String encoding is also the reason why, for example, serde (Rust de-/serialization library) can give you Cow<str> if you don't want extra allocations: when there are no character escapes, a reference to the original input can be passed, but in case we're doing \"->" replacements, we need to copy this part of input anyway.
I don't say it's bad, that is how a text format works in common, not only JSON. But if you need to put some arbitrary data into objects, think twice, probably a binary format like MessagePack, BSON or even custom ProtoBuf will be much more efficient for your task.

Also, text formats are basically not suitable for streaming, while loading a big object into RAM is a very bad idea. If it's an array, you can separate objects by newline instead of using JSON's [ ]. In other cases, search for a SAX-like library (or smth like "stream json") for your programming language.

Now I won't give a specific example, but I'm sure there are developers doing this: encoding a file with base64 to send it inside a JSON request. Please, remember that b64 bloats payload approximately by 1.33x [^1], so you should always either send a file with an additional HTTP request or use multipart form data type. Oh, or encode your objects with a binary format. Last two options are OK when you're working with small files and insist on doing everything in one request, otherwise upload data in different reqs in parallel.

[^1] formula for base64 string length is:
4 * ceil(original_length / 3)

Another example of "how definitely NOT to do" is Piped (privacy frontend for YouTube), on some API endpoints it provides a nextpage object, containing session info used to request the next page for a channel, a playlist, search results or comments, and the problem is that it's a JSON object put inside a string as explained above: "nextpage":"{\"url\":\"https…
Even funnier, there are body field inside this nextpage object that contains another JSON object, encoded in base64, so there are 3 layers of text format encoding.
And when a client requests the next page, the object is sent in GET querystring parameters, so it gets urlencoded (percent-encoded), resulting in 4 layers!! Idk why browsers don't reject its long ugly URLs.
Everything before querystring is excusable if the internal YT API itself requires such format for a context/session object. Invidious doesn't care about context at all and sends a clean request, if I got it right.

And the most stupid JSON usecase is JWT, I think. It encodes already-plaintext format with base64 (intended for converting binary data to ASCII text; the same as in Piped, but we forgave it), moreover, it does this to 2 objects, and stores a token with such a big overhead in cookies.

By the way, want a JSON config in your software? Take a look at Hjson that is much more convenient for writing by hand.

#json #msgpack #bson
#web #performance #optimization
#advice

MessagePack: It's like JSON. but fast and small.

An interesting thing about #msgpack: its *type system* doesn't differentiate between unsigned and signed integers, but the *format* does, and I don't understand why.

The format makes space for 1-byte, 2-byte, 4-byte and 8-byte signed integers, *and* 1-byte, 2-byte, 4-byte and 8-byte unsigned integers, even though those options mostly overlap. To support the same range of integers, you'd just need the 4 signed integers and one 8-byte unsigned integer format. You could then add 5/6/7B integers!

Reading the message pack spec and am I correct in assuming that a map can have any key type as well as any value type and mix them in the same map? #Programming #msgpack

Emails over #msgpack and #zmq would certainly be something 😂

#zeromq

@naia #xml and #json are not "databases", it is just a serialization formats, so you can save data from your object model in files, transfer it between applications and so on.
What do you need essentially? Saving data to disk to read it back later? Or do you need real database features, like indexes and reference integrity?
If you need to save data somewhere and read it back later - have a look at binary serialization methods: Protocol Buffers (#protobuf), Message Pack (#msgpack). They are extremely fast and compact, comparing to json and xml, but are not human-readable.
In case you need database features, but do not want to deal with all this enterprise stuff like MS SQL and Oracle, take #Sqlite. It is compact, file-based (serverless) relational database, that can be embedded into your application. That's why it is popular in mobile applications. If you are looking for something more multiuser and multithreading - #PostgreSQL should be a good choice. It is cool database, that can be used in project of every size: a small pet project, MVP to start your startup in garage and enterprise application serving hundreds RPS 24/7/365. It is a bit more complex that sqlite, but it is still easy to start, has great docs, big community and excellent documentation.

Ich bin heute von dem Hundertstel ins Tausendstel abgerutscht und habe so viele neue Techniken gefunden da weiß ich nicht wo ich anfangen soll

#nats #wasmcloud #wasm3 #msgpack #GrafanaTempo

Made a small amount of progress with my C# game engine. About to start implementing networking with #lidgren and #msgpack. Any thoughts?
#csharp #gamedev

Related: I just found the the #Rust package for #Parquet does not support write.

This means my idea of a simple #Rust application to convert our #MsgPack s to #Parquet will not be possible (for now).