Improved support in JsonSchema.Net.DataGeneration!

- better regex
- better conditionals
- error reporting

Read about it in my latest blog post!

https://blog.json-everything.net/posts/datagen-improvements/

#dotnet #jsonschema

Better Schema-Compatible Data Generation

JsonSchema.Net.DataGeneration has received some significant upgrades. In this post, I’ll go over what’s changed, and how you can use this package to enhance your schema development workflow. New and improved! First let’s cover the small stuff. I added a bunch of tests that identified a few bugs, and added support for propertyNames. There is also added support for the allof/if/then pattern Jason Desrosiers came up with to implement the OpenAPI discriminator keyword. (You can see this pattern in action in Jason’s excellent post on the JSON Schema blog.) Regex improvements In previous versions, generation of strings that matched regular expressions was performed by the Fare library. While great, this library does lack some important features specific to this kind of generation. When building strings that match JSON Schema requirements, different branches of a schema could have different requirements of the same instance. This means that in order to get Fare to work right, the library has to create composite expressions, and often those composite expressions weren’t supported by Fare. This led me to drop Fare and implement my own regular expression support that can handle the unique requirements I needed. While I have been impressed with the latest state of AI coding, I still don’t fully trust it. That said, I will admit that a large part of this new regular expression support was AI-generated, but it is also heavily tested, so I’m confident that it works for the application. I’m not sure of the limits, though. If you find them, please open an issue. The new implementation incorporates other keywords, like minLength, into the regular expression requirements, and even supports anti-requirements, like a pattern keyword inside of a not keyword. Error reporting I think this is the coolest addition to this library. When data generation fails, now it tell you why! The generation results error message is now descriptive of the error that occurred, and there are you properties that give information about where in the problem occurred: Location gives you where in the instance the generation failed. SchemaLocations gives you where in the schema the error occured. Generally a failure to generate data is the result of either a conflict in the schema 1 2 3 4 5 6 { "allOf": [ { "type": "string" }, { "type": "number" } ] } or a feature just isn’t supported. The nice thing is that they’re all reported now. Why use data generation? While there are likely many use cases for data generation, the most helpful application in my mind is testing your schemas. Being able to see what kinds of data your schemas allow enables you to find gaps that can allow invalid data into your systems. A very real failure mode Say you’re building a user registration endpoint. You write a JSON Schema for the request body, wire it up with JsonSchema.Api to support automatic request validation, and ship it. The schema looks like this: 1 2 3 4 5 6 7 8 9 { "type": "object", "properties": { "name": { "type": "string" }, "email": { "type": "string" }, "age": { "type": "integer" } }, "required": ["name", "email", "age"] } A client hits the endpoint and passes this: 1 2 3 4 5 6 7 { "name": "", "email": "x", "age": -5847, "password": "hunter2", "admin": true } Schema validation passes and the request comes through into your controller. But tThat payload has an empty name, an invalid email, a nonsensical age, and extra properties that your endpoint never asked for. If any of that data gets trusted downstream, you now have a production issue caused by a “valid” request. The schema is doing what it was told. The problem is that it doesn’t yet express what you meant. So you go back and tighten things up: 1 2 3 4 5 6 7 8 9 10 { "type": "object", "properties": { "name": { "type": "string", "minLength": 1 }, "email": { "type": "string", "format": "email" }, "age": { "type": "integer", "minimum": 0, "maximum": 150 } }, "required": ["name", "email", "age"], "additionalProperties": false } Now that same request gets rejected immediately. This is where generation helps. Instead of trying to invent every weird edge case yourself, you generate samples that are valid for your schema and inspect them. If the samples include data your API can’t safely handle, the schema needs more constraints. The new error reporting helps here, too. If you’ve created conflicting constraints (for example in an allOf) and generation can’t produce data, it tells you where and why it failed, helping you to identify and resolve the problem. Wrapping up Most of these updates came from real use: writing schemas, finding edge cases, adding tests, and fixing what those tests exposed. If you’re already using this package, updating should give you better output and much better diagnostics when something goes wrong. If you haven’t used it yet, this release is a solid place to start. If you aren’t generating revenue, you like the work I put out, and you would still like to support the project, please consider becoming a sponsor!

json-everything

Hey #dotnet #jsonschema folks! Given that JsonSchema.Net.Generation now supports source generation, does anyone have a real use case for runtime reflective generation?

I'd love you hear your thoughts.

https://github.com/json-everything/json-everything/issues/1016

@dvh laat zien dat hij all schema's uit de Open API Specs van de API's op apis.developer.overheid.nl heeft omgezet naar JSON Schema's. Dit levert een mooie inkijk in bestaande schema's vanuit verschillende organisaties.

#oas #jsonschema

@dvh laat ook zien dat deze JSON Schema's direct omgezet kunnen worden naar modellen in alle programmeertalen. Ook kan je het gebruiken voor typechecking in TypeScript.

#jsonschema

Onze product-owner @dvh geeft een presentatie over een nieuw product: het schema register. Een register met herbruikbare JSON Schema's die oa gebruikt kunnen worden in Open API Specs.

#jsonschema

Does it still make sense to maintain deterministic code generators? I mean for #OpenAPI. Since #JSONSchema translates so badly to classes in any programming language, generating with #LLM might work better with guessing what the author of the OpenAPI document or the generator library had in mind.

Than again, it's a bit of a coin flip :/

#programming #pythonLapidary
I want to give a huge shout out to Juan Cruz Viotti and the #jsonschema tool he released as OSS, https://github.com/sourcemeta/jsonschema .
I recently updated our governance pipeline to use jsonschema to produce better analysis of the examples in our #OpenAPI documents. We created a lint-json-schema-examples tool which maps across all schemas in an OpenAPI doc and uses jsonschema to validate each example. While Spectral does some basic validation of examples, the diagnostics from jsonschema are far superior.
GitHub - sourcemeta/jsonschema: The CLI for working with JSON Schema. Covers formatting, linting, testing, bundling, and more for both local development and CI/CD pipelines

The CLI for working with JSON Schema. Covers formatting, linting, testing, bundling, and more for both local development and CI/CD pipelines - sourcemeta/jsonschema

GitHub

Нотация к SQL-препроцессору, и не только

Статья представляет компактную математическую нотацию для SQL-препроцессора, разработанную для формирования сложных условных выражений из JSON-конфигураций. Нотация позволяет кратко записывать операции с множествами и интервалами: комбинированные операторы ( >=[18,65] ), стрелочные символы для интервалов ( >> — BETWEEN, >< — NOT BETWEEN) и логическое отрицание через знак минус. Цель — создать интуитивно понятный, непротиворечивый и расширяемый язык запросов. Практическое применение — генерация SQL-кода в препроцессорах, DSL для построителей запросов, компактные фильтры в JSON-API. Рассматриваются сильные стороны и потенциальные проблемы нотации, сравнительный анализ с аналогами (Quist, SQL++, PRQL), выявляется уникальность подхода. Автор приглашает к обсуждению и предлагает сотрудничество.

https://habr.com/ru/articles/1005772/

#sql #json #jsonschema #json_schema_validator #json_path

Нотация к SQL-препроцессору, и не только

Введение При решении одной прикладной задачи возникла необходимость формировать довольно длинные условные выражения в SQL-запросах на основе JSON-файла. Задача упрощалась тем обстоятельством, что...

Хабр
One Open-source Project Daily

Generate types and converters from JSON, Schema, and GraphQL

https://github.com/glideapps/quicktype

#1ospd #opensource #cplusplus #csharp #elm #golang #graphql #java #json #jsonschema #kotlin #objectivec #rust #swift #typescript

JSON schemas define "anyOf" to mean matching one _or more_ of the subschemas. (This seems to be favored over "oneOf" which would require the implementation to check every schema to make sure exactly one matched, not two or more.)

But this seems to permit an ambiguity where different implementations could interpret the same message as different types. Is this ever a problem in practice, or has everybody de facto adopted "the first match in the list" semantics?

#JsonSchema