Using JSON-Schema for data exchange

Several years ago XML was a quite popular document format – mostly due to its schema validation possibilities and clearly defined structure. Many libraries made working with such data documents possible (not really nice or a pleasure…) and humans could read them if need be. XML as a text format is programming language agnostic and processable in practically all useful programming environments.

Working with XML always was more of a pain for me. Fortunately, since then a lot of time passed and alternatives like JSON, YAML and TOML arised. All of them have their strengths and weaknesses and can fill similar roles as XML.

In general they have 2 things in common compared to XML:

  1. superiour readability
  2. lacking validation compared to XML schema

Nowadays, JSON is very widespread due to the popularity of JavaScript and perhaps the most used data exchange format across the internet. Despite having some syntactic quirks like its strictness about commas and forbidding of comments it is imho quite a good format. It is concise, human-readable, flexible and relatively simple. Many languages treat it like nested dictionaries so understanding and working with JSON is easy.

The main drawback is missing documentation and validation.

Enter JSON schema

JSON schema is a specification with accompanying libraries to fix the major issues about JSON. You define a schema of your data documents in JSON and put it in separate files. This adds the missing features to your JSON data documents I complained about: documentation and the possibility of automatic validation.

How does a simple JSON schema file look like? Let us have a look:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://cars.softwareschneiderei.com/car.schema.json",
  "title": "Car",
  "description": "Describing some important properties of a car",
  "type": "object",
  "properties": {
    "manufacturer": {
      "description": "The company producing the car.",
      "type": "string"
    },
    "model": {
      "description": "The name of the car model.",
      "type": "string"
    },
    "engineType": {
      "description": "One of the available engine types.",
      "enum": [
        "gasoline",
        "diesel",
        "hybrid",
        "electric"
      ]
    },
    "availableColors": {
      "description": "The colors the car is available in. Some colors may increase the price.",
      "type": "array",
      "items": {
        "type": "string"
      },
      "minItems": 1,
      "uniqueItems": true
    },
    "price": {
      "description": "The price tag in € including VAT. The price is optional.",
      "type": "number"
    }
  },
  "required": [
    "manufacturer",
    "model",
    "engine",
    "availableColors"
  ]
}

A valid data document could look like below:

{
  "manufacturer": "Porsche",
  "model": "911 Turbo",
  "engineType": "gasoline",
  "availableColors": [ "black", "blue", "red", "yellow", "white" ],
  "price": 150000
}

I think we can easily see how the documentation helps to understand the data, for example regarding the price property. The Java code to validate the data may look similar to this:

public void processCarData(String carFileName) { 
  var schemaDefinition = Json.decodeValue(readFile("car.schema.json"));
  JsonSchema schema = JsonSchema.of((JsonObject) schemaDefinition);
  var schemaValidator = Validator.create(schema, new JsonSchemaOptions()
      .setDraft(Draft.DRAFT202012)
      .setBaseUri("https://cars.softwareschneiderei.com"));
  var car = Json.decodeValue(readFile(carFileName));

  if (!schemaValidator.validate(car).getValid()) {
    throw new IllegalArgumentException("The format of the car data is invalid.");
  }
  // Data valid, work with it...
}

Conclusion

As depicted above JSON schema fills some important gaps when using JSON as an data exchange format. It offers helpful tools to document data structure and to validate data against a definition written in JSON itself.

That way we can safely work with the data, document the structure and still maintain the other good properties of JSON like interoperability, human-readability and data effienciency.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.