Understanding JSON Schema

Understanding JSON Schema

JSON schema

JSON Schema is a vocabulary that allows you to annotate and validate JSON documents. It is a powerful tool for validating the structure of JSON data.

If you’ve ever used XML Schema you probably already know what a schema is, but If all that sounds unfamiliar to you, you are at the right place. To define what JSON Schema is, we should probably first learn what JSON is?

JSON stands for “JavaScript Object Notation”, a simple data interchange format. It began as a notation for the worldwide web. Since JavaScript exists in most web browsers, and JSON is based on JavaScript, it’s very easy to support there. However, it has proven useful and simple enough that it is now used in many other contexts that don’t involve web surfing.

For more detail read JSON.

Since JSON Schema is itself a JSON, it’s not always easy to tell when something is JSON Schema or just an arbitrary chunk of JSON. The $schema keyword is used to declare that something is JSON Schema. It’s generally good practice to include it, though it is not required.

{ “$schema”: “http://json-schema.org/schema#” }

Let’s try to understand the following JSON schema declaring a property name.

 
"name" : {
      "type" : "string",
      "description" : "person name"
   }

As you can see here we have JSON property “name” of type String which basically will be used to hold the person name.

So far we have understood why we need JSON schema. Now let’s dive deep into JSON Schema.

Type-specific keyword:

The type keyword is fundamental to JSON Schema. It specifies the data type for a JSON property.

Let’s look into type in more detail with attributes like length, regular expression, and format as well.

String

String:

The string type is used for textual data. It can also contain Unicode characters.

{ "type": "string" }

The following are the valid string.

“This is a string”
“Déjà vu”
“”
“63”

Length: The length of a string can be constrained using the minLength and maxLength keywords. For both keywords, the value must be a non-negative number.

{
  "type": "string",
  "minLength": 2,
  "maxLength": 3
}

Regular Expressions: The pattern keyword is used to restrict a string to a particular regular expression.

Format: The format keyword allows for basic semantic validation on string values that are commonly used. This allows values to be constrained beyond what the other tools in JSON Schema, including Regular Expressions, can do.

{
"format":"date-time"
}

Numeric Types

Numeric type:

There are two numeric types in JSON Schema: integer and number. They share the same validation keywords.

integer: The integer type is used for integral numbers.

{ "type": "integer" }

Example of a valid integer.

45
-1

number: The number type is used for any numeric type, either integers or floating-point numbers.

{ "type": "number" }

The following are the sample valid number.
42
-1
Simple floating-point number:

5.0
The exponential notation also works:
2.99792458e8

Note: Numbers as strings are rejected. For example “42” as numeric will be rejected.

multipleOf: Numbers can be restricted to a multiple of a given number, using the multipleOf keyword. It may be set to any positive number.

{
    "type"       : "number",
    "multipleOf" : 10
}

The following are the valid sample values but 23 will be rejected as it is not a valid 10 multiple.

0
10
20

Range: Ranges of numbers are specified using a combination of the minimum and maximum keywords.

{
      "type" : "integer",
      "maximum" : 10000,
      "minimum" : 1
    }

Object

Object

Objects are key-value pairs in JSON, they define mapping type and map keys to values. In JSON, the keys must always be a string. Each of these pairs is conventionally referred to as a property of the given JSON object.

{ "type": "object" }

// Sample valid JSON Objects

// Example 1
{
   "key1"  "value 1",
   "key2" : "value 2"
}

// Example 2
[
 { 
    "name"  : "user 1",
    "email" : "user1@email.com",
    "age"   : 19 
 },
 { 
    "name"  : "user 2",
    "email" : "user2@email.com",
    "age"   : 29 
 }
]

// Sample invalid JSON Objects

// Example 1
{
    0.01 : "cm"
    1    : "m",
    1000 : "km"
}

// Example 2
["An", "array", "not", "an", "object"]

Properties: The properties (key-value pairs) on an object are defined using the properties keyword. The value of properties in an object, where each key is the name of a property, and each value is a JSON schema used to validate that property.

For example, let’s say we want to define a JSON schema for an address made up of a street no, street name, and street type:

{
  "type": "object",
  "properties": {
    "street_no":   { "type": "number" },
    "street_name": { "type": "string" },
    "street_type": { "type": "string",
                     "enum": ["Street", "Avenue", "Boulevard"]
                   }
  }
}

Now if we provide the following JSON it will be validated successfully.

{ “street_no”: 1600, “street_name”: “street name 1”, “street_type”: “Avenue” }

However, if we provided the following JSON it will fail during validation.

{ “street_no”: “1600”, “street_name”: “street name 1”, “street_type”: “Avenue” }

Here we are providing street_no in a string that will be rejected by JSON schema as it is expecting a number without quotes.

In addition to that, leaving out JSON properties is valid until we marked them as required.

Required Properties: By default, the properties defined by the properties keyword are not required. However, one can provide a list of required properties using the required keyword. Refer to the below JSON schema, it has four properties out of which only two properties are marked as required.

{
  "type": "object",
  "properties": {
    "name":      { "type": "string" },
    "email":     { "type": "string" },
    "address":   { "type": "string" },
    "phone":     { "type": "string" }
  },
  "required": ["name", "email"]
}

Let see the valid JSON sample for this schema.

 
// 1 
{
  "name": "tonny",
  "email": "tonny@someemail.com"
}

// 2 

  "name": "tonny",
  "email": "tonny@someemail.com",
  "address": "tonny aaddress, city, state, country",
  "phone": "XXXXXXXXXXXX"
}

But if we provide a JSON that misses name or email or both will be rejected by the JSON schema defined above. For example, refer to the below JSON sample.

 
{
  "name": "tonny",
   "address": "tonny aaddress, city, state, country",
   "phone": "XXXXXXXXXXXX"
}

The required keyword takes an array of zero or more strings. Each of these strings must be unique.

Size: The number of properties on an object can be restricted using the minProperties and maxProperties keywords. Each of these must be a non-negative integer. Refer to the below sample JSON Schema.

{
  "type": "object",
  "minProperties": 2,
  "maxProperties": 3
}

Array

Arrays: Arrays are used for ordered elements. In JSON, each element in an array may be of a different type.

{ "type": "array" }

// Sample valid JSON array
[1, 2, 3, 4, 5]

[3, "different", { "types" : "of values" }]

// Sample invalid JSON array

{"Not": "an array"}

Items: By default, the elements of the array may be anything at all. However, it’s often useful to validate the items of the array against some schema as well. This is done using the items, additionalItems, and contains keywords.

There are two ways in which arrays are generally used in JSON:

List validation: a sequence of arbitrary length where each item matches the same schema.

Tuple validation: a sequence of fixed length where each item may have a different schema. In this usage, the index (or location) of each item is meaningful as to how the value is interpreted.

List validation: List validation is useful for arrays of arbitrary length where each item matches the same schema. For this kind of array, set the items keyword to a single schema that will be used to validate all of the items in the array. When an item is a single schema, the additionalitems keyword is meaningless, and it should not be used.

Consider the following JSON schema, where we define that each item in an array is a number:

{
  "type": "array",
  "items": {
    "type": "number"
  }
}

Array [1, 2, 3, 4, 5] will be validated successfully for above schema, however, [1, 2, “3”, 4, 5] will be failed during validation, because it has one element as string.

Note: The empty array ([]) is always valid.

Length: The length of the array can be specified using the minItems and maxItems keywords. The value of each keyword must be a non-negative number. These keywords work whether doing List validation or Tuple validation.

{
  "type": "array",
  "minItems": 2,
  "maxItems": 3
}

Refer the above JSON schema, according to this let see the valid and invalid JSON array.

[] invalid, as it an empty array
[1] invalid, only have one element
[1, 2] valid
[1, 2, 3] valid
[1, 2, 3, 4] invalid, have more than element

Uniqueness: we can even use the JSON array to accept unique items. Simply set the unique items keyword to true.

{
  "type": "array",
  "uniqueItems": true
}

[1, 2, 3, 4, 5] valid
[1, 2, 3, 3, 4] invalid
[] The empty array always passes:

boolean: The boolean type matches only two special values: true and false. Note that values that evaluate to true or false, such as 1 and 0, are not accepted by the schema.

{ "type": "boolean" }

true, valid
false, valid
“true”, invalid
Values that evaluate to true or false are still not accepted by the schema:
0, invalid
1, invalid

null: The null type is generally used to represent a missing value. But when a schema specifies a type of null, it has only one acceptable value: null.

{ "type": "null" }

null, valid
false, invalid
0, invalid
“”, invalid

I have also created a JSON schema example using Java. Get the complete code from Github https://github.com/Kuldeep-Rana/JSON_WITH_SCHEMA.git.

Reference:

json-schema.org

Happy Learning !!

Leave a Comment