What is schematics?

Schematics is a python lib that assists the design, conversion and validation of data structures. This lib eliminates the need to manually create and manipulate payloads and their structure to the context of the application. To start, we use use a pip install like this:

pip install schematics

In this showcase, we are going to manipulate a complex data schema using schematics. The data arrives in a json format, as shown below:

{
    "_id": "641505586f0763093fe5de82",
    "index": 0,
    "guid": "3e591c0d-5812-4c1e-9062-91d67db36326",
    "isActive": True,
    "balance": "$3,657.06",
    "picture": "https://picsum.photos/200",
    "age": 22,
    "eyeColor": "brown",
    "name": "Whitehead Navarro",
    "gender": "male",
    "company": "ROUGHIES",
    "email": "whiteheadnavarro@roughies.com",
    "phone": "(843) 429-2875",
    "address": "934 River Street, Sparkill, New Mexico, 4866",
    "about": "Mollit enim aute sint enim ut eiusmod dolore dolore veniam. Esse ad consequat pariatur "
                "cupidatat qui deserunt proident minim irure. Proident labore minim ex voluptate ea ut excepteur "
                "duis ad minim quis incididunt labore. Laborum sit aliqua aliqua et ad qui qui quis ullamco. "
                "Voluptate voluptate consectetur nostrud amet enim. In Lorem voluptate fugiat duis. Ut proident "
                "ipsum minim do fugiat sunt laboris voluptate tempor aliqua aliquip deserunt sit.\r\n",
    "registered": "2018-08-26T04:14:49",
    "coordinates": [
        -0.642558,
        -154.849655
    ],
    "tags": [
        "pariatur",
        "consequat",
        "et",
        "amet",
        "fugiat",
        "non",
        "deserunt"
    ],
    "friends": [
        {
            "id": 0,
            "name": "Alejandra Kinney"
        },
        {
            "id": 1,
            "name": "Holmes Graves"
        },
        {
            "id": 2,
            "name": "Barbra Dominguez"
        }
    ],
    "greeting": "Hello, Whitehead Navarro! You have 7 unread messages.",
    "favoriteFruit": "strawberry",
    "favoriteMedia": {
        "name": "Better Call Saul",
        "year": "2022",
        "network": "AMC"
    }
}

1. Receiving and Parsing Data

To parse the data above using schematics, the first thing to do is create a person model:

import datetime
from schematics import Model
from schematics.transforms import blacklist, whitelist
from schematics.types import ModelType, StringType, IntType, UUIDType, BooleanType, URLType, EmailType, DateTimeType, \
    GeoPointType, ListType, serializable, PolyModelType

from model.friend import Friend
from model.game import Game
from model.movie import Movie
from model.schematics_types.currency_type import CurrencyType
from model.tv_show import TVShow
from model.validators import is_uppercase, is_email_valid, is_over_18


class Person(Model):
    id = StringType(deserialize_from='_id')
    index = IntType()
    guid = UUIDType()
    is_active = BooleanType(deserialize_from='isActive')
    balance = CurrencyType()
    picture = URLType()
    age = IntType(validators=[is_over_18])
    eye_color = StringType(deserialize_from='eyeColor')
    name = StringType(required=True)
    gender = StringType()
    company = StringType(validators=[is_uppercase])
    email = EmailType(validators=[is_email_valid], required=True)
    phone = StringType(required=True)
    address = StringType()
    about = StringType()
    registered = DateTimeType()
    coordinates = GeoPointType()
    tags = ListType(StringType)
    friends = ListType(ModelType(Friend))
    greeting = StringType()
    favorite_fruit = StringType(deserialize_from='favoriteFruit')
    favorite_media = PolyModelType([
        Movie,
        TVShow,
        Game
    ], deserialize_from='favoriteMedia')
    created_at = DateTimeType(default=datetime.datetime.now)

    @serializable
    def external_id(self):
        return u'%s-%s' % (self.index, self.id)

    class Options:
        serialize_when_none = False
        roles = {
            'public_person': blacklist('id', 'index', 'guid', 'is_active', 'balance'),
            'profile_info': whitelist('name', 'greeting', 'gender', 'picture', 'about', 'age')
        }

Loads of information here! Let’s unpack it all.

Assigning Types to Fields

A class Person was created which extends Schematics’ Model class. In this class, all variables are declared with a specific type provided by schematics or created by ourselves. Schematics provides types out of the box (such as StringType, UUIDType, IntType; check all the available types using Schematics documentation).

Those types are pretty handy to coerce and convert data to match our desired class schema. Also, with a type, field validation becomes very easy (more on that later). As seen above, it’s also possible to create custom types to fit our specific needs, like the class CurrencyType:

from schematics.types import FloatType


class CurrencyType(FloatType):
    def convert(self, value, context=None):
        if not isinstance(value, str):
            return value
        number = value.replace('$', '')
        return float(number.replace(',', ''))

In this example, the class CurrencyType extends Schematics’ FloatType, receives a value. If this value is a string, the method removes the dollar sign and the comma to adhere to the desired currency format.

List of Models

To represent a list/array of items of a model, we use ListType(ModelType()) and the name of the class type, just as in the example above: ListType(ModelType(Friend)). The Friend class represents a model as follows:

from schematics import Model
from schematics.transforms import wholelist
from schematics.types import IntType, StringType


class Friend(Model):
    id = IntType()
    name = StringType()

If the list consists of a simple type, like the parameter tags (a list with strings only), we declare a ListType() with the desired type, such as ListType(StringType).

Renaming Fields

Schematics does some automatic assigning in order to match the received field to the field of our model; when both of them share the same name, this process happens flawlessly. But what happens when we are receiving fields that must have different names in our context?

To solve that, it’s fairly easy: just add deserialize_from= followed by the original field name received to perform the matching. In the example above, fields id, eye_color and favorite_fruit are deserialized by using this attribute:

    favorite_fruit = StringType(deserialize_from='favoriteFruit')

This is a great way to easily normalize payloads with different nomenclature than snake_case, which is used by python.

Default Fields

In order to set a default value to a field, use default= attribute inside the desired field type. In the example above, the field created_at will have as default the date/time when the model was instantiated.

    created_at = DateTimeType(default=datetime.datetime.now)

Compound Fields

To create a compound field of existing fields in the class, create a function with the desired field name annotated with @serializable just like the example above with external_id, which is created by joining fields index and id. This allows the value to be accessed just like any other class field.

    @serializable
    def external_id(self):
        return u'%s-%s' % (self.index, self.id)

Defining Custom Validators

We can create a custom validator to check if received data is compliant. In order to do so, create a function with the desired validation which receives a value as parameter and raises a ValidationError exception in a negative scenario, like this:

import re

from schematics.exceptions import ValidationError


def is_uppercase(value):
    if value.upper() != value:
        raise ValidationError('Field should be uppercase.')
    return value


def is_email_valid(value):
    regex = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,7}\b'
    if not re.fullmatch(regex, value):
        raise ValidationError('E-mail address invalid.')
    return value


def is_over_18(value):
    if value < 18:
        raise ValidationError('User cannot be underage.')
    return value

To validate elements of the model, use validators=[] and the function name inside the brackets, like validators=[is_email_valid].

    email = EmailType(validators=[is_email_valid], required=True)

Required Fields

To set a field as required, use required=True on the type. If the value is not present at the moment of validation, it will throw an exception.


2. Accessing and Manipulating Data

With our model created, it’s time to output some data by loading the .json as a dict and passing as a parameter when instantiating Person class:

person = Person(json_input)

Validating Data

To validate if data received is compliant, we use the following method:

person.validate()

In our model, we created a validator that specifies that the field company should be uppercase. Failing to pass an uppercase string results in the following output when validated:

schematics.exceptions.DataError: {"company": ["Field should be uppercase."]}

Process finished with exit code 1

Ignoring Rogue Fields

To ignore extra fields not declared in the model but present in the input, we use the following validation attribute:

person.validate(strict=False)

If this parameter is not specified, passing a field in the schema which is not contained in the model results in the following output when validating:

schematics.exceptions.DataError: {"newField": "Rogue field"}

Process finished with exit code 1

Omitting Fields Without Value

To omit fields without value (assigned with None) when exporting a model, use serialize_when_none = False inside the inner class Options:

    class Options:
        serialize_when_none = False

Accessing Fields

With the model object created, it is easy to access the fields:

print(person.name, 'is', person.age, 'old.')

which outputs:

Whitehead Navarro is 22 old.

Process finished with exit code 0

Exporting to Json

To export the object to json, we use the following method:

json.dumps(person.to_primitive())

which outputs the json:

{"id": "641505586f0763093fe5de82", "index": 0, "guid": "3e591c0d-5812-4c1e-9062-91d67db36326", "is_active": true, "balance": 3657.06, "picture": "https://picsum.photos/200", "age": 22, "eye_color": "brown", "name": "Whitehead Navarro", "gender": "male", "company": "ROUGHIES", "email": "whiteheadnavarro@roughies.com", "phone": "(843) 429-2875", "address": "934 River Street, Sparkill, New Mexico, 4866", "about": "Mollit enim aute sint enim ut eiusmod dolore dolore veniam. Esse ad consequat pariatur cupidatat qui deserunt proident minim irure. Proident labore minim ex voluptate ea ut excepteur duis ad minim quis incididunt labore. Laborum sit aliqua aliqua et ad qui qui quis ullamco. Voluptate voluptate consectetur nostrud amet enim. In Lorem voluptate fugiat duis. Ut proident ipsum minim do fugiat sunt laboris voluptate tempor aliqua aliquip deserunt sit.\r\n", "registered": "2018-08-26T04:14:49.000000", "coordinates": [-0.642558, -154.849655], "tags": ["pariatur", "consequat", "et", "amet", "fugiat", "non", "deserunt"], "friends": [{"id": 0, "name": "Alejandra Kinney"}, {"id": 1, "name": "Holmes Graves"}, {"id": 2, "name": "Barbra Dominguez"}], "greeting": "Hello, Whitehead Navarro! You have 7 unread messages.", "favorite_fruit": "strawberry", "created_at": "2023-06-22T22:09:15.939875", "external_id": "0-641505586f0763093fe5de82"}

Process finished with exit code 0

Roles and Class Options

A role works like a filter when exporting data by using whitelist and blacklist options. As in the Person model above, we declare a role inside an Option subclass and listing values of the whitelist (values to export) or a blacklist (values to omit).

    class Options:
        roles = {
            'public_person': blacklist('id', 'index', 'guid', 'is_active', 'balance'),
            'profile_info': whitelist('name', 'greeting', 'gender', 'picture', 'about', 'age')
        }

This is the command to print only the whitelisted values:

print(person.to_primitive(role='profile_info'))

which outputs the values of pictures, age, name, gender, about and greeting:

{'picture': 'https://picsum.photos/200', 'age': 22, 'name': 'Whitehead Navarro', 'gender': 'male', 'about': 'Mollit enim aute sint enim ut eiusmod dolore dolore veniam. Esse ad consequat pariatur cupidatat qui deserunt proident minim irure. Proident labore minim ex voluptate ea ut excepteur duis ad minim quis incididunt labore. Laborum sit aliqua aliqua et ad qui qui quis ullamco. Voluptate voluptate consectetur nostrud amet enim. In Lorem voluptate fugiat duis. Ut proident ipsum minim do fugiat sunt laboris voluptate tempor aliqua aliquip deserunt sit.\r\n', 'greeting': 'Hello, Whitehead Navarro! You have 7 unread messages.'}

Process finished with exit code 0

When outputting all values, but those blacklisted, we use:

print(person.to_primitive(role='public_person'))

which omits the fiels id, index, guid, is_active and balance:

{'picture': 'https://picsum.photos/200', 'age': 22, 'eye_color': 'brown', 'name': 'Whitehead Navarro', 'gender': 'male', 'company': 'ROUGHIES', 'email': 'whiteheadnavarro@roughies.com', 'phone': '(843) 429-2875', 'address': '934 River Street, Sparkill, New Mexico, 4866', 'about': 'Mollit enim aute sint enim ut eiusmod dolore dolore veniam. Esse ad consequat pariatur cupidatat qui deserunt proident minim irure. Proident labore minim ex voluptate ea ut excepteur duis ad minim quis incididunt labore. Laborum sit aliqua aliqua et ad qui qui quis ullamco. Voluptate voluptate consectetur nostrud amet enim. In Lorem voluptate fugiat duis. Ut proident ipsum minim do fugiat sunt laboris voluptate tempor aliqua aliquip deserunt sit.\r\n', 'registered': '2018-08-26T04:14:49.000000', 'coordinates': [-0.642558, -154.849655], 'tags': ['pariatur', 'consequat', 'et', 'amet', 'fugiat', 'non', 'deserunt'], 'friends': [{'id': 0, 'name': 'Alejandra Kinney'}, {'id': 1, 'name': 'Holmes Graves'}, {'id': 2, 'name': 'Barbra Dominguez'}], 'greeting': 'Hello, Whitehead Navarro! You have 7 unread messages.', 'favorite_fruit': 'strawberry', 'created_at': '2023-06-22T22:17:12.004692', 'external_id': '0-641505586f0763093fe5de82'}

Process finished with exit code 0

Mocking the Response

To test our code, we can use Schematics to mock the model values:

print(Person.get_mock_object().to_primitive())

The mock has random values based on the variable typing:

{'index': 14, 'picture': 'http://aiV9Q.ZZ', 'age': 12, 'eye_color': 'N', 'name': 'hgqn8YPvDqLbf', 'company': 'JQ', 'email': 'ER@example.com', 'phone': 'DQVbv9q', 'address': '61sOKCSGGOkV', 'coordinates': (-83, 53), 'greeting': 'fsxF', 'created_at': '2127-03-09T03:39:51.357227+1130', 'external_id': '14-None'}

Process finished with exit code 0

3. Advanced Modeling: Polymorphism

It’s also possible to perform polymorphism of types with Schematics! When we have a field which can have multiple types, we can declare it like this:

favorite_media = PolyModelType([
    Movie,
    TVShow,
    Game
], deserialize_from='favoriteMedia')

The classes Movie, TVShow and Game represent the accepted types of favorite media:

from schematics import Model
from schematics.types import StringType, IntType


class Movie(Model):
    name = StringType()
    year = IntType()
from schematics import Model
from schematics.types import StringType, IntType


class TVShow(Model):
    name = StringType()
    year = IntType()
    network = StringType()

    @classmethod
    def _claim_polymorphic(cls, data):
        return data.get('network')
from schematics import Model
from schematics.types import StringType, IntType


class Game(Model):
    name = StringType()
    year = IntType()
    console = StringType()

    @classmethod
    def _claim_polymorphic(cls, data):
        return data.get('console')

Our models are very similar. In this case, we must implement the _claim_polymorphic method, which helps Schematics differentiate between the models based on the payload. In this case we return a data.get with the attribute which is unique between all classes.

To check the favorite media type selected by schematics after ingesting the received data, we use:

print(person.favorite_media)

which outputs:

<TVShow instance>

Process finished with exit code 0

The output is set as a TvShow instance because we informed a network attribute in our received schema.

To access the name of the media, we can just do:

print(person.favorite_media.name)

which outputs:

Better Call Saul

Process finished with exit code 0

If we change the input to represent a movie, for example, with an input like this:

  "favoriteMedia": {
      "name": "Everything Everywhere All At Once",
      "year": "2022"
  }

When outputting the favorite media type, we will get:

<Movie instance>

Process finished with exit code 0

That’s it! To access this implementation, check it out the source code on my Github.

References

Schematics Python Documentation