Schematics Python: Overview and Tutorial
by Graciele Damasceno
What is schematics?
Schematics is a python lib that assists the design, conversion and validation of data structures. This lib eliminates the need to manually create and manipulate payloads and their structure to the context of the application. To start, we use use a pip install like this:
pip install schematics
In this showcase, we are going to manipulate a complex data schema using schematics. The data arrives in a json
format, as shown below:
{
"_id": "641505586f0763093fe5de82",
"index": 0,
"guid": "3e591c0d-5812-4c1e-9062-91d67db36326",
"isActive": True,
"balance": "$3,657.06",
"picture": "https://picsum.photos/200",
"age": 22,
"eyeColor": "brown",
"name": "Whitehead Navarro",
"gender": "male",
"company": "ROUGHIES",
"email": "whiteheadnavarro@roughies.com",
"phone": "(843) 429-2875",
"address": "934 River Street, Sparkill, New Mexico, 4866",
"about": "Mollit enim aute sint enim ut eiusmod dolore dolore veniam. Esse ad consequat pariatur "
"cupidatat qui deserunt proident minim irure. Proident labore minim ex voluptate ea ut excepteur "
"duis ad minim quis incididunt labore. Laborum sit aliqua aliqua et ad qui qui quis ullamco. "
"Voluptate voluptate consectetur nostrud amet enim. In Lorem voluptate fugiat duis. Ut proident "
"ipsum minim do fugiat sunt laboris voluptate tempor aliqua aliquip deserunt sit.\r\n",
"registered": "2018-08-26T04:14:49",
"coordinates": [
-0.642558,
-154.849655
],
"tags": [
"pariatur",
"consequat",
"et",
"amet",
"fugiat",
"non",
"deserunt"
],
"friends": [
{
"id": 0,
"name": "Alejandra Kinney"
},
{
"id": 1,
"name": "Holmes Graves"
},
{
"id": 2,
"name": "Barbra Dominguez"
}
],
"greeting": "Hello, Whitehead Navarro! You have 7 unread messages.",
"favoriteFruit": "strawberry",
"favoriteMedia": {
"name": "Better Call Saul",
"year": "2022",
"network": "AMC"
}
}
1. Receiving and Parsing Data
To parse the data above using schematics, the first thing to do is create a person model:
import datetime
from schematics import Model
from schematics.transforms import blacklist, whitelist
from schematics.types import ModelType, StringType, IntType, UUIDType, BooleanType, URLType, EmailType, DateTimeType, \
GeoPointType, ListType, serializable, PolyModelType
from model.friend import Friend
from model.game import Game
from model.movie import Movie
from model.schematics_types.currency_type import CurrencyType
from model.tv_show import TVShow
from model.validators import is_uppercase, is_email_valid, is_over_18
class Person(Model):
id = StringType(deserialize_from='_id')
index = IntType()
guid = UUIDType()
is_active = BooleanType(deserialize_from='isActive')
balance = CurrencyType()
picture = URLType()
age = IntType(validators=[is_over_18])
eye_color = StringType(deserialize_from='eyeColor')
name = StringType(required=True)
gender = StringType()
company = StringType(validators=[is_uppercase])
email = EmailType(validators=[is_email_valid], required=True)
phone = StringType(required=True)
address = StringType()
about = StringType()
registered = DateTimeType()
coordinates = GeoPointType()
tags = ListType(StringType)
friends = ListType(ModelType(Friend))
greeting = StringType()
favorite_fruit = StringType(deserialize_from='favoriteFruit')
favorite_media = PolyModelType([
Movie,
TVShow,
Game
], deserialize_from='favoriteMedia')
created_at = DateTimeType(default=datetime.datetime.now)
@serializable
def external_id(self):
return u'%s-%s' % (self.index, self.id)
class Options:
serialize_when_none = False
roles = {
'public_person': blacklist('id', 'index', 'guid', 'is_active', 'balance'),
'profile_info': whitelist('name', 'greeting', 'gender', 'picture', 'about', 'age')
}
Loads of information here! Let’s unpack it all.
Assigning Types to Fields
A class Person was created which extends Schematics’ Model
class. In this class, all variables are declared with a specific type provided by schematics or created by ourselves. Schematics provides types out of the box (such as StringType
, UUIDType
, IntType
; check all the available types using Schematics documentation).
Those types are pretty handy to coerce and convert data to match our desired class schema. Also, with a type, field validation becomes very easy (more on that later).
As seen above, it’s also possible to create custom types to fit our specific needs, like the class CurrencyType
:
from schematics.types import FloatType
class CurrencyType(FloatType):
def convert(self, value, context=None):
if not isinstance(value, str):
return value
number = value.replace('$', '')
return float(number.replace(',', ''))
In this example, the class CurrencyType
extends Schematics’ FloatType
, receives a value. If this value is a string, the method removes the dollar sign and the comma to adhere to the desired currency format.
List of Models
To represent a list/array of items of a model, we use ListType(ModelType())
and the name of the class type, just as in the example above: ListType(ModelType(Friend))
. The Friend
class represents a model as follows:
from schematics import Model
from schematics.transforms import wholelist
from schematics.types import IntType, StringType
class Friend(Model):
id = IntType()
name = StringType()
If the list consists of a simple type, like the parameter tags
(a list with strings only), we declare a ListType()
with the desired type, such as ListType(StringType)
.
Renaming Fields
Schematics does some automatic assigning in order to match the received field to the field of our model; when both of them share the same name, this process happens flawlessly. But what happens when we are receiving fields that must have different names in our context?
To solve that, it’s fairly easy: just add deserialize_from=
followed by the original field name received to perform the matching. In the example above, fields id
, eye_color
and favorite_fruit
are deserialized by using this attribute:
favorite_fruit = StringType(deserialize_from='favoriteFruit')
This is a great way to easily normalize payloads with different nomenclature than snake_case, which is used by python.
Default Fields
In order to set a default value to a field, use default=
attribute inside the desired field type. In the example above, the field created_at
will have as default the date/time when the model was instantiated.
created_at = DateTimeType(default=datetime.datetime.now)
Compound Fields
To create a compound field of existing fields in the class, create a function with the desired field name annotated with @serializable
just like the example above with external_id
, which is created by joining fields index
and id
. This allows the value to be accessed just like any other class field.
@serializable
def external_id(self):
return u'%s-%s' % (self.index, self.id)
Defining Custom Validators
We can create a custom validator to check if received data is compliant. In order to do so, create a function with the desired validation which receives a value as parameter and raises a ValidationError
exception in a negative scenario, like this:
import re
from schematics.exceptions import ValidationError
def is_uppercase(value):
if value.upper() != value:
raise ValidationError('Field should be uppercase.')
return value
def is_email_valid(value):
regex = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,7}\b'
if not re.fullmatch(regex, value):
raise ValidationError('E-mail address invalid.')
return value
def is_over_18(value):
if value < 18:
raise ValidationError('User cannot be underage.')
return value
To validate elements of the model, use validators=[]
and the function name inside the brackets, like validators=[is_email_valid]
.
email = EmailType(validators=[is_email_valid], required=True)
Required Fields
To set a field as required, use required=True
on the type. If the value is not present at the moment of validation, it will throw an exception.
2. Accessing and Manipulating Data
With our model created, it’s time to output some data by loading the .json
as a dict and passing as a parameter when instantiating Person
class:
person = Person(json_input)
Validating Data
To validate if data received is compliant, we use the following method:
person.validate()
In our model, we created a validator that specifies that the field company should be uppercase. Failing to pass an uppercase string results in the following output when validated:
schematics.exceptions.DataError: {"company": ["Field should be uppercase."]}
Process finished with exit code 1
Ignoring Rogue Fields
To ignore extra fields not declared in the model but present in the input, we use the following validation attribute:
person.validate(strict=False)
If this parameter is not specified, passing a field in the schema which is not contained in the model results in the following output when validating:
schematics.exceptions.DataError: {"newField": "Rogue field"}
Process finished with exit code 1
Omitting Fields Without Value
To omit fields without value (assigned with None
) when exporting a model, use serialize_when_none = False
inside the inner class Options
:
class Options:
serialize_when_none = False
Accessing Fields
With the model object created, it is easy to access the fields:
print(person.name, 'is', person.age, 'old.')
which outputs:
Whitehead Navarro is 22 old.
Process finished with exit code 0
Exporting to Json
To export the object to json, we use the following method:
json.dumps(person.to_primitive())
which outputs the json:
{"id": "641505586f0763093fe5de82", "index": 0, "guid": "3e591c0d-5812-4c1e-9062-91d67db36326", "is_active": true, "balance": 3657.06, "picture": "https://picsum.photos/200", "age": 22, "eye_color": "brown", "name": "Whitehead Navarro", "gender": "male", "company": "ROUGHIES", "email": "whiteheadnavarro@roughies.com", "phone": "(843) 429-2875", "address": "934 River Street, Sparkill, New Mexico, 4866", "about": "Mollit enim aute sint enim ut eiusmod dolore dolore veniam. Esse ad consequat pariatur cupidatat qui deserunt proident minim irure. Proident labore minim ex voluptate ea ut excepteur duis ad minim quis incididunt labore. Laborum sit aliqua aliqua et ad qui qui quis ullamco. Voluptate voluptate consectetur nostrud amet enim. In Lorem voluptate fugiat duis. Ut proident ipsum minim do fugiat sunt laboris voluptate tempor aliqua aliquip deserunt sit.\r\n", "registered": "2018-08-26T04:14:49.000000", "coordinates": [-0.642558, -154.849655], "tags": ["pariatur", "consequat", "et", "amet", "fugiat", "non", "deserunt"], "friends": [{"id": 0, "name": "Alejandra Kinney"}, {"id": 1, "name": "Holmes Graves"}, {"id": 2, "name": "Barbra Dominguez"}], "greeting": "Hello, Whitehead Navarro! You have 7 unread messages.", "favorite_fruit": "strawberry", "created_at": "2023-06-22T22:09:15.939875", "external_id": "0-641505586f0763093fe5de82"}
Process finished with exit code 0
Roles and Class Options
A role works like a filter when exporting data by using whitelist
and blacklist
options. As in the Person
model above, we declare a role inside an Option subclass and listing values of the whitelist
(values to export) or a blacklist
(values to omit).
class Options:
roles = {
'public_person': blacklist('id', 'index', 'guid', 'is_active', 'balance'),
'profile_info': whitelist('name', 'greeting', 'gender', 'picture', 'about', 'age')
}
This is the command to print only the whitelisted values:
print(person.to_primitive(role='profile_info'))
which outputs the values of pictures
, age
, name
, gender
, about
and greeting
:
{'picture': 'https://picsum.photos/200', 'age': 22, 'name': 'Whitehead Navarro', 'gender': 'male', 'about': 'Mollit enim aute sint enim ut eiusmod dolore dolore veniam. Esse ad consequat pariatur cupidatat qui deserunt proident minim irure. Proident labore minim ex voluptate ea ut excepteur duis ad minim quis incididunt labore. Laborum sit aliqua aliqua et ad qui qui quis ullamco. Voluptate voluptate consectetur nostrud amet enim. In Lorem voluptate fugiat duis. Ut proident ipsum minim do fugiat sunt laboris voluptate tempor aliqua aliquip deserunt sit.\r\n', 'greeting': 'Hello, Whitehead Navarro! You have 7 unread messages.'}
Process finished with exit code 0
When outputting all values, but those blacklisted, we use:
print(person.to_primitive(role='public_person'))
which omits the fiels id
, index
, guid
, is_active
and balance
:
{'picture': 'https://picsum.photos/200', 'age': 22, 'eye_color': 'brown', 'name': 'Whitehead Navarro', 'gender': 'male', 'company': 'ROUGHIES', 'email': 'whiteheadnavarro@roughies.com', 'phone': '(843) 429-2875', 'address': '934 River Street, Sparkill, New Mexico, 4866', 'about': 'Mollit enim aute sint enim ut eiusmod dolore dolore veniam. Esse ad consequat pariatur cupidatat qui deserunt proident minim irure. Proident labore minim ex voluptate ea ut excepteur duis ad minim quis incididunt labore. Laborum sit aliqua aliqua et ad qui qui quis ullamco. Voluptate voluptate consectetur nostrud amet enim. In Lorem voluptate fugiat duis. Ut proident ipsum minim do fugiat sunt laboris voluptate tempor aliqua aliquip deserunt sit.\r\n', 'registered': '2018-08-26T04:14:49.000000', 'coordinates': [-0.642558, -154.849655], 'tags': ['pariatur', 'consequat', 'et', 'amet', 'fugiat', 'non', 'deserunt'], 'friends': [{'id': 0, 'name': 'Alejandra Kinney'}, {'id': 1, 'name': 'Holmes Graves'}, {'id': 2, 'name': 'Barbra Dominguez'}], 'greeting': 'Hello, Whitehead Navarro! You have 7 unread messages.', 'favorite_fruit': 'strawberry', 'created_at': '2023-06-22T22:17:12.004692', 'external_id': '0-641505586f0763093fe5de82'}
Process finished with exit code 0
Mocking the Response
To test our code, we can use Schematics to mock the model values:
print(Person.get_mock_object().to_primitive())
The mock has random values based on the variable typing:
{'index': 14, 'picture': 'http://aiV9Q.ZZ', 'age': 12, 'eye_color': 'N', 'name': 'hgqn8YPvDqLbf', 'company': 'JQ', 'email': 'ER@example.com', 'phone': 'DQVbv9q', 'address': '61sOKCSGGOkV', 'coordinates': (-83, 53), 'greeting': 'fsxF', 'created_at': '2127-03-09T03:39:51.357227+1130', 'external_id': '14-None'}
Process finished with exit code 0
3. Advanced Modeling: Polymorphism
It’s also possible to perform polymorphism of types with Schematics! When we have a field which can have multiple types, we can declare it like this:
favorite_media = PolyModelType([
Movie,
TVShow,
Game
], deserialize_from='favoriteMedia')
The classes Movie
, TVShow
and Game
represent the accepted types of favorite media:
from schematics import Model
from schematics.types import StringType, IntType
class Movie(Model):
name = StringType()
year = IntType()
from schematics import Model
from schematics.types import StringType, IntType
class TVShow(Model):
name = StringType()
year = IntType()
network = StringType()
@classmethod
def _claim_polymorphic(cls, data):
return data.get('network')
from schematics import Model
from schematics.types import StringType, IntType
class Game(Model):
name = StringType()
year = IntType()
console = StringType()
@classmethod
def _claim_polymorphic(cls, data):
return data.get('console')
Our models are very similar. In this case, we must implement the _claim_polymorphic
method, which helps Schematics differentiate between the models based on the payload. In this case we return a data.get
with the attribute which is unique between all classes.
To check the favorite media type selected by schematics after ingesting the received data, we use:
print(person.favorite_media)
which outputs:
<TVShow instance>
Process finished with exit code 0
The output is set as a TvShow
instance because we informed a network
attribute in our received schema.
To access the name of the media, we can just do:
print(person.favorite_media.name)
which outputs:
Better Call Saul
Process finished with exit code 0
If we change the input to represent a movie, for example, with an input like this:
"favoriteMedia": {
"name": "Everything Everywhere All At Once",
"year": "2022"
}
When outputting the favorite media type, we will get:
<Movie instance>
Process finished with exit code 0
That’s it! To access this implementation, check it out the source code on my Github.
References
Subscribe via RSS