Driving Data Quality with Data Contracts by Andrew Jones
Author:Andrew Jones
Language: eng
Format: epub
Publisher: Packt
Published: 2023-11-15T00:00:00+00:00
Using a schema registry as the source of truth
The schemas weâve implemented can be used by both data generators and data consumers in several different applications.
Both Apache Avro and Protocol Buffers schemas can be used to generate source code. As binary formats, this code must be used by the data generators to write data that conforms to the schema and is serialized correctly. The data consumers also need to use the generated code to deserialize the binary representation into something their code can understand.
While JSON Schema events are serialized in the widely used and text-based JSON format, the schemas can be loaded by libraries to help write the data in the correct format and to run the validation checks.
These schemas can also be used in Continuous Integration (CI) checks, giving both the generators and consumers confidence that their code is using the data models correctly as they develop their services.
Furthermore, as open formats, these schemas can often be ingested into other applications or used to define resources such as a table in a data warehouse. We discuss these use cases in more detail later in this chapter, in the Defining governance and controls section.
When using the schemas across many different applications, we need to ensure they are kept in sync. So, when one application refers to version 1 of our Customer schema, that needs to be the same schema as the next application that refers to it.
We achieve this by creating a central service to store these schemas. This makes the schemas accessible to any application that needs them and acts as our source of truth for those schemas. We call this the schema registry.
Depending on our requirements, the schema registry can be as simple as a Git repository or a shared folder on a distributed filesystem such as Amazon S3, or a service that presents a rich API for the saving and retrieving of schemas and performing compatibility checks. Whatever we choose to use as our registry, it should be capable of the following:
Publishing a new schema
Publishing an updated version of an existing schema
Retrieving a schema with a particular version, including those superseded by a newer version
Retrieving the latest version of a schema
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Access | Data Mining |
Data Modeling & Design | Data Processing |
Data Warehousing | MySQL |
Oracle | Other Databases |
Relational Databases | SQL |
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(7750)
Learning SQL by Alan Beaulieu(5265)
Weapons of Math Destruction by Cathy O'Neil(4848)
Blockchain Basics by Daniel Drescher(2781)
Pandas Cookbook by Theodore Petrou(2428)
Mastering Python for Finance by Unknown(2319)
How The Mind Works by Steven Pinker(2120)
Hands-On Machine Learning for Algorithmic Trading by Stefan Jansen(2084)
Big Data Analysis with Python by Ivan Marin(2054)
Building Machine Learning Systems with Python by Richert Willi Coelho Luis Pedro(1977)
Python Natural Language Processing by Jalaj Thanaki(1850)
Python Machine Learning Case Studies by Danish Haroon(1719)
Mastering Machine Learning Algorithms by Giuseppe Bonaccorso(1690)
Mastering Python Data Visualization by Kirthi Raman(1616)
Natural Language Processing with Java Cookbook by Richard M. Reese(1555)
Python for Finance: Analyze Big Financial Data by Yves Hilpisch(1541)
Everybody Lies by Seth Stephens-Davidowitz(1526)
Network Science with Python and NetworkX Quick Start Guide by Edward L. Platt(1499)
Don't Make Me Think, Revisited: A Common Sense Approach to Web Usability by Steve Krug(1491)
