The AI Organization by David Carmona

The AI Organization by David Carmona

Author:David Carmona
Language: eng
Format: epub
Publisher: O'Reilly Media
Published: 2019-11-18T16:00:00+00:00


Consolidating Your Data Estate: Data Hubs

After that brief crash course on data technologies, hopefully you now understand why consolidating a data estate is so difficult. Operational data stores, data warehouses, data lakes, knowledge graphs—each of these has its own purpose, and it’s difficult to standardize on just one. Even if you could, having only one data store for the entire organization is not practical, because different departments or functional areas need the flexibility of their own stores to be agile. Branches or even entire companies in the case of acquisitions will also have their own entire data estates that will need to coexist with the rest.

Instead of attempting to consolidate everything in one data store, many organizations are embracing a more practical approach known as a data hub. This architectural pattern is focused on connecting and exposing the existing data sources in an organization—instead of bringing all the data into the same store, a data hub is a hybrid approach that allows you to keep the data in multiple stores, such as data lakes or data warehouses, but centralize the access to and management of that data.

More than a technology, a data hub is a strategy. Instead of consumers of data (such as AI solutions) having to discover and connect with each individual data source, they can have a unified view of the data in the data hub. This hugely simplifies the development of solutions across multiple previously siloed sources.

Because it’s more of a strategy than a technology, a data hub is very flexible. You can decide which capabilities you want to centralize and which you want to leave up to each data source. There are three important capabilities to consider. In some cases you will want all of them to be handled by the data hub, and in other cases perhaps none of them:

Data semantics

A data hub can add semantics to unstructured data or unify semantics from disparate structured data sources. This is extremely powerful for data sources such as data lakes that lack structured semantics, and when data is structured but not unified (for example, data sources in different branches or from acquired companies).

Data storage

A data hub can physically copy the data from a data source or just refer to it. In the first case, the consumer of the data won’t need to access the original data source. In the second case, the level of complexity is lower because it doesn’t have to deal with loading, synchronization, or storage issues.

Data governance

A data hub brings an excellent opportunity to centralize the data governance in an organization. Because all consumers of data will connect through the data hub, you can enforce aspects like security, access control, data quality, privacy, or compliance.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.