Big Data: Principles and best practices of scalable realtime data systems by Nathan Marz & James Warren

Big Data: Principles and best practices of scalable realtime data systems by Nathan Marz & James Warren

Author:Nathan Marz & James Warren
Language: eng
Format: epub, mobi
Publisher: Manning Publications


The next step is to select a single user identifier for each person. This is the most sophisticated portion of the workflow, as it involves a fully distributed iterative graph algorithm. Despite its complexity, it only requires a few small pipe diagrams to solve it. With the appropriate tooling, you can implement it in only about 100 lines of code (as will be demonstrated in the next chapter).

User IDs are marked as belonging to the same person via equiv edges. If you were to visualize these edges from a dataset, you’d see numerous independent subgraphs, as shown in figure 8.7.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.