Home > Computers & Technology > Databases & Big Data > Data Processing

Practical Data Privacy by Katharine Jarmul

Author:Katharine Jarmul [Katharine Jarmul] , Date: July 23, 2022 ,Views: 60

Practical Data Privacy by Katharine Jarmul

Author:Katharine Jarmul [Katharine Jarmul]
Language: eng
Format: epub, mobi
Publisher: O'Reilly Media, Inc.
Published: 2023-07-24T16:00:00+00:00

Understanding Differential Privacy

To help you better understand how differential privacy actually works with your data, letâs walk through a few implementation examples.

Weâll start by analyzing a real-world use case, and then build our own mechanism and consider its privacy guarantees.

Differential Privacy in Practice: Anonymizing the US Census

The Constitution of the United States of America calls for a full census of all persons every ten years. This count is used for numerous significant decisions, including representation in Congress, federal funding and monetary support for state initiatives. Getting it rightâensuring that everyone is counted but only once-requires a huge effort. The privacy implications are equally significant.

In the past, the US Census Bureau has used a variety of obfuscation methods to ensure âanonymizationâ of the results. These included combinations of aggregation and a method called shuffling (or sometimes scrambling), which took census block data and shuffled the households so that the census blocks were mixed with one another. Because the methods used retained lots of information about the individuals, it allowed for private information to leak in ways the original Census workers did not anticipate.

To determine the potential for outsiders to re-identify households in the released data, the Census Bureau ran several attacks on the 2010 Census results. Reconstructing age, gender, race/ethnicity combinations revealed correctly reidentified data with 38% accuracy by combining the data with an external source that was readily available. These external sources often had complete identity information (names, contact information and other details). This could have been a consumer database, a voting or driving record database (for adults) or even an insurance database. For smaller census blocks, they were able to perform this re-identification with much higher success. With ubiquitous data available for free or low prices or when performed by a company with large access to consumer and household data like a large e-commerce provider, this type of attack is not only feasible, it is actively used for direct marketing campaigns and targeted advertising.

How exactly did these reconstruction attacks work? They literally built a system of equations and used a solver to determine potential candidates (more details in this article). From those candidates, they were able to deduce the most probable by linking this information with a few consumer databases or another dataset acquired via a data breach. Although there are plenty of false positives, in less populated regions this proved even more effective (up to 72%).6.

As a consequence, the US Census Bureau decided that they would use differential privacy for the 2020 Census. The task at handâââcould they create a data workflow that allowed for differentially private census results for 308 million persons that was still usable for the critical tasks requiring accurate results?

They refined their privacy parameters using example data and prior census responses, determining exactly what noise measurements and distributions fit their needs. They worked diligently to determine the balance of privacy to utility, ensuring the most accuracy while still guaranteeing basic privacy protections. They built up infrastructure in Apache Spark to run through, aggregate and finalize all results, which are available on the US Census homepage.

Download

Practical Data Privacy by Katharine Jarmul.epub
Practical Data Privacy by Katharine Jarmul.mobi

Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.

Categories

Access	Data Mining
Data Modeling & Design	Data Processing
Data Warehousing	MySQL
Oracle	Other Databases
Relational Databases	SQL