Secure Intelligent Machines: Protecting AI from Cyberattack by Joel D Yonts
Author:Joel D Yonts
Format: epub
Poisoning Attack Prevention
Preventing data poisoning is ultimately about maintaining the integrity of the training datasets. Integrity, as it relates to AI data, includes ensuring the integrity of your data collection points, preventing modification of the data along its potentially long and twisting route, and methods for validating the dataâs authenticity and integrity at its destination. Controlling access goes a long way toward achieving these integrity objectives but leaves opportunity for data poisoning from attacks that circumvent access controls or exploit human error. This section will explore additional technologies and techniques for maintaining the integrity of AI training data, leading to a multilayered defense-in-depth approach that creates resiliency against data poisoning attacks.
Detecting Change
Proving AI data integrity involves measuring the characteristics of the subject data in a way that modifications are easily detected during later data validation processes. Logical characteristics such as modification dates, file size, and record counts can serve as a starting point for detecting overt dataset modification but will fall short of detecting more subtle modifications within the contents of a data record. A more advanced method for detecting change is the application of cryptographic hashing algorithms such as SHA256. Cryptographic-based integrity controls have two distinct phases: baselining and validation. In baselining, a cryptographic hash is computed on a data collection or unit of data. The corresponding validation phase re-computes the same hash and compares it against the original. Any data modification that occurred between the baselining and validation phases would generate a completely different hash.
Cryptographic integrity controls can be applied to an AI data network in several ways. First, computing a hash early in the data collection process is desirable if the data will not change along its route to the AI training environment. An example of this may be the application of a cryptographic hash on a dataset created by a geographically remote sensor system. Generation of a baseline hash on the remote sensor with validation once it is delivered to the final or pre-staging environment effectively validates the integrity of data across all systems, networks, and services used to centrally collect the information. Another strong application is the use of cryptographic integrity controls with third-party-provided datasets. The most impactful way to implement in these scenarios is for the organization generating the datasets to perform the baselining function and provide the hash to the receiving organization as a way to validate that nothing changed during inter-organization transport. Finally, this form of integrity control can be applied to curated training, test, and knowledge datasets stored in AI training, test, quality assurance, and knowledge management environments. Implementation of such a control would involve baselining at the point an AI dataset is created and validated at each future use.
Establishing an integrity baseline early in the data life cycle is only effective if the baselining characteristics or cryptographic hashes are protected from modification. This can be achieved by storing baseline information off-system and distributing it using transports out-of-band to those involved in the transportation of the dataset. The use of alternate
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Data Forecasting and Segmentation Using Microsoft Excel by Fernando Roque(2691)
PostgreSQL 14 Administration Cookbook by Simon Riggs(2218)
Cloud Auditing Best Practices: Perform Security and IT Audits across AWS, Azure, and GCP by building effective cloud auditing plans by Shinesa Cambric Michael Ratemo(1615)
Architects of Intelligence_The Truth About AI From the People Building It by Martin Ford(1239)
In-Memory Analytics with Apache Arrow: Perform fast and efficient data analytics on both flat and hierarchical structured data by Matthew Topol(1036)
Mastering Azure Virtual Desktop: The Ultimate Guide to the Implementation and Management of Azure Virtual Desktop by Ryan Mangan(1013)
Automated Machine Learning in Action by Qingquan Song Haifeng Jin Xia Hu(902)
Python GUI Programming with Tkinter, 2nd edition by Alan D. Moore(870)
Ansible for Real-Life Automation - A complete Ansible handbook filled with practical IT automation use cases (2022) by Packt(741)
Learn Wireshark - A definitive guide to expertly analyzing protocols and troubleshooting networks using Wireshark - 2nd Edition (2022) by Packt(734)
Data Engineering with Scala and Spark by Eric Tome Rupam Bhattacharjee David Radford(416)
Introduction to Algorithms, Fourth Edition by unknow(363)
ABAP Development for SAP HANA by Unknown(358)
Automated Machine Learning in Action by Qingquan Song & Haifeng Jin & Xia Hu(302)
Kubernetes Secrets Handbook by Emmanouil Gkatziouras | Rom Adams | Chen Xi(284)
Asynchronous Programming in Rust by Carl Fredrik Samson;(259)
Learn Enough Developer Tools to Be Dangerous: Git Version Control, Command Line, and Text Editors Essentials by Michael Hartl(255)
Machine Learning for Imbalanced Data by Kumar Abhishek Dr. Mounir Abdelaziz(250)
The AWK Programming Language by Aho Alfred V. Kernighan Brian W. Weinberger Peter J. & Brian W. Kernighan & Peter J. Weinberger(238)
