Big Data: Algorithms, Analytics, and Applications by Li Kuan-Ching Jiang Hai Yang Laurence T. Cuzzocrea Alfredo
Author:Li, Kuan-Ching,Jiang, Hai,Yang, Laurence T.,Cuzzocrea, Alfredo [Li, Kuan-Ching,Jiang, Hai,Yang, Laurence T.,Cuzzocrea, Alfredo]
Language: eng
Format: epub, pdf
Published: 2015-01-16T12:32:13+00:00
k
m
= ln2
≈ 0 6
.
.
n
n
If each hash function is perfectly independent of all others, then the probability of a bit
remaining 0 after n elements is
kn
− kn
p 1 1
e m
=
−
≈
.
m
The FP—an important performance metric of a Bloom filter—is then
− kn k
p
k
(1
) (
1
m
1
)
FP =
− p ≈ − e
≈
,
k
2
for the optimal k. Note with that increasing k, the probability of an FP actually is supposed to decrease, which is an unintuitive outcome because one would expect the filter to get
filled up with keys earlier.
Streaming Algorithms for Big Data Processing on Multicore Architecture ◾ 227
Let us analyze k. For the majority of cases, m ≪ n, which means that the optimal number of hash functions is 1. Two functions are feasible only with m > 2.5 n. In most realistic cases, this is almost never so, because n is normally huge, while m is something practical like 24 or 32 (bits).
12.5.2 Unconventional Bloom Filter Designs for Data Streams
Based on Section 12.5.1, the obvious problem in Bloom filters is how to improve their flex-
ibility. As a side note, such Bloom filters are normally referred to as dynamic.
Figure 12.3 shows the generic model, which applies to most of the proposals of dynamic
Bloom filters. The simple idea is to replace a simple bit string with a richer data structure
(the change in the Bloom filter in the figure). Each bit in the filter now simply is a pointer
to a structure that supports dynamic operations.
The other change that ensues is that the OR operation is no longer applicable. Instead, a
nontrivial manipulation has to be performed on each bit of the value that was supposed to be
OR-ed in the traditional design. Natural y, this incurs a considerable overhead on performance.
The following classes of dynamic Bloom filters are found in the literature.
• Stop additions filter. This filter will stop accepting new keys beyond a given point.
Obviously, this is done in order to keep the FP beyond a given target value.
• Deletion filter. This filter is tricky to build, but if accomplished, it can revert to a given previous state by forgetting the change introduced by a given key.
• Counting filter. This filter can count both individual bits of potential occurrences and
entire values—combinations of bits. This particular class of filters obviously can find
practical applications in data streaming. In fact, the existing example of the d-left
hashing method in Reference 32 uses a kind of counting Bloom filter [33]. Another
example can be found in Reference 34, where it is used roughly for the same purpose.
Each
Hash
Bloom
in turn
keys
filter
1
1
* State
1
1
A nontrivial
0
0
manipulation
* …
0
0
1
1
*
1
1
It
em
Multiple
0
0
*
hash
Download
Big Data: Algorithms, Analytics, and Applications by Li Kuan-Ching Jiang Hai Yang Laurence T. Cuzzocrea Alfredo.pdf
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8301)
Test-Driven Development with Java by Alan Mellor(6731)
Data Augmentation with Python by Duc Haba(6644)
Principles of Data Fabric by Sonia Mezzetta(6396)
Learn Blender Simulations the Right Way by Stephen Pearson(6295)
Microservices with Spring Boot 3 and Spring Cloud by Magnus Larsson(6167)
Hadoop in Practice by Alex Holmes(5959)
Jquery UI in Action : Master the concepts Of Jquery UI: A Step By Step Approach by ANMOL GOYAL(5807)
RPA Solution Architect's Handbook by Sachin Sahgal(5563)
Big Data Analysis with Python by Ivan Marin(5368)
The Infinite Retina by Robert Scoble Irena Cronin(5253)
Life 3.0: Being Human in the Age of Artificial Intelligence by Tegmark Max(5148)
Pretrain Vision and Large Language Models in Python by Emily Webber(4331)
Infrastructure as Code for Beginners by Russ McKendrick(4092)
Functional Programming in JavaScript by Mantyla Dan(4038)
The Age of Surveillance Capitalism by Shoshana Zuboff(3955)
WordPress Plugin Development Cookbook by Yannick Lefebvre(3806)
Embracing Microservices Design by Ovais Mehboob Ahmed Khan Nabil Siddiqui and Timothy Oleson(3610)
Applied Machine Learning for Healthcare and Life Sciences Using AWS by Ujjwal Ratan(3581)
