Big Data and Visual Analytics by Sang C. Suh & Thomas Anthony

Big Data and Visual Analytics by Sang C. Suh & Thomas Anthony

Author:Sang C. Suh & Thomas Anthony
Language: eng
Format: epub
Publisher: Springer International Publishing, Cham


3 Statistical Similarity Based Data Compression

Conventional lossy coding schemes in general quantize or threshold data to adjust quality and reduce data size [51]. Their goal is to compress data without compromising distinctive attributes of data. However, the tenets of these conventional schemes thus far have restricted their attention to the recovery of signal where distortion (distance) is measured using Euclidean distance such as sum of squared error (SSE) and signal-to-noise ratio (SNR) [11, 29, 48, 51]. Specifically, using Euclidean distance as the distance measure requires the sequence of encoded and decoded data to be preserved.

Employing the concept of random variable introduces a new way of signal recovery: data is reconstructed from a learned probability distribution during the encoding process, not from encoded (quality-adjusted) data itself. Thus, encoded output is not a direct representation of original data; instead, the encoder informs the decoder how to regenerate them. If we relax the constraint of preserving the sequence of encoded and decoded data, and treat a sequence of data as if it originates from a random variable, we can achieve a superior compression ratio.

This work presents a new class of compression scheme based on statistical similarity, dubbed IDEALEM (Implementation of Dynamic Extensible Adaptive Locally Exchangeable Measures) [28], that parts with conventional Euclidean distance measure and instead focuses on the exchangeability of similar data sequences [36]. In particular, this flexibility/relaxation on the order of data sequence yields much higher compression ratios.

Of course, application data could not be explained by random numbers. However, in some situations, devices such as sensors might be measuring background noise during the majority of their operation time. In these cases, faithfully reproducing the random noise is not necessary.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.