Recent Trends in Computer Applications by Jihad Mohamad Alja’am & Abdulmotaleb El Saddik & Abdul Hamid Sadka

Recent Trends in Computer Applications by Jihad Mohamad Alja’am & Abdulmotaleb El Saddik & Abdul Hamid Sadka

Author:Jihad Mohamad Alja’am & Abdulmotaleb El Saddik & Abdul Hamid Sadka
Language: eng
Format: epub
ISBN: 9783319899145
Publisher: Springer International Publishing


3.2 Approximate Pattern Matching

The method described above has some limitations: it requires the scan engine to load the signatures in their fullest form in order to perform an exact match. For simple, 32-bit hashes used for malware signature lookup we can expect to have regular collisions that have to be solved down the line by performing byte-to-byte comparisons. Given the nature of the algorithm we can expect to have a match for one out of one million bytes, regardless of the nature of the input. This means that we will have to perform an extra check for roughly 1 MB of scanned content. While this does not influence the scan speed significantly, it does impact memory consumption, as the large signature database has to be carried around for these extra checks. This makes deployment problematic for low-end devices because it ties down an amount of memory proportional to the size of the signature database.

To solve this problem, the scan engine would have to fully rely on hashes for matching incoming traffic against the malware database. This procedure is not 100% accurate, but can provide a solution that can satisfy the current needs within a reasonable degree of certainty. In this case, the scan engine would no longer perform byte-level comparisons to determine if a match is a false positive or not, instead relying on a combination of hashes and to determine the final result. Given the a token size of 256 bytes for a malware signature, we can look for hash functions that have a good spread and are easy to compute.

A solution would be to use the same hash function at different offsets. The malware signature would be preprocessed and the hash values would be stored in the process’s memory instead of the actual contents. This method would allow our algorithm to reuse the latter computation at different stages of the detection process and would significantly reduce the memory footprint.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.