Contents

Unmasking Repackaged Android Malware

The need for mobile security has never been more important than last decade and coming years. As the primary mode of communication and entertainment continue towards mobile phones so does their use in illegal activities. Malware authors understand this trend. They are no strangers to the skyrocketing computation powers of the modern mobile devices while traditional mobile anti-malware software stay behind with their good ol’ hash matches.

Problem Statement

The year 2019 saw nearly 20 Million pieces of malware, that was a 30% increase from 2018 [1]. This graph does not show any signs of slowing down. With the emergence of repackaged genuine looking android malware applications, the problem is only becoming more complex. Direct hash matches cannot be relied upon to detect a malware android application due to the inherent features of cryptographic hashes. A number of both open-source and commerical anti-virus sysetms fail to detect repackaged android applications [2].

Repackaged App Detection

Some of the solutions include training machine learning algorithms to classify between malware and non-malware applications. While others rely on Code Cloning/Reuse Detection. Since most of the android mobile applications go through AOT/JIT compilation and convert into JVM Bytecode. They can all be reverted back to Java/Kotlin code using JADX[3] and other Android app reverse engineering software. However, there are other faster methods available that do not require going through all the trouble of reversing an applicaiton or training ML models.

Solution - Perceptual Hashing

A perceptual hash is a fingerprint for multimedia to derive various features of its content. While a cryptographic hash tells us if two images(pictures) are exact match or not, perceptual hashes tell is if two images(pictures) are similar to each other. Threshold values can be set to fit the required classification purpose. Primarly three methods are used for generating perceptual hashes.

corporate needs help

Types of Perceptual Hashing

  1. Average Hashing Takes a picture as an input. Changes it into grayscale, reduces to a 8x8 pixel size. Each pixel is then used to generate an average pixel value. Lastly, the hash is generated by comparing each pixel value to the average to generate a 64-bit hash.

  2. Perception Hashing Relies on a discrete cosine transform (DCT) instead of the average greyscale. This algorithm reduces the input image to grayscale, 32x32 size image. Computes DCT to seperate image into a collection of frequencies and scalar. At this point DCT mean is calculated. Each bit of the 64-bit hash is set to 0 or 1 depending on whether each of the 64 DCT value is above or below the average value. And in the end the bits are concatenated to make a 64-bit hash.

  3. Difference Hashing Starts by converting the image to grayscale. Next, the image is downsized to an 8x8 square of gray values. The row hash is calculated for each row from left to right. An output of 1 is generated if the next grayscale value is greater than or equal to the previous one. Otherwise, a 0 bit is calculated if the value is less than. Finally, the bits are concatenated to generate the final hash.

Experiment Setup

Dataset of 4,302 apps for training and testing purposes. All the android APKs go through perceptual hashing algorithms. Please note that executable files/binaries are sometimes called images. However, in this case these executable files are being converted into pictures so that malware analysis experiment can be carried out. Also, that’s just how perceptual hashing works. Perceptual hashing can only work on images/pictures. Converting executable binaries into images/pictures has been done on PE32+ executable files as well.

Conclusion

Experimental results of all three algorithms with a dataset of 2,151 benign app pairs, 2,151 repackaged app pairs, and 576 benign apps indicate that average hashing produced the best accuracy rate compared to perception and difference hashing. The findings show that with a hamming distance of 10, it is able to match the repackaged app to its benign pair with an accuracy rate of 88.16%.

unmasking malware

References

  1. https://blog.drhack.net/threat-analysis-report-2019-android-malware-wins/
  2. https://scholarspace.manoa.hawaii.edu/handle/10125/41911
  3. https://github.com/skylot/jadx
  4. https://www.researchgate.net/publication/339027509_Detecting_Repackaged_Android_Applications_Using_Perceptual_Hashing