Deepbits

Deep Thinking

Blog

How Binary Code AI Changes Malware Defenses?

by Wei Song, Research Scientist @Deepbits
Share via TwitterShare via FaceBook

Detect and Classify New Malware at First Sight

Abstract:

Existing commercial malware detection engines have a relatively low first-day detection rate for newly discovered samples, and it takes two to three days to gradually reach a detection rate of more than 90%. This leaves a large attack surface for malware. To solve this problem, we developed a new technique that can identify new malware at first sight, without the need for periodic retraining of machine learning models. We utilize our GPU-powered fast and accurate disassembler DeepDi to process large amounts of data daily. The results show that our detection rate for the malware found on the first day is much higher than that of all existing commercial antivirus engines, and we can also classify malware families more accurately.

Keywords: DeepDi, Malware Detection

The Problem

For security companies, one of the most important tasks is to detect new malware variants as quickly as possible. The detection models in traditional antivirus software systems generally need to be retrained regularly based on the latest samples collected daily and updated in full or incrementally. As complex models such as deep learning become more and more popular in the security field, the update of these models requires a lot of computing resources to calculate, so it often takes a lot of time. This gives attackers a large attack surface, allowing these malware samples to cause serious damage within the first 24 hours of discovery.

In order to evaluate the detection rate of various antivirus engines for the newly discovered malware samples, we regularly capture the latest malware samples, submit them to VirusTotal, and monitor the detection results of all antivirus engines on it. We found that the detection rate of existing antivirus software for the latest malware sample is often low. As shown in the following three figures, most engines have a detection rate of less than 70% on the first day when a new sample is discovered. It took more than half of the engines two or even three days to achieve a detection rate of more than 90%.

Figure 1. Detection Rate Trends of Different Engines over A Week of Discovery. Figure 1. Detection Rate Trends of Different Engines over A Week of Discovery.

Figure 2. Ranking of Engines for Samples Scanned on the day of Discovery. Figure 2. Ranking of Engines for Samples Scanned on the day of Discovery.

Figure 3. Ranking of Engines for Samples Scanned after 2 Days of Discovery. Figure 3. Ranking of Engines for Samples Scanned after 2 Days of Discovery.

Our Solution

We observed that malware writers often reuse previous malicious code when developing new malware. So that functions found in known malware can be used as the signature to detect new malware. Based on this insight, we developed a new technology that detects malicious code reuse to identify new, unknown malware variants.

This technology does not require regular retraining and can recognize new malware at first sight. The general workflow of our malware detection system is as follows. We collect the latest malware samples every hour, and use a very fast disassembler to quickly find functions in these samples and generate disassembly code for them. These functions are then compared to existing functions in our database to discover similar ones. At last, the malware family of the current sample is predicted based on the malware families to which these similar functions belong.

The greatest challenge in this workflow is that we need to quickly generate disassembly code for the large number of latest malware samples we collect hourly. Moreover, malware samples often use obfuscation techniques to make themselves difficult to disassemble. Therefore, A fast and robust disassembler is the key to this task. However, existing disassemblers do not satisfy either requirement. The majority of them focus on accuracy while the high overhead hinders them from being widely used in time-critical security practices. We developed the DeepDi disassembler, a novel disassembly approach that achieves both accuracy and efficiency by leveraging graph neural networks and GPUs. Currently, DeepDi outperforms IDA Pro, the go-to commercial disassembler by security analysts, in terms of obfuscation and robustness. On average, it takes less than a second to process a malware sample.

Also, the prediction process of our method is fully explainable. As shown in Figure 4 and Figure 5, our system can fully display the reasons for the prediction: which functions are included in the sample, and how many times these functions have appeared in known malware families. After clicking on the address of a function, our system will show the disassembly code of the function and which samples also contain the function. This can greatly facilitate users to verify whether this is a malicious function.

Figure 4. Malware Sample Scan Results Figure 4. Malware Sample Scan Results

Figure 5. Function Disassembly Code and Similar Functions Figure 5. Function Disassembly Code and Similar Functions

The Results

Detection Rate

To ensure timely detection of newly discovered malware samples, our systems retrieve them every hour and immediately send them for scanning on Deepbits and Virustotal. As depicted in Figure 6, our analysis of 912 malware samples found on the network over a week revealed that Deepbits achieved the highest detection rate of 94.1%, showcasing its effectiveness in malware detection. In contrast, the second-ranked Elastic only had a detection rate of less than 90%. These results indicate that Deepbits has the potential to improve the security of our networks by identifying malware that other engines may overlook.

Figure 6. Ranking of Engines for Newly Discovered Malware Figure 6. Ranking of Engines for Newly Discovered Malware

Malware Family Detection

The Deepbits malware detection system has a built-in capability to detect malware families, leveraging the family information of existing malware samples for classification. This results in a higher correct family rate compared to most commercial malware engines. Since various antivirus software engines utilize their own family naming conventions, we use AVClass to normalize the family information based on the VirusTotal report. To determine the accuracy of the family prediction, we check if any substring in the engine's family classification matches the family of AVClass. For instance, if the family name of AVClass for a sample is "redline," then "Trojan:MSIL/Redline.R!MTB" is considered a correct prediction.

Figures 7, 8, and 9 demonstrate the remarkable accuracy of Deepbits in predicting the families of the majority of malware. In contrast, other engines like Kaspersky and FireEye, despite having high detection rates, show a significantly lower proportion of family correctness. They sometimes can only provide singleton families, indicating their inability to accurately classify the malware samples into their respective families. This further highlights the superior performance of Deepbits in malware detection and classification.

Figure 7. Deepbits Scan Results for Latest Samples First Seen within a Week Figure 7. Deepbits Scan Results for Latest Samples First Seen within a Week

Figure 8. Kaspersky Scan Results for Latest Samples First Seen within a Week Figure 8. Kaspersky Scan Results for Latest Samples First Seen within a Week

Figure 9. FireEye Scan Results for Latest Samples First Seen within a Week Figure 9. FireEye Scan Results for Latest Samples First Seen within a Week

Throughput

Our system has the capability to provide real-time display of the throughput rate for the past week. Upon examining Figure 10 and Figure 11, it can be observed that the average time for query return of processed samples is 221 milliseconds, with a median of 187 milliseconds. For new and untreated samples, the system takes an average of 24.6 seconds to process and return the results, with a median of 10.4 seconds. These figures demonstrate the system's efficiency in handling large volumes of data while maintaining a high level of accuracy in detecting and classifying malware samples.

Figure 10. Throughput in the Last Week (Cached Samples) Figure 10. Throughput in the Last Week (Cached Samples)

Figure 11. Throughput in the Last Week (New Samples) Figure 11. Throughput in the Last Week (New Samples)

Overall, the ability of our system to provide real-time monitoring of the throughput rate and deliver fast query return times underscores its effectiveness in providing timely and reliable malware detection and protection against emerging cyber threats.

Summary

In this article, we describe the problem of the low detection rate of existing malware detection engines on newly discovered samples. We observed that malware writers often reuse previous malicious code when developing new malware. Based on this, we developed a new technique to identify new malware without the need for periodic retraining. We process huge amounts of malware samples based on our self-developed fast and powerful disassembler DeepDi. Our detection rate of malware discovered on the first day is much higher than that of all existing antivirus engines, and we can also accurately classify malware families. Furthermore, the prediction process of our method is fully explainable.

If you are interested, try our online malware detection service here.