Malicious programs are not new. Many approaches have been proposed from signature-based methods in most anti-virus products to machine learning approaches that try to classify samples based on extracted features. There are inherent challenges to carry out systematic in-depth malware analysis. Only recently have very large datasets become available. There are three families of techniques for malware analysis: static analysis, dynamic analysis, and symbolic execution. However, none of them are sufficient; static analysis potentially has good coverage but has limited precision and is difficult to scale. Dynamic analysis has perfect precision, and its coverage is practically limited especially when anti-analysis techniques are employed. Symbolic execution techniques have the advantages of both static and dynamic analysis techniques but do not scale. Clearly hybrid approaches of combining these techniques can overcome the limitations to some extent and they are not sufficient to perform in-depth malware analysis at scale. Machine learning techniques have been proposed to classify malware based on extracted features and their success on real malware samples is limited despite reported high accuracies. A shortcoming to all these methods is that the existing techniques do not utilize the knowledge from previous analyses. We adopt and are implementing the following framework: 1) Given a binary sample, we use counterfactual execution to execute all the branches. By using the call graph, we organize the system calls into overlapping short sequences hierarchically; 2) We use the knowledge base to check the family and other related samples in the base. Note that the sequences allow us to match functions without the need to check implementation details; 3) With the metadata from the knowledge base, we will perform family specific analyses. The advantage of the proposed approach is that it is scalable, achieves good coverage, and generalizes well to new malware samples.
Xiuwen Liu
Thursday Block II
03:00 pm ~ 03:20 pm
Designation Track
Duration
20