Static malware analysis: a basic workflow
Static malware analysis is the process of analysing malware samples without executing them. In this post, I’d like to share my basic workflow for static malware analysis, with tools and techniques that can be used at each stage.
1. File identification
The first step in static malware analysis is to identify the type of file you are dealing with. You can use tools like the “file” command, TrID, or VirusTotal to identify the file type and determine if it is a known malware sample.
2. File hashing
Before you start analyzing the file, you should calculate its hash values (like sha256, md5) to have a unique identifier of the file. This will allow you to compare the file against known malware samples, and to track the file’s origin and distribution.
2. File properties and metadata
Once you have identified the file, you can use tools like Exiftool to extract metadata from the file, such as the date and time it was created, the software that was used to create it, and (if is an image) the camera or device that was used to capture it. This can provide useful information about the file’s origin, and can help you to identify potential indicators of compromise.
3. Disassembling and decompiling code
Now, you can start to analyze the file’s code. You can use disassemblers like objdump, Ghidra, radare2, BinaryNinja, or Hex-Rays Decompiler to view the assembly code of the file, and to identify the instructions, registers, and memory addresses used by the file.
4. Extracting strings and resources
Using tools like strings to extract the strings of printable characters from the file, and to identify any potential indicators of compromise such as IP addresses, domain names, or file paths. You can also use tools like exiftool or Ghidra to extract the resources from the file, such as images, audio, or video files.
5. Code and behavior analysis
After you have extracted the strings and resources, you can start to analyze the code and the behavior of the file. You can use tools like Yara to create and use rules to identify specific patterns in the file, or tools like PeStudio to analyze the file properties, imports, exports, resources and structure of the file. This will allow you to identify any malicious functionality or behavior in the file.
6. Report generation
Once you have completed your analysis, you should generate a report that summarizes your findings, and includes any relevant information such as file hashes, metadata, disassembly, and strings. You should also include any indicators of compromise that you have identified, and any recommendations for how to mitigate or respond to the threat. I suggest using this template, published on the SANS Institute blog.
Used tools
-
file (Linux): a command-line utility that is built into most Linux and Unix systems, and can be used to identify the type of a file based on its contents. It can be used to identify text, images, audio, and video files, as well as executable files and scripts.
-
strings (Linux): a Unix utility that is used to extract and display the printable strings from a binary file. The command scans the input file for sequences of characters that are at least four characters long and are either a printable ASCII character or a null byte. The resulting strings are then displayed on the standard output.
-
TrID (Windows): command-line utility that can be used to identify the type of a file based on its structure and contents. It uses a database of file signatures to identify files, and can be used to identify a wide range of file types, including executables, scripts, documents, images, and audio files.
-
VirusTotal (web based): a website and a service that analyzes files and URLs for malware and other threats. It allows users to upload a file or enter a URL and have it scanned by multiple anti-virus engines and other tools for malware detection. The service aggregates the results of these scans, and provides a report that shows which engines detected malware and the severity of the detection.
-
Exiftool (Windows and Linux): utility that can be used to extract metadata from files, such as EXIF data from images, MP3 tags from audio files, and PDF metadata from documents. It can be used to identify the date and time a file was created, the software that was used to create it, and the camera or device that was used to capture
-
objdump (Linux): a command-line utility that is built into most Linux and Unix systems, and can be used to disassemble object files, such as executable files and shared libraries. It can be used to view the assembly code of a file, and to identify the instructions, the registers and the memory addresses used by the file.
-
Ghidra (Windows, macOS, Linux): a free and open-source reverse engineering framework developed by the National Security Agency (NSA)
-
radare2 (Windows, Unix): a reverse engineering framework that includes a set of tools for disassembling and analyzing binary files.
-
BinaryNinja (Windows, macOS, Linux): a reverse engineering platform that includes a variety of features such as disassembly, decompilation, and automated analysis.
-
Hex-Rays Decompiler (Windows): Hex-Rays Decompiler is a commercial plugin for IDA Pro, a popular disassembler and debugging tool. The plugin allows to decompile the assembly code of an executable file, and to generate a higher-level representation of the code in a programming language such as C or Python.
-
Yara (Windows, Linux): a tool that allows users to create descriptions of malware families (or other malicious files) and find any malware instances that match those descriptions in a set of files.
-
PeStudio (Windows): a tool that analyzes PE files, providing detailed information about the file’s structure, resources, and imported functions