It’s important to have the right tools to analyze suspect documents!

Currently, the main malware infection vehicle remains the classic malicious document attached to an email.
So it is very important to have the right tools to analyze suspect documents.

Let’s see a list of my favorite tools for analyzing Microsoft Office and PDF files.

Microsoft Office


Locates shellcode and VBA macros into MS Office Files, and alsoextracts shellcode and embeds it an EXE file for further analysis.


Microsoft Offvis

Shows raw contents and structure of an MS Office file, and identifies some common exploits.



Allow navigation through the structure of binary Office files and viewing stream contents.


Office Binary Translator

Converts DOC, PPT, and XLS files into Open XML files.



Can examine and decode some aspects of malicious binary Office files.




Identifies PDFs that contain strings associated with scripts and actions.



Examines the structure of PDF files.



Origami is a framework written in Ruby designed to parse, analyze, and forge PDF documents.

pdfwalker examines the structure of PDF files, pdfextract extract JavaScript from PDF files, pdfsh offer an interactive command-line shell for examining PDF files.



Allow the extraction of JavaScript from PDF files.


PDF Stream Dumper

Combines many PDF analysis tools under a single graphical user interface.

Has specialized tools for dealing with obsfuscated javascript, low level pdf headers and objects, and shellcode. In terms of shellcode analysis, it has an integrated interface for libemu sctest, an updated build of iDefense sclog, and a shellcode_2_exe feature.



Offers a shell for examining PDF files.

With peepdf it’s possible to see all the objects in the document showing the suspicious elements, supports the most used filters and encodings, it can parse different versions of a file, object streams and encrypted files.



Creates an HTML report containing decoded PDF file structure and contents.


SWF mastah

Extracts SWF objects from PDF files.

Utilizing functions within Peepdf, I wrote a simple command line tool called swf_mastah to extract a SWF file from a PDF. The benefits with this tool are that it does handle ObjStms, it decodes all the samples I have, handles encryption and it accounts for multiple object versions.



Malware analysis tool that includes commands for examining and decoding structure and content of PDF files.

Pyew is a (command line) python tool to analyse malware. It does have support for hexadecimal viewing, disassembly (Intel 16, 32 and 64 bits), PE and ELF file formats (it performs code analysis and let you write scripts using an API to perform many types of analysis), follows direct call/jmp instructions in the interactive command line, displays function names and string data references; supports OLE2 format, PDF format and more. It also supports plugins to add more features to the tool.