Basic PDF Analysis - Formbook Malware
Formbook malware has been floating around the dark web for several years. Advertised as an "as-a-service" commodity, it is an information stealer. In this blog post, I've grabbed a pdf sample from MalwareBazaar that attempts to call out to FormBook C2. This will serve as a good primer on understanding the basics of PDF analysis.
If you need insight on how to setup an environment where you can replicate some of the analysis I demonstrate on my blog, feel free to check out my "Reversing Malware Fundamentals" category here.
PDFs, or Portable Document Format files, are formatted to include several media such as images, text, and others. The basic structure of a PDF file contains the following:
Header - PDF version number
Body - File content
Xref Table - Concise summary of all indirect objects in file, expedites parsing
Trailer - Specifies how to find the cross reference table and other objects
The image below is out of the PDF Reference written by Adobe:
In a tool we'll use in our analysis, you'll notice that there are streams. In the context of a PDF, content streams are objects containing a "series of instructions" on how to depict the graphical elements that are to be displayed on the page. From a reverse engineering standpoint, following these streams can provide some insight without having to execute the file itself.
INITIAL ANALYSIS WITH EXIFTOOL
Analysis with exiftool shows the following:
ANALYSIS WITH PDF-PARSER.PY
Developed by the phenomenal Didier Stevens, pdf-parser offers a quick and straightforward analysis of the elements comprising the PDF file. Run with the -a parameter, it offers statistics regarding such:
For the objects, it shows the object type, the number of instances of that type, and the ID references for each instance of that object type within the file. So for example, there is one object of /XObject type, and it is ID number 27.
If we wanted the tool to look into object streams, we would have to pass it the -O parameter:
Suddenly we see a lot more content worth analyzing. Particularly the URI type stands out, claiming there are 5 instances of URI objects. To take a look at what these look like, we can specify the URI type by passing the -k parameter to pdf-parser:
Based on this output, we can assess the links in the file serve to reach out to Formbook C2 and download additional "badness" in the form of .rar, .exe, .rev, and .cab files. It's worth noting that threat actors are leveraging Discord as a malware serving platform, in this instance as a second-stage payload.
SAY CHEESE: IMAGE DUMP
Sometimes people find satisfaction in things that others may tilt their head at. For me, that's being able to dump images from files without ever having to open or run it. Seeing the images allows you to see what the threat actor was possibly trying to use from a social engineering standpoint. Maybe they used a fake CAPTCHA image to convince the user they needed to click on it for human user validation purposes. Maybe it's a fake radio button. It offers a sneak peek into the mind of the adversary, which in my book, is always a blast.
pdf-parser allows you to dump images from pdf files. Remember that /XObject type instance I referenced earlier? Let's take a closer look at it:
The subtype is listed as an image. We can dump this object out to a file and then take a look at it using feh image viewer:
No, your eyes aren't struggling and your screen isn't smudged. The image is purposely hazy, making it seem like the threat actor tried to leverage a paywall or other limiting "wall" of sorts to encourage the user to click and follow a link in order to clear up the image. A dynamic detonation of the file corroborates this finding:
INDICATORS OF COMPROMISE
Hash of initial file:
TOOLS AND DOCUMENTATION
PDF Reference, Third Edition (Adobe)
PDF File Format: Basic Structure (InfoSec Institute)