Cyber Security Technical Blog

PDF Server-Side Request Forgery | Cyber Advisors Technical Blog

Written by Justin Benjamin | Oct 15, 2024 11:00:00 AM

Server Compromise Through Report Generation: A Case Study

Our penetration testers experience the full gamut of security vulnerabilities during security assessments, from the mundane to the exotic. During a recent web application penetration test, I identified an interesting and unusual vulnerability which I wanted to share in the hope that it may help others identify similar issues in their own applications or security assessments.

I’ve recreated the vulnerability in a vulnerable webserver to preserve client confidentiality and omit unnecessary information while faithfully reproducing the scenario in which the issue was discovered as best as possible.

The Application

The assessment focused on a web application which produced status reports for the company’s clients, the purpose of which is omitted here. Users could update their contact information in the system and provide a status for staff at the company to review at a later time.

Here is a simple reproduction of the user website:

Users enter their profile information and the status and then submit the changes which are represented in a display page:

This page also offers a PDF version of the report:

As part of the enumeration process, we look at the metadata of documents returned by the application. In this case, the PDF file revealed that the PDF was generated using an application called “wkhtmltopdf” with a version of 0.12.3:

This revealed some helpful information, first of all that the PDFs are being generated using HTML as the input format, but also the name and version of the library or program being used to create them. A check of the software’s website revealed that the PDF generator is outdated and can result in server takeover if unsanitized HTML is processed:

Interestingly, JavaScript is enabled by default when processing HTML pages as input:

This scenario promised to be rewarding for our assessment, so I began to examine avenues for exploitation.

Exploitation

I began probing the application by attempting to inject HTML into the various fields for the user status update. Unfortunately, any attempts to inject tags would result in the entire field being removed from the results page and the report:

However, after some trial and error, it became apparent that including only ending tags (i.e. </b> in this case) would result in the content of the field being retained, but with the end tag mysteriously removed:

This indicated to me that the more aggressive filtering was being performed on the opening HTML tag (i.e. <b>). Now it remained to determine what the trigger was for the filter and how to bypass it. Some further trial and error revealed that any tags with an opening bracket (<) and a letter immediately following were the trigger for the entire content to be removed. Any other characters following the opening bracket would be retained:

 

Further probing revealed that HTML-encoding the first character after the opening bracket allowed the HTML tag to be passed through to the application and the resulting PDF:

So now we’ve successfully bypassed the filtering, but what can we do with this? I was able to inject an iframe tag with the local host address which revealed the default Apache server page:

 

Unfortunately, attempting to load local files in the iframe failed with a blank frame in the output:

 

However, wkhtmltopdf’s default settings gave us another option. Recall that JavaScript interpretation is turned on by default as documented on the main website. This gives us the ability to use JavaScript’s XMLHTTPRequest functionality to request files from the local host.

I switched to using the larger status field for better visibility and included a call to /etc/passwd using the XMLHTTPRequest JavaScript method, which resulted in the file being embedded in the resulting PDF document:

At this point, we had already proven that we were able to view files local to the server from a remote attacker’s point of view. However, the requested files were being embedded in a PDF, which isn’t particularly elegant. To show that there was more real-world potential for this exploit, I decided to create a payload to exfiltrate the files to a third-party host, in this case, a Burp Collaborator server. Note that, during the assessment, we used a self-contained Collaborator server, but for the purposes of this post, the public Burp Collaborator server was used.

The XMLHTTPRequest payload made a request as before to grab the local file, but then stored the result of that call in a text field, then made a secondary call out to the Collaborator server to exfiltrate the data:

The resulting request was successfully received by the Collaborator server with the relevant data URL-encoded as GET request parameters:

From this point, an attacker could trivially decode the file contents into a text form:

PDF Server-Side Request Forgery WRAP-UP

While there are likely more efficient, effective, or stealthy methods of exfiltration in this scenario, the approach detailed in this blog was successful in demonstrating the impact of this vulnerability. A low-privileged user of the application was able to compromise the server using a set of fairly simple steps:

  1. Identify the version of the PDF generator being used by the site through PDF metadata

  2. Craft an input filter bypass through persistent trial and error

  3. Inject JavaScript XMLHTTPRequest calls into text fields which were interpreted by the vulnerable PDF Generation library

There are two important lessons learned from this discovery. The first is that filtering of user input should be performed using thoroughly tested libraries. Much like the commonly used phrase “don’t ever roll your own crypto”, one should never “roll their own” input sanitization as there are too many edge cases to be covered. In this situation, the website was using a regex filter which set the content of the field to an empty string if an alphabetical character followed an open bracket (<). The regex likely was something akin to this (for PHP):

$tagfilter = "/\<[a-z]/i";

As we saw in the above example, this is trivial to bypass. There are a number of sanitization libraries available for websites which will more robustly detect and remove malicious inputs.

The second lesson is to always keep essential libraries up to date. In this case, the PDF generator was both outdated and had known critical vulnerabilities. It also processed JavaScript code in HTML inputs by default. As a result, once the HTML filtering was bypassed, the attacker was able to include malicious JavaScript and exfiltrate sensitive files. To remediate this, the site owners should have:

  1. Kept the PDF generation library up to date (or in this case, as there are no recent updates, replaced the library completely). Alternatively, manual patching of vulnerable libraries, if they are absolutely essential, may be possible.

  2. Turned off JavaScript interpretation by the library unless essential for business use

  3. Blocked outbound HTTP requests to unknown or untrusted URLs, which would have prevented the exfiltration, although this wouldn’t have blocked the PDF file output

Hopefully, this story was interesting and provides other penetration testers with ideas or approaches for exploitation that they may not have otherwise considered. I also hope it provides application owners and developers with an understanding of what to do to help remediate these issues, as well as what the impact of these vulnerabilities might be.

Thanks!

MORE FROM OUR TECHNICAL BLOG

Cyber Advisors specializes in providing fully customizable cyber security solutions & services. Our knowledgeable, highly skilled, talented security experts are here to help design, deliver, implement, manage, monitor, put your defenses to the test, & strengthen your systems - so you don’t have to.