PDF.js as the name suggests is a javascript open source library. This library is primarily used for parsing and rendering PDFs in web browsers & desktop applications.
I have been able to find this vulnerability in a number of applications over the past 1 year. Although these applications serve completely different purposes, all of them interact with PDFs. The vulnerability exists for versions prior to 4.2.67. This vulnerability has a high CVSS score of 8.8, which in simple terms means that this vulnerability can have massive impact and is comparatively easier to exploit.
Why is a 2024 web application vulnerability still relevant in 2026?
The number of web applications on the public internet was already insane. Then comes all the big AI players and you can just put into context how vibe-coding has become so mainstream that everyone from junior developers to senior developers is riding the wave and writing prompts and not the code.
Not all web applications interact with PDFs, however, the subset isn’t small. If we go to the npm’s website, we can see over 12 Million weekly downloads, however, we can make a guess that the number of applications that will already be using the exploitable version is not small.
Press enter or click to view image in full size
The major reason, why you might find this vulnerability in a lot more applications is that web application owners already using the library are not aware about their version being vulnerable because no one has tested it on their end.
The rise of AI has also led to reduced familiarity with codebase. There are quite a lot of developers who would be unaware of the libraries used by their application.
Additionally, tools like Github Copilot were initially trained on a large dataset of publicly available source code (including public repositories from Github). You never know if the code you are generating from AI is a safe option.
Another interesting point is that there are a number of other libraries for certain frameworks that use the PDF.js implementation internally, which can lead to an indirect vulnerability.
Coming to the most fun part, let’s talk about where the vulnerability lies.
In simple terms, before any PDF is rendered (displayed), internally a font definition is used to construct the font for whatever text is present within the PDF file. The problem lies within an edge case where this font definition is not validated for structure but whatever font definition is supplied is directly used, which allows javascript execution.
If we go deep, there is a matrix called FontMatrix that defines things like size, position and rotation of characters. This matrix is 6 element array. The assumption here is that FontMatrix will always contain numbers but since there is no validation, it also accepts non-numeric values.
Taking an example here, if you open a PDF in a text editor, you may find FontMatrix looking something like (the values may differ):
/FontMatrix [.00048828125 0 0 -.00048828125 0 0]However, the vulnerable version of the library allows something like the following, where test_line acts like a string:
/FontMatrix [1 2 3 4 5 (test_string)]The problematic part is the implementation directly embeds these values into dynamically generated JavaScript code.
It is a 2 step process for JS to create and then execute:
getPathGenerator() -> This is the function responsible for building up a JS string, which contains the parameter passed directly in FontMatrix . In simpler terms, it creates a string containing the transform() function, which is executed in the next step.transform() function with the provided parameters.So, when the above FontMatrix is processed, it will be passed to transform function as:
c.transform(1,2,3,4,5,test_string);Now, if we know JS, we can manipulate this further, a simple example will be using something else as the last parameter of FontMatrix:
(0\); alert(1); //(0 -> 0 will replace and fill in the place of 6th element in the array, this makes sure that we don’t mess the actual syntax.
Join Medium for free to get updates from this writer.
\) -> \ helps prevent ) from being treated as a string terminator and ensures that we treat ) as data. This ensures that all the following JS code we want to inject is preserved, not truncated and avoids syntax errors .
; -> ends the current statement of the function that will contain 6 parameters ( transform function here)
alert(1); -> becomes a separate statement and can execute independently. The ; here, makes sure that even if there is more code, our injected code runs just fine.
// -> to comment out anything after the current line.
The above code becomes:
c.transform(1,2,3,4,5,0); alert(1); // anything_here_is_commented_outThis works because attacker controlled input is embedded directly into executable JavaScript without validation or proper handling.
If you create a PDF with such a code and upload it to a vulnerable application, it will execute JavaScript when the PDF is opened/rendered.
Here our target was a college website open for application forms.
Press enter or click to view image in full size
We uploaded a dummy PDF file with the following FontMatrix :
Press enter or click to view image in full size
/FontMatrix [ 1 2 3 4 5 (1\); alert\('origin: '+window.origin+', pdf url: '+(window.PDFViewerApplication?window.PDFViewerApplication.url:document.URL)\)) ]The above payload retrieves both the origin and the URL of the loaded PDF document.
When we open the uploaded PDF file, it is executed as:
Press enter or click to view image in full size
Upgrade to a patched version: Upgrading to version 4.2.67 or later will resolve the issue.
In case upgrading the library introduces breaking changes:
isEvalSupported to false helps prevent execution of dynamically generated JavaScript.FontMatrix : Ensure FontMatrix values are strictly validated as numeric types before use.