How Browser-Based PDF Merging Works
A deep dive into PDF structure and binary array buffers to compile multiple files locally in client memory.
How Browser-Based PDF Merging Works
Traditional online converters require you to upload your PDF files to a remote database to combine them. This poses substantial privacy risks when dealing with contracts, tax records, or bank statements.
By moving document compiling client-side, we eliminate server dependencies and ensure absolute privacy. In this article, we'll dive deep into PDF file structures, binary arrays, and client-side compilation.
The PDF Object Tree
Under the hood, a PDF document is a tree of cross-referenced objects. The file is divided into four main sections:
1. Header: Declares the file syntax version (e.g., %PDF-1.7 ).
2. Body: Contains the objects representing pages, fonts, vector paths, forms, and images.
3. Cross-Reference Table (XREF): A lookup table specifying the exact byte offset of each object within the file.
4. Trailer: Points to the XREF table and defines the Catalog root object.
When merging multiple files:
- We cannot simply concatenate the byte streams. Doing so creates corrupt structures because object indexes and offsets overlap.
- We must reconstruct a unified object catalog and re-index all object reference links.
Processing Arrays locally
In the browser, we read the files as binary arrays using standard JavaScript interfaces:
const file = event.target.files[0];
const buffer: ArrayBuffer = await file.arrayBuffer();
// buffer now holds the raw binary streamUsing pdf-lib, we load the documents, extract pages, copy resource dictionaries, and compile a clean new PDF tree inside the tab memory:
import { PDFDocument } from 'pdf-lib';
async function mergePdfFiles(files: File[]): Promise<Blob> {
const mergedDoc = await PDFDocument.create();
for (const file of files) {
const buffer = await file.arrayBuffer();
const doc = await PDFDocument.load(buffer);
// Copy all pages into the merged document context
const copiedPages = await mergedDoc.copyPages(doc, doc.getPageIndices());
copiedPages.forEach((page) => mergedDoc.addPage(page));
}
const mergedPdfBytes = await mergedDoc.save();
return new Blob([mergedPdfBytes], { type: 'application/pdf' });
}This client-side compilation ensures documents remain inside your system's volatile memory. Closure of the active tab wipes all cached buffers instantly.