Back to insights library
Engineering
June 16, 20266 min read

How Browser-Based PDF Merging Works

A deep dive into PDF structure and binary array buffers to compile multiple files locally in client memory.

How Browser-Based PDF Merging Works

Traditional online converters require you to upload your PDF files to a remote database to combine them. This poses substantial privacy risks when dealing with contracts, tax records, or bank statements.

By moving document compiling client-side, we eliminate server dependencies and ensure absolute privacy. In this article, we'll dive deep into PDF file structures, binary arrays, and client-side compilation.


The PDF Object Tree

Under the hood, a PDF document is a tree of cross-referenced objects. The file is divided into four main sections:

1. Header: Declares the file syntax version (e.g., %PDF-1.7 ).

2. Body: Contains the objects representing pages, fonts, vector paths, forms, and images.

3. Cross-Reference Table (XREF): A lookup table specifying the exact byte offset of each object within the file.

4. Trailer: Points to the XREF table and defines the Catalog root object.

When merging multiple files:

  • We cannot simply concatenate the byte streams. Doing so creates corrupt structures because object indexes and offsets overlap.
  • We must reconstruct a unified object catalog and re-index all object reference links.

Processing Arrays locally

In the browser, we read the files as binary arrays using standard JavaScript interfaces:

typescriptconst file = event.target.files[0]; const buffer: ArrayBuffer = await file.arrayBuffer(); // buffer now holds the raw binary stream

Using pdf-lib, we load the documents, extract pages, copy resource dictionaries, and compile a clean new PDF tree inside the tab memory:

typescriptimport { PDFDocument } from 'pdf-lib'; async function mergePdfFiles(files: File[]): Promise<Blob> { const mergedDoc = await PDFDocument.create(); for (const file of files) { const buffer = await file.arrayBuffer(); const doc = await PDFDocument.load(buffer); // Copy all pages into the merged document context const copiedPages = await mergedDoc.copyPages(doc, doc.getPageIndices()); copiedPages.forEach((page) => mergedDoc.addPage(page)); } const mergedPdfBytes = await mergedDoc.save(); return new Blob([mergedPdfBytes], { type: 'application/pdf' }); }

This client-side compilation ensures documents remain inside your system's volatile memory. Closure of the active tab wipes all cached buffers instantly.

Safe & secure client-side reader
PDF-LibArrayBufferBinary ParsingSecurity