How Browser-Based PDF Merging Works

A deep dive into PDF structure and binary array buffers to compile multiple files locally in client memory.

How Browser-Based PDF Merging Works

Traditional online converters require you to upload your PDF files to a remote database to combine them. This poses substantial privacy risks when dealing with contracts, tax records, or bank statements.

By moving document compiling client-side, we eliminate server dependencies and ensure absolute privacy. In this article, we'll dive deep into PDF file structures, binary arrays, and client-side compilation.

The PDF Object Tree

Under the hood, a PDF document is a tree of cross-referenced objects. The file is divided into four main sections:

1. Header: Declares the file syntax version (e.g., %PDF-1.7 ).

2. Body: Contains the objects representing pages, fonts, vector paths, forms, and images.

3. Cross-Reference Table (XREF): A lookup table specifying the exact byte offset of each object within the file.

4. Trailer: Points to the XREF table and defines the Catalog root object.

When merging multiple files:

We cannot simply concatenate the byte streams. Doing so creates corrupt structures because object indexes and offsets overlap.

We must reconstruct a unified object catalog and re-index all object reference links.

Processing Arrays locally

In the browser, we read the files as binary arrays using standard JavaScript interfaces:

typescriptconst file = event.target.files[0];
const buffer: ArrayBuffer = await file.arrayBuffer();
// buffer now holds the raw binary stream

Using pdf-lib, we load the documents, extract pages, copy resource dictionaries, and compile a clean new PDF tree inside the tab memory:

typescriptimport { PDFDocument } from 'pdf-lib';

async function mergePdfFiles(files: File[]): Promise<Blob> {
  const mergedDoc = await PDFDocument.create();

  for (const file of files) {
    const buffer = await file.arrayBuffer();
    const doc = await PDFDocument.load(buffer);
    
    // Copy all pages into the merged document context
    const copiedPages = await mergedDoc.copyPages(doc, doc.getPageIndices());
    copiedPages.forEach((page) => mergedDoc.addPage(page));
  }

  const mergedPdfBytes = await mergedDoc.save();
  return new Blob([mergedPdfBytes], { type: 'application/pdf' });
}

This client-side compilation ensures documents remain inside your system's volatile memory. Closure of the active tab wipes all cached buffers instantly.

Safe & secure client-side reader

PDF-LibArrayBufferBinary ParsingSecurity