18 Sept 2023

Building a PDF Merger with Python and PyPDF2

PDFs (Portable Document Format) have become an essential part of our digital lives. They are commonly used for sharing documents, reports, forms, and other important information. However, managing multiple PDF files can sometimes be cumbersome, especially when you need to combine or organize them. In this blog, we'll explore how to build a PDF merger using Python and the PyPDF2 library, enabling you to efficiently merge and organize your PDFs.

Table of Contents

  1. Prerequisites
  2. Installing PyPDF2
  3. Understanding PyPDF2
  4. Building the PDF Merger
    1. Importing the Necessary Modules
    2. Getting User Input
    3. Merging PDFs
    4. Saving the Merged PDF
  5. Testing the PDF Merger
  6. Conclusion

In this digital age, PDFs have become ubiquitous, and being able to manipulate them programmatically can save time and effort. We'll build a PDF merger using Python, a versatile and easy-to-learn programming language, and the PyPDF2 library, a popular tool for handling PDF files. Our PDF merger will take multiple PDFs as input and combine them into a single, organized document.

Prerequisites

Before we begin building the PDF merger, make sure you have the following prerequisites installed on your system
Python (3.6 or later)
PyPDF2 library

Installing PyPDF2

If you don't have PyPDF2 installed, you can install it using pip, Python's package manager. Open a terminal or command prompt and enter the following command

pip install PyPDF2

Understanding PyPDF2

PyPDF2 is a pure-Python library that allows us to manipulate PDF files. It provides functionalities like reading, writing, merging, splitting, and more. For our PDF merger, we'll primarily use the PdfReader and PdfWriter classes from PyPDF2.

Building the PDF Merger

Now, let's dive into the code and build our PDF merger step-by-step.

Importing the Necessary Modules

We'll start by importing the required modules, PyPDF2 for PDF manipulation, and os for handling file paths

import PyPDF2
import os

Getting User Input

To create a user-friendly PDF merger, we'll allow users to select the PDF files they want to merge. We'll prompt the user for the file paths and store them in a list

def get_user_input():
    pdf_files = []
    while True:
        file_path = input("Enter the path of the PDF to merge (or 'q' to quit): ")
        if file_path.lower() == 'q':
            break
        if os.path.exists(file_path) and file_path.lower().endswith('.pdf'):
            pdf_files.append(file_path)
        else:
            print("Invalid file path or not a PDF.")
  return pdf_files

Merging PDFs

Next, we'll define a function to merge the PDF files from the provided list. We'll iterate through each PDF file, read its contents, and append it to the output PDF

def merge_pdfs(pdf_files, output_path):
    pdf_writer = PyPDF2.PdfWriter()

    for file_path in pdf_files:
        with open(file_path, 'rb') as pdf_file:
            pdf_reader = PyPDF2.PdfReader(pdf_file)
            for page_num in range(pdf_reader.numPages):
                page = pdf_reader.getPage(page_num)
                pdf_writer.addPage(page)

    with open(output_path, 'wb') as output_file:
      pdf_writer.write(output_file)

Saving the Merged PDF

Finally, we'll define a function to prompt the user for the output file path and call the merge_pdfs function to combine the PDFs:

def save_merged_pdf():
    pdf_files = get_user_input()

    if not pdf_files:
        print("No valid PDFs selected. Exiting.")
        return

    output_path = input("Enter the path to save the merged PDF: ")
    if not output_path.lower().endswith('.pdf'):
        output_path += '.pdf'

    merge_pdfs(pdf_files, output_path)
  print(f"PDFs merged successfully! The merged PDF is saved at '{output_path}'.")

Testing the PDF Merger

Now that we've completed building the PDF merger, it's time to test our application. Create a new Python file (e.g., pdf_merger.py) and paste the entire code from the previous sections into it. Save the file.

To run the PDF merger, execute the Python script:

bash
python pdf_merger.py

The script will prompt you to enter the file paths of the PDFs you want to merge. Type the paths one by one and press Enter. When you're done adding files, type q and press Enter. Then, you'll be asked to provide the output path for the merged PDF. Once you've done that, the script will merge the PDFs and save the output file.

Conclusion

In this blog, we have explored how to build a PDF merger using Python and the PyPDF2 library. Combining multiple PDFs into a single document can be immensely useful for organizing information and improving document management. By providing a user-friendly interface and leveraging the power of Python and PyPDF2, we have successfully created an efficient PDF merger.

Feel free to extend the functionality further by adding options to reorder pages, delete unwanted pages, or compress the output PDF. Exploring more features of PyPDF2 can open up new possibilities for enhancing your PDF manipulation capabilities. Happy coding!