1 Oct 2023

Building a Python Script to Automatically Sort and Organize Photos

In this digital age, we capture countless photos using our smartphones, cameras, and other devices. Over time, these photos can accumulate and become disorganized, making it challenging to find specific images when needed. Fortunately, with the power of Python, we can create a script that automatically sorts and organizes our photo collection, saving us valuable time and effort. In this blog, we'll walk through the steps to build such a Python script.

Understanding the Project Scope

Before diving into coding, let's outline the main objectives and features of our photo organizer script

  1. Image Metadata Extraction: The script will extract essential metadata (e.g., date, time, camera model) from each photo. This metadata will help us categorize and sort the images efficiently.
  2. Organize by Date: The script will sort and create folders based on the photo's capture date. All images taken on the same date will be placed in the corresponding folder.
  3. Duplicate Handling: To avoid clutter and redundancy, the script should identify and handle duplicate images, preventing the same photo from being stored in multiple folders.
  4. Flexible Configuration: Users should be able to customize certain aspects of the script, such as the output directory, file naming conventions, and supported image formats.

Getting Started

Before we proceed, ensure you have Python installed on your system. You'll also need to install the Pillow library, which provides additional functionality for working with images. You can install it using pip:pip install Pillow

Now, let's begin building our Python script!

Importing Required Libraries

import os
import shutil
from PIL import Image
from PIL.ExifTags import TAGS

We start by importing the necessary libraries. os will be used for file and folder operations, shutil for moving files, and Pillow for image processing and metadata extraction.

Configuring the Script

# Configuration
INPUT_DIR = "path/to/your/photo/directory"
OUTPUT_DIR = "path/to/organized/photos"
SUPPORTED_FORMATS = (".jpg", ".jpeg", ".png", ".gif")

Next, we define some configurable parameters for the script. You need to set INPUT_DIR to the directory where your photos are located. The sorted and organized photos will be placed in the OUTPUT_DIR directory. The SUPPORTED_FORMATS variable determines which image file formats the script will process.

Metadata Extraction

def get_image_metadata(image_path):
    image = Image.open(image_path)
    exif_data = image._getexif()
    metadata = {}

    if exif_data:
        for tag_id, value in exif_data.items():
            tag_name = TAGS.get(tag_id, tag_id)
            metadata[tag_name] = value

    return metadata

In this step, we define a function get_image_metadata(image_path) to extract image metadata using the Pillow library. The function reads the image, retrieves its Exif data, and stores it in a dictionary called metadata.

Organizing Photos

def organize_photos(input_dir, output_dir):
    for root, _, files in os.walk(input_dir):
        for file_name in files:
            if file_name.lower().endswith(SUPPORTED_FORMATS):
                file_path = os.path.join(root, file_name)
                metadata = get_image_metadata(file_path)
                if "DateTimeOriginal" in metadata:
                    date_taken = metadata["DateTimeOriginal"].split()[0]
                    destination_folder = os.path.join(output_dir, date_taken)

                    if not os.path.exists(destination_folder):
                        os.makedirs(destination_folder)

                    destination_path = os.path.join(destination_folder, file_name)

                    if not os.path.exists(destination_path):
                        shutil.move(file_path, destination_path)

The organize_photos(input_dir, output_dir) function is responsible for sorting and organizing the photos. It iterates through each file in the input_dir, checks if it is an image file of a supported format, extracts its metadata, and retrieves the capture date from the DateTimeOriginal field.

If the photo has a valid capture date, the function creates a destination folder based on that date in the output_dir. If the folder doesn't exist, it creates it. Finally, the function moves the photo to the corresponding folder.

Handling Duplicates

def handle_duplicates(output_dir):
    for root, _, files in os.walk(output_dir):
        seen = set()
        for file_name in files:
            file_path = os.path.join(root, file_name)
            with open(file_path, 'rb') as f:
                file_hash = hash(f.read())

                if file_hash in seen:
                    os.remove(file_path)
                else:
                    seen.add(file_hash)

The handle_duplicates(output_dir) function iterates through each file in the output_dir and uses a hash-based approach to detect and remove duplicate photos. It keeps track of the seen file hashes in a set and deletes any duplicates it encounters.

Putting It All Together

def main():
    print("Organizing photos...")
    organize_photos(INPUT_DIR, OUTPUT_DIR)
    print("Handling duplicates...")
    handle_duplicates(OUTPUT_DIR)
    print("Organizing complete!")

if __name__ == "__main__":
    main()

In the main() function, we call organize_photos() to sort the photos and handle_duplicates() to remove duplicates. The script is executed when running the file directly, not when imported as a module.

Conclusion

In this blog, we've built a Python script to automatically sort and organize photos based on their capture date. By extracting image metadata and handling duplicates, the script efficiently organizes your photo collection. Feel free to customize the script further to suit your specific needs, such as implementing additional sorting criteria or supporting more image formats. Now, you can run the script and enjoy a neatly organized photo library with minimal effort!