Thursday, May 30, 2024

Merging PDF Files Based on Name Segments in C#

Merging PDF Files Based on Name Segments in C#

In this blog post, we will explore how to merge multiple PDF files using C#. The merging process will be based on a specific segment of each file's filename.

Scenario Overview

Assume you have a folder containing several PDF files named in a format that includes underscore-separated segments. The goal is to combine these PDFs into larger files based on the second segment of their filenames.

Step-by-Step Solution

  1. Setup and Initialization
  2. We start by defining the paths to our input and output folders:

    string folderPath = "C:\\Users\\ersan\\Downloads\\Form10BE_AAPAM3947A_2023_238914720240524_1";
    string outputFolder = "C:\\Users\\ersan\\Downloads\\output2\\";
  3. Sorting Files
  4. We retrieve all files from the input folder and sort them based on their filenames:

    var files = Directory.GetFiles(folderPath);
    Array.Sort(files, (a, b) => string.Compare(Path.GetFileName(a), Path.GetFileName(b)));
  5. Iterating and Merging
  6. We iterate through the sorted list of files, extracting the second segment of each filename. If the segment changes, we merge the current list of files into a single PDF:

    string lastName = "";
    var file2 = outputFolder + Path.GetFileName(files[0]);
    List list = new List();
    
    foreach (var file in files)
    {
        var fileName = Path.GetFileNameWithoutExtension(file);
        var splitName = fileName.Split('_');
        var name = splitName[1];
    
        if (!string.IsNullOrEmpty(lastName) && lastName.ToLower() != name.ToLower())
        {
            CombineMultiplePDFs(list.ToArray(), file2);
            file2 = outputFolder + Path.GetFileName(file);
            list = new List();
        }
    
        list.Add(file);
        lastName = name;
    }
  7. PDF Merging Method
  8. The CombineMultiplePDFs method takes an array of filenames and merges them into a single PDF:

    public static void CombineMultiplePDFs(string[] fileNames, string outFile)
    {
        if (File.Exists(outFile))
            throw new Exception("File already exists.");
    
        Document document = new Document();
        using (FileStream newFileStream = new FileStream(outFile, FileMode.Create))
        {
            PdfCopy writer = new PdfCopy(document, newFileStream);
            document.Open();
    
            foreach (string fileName in fileNames)
            {
                PdfReader reader = new PdfReader(fileName);
                for (int i = 1; i <= reader.NumberOfPages; i++)
                {
                    PdfImportedPage page = writer.GetImportedPage(reader, i);
                    writer.AddPage(page);
                }
                reader.Close();
            }
    
            writer.Close();
            document.Close();
        }
    }
    
  9. Conclusion
  10. After processing all files, the merged PDFs are saved in the output folder. A success message is printed in the console upon completion.

Complete Code

using iTextSharp.text;
using iTextSharp.text.pdf;
using System;
using System.Collections.Generic;
using System.IO;

namespace PDFMerger
{
    internal class Program
    {
        static void Main(string[] args)
        {
            string folderPath = "C:\\Users\\ersan\\Downloads\\Form10BE_AAPAM3947A_2023_238914720240524_1";
            string outputFolder = "C:\\Users\\ersan\\Downloads\\output2\\";
            var files = Directory.GetFiles(folderPath);
            Array.Sort(files, (a, b) => string.Compare(Path.GetFileName(a), Path.GetFileName(b)));

            string lastName = "";
            // Create a new PDF document with a 50-point margin

            var file2 = outputFolder + Path.GetFileName(files[0]);

            List list = new List();

            foreach (var file in files)
            {
                var fileName = Path.GetFileNameWithoutExtension(file);
                var splitName = fileName.Split('_');
                var name = splitName[1];
                if (!string.IsNullOrEmpty(lastName) && lastName.ToLower() != name.ToLower())
                {
                    CombineMultiplePDFs(list.ToArray(), file2);
                    file2 = outputFolder + Path.GetFileName(file);
                    list = new List();
                }
                list.Add(file);
                lastName = name;
            }

            // Print the success message
            Console.WriteLine("PDF files merged successfully!");
        }

        public static void CombineMultiplePDFs(string[] fileNames, string outFile)
        {
            if (File.Exists(outFile))
                throw new Exception("122");
            // step 1: creation of a document-object
            Document document = new Document();
            //create newFileStream object which will be disposed at the end
            using (FileStream newFileStream = new FileStream(outFile, FileMode.Create))
            {
                // step 2: we create a writer that listens to the document
                PdfCopy writer = new PdfCopy(document, newFileStream);

                // step 3: we open the document
                document.Open();

                foreach (string fileName in fileNames)
                {
                    // we create a reader for a certain document
                    PdfReader reader = new PdfReader(fileName);
                    reader.ConsolidateNamedDestinations();

                    // step 4: we add content
                    for (int i = 1; i <= reader.NumberOfPages; i++)
                    {
                        PdfImportedPage page = writer.GetImportedPage(reader, i);
                        writer.AddPage(page);
                    }

                    reader.Close();
                }
                writer.Close();
                document.Close();
        }

    }

}
      
Final Thoughts

This solution provides a straightforward approach to programmatically merge PDF files based on a specific naming convention using C#. By leveraging libraries like iTextSharp (via PdfCopy), you can efficiently handle and consolidate PDF documents based on your application's requirements.

This approach can be extended or customized further to meet additional criteria or integrate with other functionalities as needed.

For more details, you can refer to the complete code and explanation above.

No comments:

Post a Comment

Featured Post

Construction Result Summary Jun-2019