Merging PDF Files Based on Name Segments in C#
In this blog post, we will explore how to merge multiple PDF files using C#. The merging process will be based on a specific segment of each file's filename.
Scenario Overview
Assume you have a folder containing several PDF files named in a format that includes underscore-separated segments. The goal is to combine these PDFs into larger files based on the second segment of their filenames.
Step-by-Step Solution
- Setup and Initialization
- Sorting Files
- Iterating and Merging
- PDF Merging Method
- Conclusion
We start by defining the paths to our input and output folders:
string folderPath = "C:\\Users\\ersan\\Downloads\\Form10BE_AAPAM3947A_2023_238914720240524_1";
string outputFolder = "C:\\Users\\ersan\\Downloads\\output2\\";
We retrieve all files from the input folder and sort them based on their filenames:
var files = Directory.GetFiles(folderPath);
Array.Sort(files, (a, b) => string.Compare(Path.GetFileName(a), Path.GetFileName(b)));
We iterate through the sorted list of files, extracting the second segment of each filename. If the segment changes, we merge the current list of files into a single PDF:
string lastName = "";
var file2 = outputFolder + Path.GetFileName(files[0]);
List list = new List();
foreach (var file in files)
{
var fileName = Path.GetFileNameWithoutExtension(file);
var splitName = fileName.Split('_');
var name = splitName[1];
if (!string.IsNullOrEmpty(lastName) && lastName.ToLower() != name.ToLower())
{
CombineMultiplePDFs(list.ToArray(), file2);
file2 = outputFolder + Path.GetFileName(file);
list = new List();
}
list.Add(file);
lastName = name;
}
The CombineMultiplePDFs
method takes an array of filenames and merges them into a single PDF:
public static void CombineMultiplePDFs(string[] fileNames, string outFile)
{
if (File.Exists(outFile))
throw new Exception("File already exists.");
Document document = new Document();
using (FileStream newFileStream = new FileStream(outFile, FileMode.Create))
{
PdfCopy writer = new PdfCopy(document, newFileStream);
document.Open();
foreach (string fileName in fileNames)
{
PdfReader reader = new PdfReader(fileName);
for (int i = 1; i <= reader.NumberOfPages; i++)
{
PdfImportedPage page = writer.GetImportedPage(reader, i);
writer.AddPage(page);
}
reader.Close();
}
writer.Close();
document.Close();
}
}
After processing all files, the merged PDFs are saved in the output folder. A success message is printed in the console upon completion.
Complete Code
using iTextSharp.text;
using iTextSharp.text.pdf;
using System;
using System.Collections.Generic;
using System.IO;
namespace PDFMerger
{
internal class Program
{
static void Main(string[] args)
{
string folderPath = "C:\\Users\\ersan\\Downloads\\Form10BE_AAPAM3947A_2023_238914720240524_1";
string outputFolder = "C:\\Users\\ersan\\Downloads\\output2\\";
var files = Directory.GetFiles(folderPath);
Array.Sort(files, (a, b) => string.Compare(Path.GetFileName(a), Path.GetFileName(b)));
string lastName = "";
// Create a new PDF document with a 50-point margin
var file2 = outputFolder + Path.GetFileName(files[0]);
List list = new List();
foreach (var file in files)
{
var fileName = Path.GetFileNameWithoutExtension(file);
var splitName = fileName.Split('_');
var name = splitName[1];
if (!string.IsNullOrEmpty(lastName) && lastName.ToLower() != name.ToLower())
{
CombineMultiplePDFs(list.ToArray(), file2);
file2 = outputFolder + Path.GetFileName(file);
list = new List();
}
list.Add(file);
lastName = name;
}
// Print the success message
Console.WriteLine("PDF files merged successfully!");
}
public static void CombineMultiplePDFs(string[] fileNames, string outFile)
{
if (File.Exists(outFile))
throw new Exception("122");
// step 1: creation of a document-object
Document document = new Document();
//create newFileStream object which will be disposed at the end
using (FileStream newFileStream = new FileStream(outFile, FileMode.Create))
{
// step 2: we create a writer that listens to the document
PdfCopy writer = new PdfCopy(document, newFileStream);
// step 3: we open the document
document.Open();
foreach (string fileName in fileNames)
{
// we create a reader for a certain document
PdfReader reader = new PdfReader(fileName);
reader.ConsolidateNamedDestinations();
// step 4: we add content
for (int i = 1; i <= reader.NumberOfPages; i++)
{
PdfImportedPage page = writer.GetImportedPage(reader, i);
writer.AddPage(page);
}
reader.Close();
}
writer.Close();
document.Close();
}
}
}
Final Thoughts
This solution provides a straightforward approach to programmatically merge PDF files based on a specific naming convention using C#. By leveraging libraries like iTextSharp (via PdfCopy
), you can efficiently handle and consolidate PDF documents based on your application's requirements.
This approach can be extended or customized further to meet additional criteria or integrate with other functionalities as needed.
For more details, you can refer to the complete code and explanation above.