Capturing and Analyzing QR Codes from Image, PDF and Word Email Attachments — Part 2

Bora Kaşmer

--

Hi,

In the previous section, we analyzed QR codes found in both the body of an email and attached JPEG files. Today, we will discuss how to detect QR codes in PDF and Word files and search for URLs within their text. Finally, we will analyze whether the URLs are malicious. In this part, we will focus on email attachments.

Part 1: Separate an Email as Attachment

We will use the “EAGetMail” library from NuGet for parsing emails. Like Part1

“After downloading the .eml file of the email using the EaGetMail library, we assign its attachments to a variable.”

string emlFilePath = "C://mails/output_email_with_attachments3.eml";
if (string.IsNullOrEmpty(emlFilePath) || !File.Exists(emlFilePath))
{
Console.WriteLine("Invalid file path. Exiting.");
return;
}

Mail email = new Mail("TryIt");
byte[] emlContent = await File.ReadAllBytesAsync(emlFilePath);

email.Load(emlContent);

var allAttachment = email.Attachments;

Part 2: Loop in All Attachments

In the code below, we’re trying to identify QR codes within email attachments and extract any URLs found in them. We will loop through each attachment, generate a unique file name for every attachment, and temporarily save it to local storage. If the file does not have an extension, we will set it as “.jpg”. When you send an email from Outlook with an image in the body, Outlook automatically sets the body as a CID and adds the image to the attachment without any extension. We will then check if the file is an image by examining its extension. If it is an image, we will use the barcode scanner discussed in the previous section to detect any QR codes.

Once QR codes are found, we extract the URLs they contain. To make sure we’re not adding the same URL multiple times, we store them in a set that only keeps unique entries.

This code part helps us efficiently find QR codes in image attachments from emails and verify the URLs they contain.

Incorrect documentation is often worse than no documentation. — Bertrand Meyer

// Check mail attachments for QR codes
foreach (Attachment attachment in email.Attachments)
{
string tempFile = string.Empty;
try
{
// Generate a unique file name using GUID
string extension = Path.GetExtension(attachment.Name).ToLower();
if (string.IsNullOrEmpty(extension))
{
// If no extension is found, assume the file is an image (default to .jpg)
extension = ".jpg";
}

// Generate a GUID-based unique file name
string uniqueFileName = Guid.NewGuid().ToString() + extension;
tempFile = Path.Combine(Path.GetTempPath(), uniqueFileName);

// Save the attachment with the unique file name
attachment.SaveAs(tempFile, true);
Console.WriteLine($"Attachment saved: {tempFile}");

// Now check if the saved file is an image
if (IsImageFile(tempFile))
{
string[]? qrCodeContentList = ScanBarcode(tempFile);
if (qrCodeContentList != null && qrCodeContentList.Length > 0)
{
qrCodeContentList.ToList().ForEach(qrCodeContent =>
{
string extractedUrl = ExtractUrl(qrCodeContent);
if (!string.IsNullOrEmpty(extractedUrl) && uniqueUrls.Add(extractedUrl))
{
Console.WriteLine($"Unique URL Found: {extractedUrl}");
}
});
}
}

IsImageFile(): This method verifies whether a file is an image by checking its extension. We will check if a given file is an image based on its file extension. First, it will check whether the filePath is null or empty, and if so, it will return “false” because the file is not valid. It then will define an array of common image extensions such as “.jpg”, “.jpeg”, “.png”, “.bmp”, and “.gif”. If the extension will match one of the valid image extensions, the method will return “true”, indicating that the file is an image. Otherwise, it will return “false”.

 static bool IsImageFile(string filePath)
{
// Add a check for null or empty file path
if (string.IsNullOrEmpty(filePath)) return false;

string[] validExtensions = { ".jpg", ".jpeg", ".png", ".bmp", ".gif" };
string? extension = Path.GetExtension(filePath)?.ToLower();

// Ensure the file has an extension
if (string.IsNullOrEmpty(extension)) return false;

return Array.Exists(validExtensions, ext => ext == extension);
}

Part 2: Find QR Codes in PDF Files

If the attachment type is a PDF, we will call the method below. I tested many PDF tools for .NET, and only “Spire.Pdf” worked perfectly.

Download Spire.PDF from nuget.

ProcessPdfAttachment():

We will load a PDF document, create a unique file name for saving it to storage, and loop through all the pages.

We will create unique file name for every page and saving it to storage as a bitmap.

We will scan each page using the ScanBarcode() method to detect whether it contains a QR code. Next, we will extract the URL if it exists and, finally, add it to the `uniqueUrls` list if it is not already present.

Documentation is like sex: when it is good, it is very, very good; and when it is bad, it is better than nothing.

— Dick Brandon

Every image from the PDF page must be deleted after completing the QR code detection task. Meanwhile, if an error occurs, the operation should not be interrupted so the file will be deleted in the “finally” sections.

ProcessPdfAttachment(Full):

 static void ProcessPdfAttachment(string tempFile, HashSet<string> uniqueUrls)
{
String tempPDFImageFile = String.Empty;
try
{
using (PdfDocument pdf = new PdfDocument())
{
// Load the PDF document
pdf.LoadFromFile(tempFile);
string uniquePDFimageName = Guid.NewGuid().ToString();
// Loop through each page in the PDF
for (int i = 0; i < pdf.Pages.Count; i++)
{
try
{
// Generate a unique file name for each page image
string uniqueImageName = $"{uniquePDFimageName}-{i + 1}.png";
tempPDFImageFile = Path.Combine(Path.GetTempPath(), uniqueImageName);

// Convert each page to an image with the specified DPI
Image image = pdf.SaveAsImage(i, PdfImageType.Bitmap, 300, 300);
image.Save(tempPDFImageFile, ImageFormat.Png);

Console.WriteLine($"Saved PDF page as image: {tempPDFImageFile}");

string[]? qrCodeContentList = ScanBarcode(tempPDFImageFile);
if (qrCodeContentList != null && qrCodeContentList.Length > 0)
{
qrCodeContentList.ToList().ForEach(qrCodeContent =>
{
string extractedUrl = ExtractUrl(qrCodeContent);
if (!string.IsNullOrEmpty(extractedUrl) && uniqueUrls.Add(extractedUrl))
{
Console.WriteLine($"Unique URL Found: {extractedUrl}");
}
});
}

File.Delete(tempPDFImageFile);
}
catch (Exception ex)
{
Console.WriteLine($"Error processing PDF attachment: {ex.Message}");
}
finally
{
if (File.Exists(tempPDFImageFile))
File.Delete(tempPDFImageFile);
}
}
}
}
catch (Exception ex)
{
Console.WriteLine($"Error processing PDF attachment: {ex.Message}");
}
finally
{
if (File.Exists(tempPDFImageFile))
File.Delete(tempPDFImageFile);
}
}

We will call “ProcessPDFAttachment()” if one of the email attachment file extensions is “.pdf”.

else if (extension.Trim() == ".pdf")
{
// Call method for PDF processing
ProcessPdfAttachment(tempFile, uniqueUrls);
}

Part 3: Find QR Codes in Word Document

If the attachment type is a Document, we will call the method below. We will use “DocumentFormat.OpenXml” tool and it works perfectly.

Download “DocumentFormat.OpenXml” from Nuget.

Good code is its own best documentation.

— Steve McConnell

ProcessWordAttachment():

We will check if document file exist and load a Word document.

We will retrieve the “MainDocumentPart” and iterate through all the images in the document. Each image will be saved with a unique name to local storage.

We will scan each documet’s image using the ScanBarcode() method to detect whether it contains a QR code. Next, we will extract the URL if it exists and, finally, add it to the “uniqueUrls” list if it is not already present.

Every image from the Word document must be deleted after completing the QR code detection task. Meanwhile, if an error occurs, the operation should not be interrupted so the file will be deleted in the “finally” sections.

Documentation is a love letter that you write to your future self. — Damian Conway

ProcessWordAttachment():

static void ProcessWordAttachment(string tempFile, HashSet<string> uniqueUrls)
{
String tempWordImageFile = String.Empty;
try
{
if (string.IsNullOrEmpty(tempFile) || !File.Exists(tempFile))
{
Console.WriteLine("Invalid .docx file path.");
return;
}
// Open the .docx || .doc file
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(tempFile, false))
{
var mainPart = wordDoc.MainDocumentPart;

// Check for embedded images in the document
if (mainPart != null && mainPart.ImageParts.Any())
{
foreach (var imagePart in mainPart.ImageParts)
{
try
{
// Save the image to a temporary file for QR code scanning
tempWordImageFile = Path.Combine(Path.GetTempPath(), Guid.NewGuid().ToString() + ".png");
using (var stream = imagePart.GetStream())
using (var fileStream = new FileStream(tempWordImageFile, FileMode.Create, FileAccess.Write))
{
stream.CopyTo(fileStream);
}

Console.WriteLine($"Image extracted: {tempWordImageFile}");

string[]? qrCodeContentList = ScanBarcode(tempWordImageFile);
if (qrCodeContentList != null && qrCodeContentList.Length > 0)
{
qrCodeContentList.ToList().ForEach(qrCodeContent =>
{
string extractedUrl = ExtractUrl(qrCodeContent);
if (!string.IsNullOrEmpty(extractedUrl) && uniqueUrls.Add(extractedUrl))
{
Console.WriteLine($"Unique URL Found in .docx: {extractedUrl}");
}
});
}

// Clean up the temporary file
File.Delete(tempWordImageFile);
}
catch (Exception ex)
{
Console.WriteLine($"Error processing image in .docx: {ex.Message}");
}
finally
{
if (File.Exists(tempWordImageFile))
File.Delete(tempWordImageFile);
}
}
}
else
{
Console.WriteLine("No images found in the .docx file.");
}
}
}
catch (Exception ex)
{
Console.WriteLine($"Error processing .docx file: {ex.Message}");
}
finally
{
if (File.Exists(tempWordImageFile))
File.Delete(tempWordImageFile);
}
}

Conclusion:

This article delved into the process of detecting and analyzing QR codes hidden within email attachments, uncovering URLs, and evaluating their potential risks. By leveraging powerful libraries and efficient methods, we transformed seemingly simple files into a treasure trove of actionable insights.

This article shared here open the door to enhanced email security and deeper content analysis, sparking curiosity about what else might be hiding in plain sight. What secrets are waiting to be uncovered in the next attachment? The journey continues!

Good Bye!

See you until the next article.

“If you have read so far, first of all, thank you for your patience and support. I welcome all of you to my blog for more!”

Github:

https://github.com/borakasmer/ScanQRCodeInMail/blob/main/Program.cs

--

--

Responses (1)