Capturing And Analyzing QRCodes in An Email Part 1
Hi,
Today, we will discuss how to detect QR codes in an email and search for URLs within its text. Finally, we will analyze whether the URLs are malicious or not. In this part, we will focus only on the email body.
Firstly we will create and Console Application with .Net 9.0
Part 1: Read And Separate an Email as Body and Attachment
We will use several libraries in this application. First, we will use the “EAGetMail” library from NuGet for parsing emails.
The following code is used to read and process an email file in `.eml` format. Once the file path is confirmed to be valid, the content of the `.eml` file is read asynchronously into a byte array. This approach ensures that the program can handle the file efficiently without blocking other tasks. The content of the file is then loaded into an email object, allowing access to the email’s data, such as its subject, body, sender, and attachments. Later, the email object will be used to parse, analyze, or manipulate email data programmatically.
string emlFilePath = "C://mails/test.eml";
if (string.IsNullOrEmpty(emlFilePath) || !File.Exists(emlFilePath))
{
Console.WriteLine("Invalid file path. Exiting.");
return;
}
Mail email = new Mail("TryIt");
byte[] emlContent = await File.ReadAllBytesAsync(emlFilePath);
email.Load(emlContent);
After all, we can get Attachment and HtmlBody of an email like that.
var allAttachment = email.Attachments;
var mailBody= email.HtmlBody
I’m not an analyzer. I’ve got a son that analyzes everything and everybody. But I don’t analyze people. — Billy Graham
Part 2: Determine if the Email Body is HTML and Use a Headless Browser to Load JavaScript-Generated Images
We will define a method to check if an email’s body is in HTML format, and if it is, we will proceed to extract and process the images from the email.
static bool IsHtml(Mail email)
{
return !string.IsNullOrEmpty(email.HtmlBody);
}
if (IsHtml(email))
{
var imageList = await GetMailImages(email);
.
.
.
}
The next library we will download is “OpenQA.Selenium.Winium” to open a headless browser and retrieve all images. Then, we will download “WebDriverManager” to automatically detect the required version based on the browser and operating system, and download the appropriate “chromedriver.exe” into the specific file path.
GetMailImages() Part 1:
We define a method to extract images from an email by first checking if the email’s body contains HTML content. If valid HTML is found, it sets up a headless Chrome browser using WebDriverManager to handle browser automation. We configure various browser settings, such as running in headless mode (no visible UI), disabling pop-ups and downloads, enabling secure browsing, and disabling unnecessary features like plugins and caching. These settings ensure that the browser runs efficiently and securely in the background while processing the HTML content and extracting images from the email.
If the html body does not open on a browser, we cannot access the QRCodes loaded with Javascript.
public static async Task<List<string>> GetMailImages(Mail email)
{
try
{
// Get HTML content from EML file
string htmlContent = email.HtmlBody;
if (string.IsNullOrEmpty(htmlContent))
{
throw new Exception("HTML content is empty or invalid.");
}
// Install ChromeDriver with WebDriverManager
new DriverManager("C://custom//chromedriver//location").SetUpDriver(new ChromeConfig());
// Start ChromeDriver
var options = new ChromeOptions();
options.AddArgument("--headless"); // Run browser in background
options.AddArgument("--disable-gpu");
//options.AddArgument("--no-sandbox");
//Automatic Download Blocking
options.AddUserProfilePreference("download.prompt_for_download", false); // İndirme öncesi uyarı vermesini kapatır.
options.AddUserProfilePreference("download.default_directory", ""); // Leaves the default download directory empty.
options.AddUserProfilePreference("profile.default_content_settings.popups", 0); // Popup engeller.
options.AddUserProfilePreference("safebrowsing.enabled", true); // Secure browsing is on..
options.AddUserProfilePreference("profile.default_content_settings.automatic_downloads", 1); // Prevent automatic downloading.
//Extra Security
options.AddArgument("--incognito"); // Starts in incognito mode.
options.AddArgument("--disable-dev-shm-usage");//Disables using the dev/shm folder.
options.AddArgument("--disable-web-security"); // Tighten web security policies.
//Sadece Https'e izin verir.
options.AddArgument("--deny-permission-prompts");
options.AddArgument("--block-new-web-contents");
// Script ve Plugin Setups
options.AddArgument("--disable-plugins"); // Block plugin installations.
options.AddUserProfilePreference("profile.managed_default_content_settings.javascript", 2);
// Data Storage and Cache Settings
options.AddArgument("--disable-cache"); // Disable cache
options.AddArgument("--disable-application-cache"); // Disable app cache
options.AddArgument("--disable-logging"); // Turn off caller logging
// Security Policies
options.AddArgument("--disable-popup-blocking"); // Turn off popup blocking
options.AddArgument("--disable-extensions"); // Disable plugins
options.AddArgument("--disable-blink-features=AutomationControlled"); // Block bot detections
options.AddArgument("--disable-features=IsolateOrigins,site-per-process"); // Disable source isolation
.
.
.
GetMailImages() Part 2:
We create a new `ChromeDriver` instance with the specified options and load HTML content into a temporary file. It sets timeouts for page loading and element searching, writes the HTML content to a temporary file, and navigates the browser to that file. The browser waits for the images on the page to load before proceeding. The use of `WebDriverWait` ensures that the browser waits until at least one image is loaded before continuing.
using (var driver = new ChromeDriver(options))
{
string tempHtmlPath = string.Empty;
try
{
driver.Manage().Timeouts().PageLoad = TimeSpan.FromSeconds(15); // Page load time
driver.Manage().Timeouts().ImplicitWait = TimeSpan.FromSeconds(10); // Waiting for staff to be found
// Write HTML content to a file
string uniqueFileName = Guid.NewGuid().ToString() + ".html";
tempHtmlPath = Path.Combine(Path.GetTempPath(), uniqueFileName);
//string tempHtmlPath = Path.Combine(Path.GetTempPath(), "tempEmail.html");
await File.WriteAllTextAsync(tempHtmlPath, htmlContent);
// Upload HTML file
driver.Navigate().GoToUrl($"file:///{tempHtmlPath}");
var wait = new WebDriverWait(driver, TimeSpan.FromSeconds(10));
wait.Until(d => d.FindElements(By.TagName("img")).Any()); // Wait for any images to load
.
.
.
GetMailImages() Part 3:
We extract all image sources (`src`) from an HTML page using JavaScript via `IJavaScriptExecutor`. It collects the sources of all `<img>` tags, converts them into a distinct list of strings and returns that list. If a timeout or error occurs, it logs the error message and returns an empty list. Finally, it ensures that the temporary HTML file is deleted and the browser is closed for optimized memory management.
// Use JavaScript to extract all image sources
var allImages = ((IJavaScriptExecutor)driver).ExecuteScript(
"return Array.from(document.querySelectorAll('img')).map(img => img.src);"
) as IReadOnlyCollection<object>;
File.Delete(tempHtmlPath);
// Convert to a list of strings
return allImages?.Select(img => img?.ToString())
.Where(img => img != null)
.Distinct()
.Cast<string>()
.ToList() ?? new List<string>();
}
catch (WebDriverTimeoutException)
{
Console.WriteLine("Timeout: No images with 'data:image' found.");
return new List<string>(); // Return empty list
}
finally
{
if (File.Exists(tempHtmlPath))
File.Delete(tempHtmlPath);
// Close browser
driver.Quit();
}
}
}
catch (Exception ex)
{
Console.WriteLine($"Error extracting images from HTML: {ex.Message}");
return new List<string>();
}
}
So far, we have divided an email into pieces. If the body part is HTML, we ensured that all images were loaded and all image paths were received.
Physicists analyze systems. Web scientists, however, can create the systems. — Tim Berners-Lee
Part 3: Scanning the received images and identifying those with QRCode and finding those with URL text.
Now we checked Email.Body is Html or not. We got AllImages from Body with GetMailImages() method. Now next we will Scan all Images to detecting and scanning QR Codes with ScanBarcode() method.
if (IsHtml(email))
{
var imageList = await GetMailImages(email);
foreach (string imgSrc in imageList)
{
string[]? qrCodeContentList = ScanBarcode(imgSrc);
.
.
.
We will download the ZXing.Net Library for detecting and scanning QR Codes.
We will download Svg free library for working with Svg files.
ScanBarcode():
Here, we will take the image, format it according to its type, send it to the BarcodeReader library, and process it if it contains a QR code or URL text. Each image format is treated as a separate case, and we will handle each case individually.
Case One Svg Image:
1-)BarcodeReader(): An instance of the “BarcodeReader” class is created, which is used to decode barcodes or QR codes. It is part of the free ZXing library.”
2-) Base64 Image Input: If the provided “imagePath“ is a Base64-encoded image (indicated by a `data:` URI), the code extracts the Base64 format and converts it into a byte array for sending “barcodeReader”
3-)Check for SVG Images: If the image is an SVG (indicated by “image/svg+xml”), it processes the byte array as an SVG document.
4-)Convert SVG to Bitmap: The SVG content is loaded into a “SvgDocument” and converted into a “Bitmap` using the “Draw” method, which creates a rasterized version of the SVG.
5-)Decoding Qr Codes: For decoding multiple barcodes or QR codes from the “Bitmap”, we will use ZXing.BarcodeReader library . If any QR Codes are found in the image, it returns their textual representations; otherwise, it outputs a message saying no “QR Codes” were detected.
Case Two PNG,JPEG etc. Image:
If an image is another type like “.png, .jpeg” we will convert it to the bitmap and send it to the “barcodeLibrary” class for detecting and getting their textual representations.
All are lunatics, but he who can analyze his delusions is called a philosopher. — Ambrose Bierce
Case Three Url Image:
If the `imagePath` is a valid URL this part of code will run. We will check if the provided string is a properly formed absolute URI. If it is, we will download the image from the URL using an `HttpClient`. The downloaded image is then loaded into a `MemoryStream`, converted into a `Bitmap`, and processed using the barcode reader. The barcode reader tries to decode multiple barcodes or QR codes from the image. If any are found, their text values are returned as an array. Otherwise, it outputs a message saying no barcodes or QR codes were detected.
Case Four Local File Image:
If the `imagePath` is a local file path (not a URL). We will load the image directly from the file using `Image.FromFile`, which creates a `Bitmap` object. Then we will pass the image to the `barcodeReader` to attempt decoding multiple barcodes or QR codes. If any codes are found, their text values are returned as an array. If no barcodes or QR codes are detected, we will return the “No barcodes or QR codes were detected” message.
barcodeReader.DecodeMultiple(bitmap): This method checks for all QR codes in an image as follows
barcodeReader.Decode(bitmap): This method checks for only first QR code in an image as follows
ScanBarcode(FULL):
static string[]? ScanBarcode(string imagePath)
//static string ScanBarcode(string imagePath, byte[] base64Image = null)
{
try
{
// Barcode reader instance
var barcodeReader = new BarcodeReader();
// Check if the imagePath is a 'data:' URI (Base64 encoded image)
if (imagePath.StartsWith("data:", StringComparison.OrdinalIgnoreCase))
{
// Extract Base64 string after "data:image/...;base64,"
string base64Data = imagePath.Substring(imagePath.IndexOf(",") + 1);
// Convert the Base64 string to byte array
byte[] imageBytes = Convert.FromBase64String(base64Data);
// Check if the data is SVG
if (imagePath.Contains("image/svg+xml"))
{
// Load the SVG content from the byte array
using (var memoryStream = new MemoryStream(imageBytes))
{
var svgDocument = SvgDocument.Open<SvgDocument>(memoryStream);
// Convert the SVG to a Bitmap
using (var bitmap = svgDocument.Draw())
{
// Now you can scan the barcode from the bitmap
//var result = barcodeReader.Decode(bitmap);
var result = barcodeReader.DecodeMultiple(bitmap);
if (result != null && result.Length > 0)
{
return result.Select(r => r.Text).ToArray();
}
else
{
Console.WriteLine("No QR code or barcode detected in the image.");
return null;
}
}
}
}
else
{
// If not SVG, treat it as another image type (e.g., PNG, JPEG)
using (var memoryStream = new MemoryStream(imageBytes))
{
using (var bitmap = new Bitmap(memoryStream))
{
var result = barcodeReader.DecodeMultiple(bitmap);
if (result != null && result.Length > 0)
{
return result.Select(r => r.Text).ToArray();
}
else
{
Console.WriteLine("No QR code or barcode detected in the image.");
return null;
}
}
}
}
}
else if (Uri.IsWellFormedUriString(imagePath, UriKind.Absolute))
{
// If it's a regular URL, download it
using (var httpClient = new HttpClient())
{
byte[] imageBytes = httpClient.GetByteArrayAsync(imagePath).Result;
using (var memoryStream = new MemoryStream(imageBytes))
{
// Load image from MemoryStream
using (var bitmap = new Bitmap(memoryStream))
{
var result = barcodeReader.DecodeMultiple(bitmap);
if (result != null && result.Length > 0)
{
return result.Select(r => r.Text).ToArray();
}
else
{
Console.WriteLine("No QR code or barcode detected in the image.");
return null;
}
}
}
}
}
else
{
// If it's a local file path, load it directly
using (var bitmap = (Bitmap)Image.FromFile(imagePath))
{
var result = barcodeReader.DecodeMultiple(bitmap);
if (result != null && result.Length > 0)
{
return result.Select(r => r.Text).ToArray();
}
else
{
Console.WriteLine("No QR code or barcode detected in the image.");
return null;
}
}
}
}
catch (System.FormatException ex)
{
Console.WriteLine($"Error decoding Base64 image data: {ex.Message}");
return null;
}
catch (FileNotFoundException)
{
Console.WriteLine("Error: The specified image file was not found.");
return null;
}
catch (OutOfMemoryException)
{
Console.WriteLine("Error: The file is not a valid image or is too large.");
return null;
}
catch (Exception ex)
{
Console.WriteLine($"Unexpected error scanning barcode: {ex.Message}");
return null;
}
}
Part 4: Extracting Unique URLs from QR Codes Text
We get all images from the email body using the “GetMailImages()” method. Then, we scan all images to detect and scan QR codes using the “ScanBarcode()” method. Now that we have all the QR code texts, we will check if these texts belong to a URL or not.
foreach (string imgSrc in imageList)
{
string[]? qrCodeContentList = ScanBarcode(imgSrc);
if (qrCodeContentList != null && qrCodeContentList.Length > 0)
{
qrCodeContentList.ToList().ForEach(qrCodeContent =>
{
string extractedUrl = ExtractUrl(qrCodeContent);
if (!string.IsNullOrEmpty(extractedUrl) && uniqueUrls.Add(extractedUrl))
{
Console.WriteLine($"Unique URL Found in HTML: {extractedUrl}");
}
});
}
//}
}
ExtractURL(): We will use Regex to detect a URL in the QR Code Text.
static string ExtractUrl(string text)
{
var match = Regex.Match(text, @"https?://[\w./?=&%-]+", RegexOptions.IgnoreCase);
return match.Success ? match.Value : null;
}
Result Screen:
// Print all unique URLs
Console.WriteLine("\nAll Unique URLs:");
foreach (string url in uniqueUrls)
{
Console.WriteLine(url);
}
Conclusion:
In conclusion, we have successfully developed a process to extract and analyze QR codes from email bodies. By leveraging various libraries and tools, we have efficiently parsed email content, identified and scanned QR codes, and extracted any URLs contained within them. In the next part, we will expand our analysis to include attachments, such as PDF and Word documents. If these attachments contain QR codes, they will also be scanned and analyzed. Stay tuned as we continue to enhance our process for comprehensive email and attachment analysis.
See you until the next article.
“If you have read so far, first of all, thank you for your patience and support. I welcome all of you to my blog for more!”
Github: I will give Github Repo at the end of the next Part.