Digital breadcrumbs: tracking Threat Actors through Favicon hashes

Cyber Threat Intelligence (CTI) is critical for identifying, monitoring, and responding to malicious actors and infrastructures. Traditionally, CTI has relied on obvious indicators of compromise (IoCs) like IP addresses, domain names, and malware hashes. However, these indicators can quickly lose effectiveness as adversaries constantly adapt their tactics.

Thus, cybersecurity researchers are increasingly turning to less obvious but persistent digital artifacts, and among these, the humble favicon stands out. Though small and seemingly trivial, favicons and their caching mechanisms in browsers can provide a surprisingly robust tool in CTI efforts.

Favicons: more than just browser candy

What is a Favicon?

A favicon, short for “favorite icon”, is a small file containing an image that represents a website (Wikipedia). It enhances user experience by visually identifying a website among open tabs, bookmarks, and search engine results (MDN Web Docs).

Because they contribute to branding and recognition, favicons are highly consistent elements across the web. Interestingly, this consistency can be exploited by threat analysts to detect malicious activities such as phishing or to identify related infrastructure on both the surface and dark webs.

Favicons are typically implemented using a simple HTML <link> tag within the <head> section:

<link rel="icon" href="/path/to/favicon.ico">

Common formats include ICO, PNG, GIF, JPEG, and SVG (Wix Blog). Their small and predictable structure makes them easy targets for automated extraction and analysis—an essential feature for CTI operations.

The role of caching in Favicon distribution

Browsers cache favicons to optimize page load times and reduce bandwidth use.

This caching can both assist and hinder CTI:

It allows persistent identification of sites.
It may result in outdated favicon data if a site changes its icon.

Interestingly, if a phishing site initially mimics a legitimate site’s favicon, caching can help the fraud persist longer than the attacker’s infrastructure does.

Turning images into fingerprints: the Favicon hashing

Hashing transforms data into a fixed-size string of characters, a fingerprint.

Good hashing functions are:

Deterministic
Fast
Hard to reverse-engineer
Sensitive to small input changes (CrowdStrike)

In CTI, favicon hashing allows analysts to efficiently index, compare, and search across huge datasets without examining entire images.

How to generate a Favicon hash

The process typically involves:

Retrieving the favicon file.
Reading its raw contents.
Optionally encoding it (e.g., Base64 for Shodan compatibility).
Applying a hashing algorithm (Shodan Blog).

Common hashing algorithms: MD5, SHA-1, SHA-256

Standard cryptographic hashes (like MD5, SHA-1, SHA-256) are suitable for favicon hashing, but speed is often more critical than cryptographic security in this context (MDN Web Docs).

Spotlight on MurmurHash3 (mmh3)

For platforms like Shodan and ZoomEye, MurmurHash3 is preferred. It’s a non-cryptographic, extremely fast hash function that supports quick lookups (Shodan Blog).

Python libraries like mmh3 make it easy to calculate these hashes, especially when dealing with Base64-encoded favicon data.

Weaponizing Favicon hashes against phishing attacks

Phishers’ reliance on familiar Favicons

Attackers often reuse favicons from legitimate sites to enhance the credibility of phishing pages (Bolster AI). This predictable behavior can be leveraged to spot malicious sites by favicon similarity.

Building databases of legitimate Favicon hashes

Maintaining an updated database of trusted favicon hashes (e.g., OWASP Favicon Database) allows security teams to rapidly compare and flag anomalies.

Detecting suspicious websites

By comparing the favicon hash of a suspicious site to trusted databases, mismatches in context (e.g., domain names not matching the favicon’s brand) can signal phishing attempts (SANS Internet Storm Center).

Shodan: Query using http.favicon.hash.
ZoomEye: Use the iconhash field.
Criminal IP: Search by favicon.

Real-World Case Studies

Researchers have uncovered phishing campaigns impersonating PayPal and AnyDesk using favicon hashing.

Success metrics from academic research showed over 99.5% detection accuracy when combining favicon analysis with DNS and PageRank features (ResearchGate).

Linking Dark Web and Clean Web entities

Attribution challenges on the Dark Web

The dark web’s anonymity, especially through the TOR network, complicates the attribution of malicious activities (MDPI).

Favicon hashing as a linkage mechanism

If a dark web (.onion) site shares a favicon with a clean web server, hash matching can reveal connections between hidden and public infrastructures (Silent Push Blog).

Tools like Silent Push Community Edition can facilitate these cross-layer correlations.

Real-World Success Stories

Quantum Ransomware: Researchers from Talos linked Quantum’s dark web leak site to a public server by matching favicon hashes (Cisco Talos Blog).
Other Case Studies: Demonstrated de-anonymization of TOR servers via favicon matching (TechRepublic).

Tools and libraries for Favicon intelligence

Online tools

These tools allow quick favicon retrieval and hashing without programming.

Powering up with Python

Libraries like requests, hashlib, and mmh3 enable automation of:

Favicon downloading
Hash generation (MD5, SHA-1, SHA-256, MurmurHash3)
Base64 encoding (where needed)

Other specialized tools

FavFreak (GitHub): Automated favicon fingerprinting.
IconsExtract (NirSoft): Local extraction of icons from binaries.

Challenges and limitations

Evasion techniques by threat actors

Sophisticated attackers may avoid reusing favicons altogether or modify them slightly to evade detection (Security Boulevard).

In some cases, malicious scripts have even been embedded inside favicon files (Sucuri Blog).

Risk of False Positives and Negatives

False positives: Multiple legitimate sites using identical favicons (e.g., CMS templates).
False negatives: Unique favicons on phishing sites.

Thus, favicon analysis should always be correlated with other indicators such as domain WHOIS data and network behavior.

Context is Key

Relying only on favicon hashes for detection is risky. Comprehensive CTI should combine favicon data with multiple layers of analysis (UpGuard Blog).

Looking Ahead

Integration with emerging threat detection technologies

Machine learning (ML) and artificial intelligence (AI) can enhance favicon-based detection by:

Recognizing subtle variations through perceptual hashing.
Spotting trends and anomalies beyond exact matches (Bolster AI).

AI and ML for smarter Favicon analysis

Instead of basic hashing, deep learning models can analyze favicon imagery to detect sophisticated impersonations or malware-laden icons (Kanerika Blog).

Future innovations

Real-time favicon databases with version histories.
Browser-level security features alerting users about suspicious favicon changes.
Cross-network analysis tools linking dark web to clean web via shared favicons.