As phishing attacks become more sophisticated, investigators and security professionals need innovative techniques to identify and combat these threats. In this article, we look at an often overlooked method: using favicon hashes in conjunction with Shodan to uncover potential phishing sites.

The growing sophistication of phishing

Phishing has come a long way from the days of poorly formatted emails claiming you’ve won a foreign lottery. Today’s phishing sites are often pixel-perfect replicas of legitimate websites, making them increasingly difficult to spot with the naked eye. Cybercriminals invest significant time and resources into cloning entire websites, including HTML structure, CSS styling, and even JavaScript functionality.

Understanding favicons: the digital fingerprint

At the heart of our investigation technique lies a small but significant element of web design: the favicon.

What is a Favicon?

A favicon, short for “favorite icon,” is a small image associated with a website that appears in various places:

  • In the browser’s address bar
  • Next to the site name in a list of bookmarks
  • On browser tabs

While often overlooked by users, favicons serve as a quick visual identifier for websites. They’re typically 16x16 or 32x32 pixels in size and are usually saved as .ico, .png, or .gif files.

The significance of favicons in phishing detection

Here’s where things get interesting: when phishers clone a website, they often copy everything—including the favicon. This seemingly minor detail becomes a powerful tool for investigators. By focusing on the favicon, we can potentially identify multiple phishing domains that are impersonating the same legitimate site.

Hashing: turning images into searchable data

To leverage favicons in our investigation, we need a way to compare them efficiently across millions of websites. This is where hashing comes into play.

Hashing is a process that takes an input (in our case, a favicon image) and produces a fixed-size string of characters, which is typically a hexadecimal number. The key properties of a good hash function are:

  1. It’s deterministic: the same input always produces the same hash.
  2. It’s quick to compute.
  3. It’s practically impossible to reconstruct the original input from the hash alone.
  4. Even a small change in the input produces a very different hash output.

Why Hash Favicons?

By hashing favicons, we transform visual data into a format that’s easy to store, compare, and search. This allows us to quickly identify websites using identical favicons, even if they’re hosted on different domains or IP addresses.

Enter Shodan: the internet’s search engine

Shodan is often described as “Google for hackers,” but this characterization doesn’t do justice to its legitimate uses in cybersecurity. More accurately, Shodan is a search engine for Internet-connected devices. It continuously scans the Internet, collecting data on servers, IoT devices, industrial control systems, and more.

Shodan’s favicon hash feature

One of Shodan’s lesser-known but powerful features is its ability to search for websites based on their favicon hash. This capability turns Shodan into an invaluable tool for identifying potential phishing sites that share the same favicon as a legitimate website.

The investigation process: a Step-by-Step guide

Now that we’ve covered the theoretical background, let’s walk through the practical steps of using favicon hashes and Shodan to investigate potential phishing sites.

Step 1: identify the target website

First, we need to identify the legitimate website that’s being impersonated. For this example, let’s say we’re investigating potential phishing sites targeting a fictional Swiss bank called “SecureBank AG.”

Step 2: locate the favicon URL

Visit the legitimate website and view the page source. Look for a line that references the favicon. It usually looks something like this:

<link rel="shortcut icon" href="/path/to/favicon.ico" type="image/x-icon">

In our example, let’s say we find:

<link rel="icon" href="https://www.securebank.ch/assets/favicon.ico" type="image/x-icon">

Step 3: generate the favicon hash

Now we need to generate a hash of this favicon. We’ll use the MMH3 (MurmurHash3) algorithm, which is what Shodan uses for its favicon hashes. Here’s a Python script to accomplish this:

import mmh3
import requests
import codecs

response = requests.get('https://www.securebank.ch/assets/favicon.ico')
favicon = codecs.encode(response.content, "base64")
hash = mmh3.hash(favicon)
print(f"Favicon hash: {hash}")

Let’s say this script outputs:

Favicon hash: 1234567890

Step 4: search Shodan

image

With our favicon hash in hand, we can now search Shodan. The query format is:

http.favicon.hash:1234567890

Execute this search on Shodan’s website or through its API.

Step 5: analyze the results

Shodan will return a list of IP addresses and domains that use a favicon with this exact hash. Here’s where your investigative skills come into play:

  1. Look for domains that are similar to but not exactly matching the legitimate domain. For example:
    • secure-bank.ch
    • securebank-online.com
    • securebankag.net
  2. Check the registration dates of suspicious domains. Newly registered domains that match the favicon of a well-established site are red flags.

  3. Examine the geographic location of the hosting servers. If SecureBank AG typically hosts in Switzerland, but you find matching favicons on servers in countries known for cybercrime, that’s suspicious.

  4. Look at other services running on the same IP addresses. Legitimate banking websites typically have a very specific and limited set of open ports and services.

Step 6: deeper investigation

For domains that look suspicious:

  1. Use WHOIS lookups to gather more information about domain registration.
  2. Perform visual comparisons of the suspected phishing sites with the legitimate site.
  3. Check for SSL certificate discrepancies.
  4. Look for subtle differences in content that might reveal the site’s fraudulent nature.

Step 7: reporting and action

If you confirm that a site is indeed a phishing attempt:

  1. Report the site to relevant authorities (e.g., the legitimate bank’s security team, anti-phishing organizations, and law enforcement).
  2. If you have the capability, submit the site to browser blocklists and security vendors to protect users.