Information Gathering Tools: my updated shortlist
During the first phase of a penetration test, especially when the test is performed in blackbox mode, is really important to gather correct informations from company websites and employees social accounts.
Here a short list of tool useful during this essential phase.
An OSINT scraping framework that performs a webscraping on target websites in order to gather passive informations.
Developed and mantained by xillwillx, skiptracer
utilizes some basic python webscraping (BeautifulSoup) of PII paywall sites to compile passive information on a target on a ramen noodle budget.
Python script, developed by DisK0nn3cT, that scrape LinkedIn without API restrictions.
A simple python script by Nick Sanzotta that query LinkedIn in order to enumerate employee names.
The Fingerprinting Organizations with Collected Archives is a tool used to find metadata and hidden information in the documents its scans which may be found on corporate web pages, using search engines such us Google, Bing, and DuckDuckGo.
FOCA is capable of analyzing a wide variety of documents, with the most common being Microsoft Office, Open Office, or PDF files, although it also analyzes Adobe InDesign or SVG files, for instance.
A python a tool developed by Christian Martorella for gathering subdomain names, e-mail addresses, virtual hosts, open ports/ banners, and employee names from different public sources.
- threatcrowd: Open source threat intelligence - https://www.threatcrowd.org/
- crtsh: Comodo Certificate search - www.crt.sh
- google: google search engine - www.google.com
- googleCSE: google custom search engine
- google-profiles: google search engine, specific search for Google profiles
- bing: microsoft search engine - www.bing.com
- bingapi: microsoft search engine, through the API (you need to add your Key in the discovery/bingsearch.py file)
- dogpile: Dogpile search engine - www.dogpile.com
- pgp: pgp key server - mit.edu
- linkedin: google search engine, specific search for Linkedin users
- vhost: Bing virtual hosts search
- twitter: twitter accounts related to an specific domain (uses google search)
- googleplus: users that works in target company (uses google search)
- yahoo: Yahoo search engine
- baidu: Baidu search engine
- shodan: Shodan Computer search engine, will search for ports and banner of the discovered hosts (http://www.shodanhq.com/)
- DNS brute force: this plugin will run a dictionary brute force enumeration
- DNS reverse lookup: reverse lookup of ip´s discovered in order to find hostnames
- DNS TDL expansion: TLD dictionary brute force enumeration
Another tool by Christian Martorella for extracting metadata of public documents (pdf,doc,xls,ppt,etc) availables in the target websites.
This information could be useful because you can get valid usernames, people names, for using later in bruteforce password attacks (vpn, ftp, webapps), the tool will also extracts interesting "paths" of the documents, where we can get shared resources names, server names, etc.
SimplyEmail is an email recon tool, was based on theHarvester.
This was just an expansion of what was used to build theHarvester and will incorporate his work but allow users to easily build Modules for the Framework.
Python script for scans git repositories for secrets, digging deep into commit history and branches, in order to find secrets and sensitive data accidentally committed.
How it works
This module will go through the entire commit history of each branch, and check each diff from each commit, and check for secrets. This is both by regex and by entropy. For entropy checks, trufflehog will evaluate the shannon entropy for both the base64 char set and hexidecimal char set for every blob of text greater than 20 characters comprised of those character sets in each diff. If at any point a high entropy string >20 characters is detected, it will print to the screen.
Developed by Dylan Ayrey
Just-Metadata is a tool that can be used to gather intelligence information passively about a large number of IP addresses, and attempt to extrapolate relationships that might not otherwise be seen.
Just-Metadata will allow you to quickly find the Top "X" number of states, cities, timezones, etc. that the loaded IP addresses are located in. It will allow you to search for IP addresses by country. You can search all IPs to find which ones are used in callbacks as identified by VirusTotal. Want to see if any IPs loaded have been documented as taking part of attacks via the Animus Project, Just-Metadata can do it.
Additionally, it is easy to create new analysis modules to let people find other relationships between IPs loaded based on the available data.
Domain typo discovery tool, written in python and released by NCC Group.
- Domain to IP
- MX records
- A and AAAA
- www address records
- webmail address records
- m address records
- A keyboard map template system (currently UK supplied)
- Geographic IP to flag
- Google safe browsing integration
- Bit flipping / squatting - http://dinaburg.org/bitsquatting.html
- Homoglyph attack identification
A demo is also available at https://labs.nccgroup.trust/typofinder/
Tool that checks if an email account has been compromised in a data breach.
If the email account is compromised it proceeds to find passwords searching into known data breaches.
It uses haveibeenpwned v2 api to test email accounts and searches for the password in Pastebin Dumps.
Some useful information available on haveibeenpwned can be displayed by this script:
- Name of Breach
- Domain Name
- Date of Breach
- Fabrication status
- Verification Status
- Retirement status
- Spam Status