OSINT investigations using the Wayback Machine
The Wayback Machine is a digital archive of the internet, maintained by the Internet Archive. It allows you to view past versions of websites, which can be a valuable tool for OSINT investigations.
For example, you can use the Wayback Machine to:
- Find information that has been deleted from a website.
- See how a website has changed over time.
- Investigate the history of a website or organization.
Here are some of the ways you can use the Wayback Machine for OSINT investigations:
Finding all saved copies of a page You can use the Wayback Machine’s CDX API to search for all of the archived copies of a particular web page. For example:
https://web.archive.org/cdx/search/cdx?url=andreafortuna.org
This query finds all archived copies of this website.
Finding all saved pages in a particular section of a website You can search for archived copies within a specific section:
https://web.archive.org/cdx/search/cdx?url=https://andreafortuna.org/2022/*&collapse=urlkey
This query finds all archived copies of pages under the “2022” directory on this website.
Finding all URLs of a website (with subdomains) You can search for a list of all URLs on a website:
https://web.archive.org/cdx/search/cdx?url=*.andreafortuna.org&collapse=urlkey
This query finds all URLs on this website.
Finding URL copies over a given period of time You can search for archived copies within a specific date range:
https://web.archive.org/cdx/search/cdx?url=https://andreafortuna.org/*&to=2021&from=2020
This query finds archived copies of this website from 2020 to 2021.
Finding all saved files of a certain type You can search for archived files of a certain type:
https://web.archive.org/cdx/search/cdx?url=andreafortuna.org/*&filter=mimetype:text/javascript&collapse=urlkey
This query finds all archived JavaScript files on this website.
There are also same useful command-line tools for automating searches:
- GAU Fetch known URLs from AlienVault’s Open Threat Exchange, the Wayback Machine, and Common Crawl.
- Waymore Find way more from the Wayback Machine, Common Crawl, Alien Vault OTX, URLScan & VirusTotal!
- WaybackUrls Fetch all the URLs that the Wayback Machine knows about for a domain
- Katana A next-generation crawling and spidering framework.
- Wayback Keyword Search Downloads each page from the Wayback Machine for a specific domain and enables further keyword search on each saved page.
Finally, alternatives to the Wayback Machine include:
The Wayback Machine is a powerful tool for gathering information in OSINT investigations. By using the CDX API and the tools mentioned above, you can automate searches and find hidden information.