OSINT Toolbox Talk: Extracting media content, collecting targeted open-source data, and creating network relationship charts

OSINT Tool Review

Scraping Dark Web URLs using 'OnionSearch'

Scraping Dark Web URLs using 'OnionSearch' Scraping Dark Web URLs using 'OnionSearch' https://github.com/megadose/OnionSearch

Admittedly, we have neglected the Dark Web for some time with regards to showcasing effective tools that can be deployed against Onion sites. One primary reason for this is because of research undertaken by ourselves in addition to other crime analysts which suggests that criminals are beginning to abandon the Dark Web as a space to undertake illicit activity. Instead, various marketplaces are now beginning to appear on messaging applications – Telegram being the primary source. Not only have some marketplaces been moved away from the Dark Web towards messaging platforms, criminals engaged in the making and distribution of child sexual abuse (CSA) content have also taken advantage of privacy controls exercised by several messaging applications. That aside, the Dark Web remains a well-established space for criminals; as such, Digital Investigators and OSINT analysts should remain aware of the latest tool and techniques that can be deployed in this regard.

One tool that we shall talk about is ‘OnionSearch’, a very well developed Python-based script by the same individual behind ‘Holehe‘ and ‘Toutatis‘. The tool is developed to conduct multiple searches against a term through several Onion search engines then extract URLs matching the search term. The script’s Github repository points out that ‘OnionSearch’ conducts searches on the following engines:

  • ahmia
  • darksearchio
  • onionland
  • notevil
  • darksearchenginer
  • phobos
  • onionsearchserver
  • torgle
  • onionsearchengine
  • tordex
  • tor66
  • tormax
  • haystack
  • multivac
  • evosearch
  • deeplink

However, it should be noted that during our review, some of the search engines returned various errors – these were all due to a combination of session timeouts and the search engine itself being unavailable or removed altogether. That aside, during our test, we collected a considerable number of URLs pointing to sources of illicit activity. As effective as the tool is in conducting multiple general searches on openly visible Onion pages, it has to be said that the script is only scratching the surface due to a vast number of Onion pages being protected behind encryption protocols and invite-only features. Nevertheless, the tool is quite effective as it does what it intends to do. What we especially like with the tool is its capability to output a TXT file containing all of the extracted URLs. Also, we like that the tool is flexible to an extent where it allows users to set their own Tor proxy, set page load limits, and set the tool to continuously write to the output file (which is very useful when involved in a large investigation).

Whilst the tool’s output is of intelligence value in itself, it is worth showing how we processed this data using Paliscope Discovry, specifically using the platform’s built-in Tor browser and its connected service to Web-IQ. Using Discovry, we navigated to the URLs extracted using ‘OnionSearch’ then created a forensic copy of several pages of investigative interest. Using Discovry’s connected service to Web-IQ, we gained maximum visibility of several target URLs over a long time period and discovered several Bitcoin addresses associated with the URLs. All of the data collected through ‘OnionSearch’ and Paliscope Discovry was then used to create a visual intelligence product in the form of a link chart in addition to a Forensic Investigation report.

All-in-all, ‘OnionSearch’ is a great tool that provides a good amount of effectiveness when searching across multiple Onion search engines. Its output is also really useful for Digital Investigators who need to compile a list of target URLs. Undoubtedly, the real power lies behind what the investigator does with that collected data. Using Paliscope Discovry, we created forensic snapshots of several target URLs and leveraged its power to search within Web-IQ’s database in order to extract further information of intelligence value and develop it even further.

Let's talk today Are you ready to begin discussing our range of training and capability development solutions?