OSINT Toolbox Talk: Extracting media content, collecting targeted open-source data, and creating network relationship charts

OSINT Toolbox Talk

Extracting media content, collecting targeted open-source data, and creating network relationship charts


Our latest OSINT Toolbox Talk brings together three of our most well-received and effective tools reviewed during the course of the past fortnight. We will begin by looking closely at DownThemAll and its capability to extract a wide range of media content from web pages. Next, we will introduce SNOINT, a command-line interface tool that packs a lot of punch with regards to conducting targeted OSINT against online users. Specifically, we will discuss how SNOINT’s wide range of packages makes it the Swiss Army Knife of OSINT collection tools. Lastly, we will introduce a very slick Link Analysis / Network Relationship chart tool called Aleph Data Desktop, and discuss how it can be used to build visual intelligence that will complement investigation reports.
Looking ahead to our next OSINT Toolbox Talk and OSINT Tool Reviews, we have plenty of very effective tools to introduce to our readers; including an extension-based solution that can scrape comments posted on YouTube videos.


Extracting content from archived Geocities webpages using DownThemAll Extracting content from archived Geocities webpages using DownThemAll https://chrome.google.com/webstore/detail/downthemall/nljkibfhlpcnanjgbnlnbjecgicbjkge?hl=en

In this latest OSINT Tool Review, we are going to take a nostalgic trip down memory lane and explain how Digital Investigators can extract content – including media – from archived Geocities pages – yes, that’s right, Geocities! For those of us who are old enough to remember, Geocities was a web hosting service that allowed users to create and publish websites for free and to browse user-created websites by their theme or interest. In its original form, site users selected a “city” in which to list the hyperlinks to their web pages. The “cities” were named after real cities or regions according to their content – for example, computer-related sites were placed in “SiliconValley” and those dealing with entertainment were assigned to “Hollywood”; hence the name of the site. Soon after its acquisition by Yahoo!, this practice was abandoned in favour of using the Yahoo! member names in the URLs. In April 2009, the company announced that it would end the United States GeoCities service on October 26, 2009. There were at least 38 million pages displayed by GeoCities before it was terminated, most user-written. The GeoCities Japan version of the service endured until March 31, 2019.

Although Geocities was abandoned, the good folks at Archive.org still maintain a vast collection of archived pages which can be extracted and downloaded in their own entirety through the use of various tools – we will touch on those tools in an upcoming article. For our readers wondering why on earth would a Digital Investigator or OSINT Analyst be interested in collecting media content from a defunct series of web pages from the early 2000’s – we very much said the same until a recent investigation we supported led us to the Geocities archive. Indeed, after several weeks of OSINT research against several Geocities pages of interest, we found instances where links to nefarious sites (including those involved in the distribution of CSAM material) was being shared. So, the bottom line, conducting investigations against archived Geocities pages remains very relevant even though it is incredibly time-consuming!

In order to extract media content directly from the archived Geocities webpages, we recommend using DownThemAll; a simple, but very powerful extension-based media extraction tool that can be used within Google Chrome or Mozilla Firefox. We should also point out that DownThemAll is not entirely exclusive for archived webpages, it can be used against most other webpages provided that the HTML markup can be easily read by the tool itself.

DownThemAll comes equipped with the capability to extract embedded media from web pages in addition to links to image files – the latter of which is the most useful when considering that most web pages insert images into pages as a link. In all, the tool has the capability to download and extract the following:

  • Software files including exe and msi
  • Image files including jpg, jpeg, png, gif and svg
  • Archive files including zip, rar and 7z
  • Document files including all Microsoft-supported files
  • Video files including mp4, webm and mkv
  • Audio files including mp3, flac and wav

The tool can also allow Digital Investigators to allow filter any additional file types through the use of its ‘Fast Filtering’ and / or ‘Mask’ filtering capability. In situations where investigators have access to the public root of a webpage, they can use the ‘Fast Filtering’ to isolate and extract entire web pages as a single HTML file. Lastly, in cases where webpages – including Archive.org – apply rate-limiting restrictions, investigators can configure DownThemAll and control the amount and speed of downloads undertaken over a given period of time.

To conclude, we have explained why OSINT research against very old Geocities webpages remains relevant to some investigations and how media content from such pages can be effectively achieved by using DownThemAll. However, as many of our readers will understand, the quality of media hosted on Geocities was incredibly low – this was mainly because the majority of Geocities users were using dial-up connectivity. Nevertheless, such media content could prove to be useful in any investigation.


Hunting and collecting targeted open-source information with SN0INT Hunting and collecting targeted open-source information with SN0INT https://github.com/kpcyrd/sn0int

As far as multi-capability tools go, SN0INT is undoubtedly the Swiss Army Knife of OSINT frameworks that is both lightweight and significant in terms of output and scaleability. SN0INT is a semi-automatic framework and package manager that is primarily written in Rust – for our readers who have no idea what Rust is, it is a general-purpose programming language that is somewhat similar to C++. The SN0INT framework itself was primarily built for professionals involved in IT security who require the capability to gather intelligence against a given target and enable them to assess their attack surface. However, when looking at the range of modules included within the framework, it is clear that this tool has significant value for Digital Investigators as it can:

  • Harvest subdomains from certificate transparency logs and passive DNS
  • Enrich IP addresses with ASN and GeoIP info
  • Harvest emails from PGP Keyservers and WHOIS
  • Discover compromised logins in breaches
  • Find somebody’s profiles across the internet
  • Enumerate local networks with unique techniques like passive ARP
  • Gather information about phone numbers
  • Attempt to bypass Cloudflare with Shodan
  • Harvest data and images from Instagram profiles
  • Scan images for nudity

All that said, from a tactical Digital Investigations standpoint, several modules within SN0INT certainly add a greater amount of value to the tool including:

  • Tinder: Search for target profiles on Tinder
  • Exif: Extract all manner of Exif data from images
  • Instagram: Collect user data from Instagram profiles
  • Pornhub: Collect account and user information from Pornhub profiles
  • TikTok: Collect user data from publicly viewable TikTok profiles
  • Twilio-Lookup: Retrieve information regarding telephone numbers
  • Twitch: Collect information from Twitch streams
  • WhatsApp: Fetch public profile information from WhatsApp

The easiest option to install and deploy SN0INT is via Docker by invoking docker run --rm --init -it -v "$PWD/.cache:/cache" -v "$PWD/.data:/data" kpcyrd/sn0int
via the command-line interface. From here, users can use SN0INT to build and deploy their case, then use a wide range of modules to gather information that is pertinent to their case.

What we especially like about the SN0INT framework is the developers’ attention to detail and understanding of user needs by providing a highly detailed series of instructions that can be accessed from https://sn0int.readthedocs.io/en/latest/index.html. However, we do feel that SN0INT could be a particularly tricky framework to use for any novice Docker user, the complex set of commands to run the various modules takes quite a bit of patience – but, we are highly confident that our readers will not be disappointed with SN0INT’s capabilities. That issue aside, SN0INT has searching and data extraction capabilities that cover several web sources that other OSINT tools have yet to fully provide.

To conclude, yes, SN0INT may be complex for some of our readers to run for the very first time; but, it is a highly effective framework that offers plenty of capabilities for Digital Investigators. Patience is key when it comes to learning about the various commands associated with the SN0INT framework and the Rust programming language, but patience does indeed pay off. All-in-all, we really like what SN0INT has to offer, and we can also see that it has the long-term potential to grow into a very comprehensive toolkit for Digital investigators and OSINT’ers. As such, this tool comes highly recommended.


Creating Link Analysis charts for your investigation with Aleph Data Desktop Creating Link Analysis charts for your investigation with Aleph Data Desktop https://github.com/alephdata/datadesktop/

This latest OSINT Tool Review is quite unique in its own right as it focuses on a visual analysis tool created by an organisation that also provides an exceptional resource in the form of company data. So, to provide our readers with some context, link analysis tools such as IBM Analyst Notebook can be quite costly – especially for Digital Investigators and Intelligence Analysts who require a simple tool they can use to create link charts. There are several data visualisation tools such as Gephi that provide some link chart capability; but, when you want a tool that enables you to create a chart and select icons based on your node type (for example; persons, companies, accounts etc), then you will quickly find that Gephi is not the right tool in this case.

Here, we introduce Aleph Data Desktop, a lightweight tool that allows you to map networks of people and influence pertinent to your investigation. Taking a step back, the tool can be used alongside Aleph’s online dataset of leaked files and corporate information (which can be found at https://data.occrp.org/). This resource is truly impressive with over 250 datasets that can be readily accessed covering 140 countries / territories.

But, focusing on the capabilities of Aleph Data Desktop as a standalone tool, we have to say that it is very effective and enables Investigators and Analysts to build their own case based on individual relationships then populate nodes with information that is pertinent to the case. It is also very customisable as it allows for Investigators to modify the colour and size of nodes as well as modifying the overall view of the network charts (hierarchical, circular, vertical and horizontal alignments). The tool also allows users to create and modify nodes within a table view – which can most certainly make the task of populating data into Aleph much more easier and faster.

Link Analysis charts produced on Aleph Data Desktop can be saved as a project file, enabling Investigators to switch between files as they see fit. The chart itself can be exported as an SVG, which is a very good choice considering that JPG and PNG have their own limitations with regard to picture quality.

So, Aleph Data Desktop is a highly recommended tool for Investigators and Analysts that require a lightweight and simple capability to create Link Analysis charts. Whilst the tool can be used as a much broader case tool by allowing users to input information in relation to each node, we certainly do believe that its value lies within its user interface and its easy drag and drop functionality. Overall, this tool comes highly recommended based on its main capability, output and lightweight composition.


Let's talk today Are you ready to begin discussing our range of training and capability development solutions?