OSINT Toolbox Talk: Extracting TikTok user data, Instagram user information and Dark Web URLs

OSINT Toolbox Talk

Extracting TikTok user data, Instagram user information and Dark Web URLs


In this week’s OSINT Toolbox Talk, we once again review three of the most effective OSINT and SOCMINT tools reviewed this week. We will begin by introducing ‘TikTok Scraper’, a very effective NodeJS-based script that extracts all TikTok user data and media, enabling Digital Investigators and OSINT Analysts to process data outputs through an integrated data processing system such as Paliscope YOSE. Secondly, we will discuss the capabilities of ‘Toutatis’, a very easy-to-use Python-script that extracts Instagram user data such as emails and telephone numbers from target profiles. Last, but not least, we will look at the capabilities of ‘OnionSearch’, another Python-based script that enables investigators to conduct searches against multiple Onion search engines simultaneously, then extract URLs into a TXT file. In our review for ‘OnionSearch’, we also point out the value of using an effective investigations platform such as Paliscope Discovry to further develop digital investigations against Dark Web targets.


Extracting TikTok user data and media with 'TikTok Scraper' Extracting TikTok user data and media with 'TikTok Scraper' https://github.com/drawrowfly/tiktok-scraper

There is a general lack of complete OSINT tools dedicated towards TikTok; but in this OSINT Tool Review, we closely look at one very effective NodeJS-based script that provides Digital Investigators with a wide range of capabilities ranging from extracting TikTok video posts, scrape user data, and extract media / data associated with hashtags. ‘TikTok Scraper’ is a lightweight but highly powerful tool that is very easy to install and deploy either directly through the NodeJS command-line interface or by using Docker. According to the Github repository associated with ‘TikTok Scraper’, the tool has the following key capabilities:

  • Download unlimited post metadata from the user, hashtag, trends, or music-ID pages
  • Save post metadata to JSON/CSV files
  • Download media (with or without watermark) and save it to a ZIP file
  • Sign URLs and create a custom request to the TikTok API
  • Extract metadata from the user, hashtag and single video pages
  • Save previous progress and download only new videos that weren’t downloaded before
  • View and manage previously downloaded post history
  • Scrape and download user, hashtag, music feeds and single videos in batch mode

It goes without saying, this tool is highly recommended based on its wide range of capabilities in addition to the fact that it is quick-to-install and quick-to-deploy. Whilst the prospect of using NodeJS for the first time may be a daunting prospect, the installation instructions provided by the developers on Github are extremely comprehensive for any beginner. The functionality of the script is flawless, though it is hoped that it can be further complemented with a visual interface – this would certainly lend greater power and flexibility to the tool itself.

The variety of data that TikTok Scraper can extract is significantly far-ranging. During our test, we uploaded extracted videos and user data into Paliscope YOSE; the result of which was very extensive. YOSE’s AI-driven video analysis capabilities provided us with a detailed view of the content contained within the media such as violent activity, money, drugs and sexual activity. At the same time, YOSE’s AI capabilities also enabled us to visualise extracted user data within an interactive link analysis chart. All-in-all, TikTok Scraper combined with Paliscope YOSE proved to be exceptionally effective.


Extracting Instagram user data with 'Toutatis' Extracting Instagram user data with 'Toutatis' https://github.com/megadose/toutatis

The number of OSINT tools for Instagram is considerably high, to the extent where the number of tools listed in Github far exceeds the number listed for other social media platforms including Facebook, VKontakte and TikTok. Several factors account for this high volume of Instagram-focused tools. For example, Instagram’s source code makes it considerably easier to extract information and media, and the ratio of new users joining Instagram versus Facebook and others is quite significant. In this tool, we have sifted through the collection of Instagram-focused OSINT tools to show you ‘Toutatis’, a lightweight but extremely effective tool that can extract public user information from Instagram accounts. Now, when we say ‘public’ information, we are referring to user data that is visible to anyone, but also information that belongs to private accounts (provided that your sock puppet is following the target account).

Toutatis has been developed by the same individual behind ‘Holehe‘, a highly effective Python-based script that can verify email addresses against 200 online sources. Our review of ‘Holehe’ can most certainly be applied for ‘Toutatis’, the developer clearly has attention to detail and has developed this tool in such a way that is easy to deploy. One detail that is particularly liked about this script is that it does not require your username and password. Instead, the user has to obtain the sessionID code that can be found from the Application tab located within the Developer console window on Google Chrome or Mozilla Firefox. This code is then applied and executed within the command-line interface alongside the target username being searched. The method of applying the sessionID means that ‘Toutatis’ latches onto the Instagram session within your browser window, bypassing the need to input your username and password during each search.

The script was tested against an Instagram profile created for our new office Puppy – so by all means give him a follow! As you can see in the above image, the type of data extracted by the script was relatively comprehensive. Such data extracted by the tool include:

  • Username
  • Profile name
  • User ID
  • Whether the target is a verified account
  • Whether the target is a business account
  • Number of followers
  • Number of profiles following the target
  • Number of posts
  • Number of tags in the posts
  • Number of external URLs
  • Number of IGTV posts
  • Biography
  • Public Email (this can only be extracted provided that the privacy settings implemented by the user allow it)
  • Public Phone Number (again, this is dependent on the privacy settings or lack thereof for the target account)
  • Obfuscated Phone Number (this information is provided regardless of the privacy settings and consists of the country code and the last two digits of the number)
  • Profile Picture URL

To wrap up, ‘Toutatis’ is a really nice tool for Digital Investigators to include in their toolkit. The tool is lightweight, easy to install and also easy to deploy against target Instagram users. It goes without saying, the developer has done a great job with this tool and has clearly focused in such a way that does not become over-burdened with capabilities such as media extraction. It is worth noting that Instagram continuously refines its security safeguards to protect user data; such safeguards are usually aimed at preventing tools from scraping Instagram user media. Therefore, the fact that this tool is focused purely on user information as opposed to media is a good decision. All-in-all, ‘Toutatis’ is a great tool that delivers very good results!


Scraping Dark Web URLs using 'OnionSearch' Scraping Dark Web URLs using 'OnionSearch' https://github.com/megadose/OnionSearch

Admittedly, we have neglected the Dark Web for some time with regards to showcasing effective tools that can be deployed against Onion sites. One primary reason for this is because of research undertaken by ourselves in addition to other crime analysts which suggests that criminals are beginning to abandon the Dark Web as a space to undertake illicit activity. Instead, various marketplaces are now beginning to appear on messaging applications – Telegram being the primary source. Not only have some marketplaces been moved away from the Dark Web towards messaging platforms, criminals engaged in the making and distribution of child sexual abuse (CSA) content have also taken advantage of privacy controls exercised by several messaging applications. That aside, the Dark Web remains a well-established space for criminals; as such, Digital Investigators and OSINT analysts should remain aware of the latest tool and techniques that can be deployed in this regard.

One tool that we shall talk about is ‘OnionSearch’, a very well developed Python-based script by the same individual behind ‘Holehe‘ and ‘Toutatis‘. The tool is developed to conduct multiple searches against a term through several Onion search engines then extract URLs matching the search term. The script’s Github repository points out that ‘OnionSearch’ conducts searches on the following engines:

  • ahmia
  • darksearchio
  • onionland
  • notevil
  • darksearchenginer
  • phobos
  • onionsearchserver
  • torgle
  • onionsearchengine
  • tordex
  • tor66
  • tormax
  • haystack
  • multivac
  • evosearch
  • deeplink

However, it should be noted that during our review, some of the search engines returned various errors – these were all due to a combination of session timeouts and the search engine itself being unavailable or removed altogether. That aside, during our test, we collected a considerable number of URLs pointing to sources of illicit activity. As effective as the tool is in conducting multiple general searches on openly visible Onion pages, it has to be said that the script is only scratching the surface due to a vast number of Onion pages being protected behind encryption protocols and invite-only features. Nevertheless, the tool is quite effective as it does what it intends to do. What we especially like with the tool is its capability to output a TXT file containing all of the extracted URLs. Also, we like that the tool is flexible to an extent where it allows users to set their own Tor proxy, set page load limits, and set the tool to continuously write to the output file (which is very useful when involved in a large investigation).

Whilst the tool’s output is of intelligence value in itself, it is worth showing how we processed this data using Paliscope Discovry, specifically using the platform’s built-in Tor browser and its connected service to Web-IQ. Using Discovry, we navigated to the URLs extracted using ‘OnionSearch’ then created a forensic copy of several pages of investigative interest. Using Discovry’s connected service to Web-IQ, we gained maximum visibility of several target URLs over a long time period and discovered several Bitcoin addresses associated with the URLs. All of the data collected through ‘OnionSearch’ and Paliscope Discovry was then used to create a visual intelligence product in the form of a link chart in addition to a Forensic Investigation report.

All-in-all, ‘OnionSearch’ is a great tool that provides a good amount of effectiveness when searching across multiple Onion search engines. Its output is also really useful for Digital Investigators who need to compile a list of target URLs. Undoubtedly, the real power lies behind what the investigator does with that collected data. Using Paliscope Discovry, we created forensic snapshots of several target URLs and leveraged its power to search within Web-IQ’s database in order to extract further information of intelligence value and develop it even further.


Let's talk today Are you ready to begin discussing our range of training and capability development solutions?