OSINT Toolbox Talk: Investigating social media usernames, extracting Pastebin data, and extracting Telegram chats

OSINT Toolbox Talk

Investigating social media usernames, extracting Pastebin data, and extracting Telegram chats

Our first OSINT Toolbox Talk article for 2022 is now finally here after several weeks of tool testing by the OS2INT team! This latest article builds on the successes we have had since we established OS2INT in 2021, and we will continue to provide our readers with all the latest insight regarding new OSINT tools for the foreseeable future.

So, let’s get straight into it! The first tool we will discuss in this article will be ‘Marple’, yet another Python-based utility by the very talented developer known as ‘Soxoj‘, this tool is quite unique as it enables OSINT Analysts and Digital Investigators to conduct username-based searches across multiple search engines simultaneously. Next on our list is ‘Pasta’, which is a very lightweight and effective Python script that scrapes Pastebin without the need for an API then output raw pastes very neatly in a .txt file. Last, and by no means least, we discuss the outstanding capabilities of TeleParser, a very effective and fast utility that uses the Telegram API to extract chat data from channels and groups then output chats within a structured CSV or JSON file. But, we go even further by demonstrating how the data from TeleParser can be analysed through the awesome Chat Analytics capabilities of Paliscope YOSE.

This neatly leads us onto our big announcement that Joseph Jones will be presenting a series of webinars for Paliscope focusing on OSINT tools for scraping chat data and subsequent analysis. To signup for the webinars, go to: https://www.paliscope.com/2022/01/12/live-webinar-learn-how-to-collect-and-analyze-chat-data-from-open-sources-paliscope/ .

Identifying and extracting social media user links with Marple Identifying and extracting social media user links with Marple https://github.com/soxoj/marple

Undoubtedly, there is a large volume of username-based OSINT utilities; each of these tools has its own unique advantages and disadvantages. The majority of such tools will query websites and social media applications themselves, returning any relevant results – or at the very least, a probability of a positive match. However, there is an overall lack in the number of OSINT tools that will conduct username queries across multiple search engines simultaneously and provide Digital Investigators with results. Here, we will now introduce ‘Marple’, another lightweight Python utility developed by the very talented Github contributor ‘soxoj‘.

So, what is Marple and what does it do? In short, Marple will run several search instances for a username via a range of search engines and present results either within the command-line interface or through its own comma-separated value (CSV) output. The search engines that Marple will query include:

  • Google
  • DuckDuckGo
  • Yandex
  • AOL
  • Ask
  • Bing
  • Startpage
  • Yahoo
  • Mojeek
  • Dogpile
  • Torch
  • Qwant

With the exception of Yandex (which requires an API in order to conduct searches), the search engines will conduct their searches via a scraping method. What this means is that Marple will run searches on-behalf of the user in the background without the need for an API for each search engine. However, Yandex’s own configurations restrict automated searches, meaning that users will need to obtain a Yandex API before Marple can run search instances.

In terms of the overall functionality of Marple, it is very easy to install and deploy – its lightweight configuration makes it quite user-friendly. Also, the utility itself comes with a range of options that can be useful for Digital Investigators. For example, it has the capability for any searches to be passed through a proxy – providing Digital Investigators with the capability to conduct regional-based searches whilst also providing greater anonymity. The tool’s output can either be displayed within the command line interface or saved to a CSV file, the latter of which is really useful if Digital Investigators intend to analyse collected data through an analytics platform such as Paliscope YOSE. However, what impresses us the most is Marple’s capability to identify PDF documents associated with target usernames.

Overall, we ran several tests of Marple against several target usernames – the results of which were very accurate. In some instances, we were also able to identify PDF documents associated with the target usernames. So, our readers may ask why we would use Marple in our Digital Investigations; the answer to this question is that it will save time by conducting multiple searches simultaneously and present us with extracted information concerning usernames and associated documents. If we had to conduct this method of searching manually, we would estimate that it would take us within the region of one hour – with Marple, it takes several seconds!

All-in-all, needless to say, we love this tool – it is lightweight, easy to use, and delivers very effective results. In our view, Marple is another fine example of an effective OSINT tool developed by the very talented ‘soxoj‘, and we certainly look forward to testing out their other tools in the very near future!

Scraping targeted data from Pastebin with 'Pasta' Scraping targeted data from Pastebin with 'Pasta' https://github.com/Kr0ff/Pasta

Pastebin is an essential source of valuable information for Digital Investigators as it is often used to ‘dump’ leaked information ranging from details of alleged CSAM criminals in addition to breached user credentials. Additionally, in an interesting research paper, named authors Adrian Riesco, Eduardo Fidalgo, Mhd Wesam Al-Nabki, Francisco,Janez-Martino, and Enrique Alegre rightly pointed out that PasteBin is used CSAM criminals to distribute links to CSAM content. Until April 2020, Digital Investigators had the capability of using PasteBin’s native search capability to conduct their own targeted searches for illegal content. However, as a result of alleged ‘abuse’ undertaken by PasteBin users, this search capability was disabled. Thereafter, Digital Investigators had no choice but to conduct Google searches for pastes by applying the ‘site’ Google Dark technique. However, this method is highly unreliable as it very much depends on whether Google has indexed certain pastes of investigative interest; even so, the indexing of PasteBin content could – in theory – take up to several weeks.

One method that Digital Investigators can use to overcome the aforementioned challenges is to scrape content directly from PasteBin and then analyse extracted content. More effective scraping can be conducted by using the PasteBin API, though it should be noted that access to this API comes at a cost. However, there is a highly effective tool that can scrape content from PasteBin without the need for an API – that tool is called ‘Pasta’ – the subject of this latest OSINT Tool Review from the OS2INT team.

So, what is ‘Pasta’? It is another Python utility that is ultra-lightweight in composition that can be installed quite simply by invoking pip install -r requirements.txt via the command-line interface. Thereafter, Pasta can be deployed immediately by selecting the relevant options from the command-line interface. The utility is highly flexible as it offers the following core capabilities:

  • Generate random eight-character-long strings similar to those identifying real PasteBin entries
  • Use randomly generated strings to bruteforce PasteBin for possible hidden valid pastes
  • View the contents of PasteBin entries
  • Scrape the most recent archive of PasteBin
  • Search for sensitive information from downloaded pastes including emails, usernames, and IP addresses
  • Scrape all pastes from a user account

Overall, Pasta is a very flexible and fast scraper that produces some excellent results in the form of extracted pastes which are locally saved as .txt files. Most certainly, this tool can be used by Digital Investigators to continuously monitor PasteBin in order to identify potential criminal activity including the dumping of leaked user credentials, CSAM link sharing, in addition to information that exposes alleged CSAM criminals. However, we should stress that the latter form of information is attributed to vigilantism and should not be considered as evidence in any way, shape, or form.

So, Pasta is undoubtedly a fantastic resource for Digital Investigators to scrape data from PasteBin. During our test of the tool, we were able to extract over 2000 individual pastes, the contents of each varied considerably. Analysing this data manually would take an extraordinary amount of time and resources to the extent where the question would inevitably be asked as to whether scraping PasteBin data is simply worth the time and effort. However, we overcame this issue by processing the data through YOSE [Your Own Search Engine] by Paliscope. Using this tool, we were able to visualise all of the collected data and how they were connected. Even better, we used YOSE’s super-charged AI to visualise instances where references to CSAM was contained within the data and connect that same data to individual usernames.

All-in-all, Pasta is a must-have tool for Digital Investigators that require the capability to scrape content from PasteBin, and it most certainly comes with our highest recommendation. However, it must be remembered that scraped data alone does not hold any intelligence value unless it is processed and analysed accordingly. To achieve this task, we highly recommend the use of an AI-driven analysis solution such as Paliscope YOSE.

Scraping and parsing Telegram chat data with TeleParser Scraping and parsing Telegram chat data with TeleParser https://github.com/artmih24/TeleParser

First and foremost, we would like to wish our readers a very happy New Year, and our huge thanks for returning to our page to read our very first OSINT Tool Review of 2022.

Without further ado, we will jump right in and introduce a very lightweight, but effective, Python utility called TeleParser. It is a single-capability utility that uses the Telegram API to extract Telegram chat data from channels and groups, then present scraped data within a CSV or JSON format. One very important reason as to why we are presenting this tool as it will coincide with an upcoming webinar by our strategic partner Paliscope that will focus on chat analytics. As such, we thoroughly recommend our readers to signup to watch the webinar scheduled for 26 January by registering via this link: https://www.paliscope.com/2022/01/12/live-webinar-learn-how-to-collect-and-analyze-chat-data-from-open-sources-paliscope/.

By now, many of our readers will be correctly pointing out that the Telegram web application provides users with the capability to export chat data from groups, channels and individual interactions alongside any uploaded media including images, videos and GIFs. Currently, the exported chat data that Telegram provides is either in HTML or JSON format. Whilst JSON is somewhat of a structured format that can allow Digital Investigators to analyse data contained within; unfortunately for Telegram, this isn’t the case. So, should a situation arise where Digital Investigators need to analyse a Telegram channel that could¬†contain several thousand messages, the most efficient way to do so would normally involve the use of a third-party analysis tool such as Paliscope YOSE. However, what the Digital Investigator needs to do first is structure the Telegram chat data in a format that can be processed and analysed accordingly. Doing this manually could take an extraordinary amount of time, to the extent where it may beg the question as to whether such a task is worth the time and resources required. However, the subject of this OSINT Tool Review – TeleParser – will achieve this task efficiently and quickly.

As earlier mentioned, TeleParser is a lightweight utility that can be run without installing any prerequisite modules. However, this utility will require the user to input the Telegram API credentials within the config.ini file. The utility can then be run by simply invoking python teleparser.py. From this point, the utility will request for the target channel or group ID to be inputted. After doing so, the scraper will run and gather all available chat data, then present extracted data within a CSV or JSON file.

So, our readers may now be asking what can be done with this data once it is extracted? During our tests of the utility, we scraped a chatroom associated with a far-right organisation, resulting in a CSV file 91MB in size being generated. This file can now be manually analysed either within Microsoft Excel – or for our more proficient users – be processed using the programming language ‘R’. Instead, we opted to use the super-charged AI-power of Paliscope YOSE to visualise the extracted chat data – the results of which was simply fantastic.

All-in-all, TeleParser is a utility that deserves a great deal of credit based on its out-of-the-box and easy-to-use functionality. In total, we were able to use TeleParser to scrape just short of 100 Telegram channels and groups within one single day. Not only does this point out the risk from far-right groups using Telegram to spread hateful and racially-charged content, but also the fact that Digital Investigators have a great opportunity to leverage chat data from the instant messaging application and analyse it accordingly. Overall, TeleParser comes highly recommended by the OS2INT team. At this point, we will once again encourage our readers to sign up to the upcoming Paliscope webinars to learn more about how you can scrape chat data using OSINT utilities such as TeleParser, then use Paliscope YOSE to analyse such data using its native Chat Analytics capability. Our readers can register for the first of a two-part webinar series via the URL: https://www.paliscope.com/2022/01/12/live-webinar-learn-how-to-collect-and-analyze-chat-data-from-open-sources-paliscope/.

Let's talk today Are you ready to begin discussing our range of training and capability development solutions?