OSINT Toolbox Talk: Scraping Telegram and Vkontakte like a pro, and streamlining video content searches

OSINT Toolbox Talk

Scraping Telegram and Vkontakte like a pro, and streamlining video content searches

It has been a very busy period for us at OS2INT – plenty of OSINT tool testing, training activities, and of course, a well-deserved summer break! For our readers who have yet to catch up on our latest OSINT Tool Review articles, now is the perfect opportunity to do so by reading about our top three OSINT tools recently reviewed. First we have the fantastic RUBY, a lightweight Python utility that enables users to conduct searches for video content on YouTube, BitChute and Rumble. Next, is the truly amazing Telegram scraping and analysis utility Telepathy developed by @JordanWildon. And, we should say that every OSINT’er who wants to keep track of the amazing capabilities of Telepathy in addition to very new and exciting upcoming features, you should follow Jordan on Twitter and be sure to let give him well-deserved feedback on his tool! Last, and by no means least, is our review of Spevktator; a very effective tool for OSINT and disinformation analysts to conduct passive scraping of Vkontakte accounts and analyse results in a local SQL database. In the same regard, our readers should be sure to follow @MischaU8 on Twitter in order to receive the latest updates regarding the tool. It goes without saying that we have made great use of Telepathy and Spevktator during the past few weeks, both of which are quite rightly necessary and relevant for our readers that are maintaining a constant eye on developments taking place in Ukraine.

The importance of effective video searches

The war in Ukraine is taking place in near real-time. Unlike previous conflicts of our time, events are being captured and distributed online as they are taking place. With regards to the ongoing effort by many organisations to document Russian war crimes taking place in Ukraine, the capability to streamline searches for video content containing suspected war crimes is of the utmost importance. Not only does such a capability enable OSINT Analysts and Digital Investigators to combine searches across multiple video streaming sites into one – thus reducing time spent, it can also allow for the discovery of content that may otherwise be difficult to locate.

What is RUBY?

OSINT on YouTube, BitChute and Rumble
No, we are not talking about the Ruby programming language – you may be happy to know! For this article, we are referring to a super lightweight, yet highly effective, Python utility that is named as an acronym for Rumble, Bitchute, and YouTube – the popular video hosting and streaming platforms.

What does it do?

We at OS2INT often say that the most effective tools are those that are the most simplistic with regards to installation and functionality – Ruby is by no means an exception! Broadly speaking, RUBY is a keyword-based search tool that will query Rumble, BitChute, and YouTube for keyword matches, and then extract results into a comma-separated value (CSV) file. The file itself contains scraped information for each search result including video author, profile username, author URL, and video URL – the latter two data results can enable OSINT Analysts and Digital Investigators to use additional tools to extract available metadata, even the videos themselves.

Installation and deployment

As we have already pointed out, RUBY is incredibly easy to install and configure. Quite simply, users should clone the utility’s Github repository either manually via Git Clone. Thereafter, the tool is installed by invoking pip install -r requirements.txt on the command-line interface. Deploying the tool is done by invoking python ruby.py followed by the user’s target keyword; for example, python ruby.py mariupol. And that is quite simply all that is required!

Analysing the output

Whilst the utility is running in the command-line interface, it will display search results for each of the video hosting and streaming sites. However, as we pointed out, the tool also outputs the results into a CSV file that contains all of the scraped search results. At this stage, it is for the user to determine what to do with the search results; for YouTube, there are additional tools that can be used to further analyse video and author metadata – including the awesome YouTube Metadata utility by Matt Wright!

Conducting multiple searches on YouTube, BitChute and Rumble

However, in our case, we opted to harness the awesome AI capabilities of Paliscope YOSE to collate our search results and visualise them. To demonstrate a fine use-case for RUBY and YOSE, we performed several searches for videos using keywords associated with known pro-Russian and pro-invasion disinformation actors, this resulted in several CSV files generated and processed through YOSE. As you can see in the image below, there were considerable overlaps between several users sharing likely disinformation content across several of the video sharing platforms.

Analysing OSINT outputs with Paliscope YOSE

Our final thoughts

Whether you are involved in monitoring current events or detecting disinformation actors involved in the distribution of video content, RUBY is a tool you should at least test in order to combine searches into one very smooth utility. There is certainly tonnes of potential for this tool, and we at OS2INT are very keen to see how it develops over time. More importantly, the tool’s CSV output presents many opportunities for OSINT Analysts and Digital Investigators as it can allow them visualise their results using a third-party application such as YOSE. So, all we can say is try this tool out and see how it works for you!

The Russian disinformation landscape

As the war in Ukraine continues, Telegram has undoubtedly become a primary source of on-the-ground reporting of Russian activity in near real-time. On the other side of the coin, pro-Russian and pro-invasion disinformation actors have taken to Telegram by establishing a significant number of accounts aimed at producing and circulating disinformation aimed at countering the Ukrainian narrative. These Russian disinformation channels and groups all have the same modus operandi, each will circulate content among each other and encourage subscribers and group members to redistribute false content on pro-Ukrainian channels and groups. So, with Telegram becoming a central focus in the information war, OSINT and disinformation analysts require effective tools to be able to acquire data concerning sources of disinformation and analyse the spread of false content.

What is Telepathy, and what does it do?

This is not the first time we at OS2INT have written about Telepathy. However, considering that Jordan Wildon, the talented developer behind Telepathy, has released the second version of the tool, we felt it absolutely necessary to produce another article and video to explain the tool and the awesome capabilities it provides. So, what is Telepathy?

Scraping disinformation data from Telegram

In our view, it is the ‘Swiss Army Knife’ of Telegram tools – whilst version 1 had awesome capabilities in itself, version 2 is most certainly on another level. Telepathy is a Python-utility that provides the user with several data scanning / acquisition options including:

  • Target (Chat / Group): This is the default basic scan which will find the title, description, number of participants, username, URL, chat type, chat ID, access hash, first post date and any applicable restrictions to the chat. For group chats, Telepathy will also generate a CSB-based memberlist (up to 5,000 members).
  • Comprehensive: This option will retrieve the same information as the basic scan, but will also archive a chat’s message history.
  • Forwards: This option will scan for messages that have been forwarded into a target channel/group, then create a CSV-based edgelist that can subsequently be analysed via a third-party data visualisation tool.
  • Media: This option will archive all media in a target channel / group alongside a comprehensive scan. Understandably, this process may take some time based on the amount of media contained within a channel / group.

Why Telepathy is your ‘go-to’ utility

It cannot be said enough – Telepathy has amazing capabilities and is reliably efficient in the way it operates. Unlike other Telegram-based utilities, it writes data to CSV output files asynchronously – this means that if you are rate-limited by Telegram during a data acquisition process, the data is not lost. Also, the utility’s output is very straightforward and not over-complicated, this means that CSV files generated during the data acquisition process can be easily processed into a data visualisation utility quickly and effectively. Lastly, Telepathy has been developed by a very talented person who has indicated upcoming features currently in development. All too many times, OSINT tools are developed and often abandoned – but this is most certainly not the case with Telepathy!

Installation and Deployment

Telepathy can be installed either by invoking pip install telepathy, or alternatively (as in our case) we manually closed the tool from the Github repository and installed it by invoking python3 setup.py install via the command line interface. Instructions regarding the use of the tool can be easily read on the tool’s Github repository. At the first run, the utility will prompt the user to input their Telegram API details.

Data analysis

During each scan, Telepathy will indicate the Telegram user ID’s most active on each channel / group. However, for OSINT and disinformation analysts looking to do a more comprehensive analysis of Telegram data, the outputs generated by Telepathy can be easily visualised using third-party applications such as Gephi.

Scraped data from Telegram of pro-Russian disinformation groups

However, considering that we wanted to analyse interactions across several scraped disinformation groups whilst also analysing memberlists and forward lists, we opted to use the Chat Analytics capabilities of Paliscope YOSE. If you are – like us – scraping from multiple groups, and you require the capability to isolate particular Telegram users in order to identify content that they have been posting – then YOSE is the solution you need. Considering that the channels and groups we scraped from were all in Russian, we then made use of the Offline Translation module that is integrated into YOSE – this took care of that issue for us!

Analysing Telegram chat data with Paliscope YOSE

Taking our analysis even further, we took our member and forward edgelists and dropped them into YOSE, enabling us to visually look at Telegram IDs that were members of multiple groups. Then, we analysed the spread of messages being forwarded from other sources of pro-Russian and pro-invasion disinformation sources, enabling us to identify additional groups and channels that we can analyse even further.

Analysing relationships between pro-Russian disinformation groups on Telegram

What more can be said!?

Overall, Telepathy is undoubtedly the ‘Swiss Army Knife’ of Python-based OSINT tools for Telegram. Easily install, seamless configuration, and effective results – this tool is a must-have for any OSINT and disinformation analyst that is keeping a close eye on sources of pro-Russian and pro-invasion disinformation on Telegram. Whilst collecting data is one thing, analysing the data is the most crucial step. This is why we opted to use YOSE’s capability to analyse and visualise interaction data and relationships between Telegram users, channels, and groups.

Link to tool:

Let's talk today Are you ready to begin discussing our range of training and capability development solutions?