Scraping Telegram and Vkontakte like a pro, and streamlining video content searches
It has been a very busy period for us at OS2INT – plenty of OSINT tool testing, training activities, and of course, a well-deserved summer break! For our readers who have yet to catch up on our latest OSINT Tool Review articles, now is the perfect opportunity to do so by reading about our top three OSINT tools recently reviewed. First we have the fantastic RUBY, a lightweight Python utility that enables users to conduct searches for video content on YouTube, BitChute and Rumble. Next, is the truly amazing Telegram scraping and analysis utility Telepathy developed by @JordanWildon. And, we should say that every OSINT’er who wants to keep track of the amazing capabilities of Telepathy in addition to very new and exciting upcoming features, you should follow Jordan on Twitter and be sure to let give him well-deserved feedback on his tool! Last, and by no means least, is our review of Spevktator; a very effective tool for OSINT and disinformation analysts to conduct passive scraping of Vkontakte accounts and analyse results in a local SQL database. In the same regard, our readers should be sure to follow @MischaU8 on Twitter in order to receive the latest updates regarding the tool. It goes without saying that we have made great use of Telepathy and Spevktator during the past few weeks, both of which are quite rightly necessary and relevant for our readers that are maintaining a constant eye on developments taking place in Ukraine.
RUBY: Streamlining video content searches on Rumble, YouTube and BitChute
The war in Ukraine is taking place in near real-time. Unlike previous conflicts of our time, events are being captured and distributed online as they are taking place. With regards to the ongoing effort by many organisations to document Russian war crimes taking place in Ukraine, the capability to streamline searches for video content containing suspected war crimes is of the utmost importance. Not only does such a capability enable OSINT Analysts and Digital Investigators to combine searches across multiple video streaming sites into one – thus reducing time spent, it can also allow for the discovery of content that may otherwise be difficult to locate.
What is RUBY?
No, we are not talking about the Ruby programming language – you may be happy to know! For this article, we are referring to a super lightweight, yet highly effective, Python utility that is named as an acronym for Rumble, Bitchute, and YouTube – the popular video hosting and streaming platforms.
What does it do?
We at OS2INT often say that the most effective tools are those that are the most simplistic with regards to installation and functionality – Ruby is by no means an exception! Broadly speaking, RUBY is a keyword-based search tool that will query Rumble, BitChute, and YouTube for keyword matches, and then extract results into a comma-separated value (CSV) file. The file itself contains scraped information for each search result including video author, profile username, author URL, and video URL – the latter two data results can enable OSINT Analysts and Digital Investigators to use additional tools to extract available metadata, even the videos themselves.
Installation and deployment
As we have already pointed out, RUBY is incredibly easy to install and configure. Quite simply, users should clone the utility’s Github repository either manually via Git Clone. Thereafter, the tool is installed by invoking pip install -r requirements.txt on the command-line interface. Deploying the tool is done by invoking python ruby.py followed by the user’s target keyword; for example, python ruby.py mariupol. And that is quite simply all that is required!
Analysing the output
Whilst the utility is running in the command-line interface, it will display search results for each of the video hosting and streaming sites. However, as we pointed out, the tool also outputs the results into a CSV file that contains all of the scraped search results. At this stage, it is for the user to determine what to do with the search results; for YouTube, there are additional tools that can be used to further analyse video and author metadata – including the awesome YouTube Metadata utility by Matt Wright!
However, in our case, we opted to harness the awesome AI capabilities of Paliscope YOSE to collate our search results and visualise them. To demonstrate a fine use-case for RUBY and YOSE, we performed several searches for videos using keywords associated with known pro-Russian and pro-invasion disinformation actors, this resulted in several CSV files generated and processed through YOSE. As you can see in the image below, there were considerable overlaps between several users sharing likely disinformation content across several of the video sharing platforms.
Our final thoughts
Whether you are involved in monitoring current events or detecting disinformation actors involved in the distribution of video content, RUBY is a tool you should at least test in order to combine searches into one very smooth utility. There is certainly tonnes of potential for this tool, and we at OS2INT are very keen to see how it develops over time. More importantly, the tool’s CSV output presents many opportunities for OSINT Analysts and Digital Investigators as it can allow them visualise their results using a third-party application such as YOSE. So, all we can say is try this tool out and see how it works for you!
Telepathy: Extract and analyse disinformation data from Telegram
As the war in Ukraine continues, Telegram has undoubtedly become a primary source of on-the-ground reporting of Russian activity in near real-time. On the other side of the coin, pro-Russian and pro-invasion disinformation actors have taken to Telegram by establishing a significant number of accounts aimed at producing and circulating disinformation aimed at countering the Ukrainian narrative. These Russian disinformation channels and groups all have the same modus operandi, each will circulate content among each other and encourage subscribers and group members to redistribute false content on pro-Ukrainian channels and groups. So, with Telegram becoming a central focus in the information war, OSINT and disinformation analysts require effective tools to be able to acquire data concerning sources of disinformation and analyse the spread of false content.
What is Telepathy, and what does it do?
This is not the first time we at OS2INT have written about Telepathy. However, considering that Jordan Wildon, the talented developer behind Telepathy, has released the second version of the tool, we felt it absolutely necessary to produce another article and video to explain the tool and the awesome capabilities it provides. So, what is Telepathy?
In our view, it is the ‘Swiss Army Knife’ of Telegram tools – whilst version 1 had awesome capabilities in itself, version 2 is most certainly on another level. Telepathy is a Python-utility that provides the user with several data scanning / acquisition options including:
Target (Chat / Group): This is the default basic scan which will find the title, description, number of participants, username, URL, chat type, chat ID, access hash, first post date and any applicable restrictions to the chat. For group chats, Telepathy will also generate a CSB-based memberlist (up to 5,000 members).
Comprehensive: This option will retrieve the same information as the basic scan, but will also archive a chat’s message history.
Forwards: This option will scan for messages that have been forwarded into a target channel/group, then create a CSV-based edgelist that can subsequently be analysed via a third-party data visualisation tool.
Media: This option will archive all media in a target channel / group alongside a comprehensive scan. Understandably, this process may take some time based on the amount of media contained within a channel / group.
Why Telepathy is your ‘go-to’ utility
It cannot be said enough – Telepathy has amazing capabilities and is reliably efficient in the way it operates. Unlike other Telegram-based utilities, it writes data to CSV output files asynchronously – this means that if you are rate-limited by Telegram during a data acquisition process, the data is not lost. Also, the utility’s output is very straightforward and not over-complicated, this means that CSV files generated during the data acquisition process can be easily processed into a data visualisation utility quickly and effectively. Lastly, Telepathy has been developed by a very talented person who has indicated upcoming features currently in development. All too many times, OSINT tools are developed and often abandoned – but this is most certainly not the case with Telepathy!
Installation and Deployment
Telepathy can be installed either by invoking pip install telepathy, or alternatively (as in our case) we manually closed the tool from the Github repository and installed it by invoking python3 setup.py install via the command line interface. Instructions regarding the use of the tool can be easily read on the tool’s Github repository. At the first run, the utility will prompt the user to input their Telegram API details.
During each scan, Telepathy will indicate the Telegram user ID’s most active on each channel / group. However, for OSINT and disinformation analysts looking to do a more comprehensive analysis of Telegram data, the outputs generated by Telepathy can be easily visualised using third-party applications such as Gephi.
However, considering that we wanted to analyse interactions across several scraped disinformation groups whilst also analysing memberlists and forward lists, we opted to use the Chat Analytics capabilities of Paliscope YOSE. If you are – like us – scraping from multiple groups, and you require the capability to isolate particular Telegram users in order to identify content that they have been posting – then YOSE is the solution you need. Considering that the channels and groups we scraped from were all in Russian, we then made use of the Offline Translation module that is integrated into YOSE – this took care of that issue for us!
Taking our analysis even further, we took our member and forward edgelists and dropped them into YOSE, enabling us to visually look at Telegram IDs that were members of multiple groups. Then, we analysed the spread of messages being forwarded from other sources of pro-Russian and pro-invasion disinformation sources, enabling us to identify additional groups and channels that we can analyse even further.
What more can be said!?
Overall, Telepathy is undoubtedly the ‘Swiss Army Knife’ of Python-based OSINT tools for Telegram. Easily install, seamless configuration, and effective results – this tool is a must-have for any OSINT and disinformation analyst that is keeping a close eye on sources of pro-Russian and pro-invasion disinformation on Telegram. Whilst collecting data is one thing, analysing the data is the most crucial step. This is why we opted to use YOSE’s capability to analyse and visualise interaction data and relationships between Telegram users, channels, and groups.
Spevktator: Scraping post content from Vkontakte channels and groups
Link to tool:
Disinformation and Russian social media
Russian disinformation is certainly not a new phenomena. Analysts will no doubt agree that social media has enabled Russian disinformation actors to extend their reach to an extent where the only practical solution to address this threat is through effective monitoring and counter-narratives. For the most part, Telegram has found itself to be the battleground currently taking place between ordinary Ukrainians reporting on developments taking place with regards to the war, and pro-Russian and pro-invasion disinformation actors. But, what about social media? According to recent figures by Statista, WhatsApp is used by most Russians (80.9%), followed by Vkontakte (VK) (76.4%), Instagram (63.7%), Telegram (50.8%), TikTok (46.6%), then Odnoklassiniki (OK) (45.1%).
VK is most certainly a unique social media platform as it is best described as Russia’s answer to Facebook. As of 2021, VK had 60.4 million active users, and is noted to be most popular among the younger generation of Russian users. More interestingly, in December 2021, Russian state-owned bank Gazprombank and insurance company Sogaz bought out 57.3% of Vkontakte shares, thus becoming the holders of the company’s controlling interest.
Disinformation on Vkontakte
For argument’s sake, Vkontakte can be seen as a state-owned Russian social media platform, with 60.4 million active users (and counting!), it is most certainly a valuable asset for the Russian government as it seeks to monitor and control the Russian narrative concerning the ongoing war in Ukraine. Essentially, Vkontakte has become a means for the Russian government to influence the Russian public opinions through propaganda.
Russia’s invasion of Ukraine has been somewhat of a tipping point for the OSINT community to an extent where several very talented developers on Github have created tools that be used to extract data and information directly from Vkontakte. In this OSINT Tool Review, we will look an impressive scraping utility called ‘Spevktator’.
What is Spevktator, and what does it do?
As detailed in the tool’s Github repository, Spevktator was created to help research domestic Russian propaganda narratives and serve as a monitoring hub for Vkontakte content. The best thing to note about the utility, it does not require a Vkontakte API, nor a username and password combination. But, this feature does come with some disadvantages.
Based on Python, the tool enables OSINT and disinformation analysts to passively scrape posts from public Vkontakte channels and groups from the command line. The tool can also be used to listen to targets, then perform some analytical functions through an SQL interface. Key features of the utility include:
Fetching all wall posts from public Vkontakte groups and channels
Extract all named-entities from scraped text
Retrieve the backlog of wall posts from Vkontakte communities from a specific date
Perform sentiment analysis on scraped posts
Translate extracted entities and post text from Russian to English (Requires DeepL translation API key)
Now that we have covered what the tool can do, our readers will by now be wondering whether the tool also scrapes media content with the posts – unfortunately, no, it doesn’t.
Installation and deployment
On the face of it, Spevktator appears to be a complicated tool for a OSINT novice to use. But, in reality, it is surprisingly straightforward; but beware, to get the most out of the utility, users are strongly encouraged to look up basic SQL commands.
Installation of the tool can be achieved through the usual processes, simply clone the Github repo and invoke pip3 install . in your command-line of choice. The developers of Spevktator have gone to additional lengths to provide users with a sample SQLite database dump from which they can experiment with. This sample database can be downloaded and installed by invoking wget -v -O data/vk.db.xz https://spevktator.io/static/vk_2022-09-04_lite.db.xz then xz -d data/vk.db.xz.
Otherwise, users can get straight to it and begin scraping their own dataset by first creating their SQLite database and configuring the sentiment analysts module by invoking spevktator install data/vk.db. Then, users can put the scraper to work against their target by invoking spevktator [OPTIONS: listen / fetch / backfill] [TARGET DB: data/vk.db] CHANNEL_ID for example spevktator listen data/vk.db vkusnoitochka.
To use the DeepL translation API to translate scraped content from Russian to English, users will need to set the API by invoking DEEPL_AUTH_KEY environment variable, then applying --deepl-auth-key as an additional command line argument. Additionally, users can also use the utility through a proxy by implementing the SPEVKTATOR_PROXY environment variable, then invoking the --spevktator-proxy command line argument.
As we said earlier, Spevktator enables user to visualise their scraped data locally. This is done by invoking datasette data/ in the command line then accessing the web interface via the localhost ip http://127.0.0.1:8001. Here, users can see the scraped content in a number of different table views. Scraped data includes post IDs, channel / group names, post date timestamps, text, translated text, likes, shares, and views.
Additionally, users can visualise sentiment analysis in a chart interface and conduct keyword searches against collected data. It should also be said that a lot more can be done by the users in this regard by learning a little SQL and using the web interface’s own custom SQL query interface to do a more thorough data drill-down.
Summing it all up
What we really like about this utility is that it conduct passive scraping, no Vkontakte API or username / password combination is required. But this does obviously have limitations that Spevktator’s developers have actively pointed out. For example, the utility cannot scrape from private groups. Additional limitations that should be noted include:
The tool doesn’t collect comments and other personal information
Sentiment prediction is only moderately in quality
Post metrics are only tracked for a limited duration
Post text longer than 2500 characters are not translated
The tool has limited error handling and data loss recovery
So, though the tool has obvious limitations, we do like that fact that the developers are forthcoming about them. Equally, a lot of credit should also go to the developers on being very transparent about future developments which include the ability to extract images, videos, and user comments; a more interactive UI where scraping actions can be configured; user authentication for private groups; and other installation options.
Overall, our hats go off to the developers of Spevktator for releasing a tool that shows a huge amount of potential and brings a lot of capabilities to OSINT and disinformation analysts. This is especially the case when taking into account that OSINT tools and scrapers for Vkontakte are limited in quantity when compared against other tools for Instagram, Telegram, and Twitter, for example. So, we at OS2INT will be keeping a very close eye on future releases of Spevktator!
Joseph Jones | Founder of OS2INT and Director of Capability Development at Paliscope
Joseph Jones is a former British military intelligence operator and former National Crime Agency intelligence officer with more than 16 years of intelligence-gathering and investigative experience. He holds a BSc (Hons) Intelligence and Cyber Security from Staffordshire University and is also an external expert for the European Union Agency for Law Enforcement Training (CEPOL), the European Border and Coast Guard Agency (FRONTEX), the European Union Agency for Cybersecurity (ENISA) and Expertise France.
Let's talk today Are you ready to begin discussing our range of training and capability development solutions?