OSINT Toolbox Talk: Scraping social media data, analysing disinformation, and batch scraping from Telegram

OSINT Tool Review

Batch scraping from Telegram channels and groups using TG-API

The value of Telegram for OSINT Analysts

Telegram is undoubtedly a vital source of data and information concerning the ongoing war in Ukraine. On the one hand, Telegram channels and groups created by local civilians are being used to report on Russian troop movements; on the other hand, pro-Russian and pro-invasion disinformation actors have created a significant number of channels to broadcast their false narratives. Whilst Telegram does offer users the capability to export chat histories through the native export feature that can be found on the Telegram desktop application, tools that can batch extract chats are considerably few and far between. However, one such tool that offers OSINT analysts the capability to batch scrape from Telegram channels and groups is a Python-based utility named TG-API [Telegram API].

Why would you need to batch scrape?

Using the ongoing military situation in Ukraine as an example, there is a real risk of ‘information – or data – overload’. This is caused by the huge amount of Telegram channels and groups that exist within this space and the vast amount of information being posted by users on a daily basis. Using the native export chat feature on the Telegram desktop application is quite simply not a feasible option as it would take days – or perhaps weeks – to archive each group individually. Batch scraping would at least enable OSINT analysts to continuously scrape from Telegram channels and groups, then use extracted data through an effective third-party analysis tool.

What can TG-API do?

TG-API provides several very useful functions. Its core capability is that it individually or batch scrapes from Telegram channels and groups, then generates JSON files containing the scraped data. Such data includes information regarding the target channel / group in addition to scraped user posts. Additionally, the utility provides users with the capability to generate a CSV file based on the aforementioned JSON files – which is especially useful when using a third-party platform to analyse the results.

Installation and deployment

Cloning the tool from its Github repository is very straightforward, and installation of the tool using Python is done by invoking the standard command pip install -r requirements.txt. However, depending on your operating system of choice, some of the required Python libraries such as Louvain, Matplotlib, and Pandas will need to be manually installed by invoking pip install [INSERT TARGET LIBRARY HERE]. After all of the required Python libraries have been installed, your Telegram API credentials need to be inserted into the config.ini file located in the utility’s root folder.

Once all of the configurations are complete, the tool can now scrape from your target Telegram channels / groups by invoking python main.py --telegram-channel [INSERT CHANNEL NAME]. However, if you need to batch scrape from multiple sources,  this can be achieved by creating a .txt file with a list of target Telegram channels / groups (one per line) and saving it in the utility’s root folder. Then, you can run the tool to scrape from multiple sources by invoking the command python main.py --batch-file [PATH TO TXT FILE].

The utility also provides users with the capability to scrape new messages from target Telegram channels / groups by invoking the command python main.py --telegram-channel channelname --min-id [INSERT LAST ID NUMBER SCRAPED].

As we earlier pointed out, TG-API works by scraping Telegram channel / group data including metadata and posts, then saving them in JSON format. But, if you require the scraped data to be in CSV format, this can be easily achieved by invoking the command python build-datasets.py.

Analysing the output

Extracting chat data from Telegram
TG-API is also meant to provide users with the capability to produce a Gephi file based on its output – ultimately enabling users to visualise collected data. Unfortunately, we found that this feature has a bug which prevents it from working (hopefully the utility’s developer can resolve this issue). That said, and going back to what we indicated earlier, collecting vast amounts of data from Telegram could be a useless task if you have no way to effectively analyse it.

To analyse the scraped data, we turned to YOSE by Paliscope – specifically its Chat Analytics module. In YOSE, we established a comprehensive keyword list containing a whole range of Russian military equipment so that we can later identify and analyse interactions where there were keyword matches. To process our Telegram data into YOSE, we simply used its drag-and-drop feature and then identified the relevant columns containing the Telegram chat data, the results (as shown below) are very good!
Analysing chat data from Telegram using Paliscope YOSE
Telegram chat and interaction analysis using OSINT

Taking our analysis even further, we used YOSE to analyse the dataset and visualise the flow of chats between various users and instances where messages have been forwarded from one channel to another. As you can see from the image below, we were able to create an effective intelligence picture concerning our scraped chats and visualise how chats and messages are being shared between various Telegram channels / groups.
Paliscope YOSE analysing Telegram chat data

Our final thoughts

TG-API is quite a good tool for OSINT analysts that need the capability to batch scrape from multiple Telegram channels and groups. Whilst the utility itself is relatively new, some features do contain bugs or deprecation warnings. This means that unless these issues are addressed soon, the tool may not effectively function in the short term. Issues aside, the tool is very capable of extracting vast amounts of Telegram data from multiple sources and generating data-sets that can be effectively analysed using third-party applications. As we already pointed out, there is a genuine need for OSINT analysts monitoring the situation in Ukraine to have the capability to batch scrape from multiple Telegram channels / groups. But, this data is useless unless you have the capability to process it and analyse it effectively.

Let's talk today Are you ready to begin discussing our range of training and capability development solutions?