TeleParser: Scraping and parsing Telegram chat data
Link to tool: https://github.com/artmih24/TeleParser
First and foremost, we would like to wish our readers a very happy New Year, and our huge thanks for returning to our page to read our very first OSINT Tool Review of 2022.
Without further ado, we will jump right in and introduce a very lightweight, but effective, Python utility called TeleParser. It is a single-capability utility that uses the Telegram API to extract Telegram chat data from channels and groups, then present scraped data within a CSV or JSON format. One very important reason as to why we are presenting this tool as it will coincide with an upcoming webinar by our strategic partner Paliscope that will focus on chat analytics. As such, we thoroughly recommend our readers to signup to watch the webinar scheduled for 26 January by registering via this link: https://www.paliscope.com/2022/01/12/live-webinar-learn-how-to-collect-and-analyze-chat-data-from-open-sources-paliscope/.
By now, many of our readers will be correctly pointing out that the Telegram web application provides users with the capability to export chat data from groups, channels and individual interactions alongside any uploaded media including images, videos and GIFs. Currently, the exported chat data that Telegram provides is either in HTML or JSON format. Whilst JSON is somewhat of a structured format that can allow Digital Investigators to analyse data contained within; unfortunately for Telegram, this isn’t the case. So, should a situation arise where Digital Investigators need to analyse a Telegram channel that could contain several thousand messages, the most efficient way to do so would normally involve the use of a third-party analysis tool such as Paliscope YOSE. However, what the Digital Investigator needs to do first is structure the Telegram chat data in a format that can be processed and analysed accordingly. Doing this manually could take an extraordinary amount of time, to the extent where it may beg the question as to whether such a task is worth the time and resources required. However, the subject of this OSINT Tool Review – TeleParser – will achieve this task efficiently and quickly.
As earlier mentioned, TeleParser is a lightweight utility that can be run without installing any prerequisite modules. However, this utility will require the user to input the Telegram API credentials within the config.ini file. The utility can then be run by simply invoking
python teleparser.py. From this point, the utility will request for the target channel or group ID to be inputted. After doing so, the scraper will run and gather all available chat data, then present extracted data within a CSV or JSON file.
So, our readers may now be asking what can be done with this data once it is extracted? During our tests of the utility, we scraped a chatroom associated with a far-right organisation, resulting in a CSV file 91MB in size being generated. This file can now be manually analysed either within Microsoft Excel – or for our more proficient users – be processed using the programming language ‘R’. Instead, we opted to use the super-charged AI-power of Paliscope YOSE to visualise the extracted chat data – the results of which was simply fantastic.
All-in-all, TeleParser is a utility that deserves a great deal of credit based on its out-of-the-box and easy-to-use functionality. In total, we were able to use TeleParser to scrape just short of 100 Telegram channels and groups within one single day. Not only does this point out the risk from far-right groups using Telegram to spread hateful and racially-charged content, but also the fact that Digital Investigators have a great opportunity to leverage chat data from the instant messaging application and analyse it accordingly. Overall, TeleParser comes highly recommended by the OS2INT team. At this point, we will once again encourage our readers to sign up to the upcoming Paliscope webinars to learn more about how you can scrape chat data using OSINT utilities such as TeleParser, then use Paliscope YOSE to analyse such data using its native Chat Analytics capability. Our readers can register for the first of a two-part webinar series via the URL: https://www.paliscope.com/2022/01/12/live-webinar-learn-how-to-collect-and-analyze-chat-data-from-open-sources-paliscope/.