Spevktator: Scraping post content from Vkontakte channels and groups
Disinformation and Russian social media
Russian disinformation is certainly not a new phenomena. Analysts will no doubt agree that social media has enabled Russian disinformation actors to extend their reach to an extent where the only practical solution to address this threat is through effective monitoring and counter-narratives. For the most part, Telegram has found itself to be the battleground currently taking place between ordinary Ukrainians reporting on developments taking place with regards to the war, and pro-Russian and pro-invasion disinformation actors. But, what about social media? According to recent figures by Statista, WhatsApp is used by most Russians (80.9%), followed by Vkontakte (VK) (76.4%), Instagram (63.7%), Telegram (50.8%), TikTok (46.6%), then Odnoklassiniki (OK) (45.1%).
VK is most certainly a unique social media platform as it is best described as Russia’s answer to Facebook. As of 2021, VK had 60.4 million active users, and is noted to be most popular among the younger generation of Russian users. More interestingly, in December 2021, Russian state-owned bank Gazprombank and insurance company Sogaz bought out 57.3% of Vkontakte shares, thus becoming the holders of the company’s controlling interest.
Disinformation on Vkontakte
For argument’s sake, Vkontakte can be seen as a state-owned Russian social media platform, with 60.4 million active users (and counting!), it is most certainly a valuable asset for the Russian government as it seeks to monitor and control the Russian narrative concerning the ongoing war in Ukraine. Essentially, Vkontakte has become a means for the Russian government to influence the Russian public opinions through propaganda.
Russia’s invasion of Ukraine has been somewhat of a tipping point for the OSINT community to an extent where several very talented developers on Github have created tools that be used to extract data and information directly from Vkontakte. In this OSINT Tool Review, we will look an impressive scraping utility called ‘Spevktator’.
What is Spevktator, and what does it do?
As detailed in the tool’s Github repository, Spevktator was created to help research domestic Russian propaganda narratives and serve as a monitoring hub for Vkontakte content. The best thing to note about the utility, it does not require a Vkontakte API, nor a username and password combination. But, this feature does come with some disadvantages.
Based on Python, the tool enables OSINT and disinformation analysts to passively scrape posts from public Vkontakte channels and groups from the command line. The tool can also be used to listen to targets, then perform some analytical functions through an SQL interface. Key features of the utility include:
- Fetching all wall posts from public Vkontakte groups and channels
- Extract all named-entities from scraped text
- Retrieve the backlog of wall posts from Vkontakte communities from a specific date
- Perform sentiment analysis on scraped posts
- Translate extracted entities and post text from Russian to English (Requires DeepL translation API key)
Now that we have covered what the tool can do, our readers will by now be wondering whether the tool also scrapes media content with the posts – unfortunately, no, it doesn’t.
Installation and deployment
On the face of it, Spevktator appears to be a complicated tool for a OSINT novice to use. But, in reality, it is surprisingly straightforward; but beware, to get the most out of the utility, users are strongly encouraged to look up basic SQL commands.
Installation of the tool can be achieved through the usual processes, simply clone the Github repo and invoke
pip3 install . in your command-line of choice. The developers of Spevktator have gone to additional lengths to provide users with a sample SQLite database dump from which they can experiment with. This sample database can be downloaded and installed by invoking
wget -v -O data/vk.db.xz https://spevktator.io/static/vk_2022-09-04_lite.db.xz then
xz -d data/vk.db.xz.
Otherwise, users can get straight to it and begin scraping their own dataset by first creating their SQLite database and configuring the sentiment analysts module by invoking
spevktator install data/vk.db. Then, users can put the scraper to work against their target by invoking
spevktator [OPTIONS: listen / fetch / backfill] [TARGET DB: data/vk.db] CHANNEL_ID for example
spevktator listen data/vk.db vkusnoitochka.
To use the DeepL translation API to translate scraped content from Russian to English, users will need to set the API by invoking
DEEPL_AUTH_KEY environment variable, then applying
--deepl-auth-key as an additional command line argument. Additionally, users can also use the utility through a proxy by implementing the
SPEVKTATOR_PROXY environment variable, then invoking the
--spevktator-proxy command line argument.
As we said earlier, Spevktator enables user to visualise their scraped data locally. This is done by invoking
datasette data/ in the command line then accessing the web interface via the localhost ip
http://127.0.0.1:8001. Here, users can see the scraped content in a number of different table views. Scraped data includes post IDs, channel / group names, post date timestamps, text, translated text, likes, shares, and views.
Additionally, users can visualise sentiment analysis in a chart interface and conduct keyword searches against collected data. It should also be said that a lot more can be done by the users in this regard by learning a little SQL and using the web interface’s own custom SQL query interface to do a more thorough data drill-down.
Summing it all up
What we really like about this utility is that it conduct passive scraping, no Vkontakte API or username / password combination is required. But this does obviously have limitations that Spevktator’s developers have actively pointed out. For example, the utility cannot scrape from private groups. Additional limitations that should be noted include:
- The tool doesn’t collect comments and other personal information
- Sentiment prediction is only moderately in quality
- Post metrics are only tracked for a limited duration
- Post text longer than 2500 characters are not translated
- The tool has limited error handling and data loss recovery
So, though the tool has obvious limitations, we do like that fact that the developers are forthcoming about them. Equally, a lot of credit should also go to the developers on being very transparent about future developments which include the ability to extract images, videos, and user comments; a more interactive UI where scraping actions can be configured; user authentication for private groups; and other installation options.
Overall, our hats go off to the developers of Spevktator for releasing a tool that shows a huge amount of potential and brings a lot of capabilities to OSINT and disinformation analysts. This is especially the case when taking into account that OSINT tools and scrapers for Vkontakte are limited in quantity when compared against other tools for Instagram, Telegram, and Twitter, for example. So, we at OS2INT will be keeping a very close eye on future releases of Spevktator!