Instant Data Scraper: Scrape social media data
In this latest OSINT Tool Review, we will begin by pointing out that Twitter has implemented limits to the amount of data that can be scraped through the use of the Twitter API. Consequently, these limits present significant issues for Digital Investigators and Intelligence Analysts involved in the identification and analysis of suspected Russian disinformation actors. This is especially the case when considering the ongoing military situation in Ukraine and heightened geopolitical tensions. But, we at OS2INT will demonstrate how we overcame these limitations by using Instant Data Scraper to collect data from Twitter profiles suspected to be involved in the dissemination of disinformation.
What is Instant Data Scraper?
This is not the first time we have introduced Instant Data Scraper to our readers. In fact, we demonstrated in a previous OSINT Workflow article how this tool can be used to scrape friends lists from Facebook. So, to reacquaint our readers with this tool, Instant Data Scraper is a Google Chrome Extension developed by Web Robots. The tool is an automated data extraction tool for any website. It uses AI to predict which data is most relevant on an HTML page and allows saving it to Excel or CSV files (XLS, XLSX, CSV). This tool does not require website specific scripts, instead, it uses applies AI to analyse the HTML structure of websites to detect data for extraction. If the prediction is not satisfactory, the user can customise the selections for greater accuracy. Additionally, the tool comes pre-compiled with an ‘infinite scroll’ capability that allows users to acquire HTML data that is auto-loaded when the user reaches the bottom of a web page screen.
Installation and deployment
As Instant Data Scraper is a Google Chrome Extension, it only needs to be downloaded and installed via the Google Chrome Web Store. Though we must stress that the tool is 100% free to use. Deploying the tool is just as easy – the user must simply navigate to the page they wish to scrape from and initiate the tool by pressing on the Instant Data Scraper icon from the Chrome browser toolbar.
What social media platforms can it scrape from?
The list of websites and social media platforms that Instant Data Scraper is compatible with is extremely long. However, we must say that we have spent a considerable amount of time using the tool to scrape from:
- Telegram (Web App)
However, we have found that the tool is not well-suited to be used on LinkedIn, most likely due to the platform’s HTML structure and markup. However, given that the tool has been developed to provide the user with as much control over the type of data to be scraped, it is possible to configure the tool to detect specific HTML elements on any target page.
Identifying suspected disinformation actors
Most certainly, this was a very easy accomplishment when taking into account the current military situation in Ukraine. Additionally, the use of Twitter by pro-Russian disinformation actors is very well-established. To identify suspected disinformation actors, we understood that the notorious letter ‘Z’ has become synonymous as a sign of support for the invasion – this symbolic alphanumeric character was the basis for our basic search for disinformation actors. Immediately, we found a substantial number of pro-Russian and pro-invasion Twitter accounts. On closer inspection, the majority of these accounts were publishing and circulating disinformation.
Scraping Twitter follower lists
As we earlier pointed out, Twitter has implemented a range of measures to limit the amount of data that can be scraped via its API. These limits affect the amount of media, tweets and follower data that can be extracted. However, taking into account that Instant Data Scraper does not rely on the Twitter API to scrape data, we can use this tool to extract follower lists from the Twitter profiles associated with a range of suspected disinformation actors. To do this, we simply navigated to the follower list of each target profile and initiated the scraper by clicking on the Instant Data Scraper icon located in the Chrome extensions toolbar (top right-hand-side).
When initiated, the Instant Data Scraper window will open. In this window, we must now select the ‘Infinite Scroll’ checkbox. We use Infinite Scroll because every follower list on Twitter is built with a feature called ‘lazy load’, what this means is that Twitter will defer the initialisation of an object (such as followers) until the point at which it is needed. With the Infinite Scroll checkbox selected, we can now set our minimum and maximum delay values – these values will not only help us to bypass rate-limiting mechanisms, but they will also ensure a more accurate crawl and scrape of the target follower lists. In our case, we set the minimum delay value as ‘3’ and the maximum delay value as ’20’. At this point, we selected the blue box labelled ‘Start Crawling’. Once the crawl and scrape process has finished, we opted to download the follower list as a CSV. We then proceeded to repeat the same process across several additional Twitter profiles we suspected to be involved in the spreading of disinformation.
Analysing the data
With our scraped lists of Twitter profiles suspected to be involved in spreading pro-Russian disinformation, we chose to process that same data within a link chart. This would allow us to visualise how each of the Twitter profiles is connected and where those connections intersect. To achieve this, we can combine each of our scraped follower lists into a ‘Node’ and ‘Edge’ list, then visualise the scraped data using Gephi. Instructions on how to process scraped data and visualise it using Gephi can be found in our previous OSINT Workflow Article.
However, in our case, we chose to upload our data into Paliscope YOSE by simply dragging and dropping our CSV files containing the scraped data into the YOSE database. The result – as you can see from the image above – shows that we produced an extensive link graph showing how each of the suspected disinformation actors was connected and where those connections intersect. The visual intelligence produced by YOSE now enables us to identify additional disinformation actors who may be of intelligence interest.
To bring this article to a conclusion, we will conclude by saying that Instant Data Scraper is a highly effective tool that delivers outstanding results. Whilst it does not have the automation capabilities that are normally found in Python-based scripts, it does have the ability to provide any user with a simple and effective way to extract data from a variety of web pages and social media platforms. In our case, we used Instant Data Scraper to bypass limitations associated with the Twitter API to scrape data from suspected pro-Russian and pro-invasion disinformation actors. From this article, we have also found the basis to produce a wider workflow focused on disinformation actors operating on Twitter…watch this space!