Chan Scraper: Extracting media from 2ch.hk and 4Chan imageboards
Link to tool: https://github.com/m3tro1d/chan-scraper
In one of our previous OSINT Tool Review articles, we introduced a scraper which has the capability to extract user posts and media from the 4Chan imageboard. In the same article, we discussed several reasons as to why 4Chan remains a hub for political extremism, disinformation, and CSAM criminality. However, 4Chan is certainly not unique in this regard; and as stated by one of our esteemed readers – a well-renowned independent CSAM researcher – “CSAM criminals count on the ‘hard-to-surf-Chans’ structure to protect what is often meaningful intelligence“. Indeed, her comment rightfully points out that investigations on Chan imageboards should not be under-looked.
Our previous article also pointed out that OSINT tools for Chan imageboards are predominantly limited in quantity – especially with regards to the lesser-known Chans which are often found to host a high volume of criminal users. The 2ch.hk imageboard – also referred to as ‘Dvach’ – is Russia’s largest anonymous imageboard site that is well known for cyberbullying, misogyny, and toxic trolling. However, during our research, we also discovered CSAM being distributed across several threads.
So, if we (as Digital Investigators) want to scrape media content from 2ch.hk and 4Chan and use such content for the purpose of building an intelligence picture, the ideal tool that we should use is ‘Chan Scraper’, a lightweight Python script that is capable of downloading attachments (images, videos, or both) from individual threads. Downloading the tool is incredibly easy – as is deploying the tool by invoking the command
python chan-scraper.py within the same argument, the user can specify the output directory, type of content to download, and the target imageboard thread. Additionally, the tool has the capability to scrape from more than one thread at the same time and can also be adapted for other imageboards by implementing some tweaks to the script itself.
The overall performance and output of this tool is impressive. During our tests, we sought to extract a range of media content associated with Russian right-wing extremists associated with the now-disbanded National Socialist Society (NSO). The content extracted included several high-quality images of armed individuals in addition to other graphical content depicting extreme politically-motivated violence. However, without implementing some corrections to the script, it is unable to scrape from archived threads. Additionally, the script is not able to scrape entire imageboard catalogs.
Like our previous article, we sought to see what intelligence we could extract from the images that we scraped. Again, we turned to Paliscope YOSE and its awesome AI capabilities to index and analyse all of the images – the results were once again fantastic. Within a few mouse clicks, we had categorised our images according to their content; it should be pointed out that the main categories of images discovered included weapons, violence, pornography and CSAM. We went even further and used YOSE’s imagery analysis capabilities to extract several weapon serial numbers and Russian ID cards. And lastly, we mapped the links between many Russian and European right-wing extremists.
To bring this article to a conclusion, we will end by saying that Chan Scraper is a very useful tool for Digital Investigators to scrape media content from 2ch.hk and 4Chan. However, as great as this tool is for collecting data from 2ch.hk and 4Chan – Digital Investigators should combine this script with an analysis tool that enables them to categorise collected media and connect the dots between individuals of interest. All in all, Chan Scraper and Paliscope YOSE makes an excellent combination!