OSINT Toolbox Talk: Creating local copies of web pages, extracting 4Chan content, and scraping Instagram location data

OSINT Tool Review

Scraping targeted data from Pastebin with 'Pasta'


Scraping targeted data from Pastebin with 'Pasta' Scraping targeted data from Pastebin with 'Pasta' https://github.com/Kr0ff/Pasta

Pastebin is an essential source of valuable information for Digital Investigators as it is often used to ‘dump’ leaked information ranging from details of alleged CSAM criminals in addition to breached user credentials. Additionally, in an interesting research paper, named authors Adrian Riesco, Eduardo Fidalgo, Mhd Wesam Al-Nabki, Francisco,Janez-Martino, and Enrique Alegre rightly pointed out that PasteBin is used CSAM criminals to distribute links to CSAM content. Until April 2020, Digital Investigators had the capability of using PasteBin’s native search capability to conduct their own targeted searches for illegal content. However, as a result of alleged ‘abuse’ undertaken by PasteBin users, this search capability was disabled. Thereafter, Digital Investigators had no choice but to conduct Google searches for pastes by applying the ‘site’ Google Dark technique. However, this method is highly unreliable as it very much depends on whether Google has indexed certain pastes of investigative interest; even so, the indexing of PasteBin content could – in theory – take up to several weeks.

One method that Digital Investigators can use to overcome the aforementioned challenges is to scrape content directly from PasteBin and then analyse extracted content. More effective scraping can be conducted by using the PasteBin API, though it should be noted that access to this API comes at a cost. However, there is a highly effective tool that can scrape content from PasteBin without the need for an API – that tool is called ‘Pasta’ – the subject of this latest OSINT Tool Review from the OS2INT team.

So, what is ‘Pasta’? It is another Python utility that is ultra-lightweight in composition that can be installed quite simply by invoking pip install -r requirements.txt via the command-line interface. Thereafter, Pasta can be deployed immediately by selecting the relevant options from the command-line interface. The utility is highly flexible as it offers the following core capabilities:

  • Generate random eight-character-long strings similar to those identifying real PasteBin entries
  • Use randomly generated strings to bruteforce PasteBin for possible hidden valid pastes
  • View the contents of PasteBin entries
  • Scrape the most recent archive of PasteBin
  • Search for sensitive information from downloaded pastes including emails, usernames, and IP addresses
  • Scrape all pastes from a user account

Overall, Pasta is a very flexible and fast scraper that produces some excellent results in the form of extracted pastes which are locally saved as .txt files. Most certainly, this tool can be used by Digital Investigators to continuously monitor PasteBin in order to identify potential criminal activity including the dumping of leaked user credentials, CSAM link sharing, in addition to information that exposes alleged CSAM criminals. However, we should stress that the latter form of information is attributed to vigilantism and should not be considered as evidence in any way, shape, or form.

So, Pasta is undoubtedly a fantastic resource for Digital Investigators to scrape data from PasteBin. During our test of the tool, we were able to extract over 2000 individual pastes, the contents of each varied considerably. Analysing this data manually would take an extraordinary amount of time and resources to the extent where the question would inevitably be asked as to whether scraping PasteBin data is simply worth the time and effort. However, we overcame this issue by processing the data through YOSE [Your Own Search Engine] by Paliscope. Using this tool, we were able to visualise all of the collected data and how they were connected. Even better, we used YOSE’s super-charged AI to visualise instances where references to CSAM was contained within the data and connect that same data to individual usernames.

All-in-all, Pasta is a must-have tool for Digital Investigators that require the capability to scrape content from PasteBin, and it most certainly comes with our highest recommendation. However, it must be remembered that scraped data alone does not hold any intelligence value unless it is processed and analysed accordingly. To achieve this task, we highly recommend the use of an AI-driven analysis solution such as Paliscope YOSE.


Let's talk today Are you ready to begin discussing our range of training and capability development solutions?