YouTube-based OSINT, what is possible?
Despite the prominence of content-sharing social media platforms including TikTok, YouTube very much remains an important source of data that can be used for OSINT purposes. Use cases where YouTube data can be leveraged are considerably broad. For example, YouTube remains a space where criminals target underage children for grooming and sexual exploitation.
At the same time, video content relating to awful events taking place in Ukraine is posted daily, such content is valuable for military intelligence operators looking to maintain their situational awareness of events taking place on the ground – especially in illegally occupied areas of the country.
Scratching the surface of YouTube
On the face of it, exploitable data on YouTube appears somewhat limited. For example, when we study the YouTube OSINT Attack Surface Map by OSINT Dojo (a great resource we should add!), we can identify the categories and classifications of data that is present on the surface of YouTube. Visible content includes the media content itself, users / channels associated, video captions and comments (to name but a few).
But, when we scratch the surface, we often find a substantial amount of highly valuable metadata which can include a broad range of video / channel metrics, geolocation data, and time / date stamps. All of this data can be effectively leveraged and used for OSINT purposes, but the question is how can such data be effectively collected.
YARK – YouTube archiving made simple
We have been closely following YARK on Github for some time and decided that now was a great moment to test out this tool. YARK has been created and developed by Liverpool-based developer Owen Griffiths; and from the very start, we are very impressed with what he has created and how it can be used.
This utility has been developed in Python and will enable users to continuously archive all videos and associated metadata from a YouTube channel; then view their archives in a local web-app interface. To batch download video content from YouTube channels, YARK uses the YoutubeDL library – which is great considering that YoutubeDL is a great downloading utility in its own right!
Installation and deployment
The installation process of YARK is surprisingly very straightforward thanks to the instructions provided by Owen on the Github repo page. Quite simply, it was a case of creating a folder for YARK within our toolbox and installing the tool in our virtual environment by invoking pip3 install yark
. To deploy the utility, you should create an archive for the target YouTube channel by invoking yark new [archive name] [YouTube Channel ID]
.
For example, if you invoke yark new vice VICENews
, you will create an archive named ‘vice’ and associate that archive with the Vice News YouTube channel. To begin the download process, users can then invoke yark refresh [archive name]
. This process will store downloaded video content within the YARK root folder.
Visualising your YouTube archives
Unlike the standard YoutubeDL library, YARK enables users to visualise archives of YouTube channels through a web-app interface that is run through localhost port 7667. To run and view the web app, yark view
should be invoked in the command-line interface and localhost:766
opened in your browser of choice. Once opened, users can search open and view archived YouTube channels.
For each video collected by YARK, users can use the web app to visualise a range of information including:
- History: Showing whether any changes have been applied to the video title when the video description was created, whether the video description has been modified and when, and whether any additional changes have been applied.
- Views over time: YARK provides a graphical view to show the number of video views from the point that the archive containing the video was created.
As an added feature, YARK also has the capability to allow users to attach notes to collected videos.
Refreshing YouTube archives
One reason why we appreciate the work that Owen has put into developing YARK is the fact that he has rightly indicated that automatic archive refreshing is not yet a feature in YARK. That said, he has rightfully pointed out that users can implement a cron job to automate the process of updating their YouTube archives. Nevertheless, for now, at least, users can manually refresh their YouTube archives by invoking yark refresh [archive name]
. By doing so, this process will update video metrics and the graph chart showing video views over time.
On a final note!
Owen has done a great job with YARK. We spent a considerable amount of time testing out YARK on several channels associated with video content concerning events taking place in occupied eastern Ukraine. This enabled us to batch-scrape a considerably large volume of video content showing military activity taking place on the ground. Additionally, we could see a broad range of useful information concerning video views and changes to video titles. This information was also quite useful when analysing videos containing pro-invasion and pro-Russian disinformation.
So, for OSINT’ers looking for a very elegant YouTube channel scraping and visualisation utility, YARK is most certainly a tool that everyone should consider. But, what we like is Owen’s approach to the development of this tool and that he welcomes any feedback and feature suggestions. Naturally, many OSINT’ers will have their own specific features that they would like to see implemented in YARK. Certainly, we believe that YARK can also be combined with other YouTube-based OSINT tools to provide users with the capability to scrape and export user comments in CSV in addition to geolocation data. However, such features would very likely require YouTube Data API access. Nevertheless, even in its current form, YARK is a really elegant and effective tool for OSINT’ers looking to maintain an archive of YouTube content relating to their subject matter areas. So, what more can we say other than “great work Owen!”.