Twayback: Discover and extract deleted Tweets and Twitter user activity
Link to tool: https://github.com/Mennaruuk/twayback
The capability to recover deleted Tweets and Twitter user data is a highly sought-after requirement for digital investigators, investigative journalists, and the like. When we consider how Twitter has become the number one platform used by political figures to communicate key messages, there is undoubtedly a requirement for political journalists, analysts, and commentators to use Twitter-focused OSINT tools that can identify inconsistencies in those messages. Additionally, in several well-reported instances worldwide, journalists have identified historical Tweets posted by politicians that identify instances of anti-Semitic, anti-Islamic, and homophobic speech – a simple Google search will reveal such instances. For law enforcement digital investigators, historical Tweets and Twitter user activity has – on many occasions – been used as evidence in criminal cases, especially those involving hate speech, online harassment, and sexual abuse.
To that end, we will now present ‘Twayback’, a lightweight Python utility that can be used to identify and extract archived deleted Tweets and Twitter user activity. What makes this utility even better for users is that it does not require the Twitter API – instead, it will query the Wayback Machine to identify archived deleted Tweets, then save them as an HTML file. The key capabilities of this tool include:
- Download some or all of a Twitter user’s archived deleted Tweets (of course, this is dependent on whether the Wayback Machine has archived those Tweets!)
- Extract deleted Tweets, retweets, and replies to an HTML file
- Apply a custom time range to allow digital investigators to narrow their search for deleted Tweets between two dates
- Differentiate between accounts that are active, suspended, or deleted
- Indicate whether a target Twitter user’s Tweets have been excluded from the Wayback Machine.
Installing, configuring, and deploying the tool is very straightforward. For Windows OS users, the developers of Twayback provide an executable file that can be used to launch the utility within the command-line interface. Alternatively, users can clone the repository directly from Github and install the tool by invoking the necessary command pip install -r requirements.txt within the command-line interface. And that is it! With the utility now installed, users can use it by invoking the command twayback.py -u USERNAME (e.g. os2int)
. To apply the custom time range, the command should be appended with -from YYMMDD -to YYMMDD
.
The utility will output archived deleted Tweets in a custom folder within the utility’s root directory – each Tweet contained within its own respective folder. Depending on your system, the extracted Tweets may be saved as a generic FILE extension – this can be quite frustrating as that means that the user will have to append all of the extracted Tweets with a .htm
l extension. That said, for what the utility is designed to do – it is simply fantastic.
To put the utility to the test, we ran it against the Twitter accounts for UK Prime Minister Boris Johnson and the leader of the Labour Party Sir Keir Starmer (to highlight our political neutrality of course!). The results of our test revealed a wide range of deleted Tweets from each of the aforementioned individuals spanning several years – for a political journalist, this is very likely to be pure gold dust! It goes without saying, if we wanted to view the archived deleted Tweets from each of the two individuals manually, that would involve a lot of time spent searching through the Wayback Machine and identifying those that have been archived. With Twayback, it does all of the hard work for us by automating the entire process. However, as the developer behind the tool has rightly pointed out, there are some considerations that every user should be aware of:
- The quality of the extracted Tweets can vary drastically depending on how the Wayback Machine has archived them. Of course, this isn’t ideal, though it is most certainly through no fault of the developer.
- Again, depending on how the Wayback Machine has archived the Tweets, you may or may not be able to extract embedded images. Videos most certainly cannot be extracted.
- If a Twitter account is suspended or deleted, this can affect the number of Tweets that can be extracted.
- The custom date range does not reflect when the Tweets were made, but rather when they were archived. This means that a Tweet from 2020 may have only been archived today.
Overall, we at OS2INT really love this tool based on its simplicity with regard to installation and deployment. At the same time, we believe that this tool can save investigators and journalists an incredible amount of time and streamline their workflows quite effectively. On that note, this tool comes highly recommended!