PREPARING OUR TOOLS
For this activity that we are bout to show, you will need the following tools:
- A Telegram account – preferably created with a burner telephone number with API access obtained. Also, you will need to be a member of your target groups.
- Telegram Scraper, a lightweight Python-based tool that is used to scrape user data from groups and create CSV lists of users within specific groups: https://github.com/TechRahul20/TelegramScraper
- Microsoft Excel, to be used to create ‘Node’ and ‘Edge’ lists
- Gephi, an open-source data visualisation tool that will enable us to visualise user group lists
PREPARING TELEGRAM
The first thing you need to do on your Telegram account is to obtain API access, this can be done by logging into https://my.telegram.org/apps, registering an App, and then taking note of your API ID and API Hash. Both of these values will need to be registered into the Telegram Scraper tool very shortly.
PREPARING TELEGRAM SCRAPER
Installing and deploying Telegram Scraper is very easy. Once you have installed the script from Github and stored it within your chosen directory, you can then use your Command-Line Interface to install the prerequisite packages by invoking pip install pandas
, then pip install telethon
.
Now, it is time to set up Telegram Scanner by invoking the following command, python setup.py
. Invoking this command will walk you through the Telegram API login and prepare the toolkit with your details.
SCRAPING TELEGRAM GROUPS
Telegram Scraper should immediately launch after the setup has been completed. By now, you should be a member of your target groups – otherwise, you will be unable to scrape the required data. If Telegram Scraper has not launched by this point, you can deploy it by invoking the command python launcher.py
. Telegram Scanner provides three capabilities, here is an overview:
- Scrape group members Scrapes all group members from a Telegram group you are part of. Exports as a .CSV containing the username (when available), user id, name, group name and group ID. The file is named after the group.
- Scrape forwards from chats you are in Scrapes all forwards from a chat you are following. Saves from, from ID, to and to ID to forwards_data.csv. It can then scrape forwards from all the discovered channels for a larger network map. This second feature takes a long time to run, but is worthwhile for a broader analysis.
- Scrape forwards from a channel Scrapes all forwards from any channel you specify. It can then scrape forwards from all the discovered channels for a larger network map. This second feature takes a long time to run.
For what we need to achieve, we will select the first option – ‘Scrape group members
‘. Once you have specified this option, Telegram Scanner will present the list groups that you are a member of; here you can select which group you want to scrape by inputting the corresponding number and hitting the ‘Enter
‘ key. In the case of multiple groups, you can repeat the aforementioned process.
We chose to scrape from several UK-based hard-right groups, simply because there were so many of them and the group list was enormously high.
Once the process has finished, Telegram Scraper will produce a CSV file containing the members of each group. The CSV files will be stored in the tool’s directory; in our case, it was stored in d:/Scripts/Python/TelegramScanner
.
CREATING THE NODE LIST
What we now want to do is create the ‘Node List’ – a ‘Node’ being an individual group member that has been scraped.
-
- Open Microsoft Excel and open a blank worksheet.
- Navigate to the ‘
Data
‘ tab and click ‘From Text / CSV
‘ located on the top left-hand side of the ribbon. - In the Import Data dialog box, navigate to where user lists have been saved and select one, then click ‘
Open
‘. - A new window will open in Microsoft Excel, here we can specify the File Origin of the CSV – in our case, we should select the drop-down list and select ‘
65001: Unicode (UTF-8)
‘, and then click ‘Transform Data
‘. - The Power Query Editor window will now open, allowing you to view the CSV file and check for obvious errors.
- If there are none, you can now select ‘
Close & Load
‘ located in the top left-hand side. The CSV will now be loaded into Microsoft Excel.
- In Microsoft Excel, we should now add three new columns before Column A and rename them ‘
ID
‘, ‘Label
‘, and ‘Role
‘. - In the column named ‘
ID
‘, copy and paste the data fromColumn E
(user id
). However, you may notice that pasting these values returns a different value; in this case, you will need to format the cells by selecting the ‘ID
‘ column, right-clicking and selecting ‘Format Cells
‘. In the ‘Format Cells
‘ dialog, choose the ‘Number
‘ category and ensure that the number of decimal places is set to zero ‘0
‘. - In the column named ‘
Label
‘, copy and paste the data fromColumn F
(name
). - In the column named ‘
Role
‘, add the value ‘Member
‘ in each.
Using the same process indicated above, open a new blank worksheet and import the CSV files for other group member lists that you have collected. However, there is no need to add additional columns as we will now copy and paste the data from these CSV’s into our Node List.
We will now stop working on the Node List and create our Edge List – but we will return to our Node List very shortly, just make sure to keep it open!
CREATING THE EDGE LIST
We will now create the ‘Edge’ list – this list defines the links between the group members and the groups.
- Open a blank Microsoft Worksheet alongside the ‘Node List’ that we have created
- Name Column A as ‘
Source
‘, Column B as ‘Target
‘, Column C as ‘Type
‘ and Column D as ‘Weight
‘ - Return to the Node List and copy the data located in Column G (‘
group
‘) and paste it into the Edge List at Column A (‘Source
‘) - Return to the Node List and copy the data located in Column A (‘
ID
‘) and paste it into the Edge List at Column B (‘Target
‘) - In Column C (‘
Type
‘) in the Edge List, give all the cells the value ‘Undirected
‘ - In Column D (‘
Weight
‘) in the Edge List, give all the cells the value ‘1
‘ - Export the Edge List as a CSV by selecting ‘
File
‘, then ‘Export
‘, then ‘Change File Type
‘, then ‘CSV (Comma delimited)
‘, then ‘Save As
‘. In the ‘Save As’ dialog window, save the Edge List to your desired location.
FINALISING THE NODE LIST
Let’s now return to the Node List, we should now have all of the group members contained within this list. We must now apply the finishing touches to this list.
-
- Select Column A (‘ID’) and navigate to the ‘
Home
‘ ribbon tab and select on the ‘Conditional Formatting
‘ drop-down. Hover over ‘Highlight Cell Rules
‘ and select ‘Duplicate Values
‘. - Scroll through the Node List and remove rows containing duplicate values. Alternatively, you can select on the ‘
Data
‘ ribbon tab and use the ‘Remove Duplicates
‘ feature.
- Select Column A (‘ID’) and navigate to the ‘
- With the duplicates now removed, we should now export the Node List as a CSV by selecting ‘
File
‘, then ‘Export
‘, then ‘Change File Type
‘, then ‘CSV (Comma delimited)
‘, then ‘Save As
‘. In the ‘Save As’ dialog window, save the Node List to your desired location.
IMPORTING NODE AND EDGE LISTS INTO GEPHI
The hard work is now done, it is time for us to import our lists. Simply open Gephi and select ‘New Project
‘ when the dialog box appears. Then, select the tab which says ‘Data Laboratory
‘. We are now in the Data Laboratory, this is where we can import our lists.
- Ensure that the ‘
Node
‘ node button is selected, this is located in the ‘Data Table
‘ tab. - Navigate to the button ‘
Import Spreadsheet
‘ and select it. An ‘Open’ dialog box will appear, use this to navigate to your lists and select the ‘Node List’. - The import dialog box will appear, simply click on ‘
Next
‘ and in the proceeding section, select ‘Finish’. - The ‘Import Report’ dialog box should now appear. Select the circular radio button which says ‘
Append to existing workspace
‘. Now select ‘Ok
‘.
The ‘Node List’ should now appear in the ‘Data Laboratory’ window. At this point, we need to repeat the process for the ‘Edge List’.
- Underneath the ‘Data Table’ tab, select the button which says ‘
Edge
‘ – this is located next to a button called ‘Node’. - Navigate to the button ‘
Import Spreadsheet
‘ and select it. An ‘Open’ dialog box will appear, use this to navigate to your lists and select the ‘Edge List’. - The import dialog box will appear, simply click on ‘
Next
‘ and in the proceeding section, select ‘Finish’. - The ‘Import Report’ dialog box should now appear. Select the circular radio button which says ‘
Append to existing workspace
‘. Now select ‘Ok
‘.
That’s it! we have now imported our Node and Edge Lists into Gephi, now it is time to visualise our data.
VISUALISING THE DATA
With the Node and Edge Lists containing our Telegram user data now imported into Gephi, we should now apply several processes to visualise the data.
- Select the button named ‘
Overview
‘ located in the top of Gephi, just below the navigation tabs. This will return you to the main graph area. As you will see, the data is now presented, but it looks rather disorganised … so let’s fix this! - In the Layout dialog box located in the bottom window to the left of the screen, select the drop-down box and choose ‘
ForceAtlas 2
‘ - A series of options should appear, under to the right of ‘
Threads number
‘ input the value ‘15
‘. - Then, check the boxes for ‘
Approximate Repulsion
‘, ‘LinLog mode
‘, and ‘Prevent Overlap
‘ - Now, you can go ahead and click the ‘
Run
‘ button. At this point, you will see the chart visually adjusting itself. But feel free to increase or decrease the scaling of the chart within the ‘Layout’ window if you feel the need to. - Also, directly beneath the chart window, you will see a series of options you can use to turn the Node Labels on and off in addition to widening the Edges (connector lines). Go ahead and use these to set your network chart accordingly.
We can see that our chart is now set, but what we now want to do is apply some settings to make the chart more visually effective. The first we will do is apply distance between the nodes according to their size.
- On the ‘Statistics’ panel located on the right-hand side of Gephi, run the ‘
Network Diameter
‘ function. - A window should appear, ensure that the circular radio button for ‘
Undirected
‘ is selected and then press ‘Ok
‘. - Once Gephi has finished calculating the network diameter, a new window should appear, you can go ahead and select the ‘
Close
‘ button. - In the ‘Appearance’ panel located in the upper left-hand side of Gephi, select ‘
Nodes
‘, then the circular buttons for ‘Size
‘, then select ‘Ranking
‘. - In the drop-down button located directly below, select ‘
Betweeness Centrality
‘. Then, set ‘Min size’ with the value ‘5
‘, and the ‘Max size’ with the value ‘500
‘. Now select the ‘Apply
‘ button. You will now see that the Nodes that correspond with the Telegram groups being re-sized according to the number of members they have.
We will now apply some more settings, this time, we will apply the modularity settings which will colourise the Nodes and Edges by Telegram group.
- On the ‘Statistics’ panel located on the right hand side of Gephi, run the ‘
Modularity
‘ function. - A window should appear, ensure that the ‘
Randomize
‘ and ‘Use
weights’ check-boxes are checked and then press ‘Ok
‘. - Once Gephi has finished applying the modularity settings, a new window should appear, you can go ahead and select the ‘
Close
‘ button. - In the ‘Appearance’ panel located in the upper left-hand side of Gephi, select ‘
Nodes
‘, then the palette icon for ‘Colour
‘, then select ‘Partition
‘. - In the drop-down button located directly below, select ‘
Modularity Class
‘. Then, select the ‘Apply
‘ button. You will now see that the chart has been colourised according to each of the groups.
And that’s it! We have now scraped several Telegram groups, processed members within each of the groups and visualised them.