DSC 180 – The Spread of Misinformation Online
Weeks 09 and 10 - k-Core Decomposition
Reading
We will be using the concept of a k-core decomposition of a graph.
It might be helpful to re-read the definition of a k-core on wikipedia or
elsewhere. It will also be useful to read the networkx
documentation
for its k_core
algorithm.
Tasks
In all of the following, you will need to construct a graph of twitter users. In this graph, place an edge between users A and B if A retweets B. We will compute k-cores of this graph. You will need to make several decisions -- first, you may or may not need to subsample your data again in order to compute k-cores in a reasonable amount of time. Second, for tasks that require you to compute a sequence of k-cores, you may choose a reasonable interval between subsequent values of i; for example, k = 5, 10, 15, etc. There is no correct choice in either case, but you should document and explain your choice in your final report.
- At what value of k is the main core found? In the original paper, this was k = 50.
- Given any set of Twitter users, we can compute the proportion of their retweets that link to fact-checking sites. Perform this computation for all users in each core in a sequence of k-cores and plot the trend. That is, try to replicate Figure 5 from the paper.
- Compute the average number of tweets per user in each k-core in a sequence of k-cores and plot the trend.