DSC 180 – The Spread of Misinformation Online


Week 04 - Robust Data Collection

Topics

This week, we'll see how to gather a massive data set robustly.

Readings and Tasks

  • Devise a robust strategy for downloading a large data set of tweets, so that if your internet crashes midway through, you don't lose all of your work. The strategy should involve some way of restarting the download in the middle. You do not need to code up your strategy yet, but you should describe it in your answers to the weekly participation.

  • Re-read Anatomy of an online misinformation network up to the "Discussion" section, and answer the following questions:

    • How do the authors handle the problem of URL duplication, where two different URLS point to the same article?
    • How do the authors measure the efficiency of the spread of misinformation from the central nodes out to the periphery of the network?
    • How does the likelihood that a user is a bot change as we move further into the "core" of the Twitter misinformation network?