arrow
Search icon

Twitter 15M

This page links to the Twitter data used in the paper “Determinants of Meme Popularity” by James P. Gleeson, Kevin P. O’Sullivan, Raquel A. Baños and Yamir Moreno; please cite this paper if you use the data. All data processing was performed by Raquel A. Baños at Instituto de Biocomputación y Física de Sistemas Complejos (BIFI), Universidad de Zaragoza.

The zip file twitter15M_data.zip (16MB) contains the following files:

  • 1year_hashtags.txt
    Text file, with one line per tweet. Tweets were gathered from March 2011 to March 2012. Each line has the following structure: the first column indicates the number of fields minus 1 (NF-1) in the corresponding line, the second column corresponds to the date when the tweet was posted (time units are days), and columns from 3 to NF are the hashtags used.
  • ccdf_1year_S200
    Folder containing multiple text files. Text file ccdf_axx.txt gives the CCDF for hashtags of age xx days, as used in Figure 3 of the paper.
  • pjk_final.txt
    Text file from the sampling of 8.2E5 random Twitter users (sampled in October 2013). The file structure is as follows:
    • column: number of followers (out-degree)
    • column: number of friends (following, or in-degree)
    • column: frequency