arrow
Search icon

Error message

Deprecated function: The each() function is deprecated. This message will be suppressed on further calls in _menu_load_objects() (line 579 of /home/stafful/public_html/gleesonj/includes/menu.inc).

Twitter 15M

This page links to the Twitter data used in the paper “Determinants of Meme Popularity” by James P. Gleeson, Kevin P. O’Sullivan, Raquel A. Baños and Yamir Moreno; please cite this paper if you use the data. All data processing was performed by Raquel A. Baños at Instituto de Biocomputación y Física de Sistemas Complejos (BIFI), Universidad de Zaragoza.

The zip file twitter15M_data.zip (16MB) contains the following files:

  • 1year_hashtags.txt
    Text file, with one line per tweet. Tweets were gathered from March 2011 to March 2012. Each line has the following structure: the first column indicates the number of fields minus 1 (NF-1) in the corresponding line, the second column corresponds to the date when the tweet was posted (time units are days), and columns from 3 to NF are the hashtags used.
  • ccdf_1year_S200
    Folder containing multiple text files. Text file ccdf_axx.txt gives the CCDF for hashtags of age xx days, as used in Figure 3 of the paper.
  • pjk_final.txt
    Text file from the sampling of 8.2E5 random Twitter users (sampled in October 2013). The file structure is as follows:
    • column: number of followers (out-degree)
    • column: number of friends (following, or in-degree)
    • column: frequency