Notes on a presentation by Pablo Paredes. The abstract for the seminar is:
This presentation will be about how to make social network analysis from social media services such as Facebook and Twitter. Although traditional SNA packages are able to analyse data from any source, the volume of data from these new services can make convenient the use of additional technologies. The case in the presentation will be about a study of the degrees of distance on Twitter, considering different steps as making use of streaming API, filtering and computing results.
The presentation is drawn from the paper: Fabrega, J. Paredes, P. (2013) Social Contagion and Cascade behaviours on Twitter. Information 4/2: 171-181.
These are my brief and partial notes on the seminar taken live (so “typos ahead!”).
Looking at gathering data from social network sites and on a research project on contagion in digital data.
Data access requires knowledge of the APIs for each platform but Apigee details the APIs of most social networks (although as an intermediary, this may lead to further issues in interfacing different software tools, e.g., Python tool kits may assist in accessing APIs directly rather than through Apigee). In their research, Twitter data was extracted using Python tools such as Tweepy (calls to Twitter) and NetworkX (a Python library for SNA) along with additional libraries including Apigee. These tools allow the investigation of different forms of SNA beyond ego-centric analysis.
Pablo presented a network diagram from Twitter using NodeXL as ego-networks but direct access to Twitter API would give more options in alternative network analysis . Diffusion of information on Twitter was not possible on NodeXL.
Used three degrees of influence theory from Christakes & Fowler 2008. Social influence diffuses to three degrees but not beyond due to noisy communication and technology/ time issues leading to information decay. For example, most RTs take place within 48 hrs so tends not to extend beyond a friends, friends friend! This relates to network instability and loss of interest from users beyond three degrees alongside increasing information competition as too intense beyond three degrees to diffusion decomposes.
The direct research found a 3-5% RT rate in diffusion of a single Tweet. RT rates were higher with the use of a hashtag and correlate to the number of followers of the originator but negatively correlates to @_mentions in the original Tweet. This is possibly as a result of @_mentions being seen as a private conversations. Overall, less than 1% of RTs went beyond three degrees.
Conclusion is that diffusion in digital networks is similar to that found in physical networks which implies that there are human barriers to communication in online spaces. But the research is limited due to the limits on access to Twitter API as well as privacy policies on Twitter API. Replicability becomes very difficult as a result and this issue is compounded as API versions change and so software libraries and tools no longer work or no longer work in the same way. Worth noting that there is no way of knowing how Twitter samples the 1% of Tweets provided through the API. Therefore, there is a need to access 100% of the Twitter data to provide a clear baseline for understanding Twitter samples and justify the network boundaries.
Points to importance that were writing code using R/ Python preferable as easier to learn and with larger support communities.