Digital Scholarship day of ideas: data 
This is the second session of the day I wanted to note in detailed (the first is here). The session it Robert Procter on Big Data and the sociological imagination, Professor of social informatics at the University of Warwick. These notes are written live from the live stream. So here we go:
The title has changed to Big Data and the Co-Production of Social Scientific Knowledge. The talk will explain a bit more on social informatics as a hybrid computer scientist and sociologist; the meaning of ‘big data’ and how academic sociology can use such data including the development of new tools and methods of inquiry – see COSMOS – and concluding with remarks how these elements may combine in an exciting understanding of how social science and technology may emerge through different stakeholders including crow-sourced approach.
Social informatics is inter-disciplinary study of factors that shape adoption of ICT and the social shaping of technology. Processes of innovation involving districted technologies are large in scale and involve diverse range of publics such as understanding social media as processes of large-scale social learning. Asking how social media works and how people can use it to further their aims. As it is public and involves social media makes it easier in many ways to see what is going on as the technology makes much of the data available (although its not entirely straightforward).
Social media is Rob’s primary area of interest. Recent research includes on the use of social media in scholarly communications to put research in the public domain. But the value of this is not entirely clear. Identified positive and negative view points. The research also looked at how academic publishers were responding to such changes in scholarly communications such as supporting the use of social media as well as developing tools to trace and aggregate the use of research data. This showed mixed results.
Another research project was on the use of Twitter use during the 2012 riots in England in conjunction with The Guardian. In particular, was social media important in spreading false information during such events. So the research looked at particular rumours identified in the corpus of Tweets. So how do rumours emerge in social media and how do people behave and respond to such rumours?
This leads to the question of how to analyse 2.5m Tweets which is qualitative data. Research needs to seek out structures and patterns to focus scarce human resources for closer analysis of the Tweets.
Savage and Burrows (2007) on empirical sociology arguing that the best sociology is being done by the commercial sector as they have access to data. Academic sociology becoming irrelevant. However, newer sources of data that provides for enhanced relevance of academic sociology and this is reinforced by the rise of open data initiatives. So we can feel more confident on the future of academic sociology.
But how this data is being used raises further issues such as linking mood in social media with stock market movements but this confuses correlation and causation. Other analysis has focused on challenges to dictatorial regimes and the promotion of democracy and political changes and for social movements to self-organise. Methodological challenges are concerned with dealing with the volume of data so combining computation tools with sociological sensitivity and understanding of the world. But many sociologists are wary of the ‘computational turn’.
Returning to the England riots looking at the rumour of rioters attacking a children’s hospital in Birmingham. This involves an interpretive piece of work focused on data that may provide useful and interesting results. So the rumour started with people reporting police congregating at the hospital and so people inferred that the hospital was under threat. The computational component was to discover a useful structure in the data using sentiment and topic analysis – divided Tweets into original and retweets that combine in to an information flows and some flows are bigger than others. Taking size of the information flow as an indicator of significance can provide an indication for where to focus the analysis. Used coding frames to capture the relevant ways people were responding to the information including accepting and challenging Tweets. This coding was used to visualise how information flows through Twitter. The rumour was initially framed as a possibility but mushroomed and different threads of the rumour emerged. The rumour initially spreads without challenge but later people began to Tweet alternative explanations for the police being her the hospital i.e., a police station is next to the hospital. So rumours do not go unchallenged and people apply common-sense reasoning to rumours. While rumours grow quickly in social media but the crowd sourcing effects of social media help in establishing what the likely truth is. This could be further enhanced through engagement from trusted sources such as news organisations or the police? This could be augmented by computational work to help address such rumour flows (see Pheme).
There is also the question of what the police were doing on Twitter at the time. In Manchester, accounts were created to disseminate what was happening and draw attention to events to the police so acting to inform public services.
This research indicates innovation as a co-production. People collective experimenting and discovering the limitations and benefits of social media. Uses of social media are emergent and shaped through exploration.
On to the development of tools for sociologists to analyse ‘big’ social data including COSMOS to help interrogate large social media data. This also involves linking social media data with other data sets [and so links to the open data]. So COSMOS assists in forging interdisciplinary working between sociologists and computing scientists, provide interoperable analysis tools and evolve capabilities for analysis. In particular, points to the issues of the black-boxing of computational analysis and COSMOS aims to make the computational processes as transparent as possible.
COSMOS tools include text analysis and social network analysis linked to other data sets. A couple of experimental tools are being developed on geolocation and on topic identification and clustering around related words. COSMOS research looking at social media and civil society; hate speech and social media, citizen science, crime sensing; suicide clusters and social media; and the BBC and tweeting the olympics. Points to an educational need for people to understand the public nature of social media especially in relation to hate speech.
Social media as digital agora, on the role of social media in developing civil society and social resilience through sharing information, holding institutions to account, inter-subjective sense-making, cohesion and so forth.
Sociology beyond the academy and the co-production of scientific knowledge. Points to examples such as the Channel 4 fact checker as an example of wider data awareness and understanding and citizen journalism mobilises people to document and disseminate what is going on in the world. Also gives the example of sousveillance of the police as a counter to the rise of the surveillance state. The Guardian’s use of volunteers to analyse MP expenses. So ‘the crowd’ is involved in social science through collecting and analysing data and so sociology is spanning the academy and so boundaries of the academy are becoming more porous. These developments create an opportunity to realise a ‘public sociology’ (Burawoy 2005) but this requires greater facilitation from the academy through engaging with diverse stakeholders, provision of tools, new forms of scholarly communication, training and capacity building and developing more open dialogues on research problems. Points to public lab and hackathons as means for people to engage with and do (social) science themselves.