Tag Archives: data

ReCon, Research in the 21st Century: Data, Analytics and Impact

So here we are at ReCon, Research in the 21st Century: Data, Analytics and Impact at the University of Edinburgh’s Business School. I’ll be taking notes here throughout the day but these will be partial and picking up main points of interest to me.

The conference is opening with Jo Young from the Scientific Editing Co giving the welcome and introduction to the event.

The first session is from Scott Edmunds from GigaScience on “Beyond Paper”. Has the 350 year old practices of academic publishing had its day and is the advertising of scholarship & formulated around academic clickbait. Taken to extremes, we can see the use of bribery around impact factors, writing papers to order, guaranteed publications etc. This has led to an increase in retractions (x15 in the last decade) so that by 2045 as many paper will be retracted as published and then we’re into negative publishing.
We need to think of new systems of incentives and we now have the infrastructure to do this especially data publishing such as Giga Science provide.
Giga Science has own data publishing repository as well as an open access journal with open and transparent review process. Open data and data publishing is not new and was how Darwin worked through depositing collections in museums and publishing descriptions of finds before the analysis that led to Origin of the Species.
Open data has a moral imperative regarding data on natural disasters, disease outbreaks and so forth. Releasing data leads to sharing of data and analysis of that data for examples on Ecoli Genome analysis. Traditional academic outputs were created but it is also used as an example of the impact of open data. See the Royal Society report here. The crowd sourced approach to genome sequencing is being used in, eg, Ebola, in rice genomes addressing the global food crisis. But publishing of analysis remains slow and needs to be closer to realtime publishing.
So we’re now interesting in executable data looking at the research cycle of interacting data and analysis leading to publications at micro and nano publications that retain DOIs. Alot of this is collected on GitHub.
Also looking at the sharing of workflows using the Galaxy system and again, giving DOIs to particular workflows (see GigaGalaxy), sharing virtual machines (via Amazon).
Through analysis of published papers found how rates of errors but also that replication was very costly.
So the call is “death to the publication, long live the research object” to rewards replication rather than scholarly advertising.

Question: how is the quality of the data assured?
Journal publications are peered reviewed and do checks using own data scientists. While open data is not checked. Tools are available and being developed that will help improve this.

Now on to Arfon Smith from GitHub on Predicting the future of Publishing. Looking at open source software communities for ideas that could inform academic publishing. GitHub is a solution to the issues of version control for collaboration using Git technology. People use GitHub for different things: from single files, through to massive software projects involving 7m + lines of codes. There are about 24m projects on GitHub and is often used by academics.
Will be talking about the publication of software and data rather than papers. Assumptions for the talk are: 1. open is the new normal; 2. the PDF is increasingly unsatisfactory way of sharing research; and 3. we are unprepared to share data and software in useful ways.
GitHub especially being used in data intensive sciences. There is the argument that we are moving in to a new paradigm of sciences beyond computational data in to data intensive sciences (data abundance) & Big Science.
Big Science requires new tools, ways of working and ways of publishing research. But as we become more data intensive, reproducibility declines under traditional publishing. In the biosciences, many methods are black boxed and so it is difficult to really understand the findings – which is not good!
To help, GitHub have a guide on how to cite code by giving a GitHub repository a DOI (via Zenodo) for academics.
From open source practices that are most applicable are:
1. rapid verification, eg, through verification of pull-requests where the community and 3rd party providers undertaking testing or using metrics that check the quality of the code, eg, Code Climate. So verification can and should be automated and open source is “reproducible by necessity”. So in academia we can see the rise of benchmarking services – see for example, Recast or benchmarking algorithm performance.
2. innovation in where there are data challenges by drawing on a culture of reuse around data products to filter out noise in research to enable focus on the specific phenomena of interest (by elimination by data from other analysis)
3. Normal citations are not sufficient for software. Academic environments do not reward tool builders. So there is an idea of distributing credit to authors, tools, data and previous papers. So makes the credit tree transparent and comprehensive.
These innovations depend on the forming of communities around challenges and/ or where open data is available.
Screen Shot 2015-06-19 at 10.38.51
The open software community have developed a number of solutions for the challenges faced in academic publishing.

Now we’ve moved on to Stephanie Dawson, CEO, ScienceOpen on “The Big Picture: Open Access content aggregators as drivers of impact” – which is framed in terms of information overload which is a growth trend that is not going to go away. The is reinforced by an economic advantage open access of publishing more along with increased interest in open data, micro-publications etc At the same time, the science information market is extending to new countries such as India, Brazil & China.
Discovery is largely through search engines, indexing services (Scopus, Web of Science), personal and online networking (conferences, mendeley) and so one. But these do not rank knowledge providing reputation, orientation, context, inspiration.
Current tools: journal impact factor but this is a blunt tool that doesn’t work at the individual paper level but is still perceived as important for academics – and for publishers as pricing correlates to impact factor. Article based tools such as usage and dissemination metrics are common.
There is an opportunity for open access to make access to published papers easier that may undermine publishing paywalls and encourage academics to look to open access channels. But open access publications are about 10% of total and on a lower growth trajectory. So there needs further incentives for academics to support open access publications.
Open Science is an open access communication platform with 1.5m open access articles, social networking and collaboration tools. The platform allows commenting, dissemination, reviewing or ‘liking’ an article. Will develop an approach to enable the ranking of individual articles that can be bundled with others, eg, by platform users, or by publishers [so there is a shift towards alternative and personalised forms of article aggregation that can be shared as collections?].

Question: impact factors can be gamed as can alternative metrics. What is key is the quality of the data used and analysis – metrics for how believable articles are?

We’re looking at how to note reproducibility of article findings but these aren’t always possible so edited collections based are a way forward.

Q: this issue of trust is not about people but should be about the data and analysis and the transparency of these – how the data came about?

So there is a need to rethink how methods sections are written. We’re also enhancing the transparency of the review process.

The final session on this section is Peter Burnhill, Director, EDINA on “Where data and journal content collide: what does it mean to ‘publish your data’?”. Looking at two case studies:
1. project on reference rot (link rot+content drift) to develop ways of archiving the web and capturing how sites/ urls have changed over time. Tracked the growth in web citations in academic articles and found 20%+ of urls are ‘rotten’ and original pages cited have disappeared including from open archives. A remedy is to use reference management software to snapshot and archive web pages at time of capture. The project has developed a Zotero plug-in to do this (see video here).
2. an ongoing project on url preservation by publishers. There are many smaller publishers that are ‘at risk’ of being lost. Considers data as working capital (that can be private as work-in-progress) or as something to be shared.
The idea of open data is not new to science and can be seen in comments on science from the 19th Century.
The web and archiving problematises the issues of fixity and malleability of data.
__________________________________________

We’re back following a brief coffee break.

Next up is Steve Wheeler on “The Future is Open: Education in the digital age”. Will be talking about ‘openness’ and what we do with the content and knowledge that we produce and have available. Publishing is about educating our community and so should be as open as possible and for freely accessible to better educate that community.
Pedagogy comes first and technology are the tools: we don’t want technological determinism. You have to have a purpose in subscribing to a tool – technology is not a silver bullet.
“Meet Student 2.0”: has been using digital tools at six months old onwards. Most of our students are younger than Google! and are immersed in the digital. But I don’t follow the digital natives idea but do see merit in the digital residents and visitor concept from White and Le Cornu.
Teachers fearing technology: 1 how to make it work; 2 how to avoid looking like an idiot; 3. they ‘ll know more then me. For learners the concerns are about access to WiFi and power. Uses the example of the floppy disk recognised as the save icon but not as a storage device.
Students in lectures with laptops as ‘windows on the world’ to check on and expand on what is being presented too them. But what do these windows do: find information, engage in conversations. Another example is asking about a text on Twitter leads to a response directly from the author of that text. UNESCO talks about communities of users (2002).
Openness is based on the premise of sharing and becomes more prominent as technology makes sharing possible at scale. mentions Martin Weller’s Battle for Open and how openness as an idea has ‘won’ but implementation still has a lot stil to do.
Community is key based on common interest rather than proximity – as communities of practice and of interest. Online, en masse reduces the scope for anonymity and drives towards open scholarship where the academic opens themselves up for constructive criticism. Everything can be collaborative if we want it to be.
Celebration, connection, collaboration and communication all goes into User Generated Content (UGC). Defines UGC as having *not* been through peer review but there is peer review through blog comments, Wikipedia, Twitter conversations. Notes Wikipedia as the largest human Rhizomatic structure in the world.
Moving on to CopyLeft and the Creative Commons. Rheingold on networking as a key literacy of the 21st Century in terms of amplifying your content and knowledge.
Communities of Learning and professional learning networks – with a nod to six degrees of separation but thinks it is down to two to three degrees as we can network to people much easier. Collaborative Open Networks where information is counted as knowledge if it is useful to the community. David Cormier (2007) on Rhizomatic knowledge that has no core or centre and the connections become more important than the knowledge. Knowledge comes out of the processes of working together. This can be contrasted with the closed nature of the LMS/ VLE and students will shift as much as possible to their personal learning environments.
Have to mention MOOCS ad the original cMOOCs were very much about opening content on a massive scale and led by students. The xMOOC has closed and boxed the concept and generating accusations of a shallow learning experience.
Open access publishing. Gives the example of two papers of his, one was in an open access journal that underwent open peer review. The original paper, the reviewer comments, the response and the final paper were published – open publishing at its best!But the other paper was to a closed journal and took three years to publish – the open journal took five months. The closed journal paper has 27 citations against 1023 for the open journal.
Open publishing amplifies your content, eg, the interactions generated through sharing content on SlideShare. His blog has about 100k readers a month and is another form of publication and all available under Creative Commons.
This is about adaptation to make our research and knowledge more available and more impactful.

Question: how are universities responding to openness.
It depends on the universities’ business model – cites the freemium model with a basic ‘product’ being available for free. In the example of FutureLearn is giving away partner content for free with either paid for certification or as a way of enhancing recruitment to mainstream courses.

Now time for lunch
______________________________________________________

Now back and looking at measuring impact with Euan Adie from altmetric
Using the idea of impact of research is about making a difference. Impact include quality: rigour, significance, original, replicable
attention: the right people see it
impact: makes a difference in terms of social, economic, cultural benefits.

REF impact is assessed on quality and impact. A ‘high impact journal’ assumes the journal is of quality and the right people see it (attention).

Impact is increasingly important in research funding across the world. And it is important to look at impact.

Traditional citations counts measure attention – scholars reading scholarship.

Altmetrics manifesto – acknowledge that research is available and used online then we can capture some measures of attention and impact (not quality). This tends to look at non-academic attention through blog posts and comments, Tweets, newspapers; and impact on policy-makers. But what this gives is data but a human has to interpret it and put it in to context via narrative.

Anna Clements on the university library at St Andrews University. What are the policy drivers for the focus on data: research assessments, open access requirements (HEFCE, RCUK) and research data management policies (EPSRC, 2015). Which required HE to focus on the quality of research data with a view to REF2020, asset exploitation, promotion and reputation and managing research income – as well as student demand/ expectations especially following the increase in fees. So libraries are taking lead in institutional data science within the context of financial constraints and ROI and working with academics.
Developing metrics jointly with other HEIs as snowball metrics involving UK, US and ANZ as well as publishers and the metrics are open and free to use.

Kaveh Bazargan from River Valley Technologies on “Letting go of 350 years’ legacy – painful but necessary”. The company specialises in typesetting heavy maths texts. But has more recently developed publishing platforms.

Digital Scholarship day of ideas: data

Live notes from the day.

Starting the day with Dorothy Meill, Head of CHSS introducing the third annual day of ideas as a forum for those interested in digital scholarship across the University and College. Today has a mixture of internal and external speakers. Also mentions the other digital HSS activities including the website and the other events listed there.
Todays’ focus is on data as a contested but popular term. What does it mean for HSS and what traction does it have in the humanities and what currency does big data have for humanities and what are the implications for the computational turn for digital scholarship?
The event is being streamed on the website and the presentations will be posted there later.

Sian Bayne introducing Annette Markham as a theorist of the internet and is currently at Aarhus University. Her focus is on ethnographic research and the ethics of online research. She also has a good line on paper titles.

Annette Markham asking “‘Data’ what does that mean anyway?”. For the last five years or so she has been particularly pushing at thinking about method to better represent the complexities of 21st Century life. She works with STS, informatics, ethnographers, social scientists, linguists, machine learning scholars etc. The presentation is based on a series of workshops published in First Monday special issues October 2013.
Annette argues that we need to be careful about using the term data as it assumes we’re all meaning the same term. Taking a post-humanist perspective or at least non-positivist stance. It is our repsonsibility to critique the word “data”. For other researchers, data and big data are terms that seem unproblematic.

Annette is providing an overview of the debates on data and a provocation to start the day. Asking what does method mean for our forms of inquiry requires ‘method’ to be looked at sideways or from above and below ‘method’ to take account of  the epistemological and political conditions for inquiry. Such conditions include funding constraints and demands around, for example, developing evidence bases and requirements for the archiving of data. But the latter is problematic in terms of capturing and tracing ethnographic research and ‘data’. Also look below ‘method’ in terms of the practices of inquiry that involve the gathering and analysing of data as well as the practices of “writing up”.
The notion of framing inquiry (Goffman) involves drawing attention to some things and excludes others – those outside the frame. Changes the frame changes the focus of inquiry and perspective of the phenomenon. Different images such as frames, a globe/ sphere, a cluster of connected nodes (sociogram or DNA) are used to critique the notion of a ‘frame’. A frame guides our view of the world but is often invisible until it is disrupted. So it is important to make the frame visible.
The term data acts as a frame but is highly ambiguous yet is often perceived as being universally understood, eg, not visible as a framing mechanism.
How are our research sensibilities being framed? To understand the question, we need to ask how are we framing culture, objects and processes of analysis and how do we frame legitimate inquiry. Culture is framed in internet studies through the changes due to the internet, as a networked culture but also how our understanding of the internet as embodied informational spaces. Interfaces developed from an interest in architecturalised spaces towards standardised interfaces to simplification as represented by Google. This is linked to the rise of commercial interests in the internet. the frame of objects and processes of inquiries has not changed much and not changed sufficiently. Inquiry involves entangled processes of social interaction online yet methods remain largely based on 19th century practices. Research models are generally based on linear processes (deductive) which acts to value linear research over messy and complex. We are still expected to draw conclusions for example. The framing of legitimate inquiry has gone backwards from the feminist work on situated knowledge and practice in the 1960s towards evidence and solutions based practices.
So what is data? An easy term to toss around to cover a lot of stuff. It is a vague term and arguably powerful rhetorical term shaping what we see and think. The term comes from 18th century sciences and popularised via translation of scientific works. As a term, ‘data’ was used as preceding an argument or analysis so data is unarguable and pre-existing – it has an “itness”. Data cannot be falsified. Data as a term refers to what a research seeks and needs for inquiry. Yet there are alternative sociological approaches involving the collection of ‘material’ to construct ‘data’ through practices of interpretation. So a very different meaning from ‘data’ as more widely used.
Refers to boyd and Crawfords 2011 provocations on big data and Baym’s 2013 work arguing that all social metrics are partial and non-representative and thereis ambiguity involved in decontextualising material from its context.
Technology now pervasive in everyday life as repsented in a Galaxy S II advert. Experiences are flattened and equalised with everything else and than flattened again as informational bits that can be diffused shared through technology.
Humans as data argument. She has nothing against data and computational analysis as such analysis is important and powerful. But wants to critique the idea that data speaks for itself and that human interaction with technology produces just data. Not all human experience is reduceable to data points. Data is never raw, it is always filtered and framed. Data operates in a larger framework of inquiry and other frameworks of inquiry exist that do not focus on data. Rather, this is inquiry that is focused on the analysis of phenomena involving play around with understanding that phenomena (not data).
Data functions powerfully as a term and acts as a frame on inquiry and this should be subject critique. Inquiry can and should be playful and generative in its entanglements with ‘the world’.

Q: what is the alternative to data? What is human experience reducible to?
A: that’s not the key question. We don’t want to think in terms of reduction which is how data generally frames inquiry.

 

This talk was followed by a fascinating use of crowd-sourced data coding by Prof Ken Benoit. This included completing an analysis of a UKIP manifesto during the course of the talk via cloudflower.

Digital Scholarship: day of ideas 2

I’m listening now to Tara McPherson on humanities research in a networked world as the opening session of the Digital Scholarship day of ideas. (I’ve started late due to a change in the start-time).

Discussing how large data sets can be presented in a variety of interfaces: for schools; researchers; publishers and only now beginning to realise the variety of modes of presenting information across all discipline areas. But humanities scholars are not trained in tool building but should engage in that tool building drawing on their historic work on text, embodiment etc. and points to working with artistis on such interpretive tool building – see Mukurtu an archive platform design by an anthropogist based on work with indigenous people in Australia. Tools allow indigenous people to control access to knowledge according to their knowledge exchange protocols.

Open ended group create immersive 3D spaces but is not designed to be realistic but engaging. More usually found in an experimental art gallery. Also showing an example of a project of audio recordings of interviews with drug users at a needle exchanges.

Vectors is a journal examining these sorts of interactive and immersive experiences and research. Involves ‘papers’ that interact, mutate and change which challenges the notion of scholarship as stable. Interactive experiences are developed in collaboration with scholars in  a long iterative process that is not particularly scaleable.

The develop of a tool-building process was a reaction on problematising interaction with data-sets. Example of HyperCities extending google maps across space and time.

The Alliance for networking Visual Culture including universities and publishers working together, reconsider scales of scholarship and using material from visual archives. Process starts with the development of prototypes. Scalar emerged from Vectors work as a publishing platform for scholars using visual materials. Allows scholars to explore multiple views of visual materials linked to archives and associated online materials linked to critical commons (under US ‘fair use’ allowing legal use of commercial material). Scalar allows a high level of interactivity with the material of (virtual) books and learning materials.

Aim to expand proces of scholarly production and to rethink education. For example, USC has a new PhD programme in media studies in which PhD students make (rather than write) a dissertation- see Take Action Games as an example.

Thinking about scholarly practice in an era of big data and archives: valuing openness; thinking of users as co-creators; assume multiple front-ends/ interfaces; scales scholarship from micro to macro; learning from experiment and artistic practices; engaging designers and information architects; value and reward collaboration acros skills sets.

Scalar treats all items in a data-set as at the same ‘level’ so affording alternative and different ways of examining and interacting with the data.

USC School of Cinematic Arts has a long history of the use of multi-media in assessment practices and the development of criteria. Have also developed guidance on the evaluation of digital scholarship for appointment and tenure. The key issue here has been in dealing with issues of attribution in collaborative production.

…………..

Now moved on to the next sessions of the day with Jeremy Knox who is research open education and questioning the current calls for restructuring higher education about autonomous learning  and developing a critique of the open education movement. He is discussing data collection on MOOCS in terms of

  • Space
  • Objectives of education
  • Bodies and how the human body might be involved in online education

Starts with discussing what a MOOC is as free; delivered online and massive. Delivered via universities on platforms provided through main players such as Udacity, Coursera and edX.

Most MOOCs involved video lectures and quizes supported by discussion forum and assessed through an automatic process (often multi-choice quizes) due to the number of students.

Data collection in MOOCs as example of big data in education allowing learning analytics to optimise the educational experience including through personalisation of the educational experience.

Data collected specifically from the MOOC platforms. edX claiming to use data to inform both their MOOC delivery but also to inform development of the campus based progress at MIT

Space – where is the MOOC? edX website includes images of campus students congregating around the bricks and mortar of the university. Coursera makes use of many images of physical campus buildings. Also many images of where students are from through images of the globe – see here

Metaphor of the space of the MOOc is both local and global.

Taught on one of the six MOOCs delivered by University of Edinburgh. Students often used visual metaphors of space in their experience fo the MOOC – network spaces, flows and spaces of confusion. Also the space metaphor used by instructors in delivering MOOCs such as in video tours of spaces. The instructors seeking to project the campus building as the ‘space of the MOOC’ and this impacts on the student experience of the MOOC. The buildings may have agency

What else might have agency in the experience of education? For example, book as a key ‘tool’ of education. Developed a RFID system so that tagged books send a Tweet with a random sentence from the book when placed on a book-stand/ sensor as a playful way of collecting data. So twitter streams include tweets from students/ people and books.

Another example is of YouTube recommended videos recontextualises video with other videos as a mesh of videos and algorithms.

The body in the MOOCs? Is taken in to account through Signature Track that uses the body to tract the individual student.  Now showing a Kinect sensor to analyse how body position changes interaction with a MOOC course which allows the body to intervene and impact on the course space.

How does the body of the teacher be other than the body of external gaze?

……….

Now moving to a Skyped session with Sophia Lycouris Reader in Digital Choreography at Edinburgh College of Art and is working on research in using haptic technologies to enable people with impaired sight to experience live dance performance – see here. A prototype has been developed to allow users to experience some movements of the dance through vibrations. Again, uses a Kinect.

The project explores the relationship between arts and humanities and innovations in digital technology as trans-disciplinary alongside accessing and experiencing forms of performing arts. In particular, interested in how technologies changes the practice itself and how arts practice can drive technological change (not just respond to it).

The Kinect senses movement which is transformed in to vibrations in a pad held by the participant.

Discussing some problems as Microsoft now limiting code changes needed for the project.

The device does not translate dance but does provide an alternative experience equivalent to seeing the dance. The haptic device becomes a performance space in its own right that is not necessarily similar to a visual experience. So the visual landscape of a performance becomes a haptic landscape to be explored by the wandering fingers of blind users.

The project is part of a number of projects around the world looking at kinesthetic empathy.

Question on what models are being used to investigate the intersection of the human and the digital? Sophia focuses on using the technology as a choreographic medium and away from the dancing body. Jeremy’s research underpinned by theories of post-humanism that decentres the human: socio-materialism; Actor Network Theory and spacial theory.

…………

Now on to Mariza Dima on design-led knowledge exchange with creative industries in the Moving Targets project. Focusing today on the methodological approach to knowledge exchange.

Moving targets is a three year project funded by SFc for creative industries in Scotland including sector intermediaries and universities to involve audiences in collaboration and co-design. INterdisciplinary research team including design, games, management. The project targets SMEs as well as working with BBC Scotland.

Knowledge exchange as alternative to transfer model. Exchange model emphasises interaction between all participants to develop new knowledge and experiences. Used design as a methodological approach in the co-design of problem identification and problem-solving.

Used experiential design which is design as experience – the designer is not an expert but supports collaboration; transdisciplinary; experience and knowledge is closely related and interactional working in context of complexity.

Process stages of research; design and innovation. Innovation tending to incremental improvement that returns to research. Knowledge is developed as a concept through research and as an experience through design and innovation.  Phases:

Research involves secondments in to companies as immersion researching areas for improvement, gain and share knowledge and undertaking tasks/ activities. Example of working with CulturalSparks on community consultation related to cultural programme of Commonwealth Games 2014. Research workshops were also held on a quarterly basis.

Design of interventions with companies and audiences using e business voucher scheme. Ran a number of proto-typing projects including looking at pre-consumption theatre audience engagement.

Innovation based on two streams: (a) application of knowledge within the company and (b) identifying transferable knowledge. Have developed new processes, digital tools and products with an aim of creating longer-term impact of process improvements and tacit understandings by both the companies and by the universities/ intermediaries.

Experience of the clients very variable. Agencies much more receptive to working with higher education while micro-enterprises were more cautious as have limited resources. So with company, took a more business-like approach focused on outcomes and have gained positive impact.

The focus project is on supporting creative industries companies to engage with rapid changes in audiences driven by technological changes.

 

Now onto looking at invisible work in software development; data curatorship and invisible data consumption in industry, government and research. Research framework is base don the social shaping of technology; infrastructure studies and the sociology of business knowledge.

Focused on climate science due the importance of the interface between data and modelling projections through software; also in modelling data in manufacturing. In manufacturing is a question of generic software vs localisation via specific vagueness where metadata is under-emphasised and developed. While sharing data in government involved a more specific focus on curation of data and sharing data without affecting data ownership. Discourse on disintermediation tends to downplay costs of co-ordination particularly in respect of trust relations.

Data consumption linked to issues in data visualisation that aggregates and simplifies data presentation with careless consumption of data. Consumers have preference for simplified visualisations such as the two-by-two matrix to aid prioritisation. Such matrices become the shared language for users and the market or are amended as different simplified visualisation such as waves or landscapes.

The specific vagueness of the software ontologies makes comparability across platforms and contexts of the data becomes impossible.

Study on ERP involved videoed observation; situational analysis used in study on government softwares to generate grounded data analysis and study on data visualisation involved direct interviews of providers and users of data.

Ontologies discovered as useless – a life changing discovery!