ReCon, Research in the 21st Century: Data, Analytics and Impact
So here we are at ReCon, Research in the 21st Century: Data, Analytics and Impact at the University of Edinburgh’s Business School. I’ll be taking notes here throughout the day but these will be partial and picking up main points of interest to me.
The conference is opening with Jo Young from the Scientific Editing Co giving the welcome and introduction to the event.
The first session is from Scott Edmunds from GigaScience on “Beyond Paper”. Has the 350 year old practices of academic publishing had its day and is the advertising of scholarship & formulated around academic clickbait. Taken to extremes, we can see the use of bribery around impact factors, writing papers to order, guaranteed publications etc. This has led to an increase in retractions (x15 in the last decade) so that by 2045 as many paper will be retracted as published and then we’re into negative publishing.
We need to think of new systems of incentives and we now have the infrastructure to do this especially data publishing such as Giga Science provide.
Giga Science has own data publishing repository as well as an open access journal with open and transparent review process. Open data and data publishing is not new and was how Darwin worked through depositing collections in museums and publishing descriptions of finds before the analysis that led to Origin of the Species.
Open data has a moral imperative regarding data on natural disasters, disease outbreaks and so forth. Releasing data leads to sharing of data and analysis of that data for examples on Ecoli Genome analysis. Traditional academic outputs were created but it is also used as an example of the impact of open data. See the Royal Society report here. The crowd sourced approach to genome sequencing is being used in, eg, Ebola, in rice genomes addressing the global food crisis. But publishing of analysis remains slow and needs to be closer to realtime publishing.
So we’re now interesting in executable data looking at the research cycle of interacting data and analysis leading to publications at micro and nano publications that retain DOIs. Alot of this is collected on GitHub.
Also looking at the sharing of workflows using the Galaxy system and again, giving DOIs to particular workflows (see GigaGalaxy), sharing virtual machines (via Amazon).
Through analysis of published papers found how rates of errors but also that replication was very costly.
So the call is “death to the publication, long live the research object” to rewards replication rather than scholarly advertising.
Question: how is the quality of the data assured?
Journal publications are peered reviewed and do checks using own data scientists. While open data is not checked. Tools are available and being developed that will help improve this.
Now on to Arfon Smith from GitHub on Predicting the future of Publishing. Looking at open source software communities for ideas that could inform academic publishing. GitHub is a solution to the issues of version control for collaboration using Git technology. People use GitHub for different things: from single files, through to massive software projects involving 7m + lines of codes. There are about 24m projects on GitHub and is often used by academics.
Will be talking about the publication of software and data rather than papers. Assumptions for the talk are: 1. open is the new normal; 2. the PDF is increasingly unsatisfactory way of sharing research; and 3. we are unprepared to share data and software in useful ways.
GitHub especially being used in data intensive sciences. There is the argument that we are moving in to a new paradigm of sciences beyond computational data in to data intensive sciences (data abundance) & Big Science.
Big Science requires new tools, ways of working and ways of publishing research. But as we become more data intensive, reproducibility declines under traditional publishing. In the biosciences, many methods are black boxed and so it is difficult to really understand the findings – which is not good!
To help, GitHub have a guide on how to cite code by giving a GitHub repository a DOI (via Zenodo) for academics.
From open source practices that are most applicable are:
1. rapid verification, eg, through verification of pull-requests where the community and 3rd party providers undertaking testing or using metrics that check the quality of the code, eg, Code Climate. So verification can and should be automated and open source is “reproducible by necessity”. So in academia we can see the rise of benchmarking services – see for example, Recast or benchmarking algorithm performance.
2. innovation in where there are data challenges by drawing on a culture of reuse around data products to filter out noise in research to enable focus on the specific phenomena of interest (by elimination by data from other analysis)
3. Normal citations are not sufficient for software. Academic environments do not reward tool builders. So there is an idea of distributing credit to authors, tools, data and previous papers. So makes the credit tree transparent and comprehensive.
These innovations depend on the forming of communities around challenges and/ or where open data is available.
The open software community have developed a number of solutions for the challenges faced in academic publishing.
Now we’ve moved on to Stephanie Dawson, CEO, ScienceOpen on “The Big Picture: Open Access content aggregators as drivers of impact” – which is framed in terms of information overload which is a growth trend that is not going to go away. The is reinforced by an economic advantage open access of publishing more along with increased interest in open data, micro-publications etc At the same time, the science information market is extending to new countries such as India, Brazil & China.
Discovery is largely through search engines, indexing services (Scopus, Web of Science), personal and online networking (conferences, mendeley) and so one. But these do not rank knowledge providing reputation, orientation, context, inspiration.
Current tools: journal impact factor but this is a blunt tool that doesn’t work at the individual paper level but is still perceived as important for academics – and for publishers as pricing correlates to impact factor. Article based tools such as usage and dissemination metrics are common.
There is an opportunity for open access to make access to published papers easier that may undermine publishing paywalls and encourage academics to look to open access channels. But open access publications are about 10% of total and on a lower growth trajectory. So there needs further incentives for academics to support open access publications.
Open Science is an open access communication platform with 1.5m open access articles, social networking and collaboration tools. The platform allows commenting, dissemination, reviewing or ‘liking’ an article. Will develop an approach to enable the ranking of individual articles that can be bundled with others, eg, by platform users, or by publishers [so there is a shift towards alternative and personalised forms of article aggregation that can be shared as collections?].
Question: impact factors can be gamed as can alternative metrics. What is key is the quality of the data used and analysis – metrics for how believable articles are?
We’re looking at how to note reproducibility of article findings but these aren’t always possible so edited collections based are a way forward.
Q: this issue of trust is not about people but should be about the data and analysis and the transparency of these – how the data came about?
So there is a need to rethink how methods sections are written. We’re also enhancing the transparency of the review process.
The final session on this section is Peter Burnhill, Director, EDINA on “Where data and journal content collide: what does it mean to ‘publish your data’?”. Looking at two case studies:
1. project on reference rot (link rot+content drift) to develop ways of archiving the web and capturing how sites/ urls have changed over time. Tracked the growth in web citations in academic articles and found 20%+ of urls are ‘rotten’ and original pages cited have disappeared including from open archives. A remedy is to use reference management software to snapshot and archive web pages at time of capture. The project has developed a Zotero plug-in to do this (see video here).
2. an ongoing project on url preservation by publishers. There are many smaller publishers that are ‘at risk’ of being lost. Considers data as working capital (that can be private as work-in-progress) or as something to be shared.
The idea of open data is not new to science and can be seen in comments on science from the 19th Century.
The web and archiving problematises the issues of fixity and malleability of data.
We’re back following a brief coffee break.
Next up is Steve Wheeler on “The Future is Open: Education in the digital age”. Will be talking about ‘openness’ and what we do with the content and knowledge that we produce and have available. Publishing is about educating our community and so should be as open as possible and for freely accessible to better educate that community.
Pedagogy comes first and technology are the tools: we don’t want technological determinism. You have to have a purpose in subscribing to a tool – technology is not a silver bullet.
“Meet Student 2.0”: has been using digital tools at six months old onwards. Most of our students are younger than Google! and are immersed in the digital. But I don’t follow the digital natives idea but do see merit in the digital residents and visitor concept from White and Le Cornu.
Teachers fearing technology: 1 how to make it work; 2 how to avoid looking like an idiot; 3. they ‘ll know more then me. For learners the concerns are about access to WiFi and power. Uses the example of the floppy disk recognised as the save icon but not as a storage device.
Students in lectures with laptops as ‘windows on the world’ to check on and expand on what is being presented too them. But what do these windows do: find information, engage in conversations. Another example is asking about a text on Twitter leads to a response directly from the author of that text. UNESCO talks about communities of users (2002).
Openness is based on the premise of sharing and becomes more prominent as technology makes sharing possible at scale. mentions Martin Weller’s Battle for Open and how openness as an idea has ‘won’ but implementation still has a lot stil to do.
Community is key based on common interest rather than proximity – as communities of practice and of interest. Online, en masse reduces the scope for anonymity and drives towards open scholarship where the academic opens themselves up for constructive criticism. Everything can be collaborative if we want it to be.
Celebration, connection, collaboration and communication all goes into User Generated Content (UGC). Defines UGC as having *not* been through peer review but there is peer review through blog comments, Wikipedia, Twitter conversations. Notes Wikipedia as the largest human Rhizomatic structure in the world.
Moving on to CopyLeft and the Creative Commons. Rheingold on networking as a key literacy of the 21st Century in terms of amplifying your content and knowledge.
Communities of Learning and professional learning networks – with a nod to six degrees of separation but thinks it is down to two to three degrees as we can network to people much easier. Collaborative Open Networks where information is counted as knowledge if it is useful to the community. David Cormier (2007) on Rhizomatic knowledge that has no core or centre and the connections become more important than the knowledge. Knowledge comes out of the processes of working together. This can be contrasted with the closed nature of the LMS/ VLE and students will shift as much as possible to their personal learning environments.
Have to mention MOOCS ad the original cMOOCs were very much about opening content on a massive scale and led by students. The xMOOC has closed and boxed the concept and generating accusations of a shallow learning experience.
Open access publishing. Gives the example of two papers of his, one was in an open access journal that underwent open peer review. The original paper, the reviewer comments, the response and the final paper were published – open publishing at its best!But the other paper was to a closed journal and took three years to publish – the open journal took five months. The closed journal paper has 27 citations against 1023 for the open journal.
Open publishing amplifies your content, eg, the interactions generated through sharing content on SlideShare. His blog has about 100k readers a month and is another form of publication and all available under Creative Commons.
This is about adaptation to make our research and knowledge more available and more impactful.
Question: how are universities responding to openness.
It depends on the universities’ business model – cites the freemium model with a basic ‘product’ being available for free. In the example of FutureLearn is giving away partner content for free with either paid for certification or as a way of enhancing recruitment to mainstream courses.
Now time for lunch
Now back and looking at measuring impact with Euan Adie from altmetric
Using the idea of impact of research is about making a difference. Impact include quality: rigour, significance, original, replicable
attention: the right people see it
impact: makes a difference in terms of social, economic, cultural benefits.
REF impact is assessed on quality and impact. A ‘high impact journal’ assumes the journal is of quality and the right people see it (attention).
Impact is increasingly important in research funding across the world. And it is important to look at impact.
Traditional citations counts measure attention – scholars reading scholarship.
Altmetrics manifesto – acknowledge that research is available and used online then we can capture some measures of attention and impact (not quality). This tends to look at non-academic attention through blog posts and comments, Tweets, newspapers; and impact on policy-makers. But what this gives is data but a human has to interpret it and put it in to context via narrative.
Anna Clements on the university library at St Andrews University. What are the policy drivers for the focus on data: research assessments, open access requirements (HEFCE, RCUK) and research data management policies (EPSRC, 2015). Which required HE to focus on the quality of research data with a view to REF2020, asset exploitation, promotion and reputation and managing research income – as well as student demand/ expectations especially following the increase in fees. So libraries are taking lead in institutional data science within the context of financial constraints and ROI and working with academics.
Developing metrics jointly with other HEIs as snowball metrics involving UK, US and ANZ as well as publishers and the metrics are open and free to use.
Kaveh Bazargan from River Valley Technologies on “Letting go of 350 years’ legacy – painful but necessary”. The company specialises in typesetting heavy maths texts. But has more recently developed publishing platforms.