All listed times are in British Summer Time. To convert to your time zone, use the links.
Programme subject to change.
Wednesday April 28, 2021
Opening Keynote Address
Ingrid Bauer, TU Wien: “Next Generation Metrics – A White Paper by CESAER”
CESEAR has released a White Paper about “Next generation metrics” in June 2020 including a set of various even progressive metrics. It arose from the genuine interest of the CESAER Members to stay at the forefront of science, education and innovation, to benchmark over time in order to pursue institutional development paths and – ultimately – to optimise their contributions to society and the world. The findings, recommendations and indicators presented are neither conclusive nor exhaustive, but are based on the excellence, expertise and best practices from CESAER Members and build on the longstanding and extensive work of the Task Force Benchmark. This White Paper will be introduced and discussed in this session.
Instructor: Dr. A. Cecile J.W. Janssens, Emory University
CoCites is a new method for searching scientific literature that searches through citations rather than keywords. The method is comparable with the “related articles” feature in PubMed and other databases, but more effective. CoCites co-citation search ranks articles that are frequently cited together with an article of interest. Related articles can be combined into query set to find articles that are frequently 1) cited together with any of the articles in the set, in the co-citation search; or 2) cited by or citing articles from the set to find recent articles, in the citation search. By virtue of its design, the method is ideal for literature searches in (systematic and rapid) reviews and meta-analyses, and for updating reviews. CoCites is embedded in PubMed and Google Scholar, and exports to EndNote, Mendeley, Zotero and others.
Shiobhan Smith: “Where AI and IQ meet: can researchers and bibliometricians trust classifications of citations by algorithms? – a case study”
Two recent tools have been developed to support contextualisation of citations using artificial intelligence (AI) and machine learning. These are Scite.ai and Semantic Scholar. Scite.ai classifies citations into three categories; supporting, disputing, and mentioning. Semantic Scholar identifies citations according to citation intent and importance cumulating in the identification of Highly Influential Citations. These tools provide immediate contextualisation of a publication’s citations in bulk and have great potential to improve literature reviewing practices. Contextualising citations in this way also supports more nuanced assessment and reporting of impact using citation data. It is also a line of defence against unsuspecting researchers citing retracted papers. However, bibliometricians are expressing curiosity in, and perhaps some concern about, the accuracy of the AI classifications and the mechanisms used to correct misclassifications.
As part of a project to celebrate the ten year anniversary of his article “A critique of current practice: Ten foundational guidelines for autoethnographers”, and consider its impact on autoethnographic research, the University of Otago’s Associate Professor Martin Tolich created a dataset of all citations to this publication. This dataset was subsequently assessed by Martin Tolich to form the basis of a follow up article. Serendipitously, Library Research Support Unit Manager Shiobhan Smith became privy to the project and its dataset and suggested a collaboration with Martin to compare his citation classifications to the AI generated classifications in Semantic Scholar and Scite.ai.
This presentation introduces the methodology used, discusses preliminary findings, and considers what this case study reveals about the strengths and limitations of citation classification by both humans and machines. The aim is to contribute to answering the question “how much should researchers and bibliometricians trust classifications of citations by algorithms?”.
Tolich, M. (2010). A critique of current practice: Ten foundational guidelines for autoethnographers. Qualitative Health Research, 20 (12), 1599–1610. https://doi.org/10.1177/1049732310376076
Barrie Hayes, Adam Dodd & Michelle Cawley: “Using Machine Learning Methods to Refine a (Health Equity) Publication Set for Bibliometric Analysis”
The Impact Measurement and Visualization (IMV) Team at the University of North Carolina at Chapel Hill (UNC-Chapel Hill) Health Sciences Library (HSL) has partnered with the UNC-Chapel Hill Gillings School of Global Public Health (SPH) since its formation in 2017. The IMV team and SPH administration are currently collaborating on multiple projects to highlight SPH research impact, reveal collaboration patterns, and to illustrate evolution in research foci over time. Analyses are centered around publications from SPH researchers in key areas such as COVID-19, infectious diseases, and health equity. For the analyses around health equity, we conducted a search using author names and health equity keywords that returned approximately 8200 studies. Our objective was to reduce the proportion of false positives in the dataset used for bibliometric analyses using machine learning. Specifically, the team used supervised clustering, a type of semi-supervised learning, to identify publications in the full data set likely to be not relevant. Our approach is based on one commonly used to identify studies most likely to be relevant in a comprehensive literature search. Given that most of the results were relevant to the topic, we set out to identify results most likely to be not relevant using machine learning. A random set of results from the full corpus was screened to identify training data for the machine learning model. SPH domain experts manually reviewed 250 studies and identified 95 studies not relevant to health equity, indicating a 38% false positive rate. These studies were used to identify other studies out of the total corpus most likely to be not relevant. This project is on-going and updated results will be shared in this presentation. Using machine learning shows promise to efficiently improve the overall quality and precision of input datasets for bibliometric analysis.
William Mischo & Mary Schlembach: “Research Impact Indicator Visualizations and Analysis”
University libraries are partnering with campus administrators, departments, and research groups on providing bibliometric and research impact services and metrics. Research impact data is being used as an assessment tool for measuring faculty and research group productivity; in promotion and tenure processes; in grant and funding applications; and in program and institution reviews and rankings. Libraries and librarians have the tools and expertise to assist in the gathering, organization, and visualization of research impact indicators and measures. The introduction of various bibliometric and database tools and APIs has provided libraries with the ability to responsibly extract a variety of research impact measures including publication, citation, co-authorship, H-Index, journal impact metrics, awards, and grant and patent activity. The University of Illinois at Urbana-Champaign Library has developed a system for generating dynamic and interactive visualizations and dashboards for displaying research impact data at the individual faculty, department, and research group level. This presentation will describe the elements of the Illinois research visualization system and look at the relationships between the various available research indicators, including providing correlation analyses.
Samuel Hansen: “Using CADRE to support Large Scale Bibliometric Studies”
In my presentation I will discuss how I have used The Collaborative Archive & Data Research Environment (CADRE) platform in large scale bibliometric studies which have ranged from investigating the citation aging patterns of scientific publications, the differences between legacy and modern mathematical biology journal publication patterns, and the disciplinary differences in publications that awaken late and receive delayed citations. I will highlight the data available withing CADRE, multiple methods which can be used to access the data, and the support available from CADRE itself. I will also discuss the work of the CADRE fellows, of whom I am lucky to call myself a member.
Simon Linacre: “Mind The Sustainability Gap: Thoughts from Developing a Metric for Journals and SDG”
Juergen Wastl: “Using Machine Learning to analyse research in the context of Sustainable Development”
Simon Kerridge: “CRediT – Where next for the contributor role taxonomy?”
Andreas Pacher: “Open Editors: A New Data and Its Challenges”
Vicky Wallace: “Bibliometrics in context: a personalised action plan for time-poor academics”
Ernesto Priego: “Using the Altmetric Explorer for Responsible Metrics as an Author for Research Output Impact Statements and Research Environment Case Studies”
Euan Adie: “Off the Record: analyzing citations between scholarly work and policy”
Fei Yu: “A systematic review on how machine learning has been applied to bibliometrics”
Corrado Cuccurullo: “Biblioshiny: A Comprehensive Science Mapping App for Systematic Literature Review”
Instructor: Silvio Peroni
OpenCitations has been established as a fully free and open infrastructure to provide access to global scholarly bibliographic and citation data. It provides a data model to describe such data, several collections containing bibliographic and citation data available in CC0, a repository of open-source software developed for gathering data and providing online services to access citation data, such as REST APIs and data dumps. In this workshop, we show how to use OpenCitations as a source of bibliometric data, in particular demonstrating (a) how the REST APIs can return separate information about the citing entities, the cited entities, the citations themselves, and citation counts, and (b) how these data can be called from, integrated with and used in other applications.
Wednesday May 5, 2021
Ludo Waltman, CWTS: “New tools and technologies: How to build a better research system?”
We are living in exciting times. The bibliometric community is facing the rapid emergence of new tools and technologies, many of them enabled by new infrastructures and data sources. I will review these developments, focusing in particular on those advances that are most likely to have a long-lasting impact. At the same time the expectations we have of new tools and technologies are immense. In addition to the use of these tools and technologies to promote ‘research excellence’, we seem to expect that they will also help us to make research assessment more responsible, to make science more open and inclusive, and perhaps even to contribute to a more sustainable world. Can our tools and technologies live up to these expectations, or could there also be adverse effects? How can we use our expanded toolbox to build a better research system?
Barbara S. Lancho Barrantes & Sally Dalton: “Predatory Journals under the Bibliometric Gaze”
A negative consequence of the growth of open access publishing, is the emergence of a new type of scholarly publication, the predatory journal. In 2012, Jeffrey Beall, an American librarian, announced the rise of this new phenomenon. Beall created a list which has been widely used as a reference by librarians and researchers. Predatory journals are driven by self-interest, usually financial, at the expense of scholarship, publishing any article for payment. They are characterized by the following criteria: false or misleading information, quick acceptance of low-quality papers, charging exorbitant rates for publication of articles, fake journal metrics, fraudulent ISSNs, incorrect addresses, non-existing editorial boards, spelling or grammar mistakes, fast publication process, speed in peer review, etc.)
This presentation will revisit the topic of predatory journals, contextualise it for today, and address the following questions: What is the current state of predatory journals in academic publishing? Is the open access community supporting the control of predatory journals? How have predatory journals managed to sneak into international databases? How can bibliometrics help in the detection of these bad practices? We will also explore how data sources are responding to curate the contents, how new advances in publishing are being used to tackle this problem, and how new technologies and artificial intelligence through machine or deep learning might help to detect and classify them. The risks of publishing in a predatory journal could have negative repercussions in a research career, especially for early career researchers who feel the weight of “publish or perish” to continue existing in academia. We will suggest that research libraries play a crucial role in helping researchers identify predatory journals and monitor these harmful publishing practices. We will explain the benefits of creating a library service which supports researchers in the detection and monitoring of these publishing frauds.
Panel discussion on ethics of new tools & technologies
Chair: Elizabeth Gadd
Panelists: Erin Young, Stephen Pinfield, David Pride, Josh Nicholson, Cecile Janssens & Ludo Waltman
The responsible metrics agenda has taught us that just because we can do something, doesn’t mean we should. And with the advancement of new citation-based tools and technologies are we in danger of taking a “build fast, fix later” approach. If so, what can we do about it? This panel invites six experts in the fields of ethics, AI and citation-based technologies to consider questions such as: does it matter that new technology development is so male-dominated? Does AI-based peer review support bake in existing biases? Should the design of new tools & indicators go through an ethics process? Attendees can also pose their own questions to the panel, with the ultimate aim of understanding where the responsibilities lie for the appropriate production and use of new technologies, and what we can do to ensure this is done in an ethical way.
Workshop: Open Knowledge Maps – A Visual Interface to the World’s Scientific Knowledge
Instructor: Dr. Peter Kraker, Open Knowledge Maps, @PeterKraker
Open Knowledge Maps is the world’s largest visual search engine for research. On its website, users can create knowledge maps of research topics in any discipline based on 260+ million research outputs. Knowledge maps provide an immediate overview of a topic by showing important sub-areas at a glance and linking them to relevant resources and concepts. As such, Open Knowledge Maps enables a diverse set of users to explore, discover, and make use of scientific content.
Head Start, the open source software behind Open Knowledge Maps, has deep roots in bibliometric and altmetrics research. In this session, Dr. Kraker will present Open Knowledge Maps and its journey from a prototype based on co-readership data to the large-scale system based on co-word occurrence that it is today. He will also highlight where metrics still play an important role in the software and discuss our future plans in this direction. In this discussion, he will address the importance of responsible metrics in visualizations and the opportunities afforded by the increased availability of open metrics data.
Instructor: Barney Walker, Imperial College London
CitationGecko is an open-source project that allows users to interactively build, visualise and explore the local citation network around an initial set of ‘seed papers’. As you explore the citation network, you can add the relevant papers you find to your seed set to expand
the network further and hide irrelevant ones to focus attention. CitationGecko makes use of open APIs to fetch citation data in real-time from multiple sources and integrates with reference managers such as Zotero and Mendeley.