Photo courtesy: Unsplash
Hi there! My name is Ruby and I am a Master of Library and Information Studies (MLIS) student at the UBC iSchool and the cIRcle Research Assistant. This post is one of a series of blog posts by me explaining and discussing some of the projects I have worked on at cIRcle. This post in particular details my exploration of the tool OpenAlex and how it may be relevant and useful to cIRcle.
OpenAlex was created by the nonprofit OurResearch which also developed other tools such as Unpaywall and Unsub. OpenAlex is an open-source dataset of scholarly research. It is an index of hundreds of millions of interconnected scholarly entities (i.e., works, authors, venues, institutions, and concepts) (“OpenAlex,” n.d.). Investigating the possible uses of this tool for cIRcle workflows was the main goal of this project. Given my student position and drive to learn as much as possible during my time at cIRcle, this project was exciting as it allowed me to investigate an innovative and exciting new tool in the world of scholarly communication and institutional repositories.
OpenAlex provides data that can be used to assist scholarly support services such as institutional repositories to assess the scholarly output of their institution. To gain insight into how other libraries and institutional repositories use the OpenAlex API in their workflow, I attended an open call for librarians hosted by OpenCon discussing this tool. Most of the attendees were like me: curious about the possibilities of OpenAlex and its API and how these tools are or may be used in libraries. One attendee mentioned that they collected stats through the OpenAlex API to determine the percentage of their institutional output that was Open Access within a given timeframe. They then used this data to inform outreach initiatives to encourage more authors to engage in Open Access publishing. There was also mention of the OpenAlex API being used by institutional repository managers to identify publications that could be specifically requested from authors for deposit.
Photo courtesy: OpenAlex
Querying the OpenAlex API
To filter the OpenAlex dataset, I needed to learn how to query the OpenAlex API. Querying refers to the act of searching or filtering the dataset using specific syntax in one’s browser address bar. Although there is an OpenAlex website that can be used to explore their dataset, the OpenAlex API is considered the primary way to get OpenAlex data (“Overview,” n.d.).
Learning how to build a query involved exploring the OpenAlex API website and following the instructions and tutorials they provide. As a beginner to this process, understanding APIs and how to query them took some practice and hours of trial and error. Making sure I understood each aspect of the query and then testing different query combinations was the first step. Then, I had to figure out how to read and sort through the query results using a JSON Viewer. Through these processes, I learned about the specific formatting and syntax that is required when querying the OpenAlex API. After learning these operations, I created an internal cIRcle-specific OpenAlex API tutorial document to assist with creating future OpenAlex API queries. I’ve included a sample UBC query on the Adding Faculty Publications in cIRcle wiki.
Potential Uses at cIRcle
In general, OpenAlex provides access to a large dataset of scholarly research that can be filtered according to one’s goals. For cIRcle, this could entail searching the OpenAlex dataset for Open Access journal articles affiliated with UBC authors within a certain date range or concerning a subject. This data can then be used to inform campaigns to recruit content such as articles related to COVID-19 or climate crisis research. These query results could also be used to report on cIRcle’s impact and activity in relation to scholarly output at UBC.
I also investigated if OpenAlex may be suitable for author name disambiguation projects at cIRcle. Author name disambiguation is a huge challenge for repositories and refers to the process of ensuring that individual authors have correct, unique entries for their names across a database to ensure all their related works can be found and easily distinguished from other authors with the same name. However, through my research, I learned that OpenAlex’s own name disambiguation procedures are being refined and they are in the process of deploying new methods of author name disambiguation based on BERT (Bidirectional Encoder Representations from Transformers) (Priem, 2022). Due to these findings, OpenAlex was not found to be an ideal candidate for assistance with cIRcle name disambiguation projects so the work proceeded with alternate methods.
Given the newness of OpenAlex (Priem et al. 2022) and the capacity for its growth and adaptation in the months and years to come, it is worthwhile for cIRcle to remain up to date with OpenAlex and in touch with how other institutions, libraries, and institutional repositories are using this tool.
Follow the cIRcle Blog for updates on this and other projects.
Adding faculty publications to cIRcle. (n.d.). In UBC Wiki. https://wiki.ubc.ca/Library:Circle/Adding_faculty_publications_to_cIRcle
API tutorial. (n.d.). OpenAlex documentation. https://docs.openalex.org/additional-help/tutorials/filter-and-group-with-api
Author name disambiguation. (2022, July 10). In Wikipedia. https://en.wikipedia.org/w/index.php?title=Author_name_disambiguation&oldid=1097412489
explore.OpenAlex. (n.d.). https://explore.openalex.org/
OpenAlex. (n.d.). https://openalex.org/
OurResearch. (n.d.). https://ourresearch.org/
Overview. (n.d.). OpenAlex documentation. https://docs.openalex.org
Priem, J., Piwowar, H., & Orr, R. (2022). OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts. 26th International Conference on Science and Technology Indicators. https://doi.org/10.48550/arXiv.2205.01833
ROR. (n.d.). https://ror.org/
Unpaywall. (n.d.). https://unpaywall.org/
Unsub. (n.d.). https://unsub.org/