Posted on behalf of Emma Shaw (KB+ Data Manager)
I started working at Jisc as a Data Manager back in February 2016. My background is in academic libraries (my previous role was Journals and E-resources Librarian) and so my approach to data management is very much from the librarian’s point of view. I understand both Higher Education and the journals’ publishing landscape, and appreciate the frustrations that can come with working in those fields. Having access to reliable and accurate data about your library’s holdings is the Holy Grail for most librarians. Yet the lack of compliance to data standards and recommendations within publishing makes this almost impossible to achieve. The quality and types of information provided by publishers differ greatly. Some publishers seem to take a somewhat cavalier attitude to their own data (such as the flagrant use of titles and ISSNs with no regard to their bibliographic history). This is the world of journals publishing! The reality is that content providers’ information is often unreliable.
For the last six months, in addition to my work on KB+, I have participated in a joint project with the Jisc Journal Archives (JJA). The aim of this project was to check the JJA holdings to identify any gaps alongside enriching the metadata.
When the JJA material was originally purchased by Jisc from 2004, the data was supplied in the form of a black box (literally!); it contained millions of individual articles (as PDFs) and their corresponding metadata. The quality of the data was somewhat patchy, for example, there were articles with no metadata as well as metadata with no articles. The JJA team did a fantastic job of managing to ingest this content, and so produce the platform we now know as the JJA. We were aware that some omissions in the content remain, however, and also that some of the metadata could be improved. These are the issues I set out to resolve.
Assessing licence information: My journey began with locating the original licence agreements in order to confirm what licensed material is included (and consequently what the JJA should have). Usually the licence is in the form of a Word document or a PDF, so my first task was to extract this data and put it into Excel. The quality of this information varies enormously between publishers and agreements. Title level information can be sparse, often with inaccurate or non-corresponding date ranges; volume or issue information may be missing, and no related titles or bibliographic histories may be supplied. For example, both Oxford University Press and the Institute of Civil Engineers provided only a basic list of titles (lacking ISSNS) with a coverage date range for the entire collection (rather than by title). Other publishers, such as Brill, Cambridge University Press (CUP) and Taylor and Francis, provided coverage information down to the issue level, which obviously made my job a lot easier. The Institute of Physics (IPO) was the only publisher to attempt to include a full bibliographic history (and the corresponding coverage) for each archive title.
Checking, verifying and enriching the data: Next I looked at any discrepancies around the journal title; these were checked against the publisher’s information and the ISSN Portal. Title publication and coverage dates were checked and verified, start and end volume and issue numbers were checked and added to if necessary, and any preceding/ succeeding titles (within the licensed coverage date range) were added as separate titles, each as a new data line. The bibliographic histories for titles in archive collections are often by nature extensive. For example, in the IOP collection Measurement Science and Technology (0957-0233) has had six title changes since its first publication in 1923!
Creating coverage notes: Then I added the bibliographic history for each title and any other useful details, for example information to cover gaps in publisher content.
I then had a checked and verified title list for the Licence agreement.
Checking and updating KB+ file: Fortunately, we already have KB+ title lists for these Jisc agreements – so my next task was to double check the KB+ file against the verified licence information and update the titles lists as necessary. Publishers have the annoying habit of selling on their content, so any titles which had transferred out since the original agreement needed to be updated. Then, using the checked and verified title list for each agreement, I made the necessary updates to the KB+ file, for example noting publisher transfer outs, to ensure it matched the licences and the JJA.
The Data Managers will continue to monitor this information and include the archive agreements in their workflows to ensure titles continue to stay up to date.
Confirming JJA data: The final phase was to check the verified and updated KB+ title list against the JJA holdings data. This check needs to be done at issue level (rather than title level) to ensure all issues are included for each title within the licensed coverage range.
This five-stage process enabled me to identify any gaps in the archive holdings. Armed with this information we are now in a position to be able to fill these gaps, and the JJA team are currently in contact with the publishers to obtain the missing content. Many of the problems I encountered with the JJA data were due to a lack of bibliographic history information, where a holding was listed under a different (preceding or succeeding) title or ISSN. In Brill, for example, the holdings for six titles (of 194) were indexed under a different title/ ISSN; for the IOP the holdings for eight titles (of 64) were listed under a different title/ ISSN and for CUP four titles (of 200+) had the same problem. I have rectified any such issues, updating and amending the metadata for each holding, ensuring that the bibliographic information is accurate. We also aim to enrich this data further by adding the coverage note information for each record (which includes any bibliographic history). The JJA team are in the process of implementing these changes and updates across the archive collections.
Once completed, and once the gaps in content have been filled, the end result will be a comprehensive journal archives collection with accurate and rich bibliographic metadata. As a Data Manager (and Librarian) I understand the importance of having access to a trusted data source. Quality data not only helps the librarian community; it also means better searching and discoverability, which ultimately benefits the end user. This is why we (Data Managers at KB+) do what we do. It may be painstaking work, but we know it’s worth it.