I have been in my new role as data manager for a few months and one of my main tasks is adding new collections of titles into KB+. Our first priority has been Nesli2 collections. In some cases publishers kindly supply KBART files of their collection, others direct us to a title list on their website or we garner the information directly from their web pages . Once we receive the files we need to reformat them in order to upload into KB+.
Before uploading we often attempt to check the data quality. This can include comparing against a title list acquired from a different source, comparing against last years list (using the comparison basket feature in KB+), checking for known title cessations or transfers. We are currently using tools such as SunCat to help with the data verification but are looking to other services and collaboration with international services to ensure consistency and accuracy. Depending on the outcome of these checks we are then faced with how best to resolve any found issues, such as title misspellings, incorrectly assigned DOIs, incorrect ISSNs before this data is loaded into KB+.
I have been mainly using Excel to manage and manipulate the data prior to loading into KB+ and therefore tools such as the ‘conditional formatting’ to find duplicate ISSNs have proven to be useful. I am planning investigating tools such as Google Refine to help manage this data more effectively, especially with regards to breaking down large title lists into appropriate collections and automate some standard formatting.
I am also looking forward to release 3.1 of KB+ which is looking to provide more integrated communication channels.
In working collaboratively and sharing data errors I think we can help build a good quality source of data.