innovation in metadata design, implementation & best practices

Papers, Presentations, Sessions, Posters, Workshops, Tutorials and Hands-on sessions

Session: Metadata
National Diet Library Data for Open Knowledge and Community Empowerment [Presentation]
Saho Yasumatsu, Tomoko Okuda
The National Diet Library (NDL) has been promoting utilizations of the data created and provided on the Internet by the NDL since it established its "Policy of providing databases created by the National Diet Library." The NDL provides bulk download of open datasets and takes part in public events related to open data and civic technology, which increased visibility of NDL data in communities throughout Japan. The NDL also organizes ideathons and hackathons to promote its data and services. These outreach activities resulted in any number of interesting and potentially useful initiatives.This presentation will demonstrate the NDL's efforts and achievements in promoting the use of its data, while showcasing some of the best civic-driven applications and visualizations of library data.
Metadata as Content: Navigating the Intersection of Repositories, Documentation, and Legacy Futures [Presentation]
Erik Radio
Documentary Relations of the Southwest (DRSW) is a dataset of bibliographic metadata derived from over 1500 reels of microfilmed documents that trace the history of the southwest from the 16th century until Mexico's independence in 1821. Originally made available to scholars' through a now defunct proprietary repository, DRSW is currently completing a migration from a home-grown solution to Blacklight as a sustainable option. While migrating content is a familiar scenario, this migration highlights key challenges in navigating the intersection of legacy design and possible futures for metadata curation and repository selection. This presentation deals with challenges revolving around three paradigms: metadata as content, system documentation generation, and metadata futures for indexing and integration.
Wikidata & Scholia for scholarly profiles: the IU Lilly Family School of Philanthropy pilot project [Presentation]
Mairelys Lemus-Rojas, Jere Odell
During recent years, cultural heritage institutions have become increasingly interested in participating in open knowledge projects. The most commonly known of these projects is Wikipedia, the online encyclopedia. Libraries and archives in particular, are also showing an interest in contributing their data to Wikidata, the newest project of the Wikimedia Foundation. Wikidata, a sister project to Wikipedia, is a free knowledge base where structured, linked data is stored. It aims to be the data hub for all Wikimedia projects. The Wiki community has developed numerous tools and web-based applications to facilitate the contribution of content to Wikidata and to display the data in more meaningful ways. One such web-based application is Scholia which was created to provide users with complete scholarly profiles by making live queries to Wikidata and displaying the information in an appealing and effective manner. Scholia provides a comprehensive sketch of the author’s scholarship. This presentation will demonstrate our efforts to contribute data to Wikidata related to our faculty members and will provide a demo of Scholia’s functionalities. At IUPUI (Indiana University-Purdue University Indianapolis) University Library, we conducted a pilot project where we selected 18 faculty members from the IU Lilly Family School of Philanthropy to be included in Wikidata. The School of Philanthropy, located on the IUPUI campus, is the leading school in the subject in the United States. The scholarship produced by its faculty is known to be widely used. We wanted to provide a presence in Wikidata not just for the faculty, but also for their publications and co-authors. For the creation of Wikidata items, we used a combination of semi-automated and manual processes. Once the items were created in Wikidata, we used Scholia to generate the scholarly profiles. Academic libraries have the capacity to create and curate data about scholars affiliated with their institutions. We expect that the data set we built in Wikidata will help our institution better understand and describe the value of this school to global research on philanthropic giving and nonprofit management. Our pilot project is just a first step toward more efficient and systematic library-based contributions to Wikidata.
Session: RDF
Linking knowledge organization systems via Wikidata [Presentation]
Joachim Neubert
Wikidata is a large collaboratively curated knowledge base, which connects all of the roughly 300 Wikipedia projects in different languages and provides common data for them. Its items also link to more than 1500 different sources of authority information. Wikidata can therefore serve as a linking hub for the authorities and knowledge organization systems represented by these “external identifiers”. In the past, this approach has been applied successfully to rather straight-forward cases such as personal name authorities. Knowledge organization systems with more abstract concepts are more challenging due to, e.g., partial overlaps in meaning and different granularities of concepts.
An Approach to Enabling RDF Data in Querying to Invoke REST API for Complex Calculating [Paper]
Xianming Zhang
RDF is short in calculating, especially complex calculating. SPARQL Inferencing Notation (SPIN) has been proposed with a specific capability of returning a value by executing external JavaScript file that in partly performs complex calculating, however it is still far away from accomplishing many practices. This paper investigates SPIN's capability of executing JavaScript, namely SPINx framework, presents a method of equipping RDF data with a new capability of invoking REST API, by which a user who is querying can obtain returned value by invoking the REST API  performing complex calculating ,and then the value is semantically annotated for further use .Calculation of lift coefficient of airfoil is taken as a use case ,in which with a given attack angle as input a desired returned value is obtained by invoking a particular REST API while querying the RDF data. Through this use case, it is explicit that RDF data invoking REST API for complex calculating is feasible and profound in both real practice and semantic web.
Experiments in Operationalizing Metadata Quality Interfaces: A Case Study at the University of North Texas Libraries [Paper]
Mark Edward Phillips, Hannah Tarver
This case study presents work underway at the University of North Texas (UNT) Libraries to design and implement interfaces and tools for analyzing metadata quality in their local metadata editing environment.  It discusses the rationale for including these kinds of tools in locally-developed systems and discusses several interfaces currently being used at UNT to improve the quality of metadata managed within the Digital Collections.
Session: Multilingual
A study of multilingual semantic data integration [Presentation]
Douglas Tudhope, Ceri Binding
The availability of the various forms of open data today offers great opportunity for meta level research that draws on combinations of data previously considered only in isolation. There are also great challenges to be overcome; datasets may have different schemas, may employ different terminology or languages, data may only be represented by textual reports. Metadata and vocabularies of different kinds have the potential to help address many of these issues. Previous work explored semantic integration of English language archaeological datasets and reports (Binding et al., 2015; Tudhope et al., 2011). This presentation reflects on initial experience from a semantic integration exercise involving archaeological datasets and reports in different languages. Different forms of Knowledge Organization Systems (KOS) were key to the exercise. The Getty Art and Architecture Thesaurus (AAT) was used as the underlying value vocabulary and the CIDOC CRM ontology as the metadata element set (Isaac et al. 2011) for the semantic integration. Linked data expressions of the vocabularies formed part of an integration dataset (RDF) extracted from the source data, together with subject metadata automatically generated from the reports via Natural Language Processing (NLP) techniques. The data was selected following a broad theme of wooden material, objects and samples dated via dendrochronological analysis. The investigation was conducted as an advanced data integration case study for the ARIADNE FP7 archaeological infrastructure project (ARIADNE 2017), with the datasets and reports provided by Dutch, English and Swedish ARIADNE project partners.
Designing a Multilingual Knowledge Graph as a Service for Cultural Heritage – Some Challenges and Solutions [Paper]
Valentine Charles, Hugo Manguinhas, Antoine Isaac, Nuno Freire and Sergiu Gordea
Europeana gives access to data from Galleries, Libraries, Archives & Museums across Europe. Semantic and multilingual diversity as well as the variable quality of our metadata makes it difficult to create a digital library offering end-user services such as multilingual search. To palliate this, we build an “Entity Collection”, a knowledge graph that holds data about entities (places, people, concepts and organizations) bringing context to the cultural heritage objects.The diversity and heterogeneity of our metadata has encouraged us to re-use and combine third-party data instead of relying only on those contributed by our own providers. This raises however a number of design issues. This paper lists the most important of these and describes our choices for tackling them using Linked Data and Semantic Web approaches.
Session: Application Profiles
Modeling and application profiles in the Art and Rare Materials BIBFRAME Ontology Extension [Presentation]
Jason Kovari, Melanie Wacker, Huda Khan, Steven Folsom
Since April 2016, the Art Libraries Society of North America's Cataloging Advisory Committee (CAC) and the RBMS Bibliographic Standards Committee (BSC) have collaborated with the Andrew W. Mellon Foundation funded Linked Data for Production project on the Art and Rare Materials BIBFRAME Ontology Extension (ARM). BIBFRAME leaves some areas underdefined that need to be expanded by specialized communities. More specifically, ARM facilitates the descriptive needs of the art and rare materials communities in areas such as exhibitions, materials, measurements, physical condition and much more. Between April 2016 and February 2018, work focused on modeling. In February 2018, our focus shifted to development of SHACL application profiles for Art resources and a Rare Monographs, which we are using to define forms and display for the cataloging environment in VitroLib, an RDF-based, ontology agnostic cataloging tool being developed as part of the Linked Data for Libraries - Labs project that was discussed at DCMI 2017. Since these application profiles are being implemented in VitroLib, catalogers will be able to test the ARM modeling in a real-world environment, providing feedback to the project for potential future development. This presentation will provide an overview of select ARM modeling components, detail the process of creating and defining SHACL application profiles for ARM, and discuss challenges and opportunities for implementing these profiles in VitroLib. Further, we will discuss our strategy for low-threshold hosting of the ontology and administrative questions regarding long-term maintenance of this BIBFRAME extension.
Developing a Metadata Application Profile for the Daily Hire Labor [Presentation]
Sangeeta Sen, Nisat Raza, Animesh Dutta, Mariana Curado Malta, Ana Alice Baptista
EMPOWER SSE is a Fundação para a Ciência e Tecnologia (FCT, Portugal) and Department of Science & Technology (DST, India), financed research project that aims to use the Linked Open Data Framework to empower the Social and Solidarity Economy (SSE) Agents. It is a collaborative project between India and Portugal that is focused on defining a Semantic Web framework to consolidate players of the informal sector, enabling a paradigm shift. The Indian economy can be categorized into two sectors: formal and informal. The informal sector economy differs from the formal as it is an unorganized sector and comprised of economic activities that are not covered by formal arrangements such as taxation, labor protections, minimum wage regulations, unemployment benefits, or documentation. The major economy in India depends on the skilled labor of this informal sector such e.g. daily labor, farmers, electricians, food production, and small-scale industries (Kalyani, 2016). The informal sector is mainly made of skilled people that follow their family job traditions, sometimes they are not even formally trained. This sector struggles with the lack of information, data sharing needs and interoperability issues across systems and organizational boundaries. In fact, this sector does not have any visibility to the society not having the possibility to do business as most of the agents of this sector do not reach the end of the chain. This blocks them from getting proper exposure and a better livelihood.
Session: Validation
Metadata quality: Generating SHACL rules from UML class diagram [Presentation]
Emidio Stani
Metadata plays a fundamental role beyond classified data, as data needs to be transformed, integrated, and transmitted. Like data, metadata needs to be harvested, standardized and validated. Metadata management processes require resources. The challenge for organizations is to make the processes more efficient, while maintaining and even increasing confidence in their data. While RDF harvesting has already become an important step implemented at large scale (European Data Portal), there is now a need to introduce a RDF validation mechanism. However such a mechanism will depend upon the definition of RDF standards. When a standard is set, the provision of a validation service is necessary to determine if metadata complies, as for example with the HTML validation service.
Validation of the Metadata Application Profile for Scholarly Articles in the Field of Oncology – Onco-MAP [Paper]
Morgana Andrade, Ana Alice Baptista
Background: a metadata application profile named Onco-MAP was developed for scholarly articles in the context of scientific digital repositories in the domain of Oncology. This article reports the validation process of the Onco-MAP in which the Delphi method was used. The Delphi method has been used in many knowledge fields due to advantages such as anonymity, controlled feedback and results shown by statistical data. It is a method can be developed using questionnaires in two or more rounds and this purpose is to reach consensus about a specific topic among the experts. Purpose: to validate detailed data model of Onco-MAP. Methodology: an electronic questionnaire (SurveyMonkey) was submitted to eight librarians who work in medical and Oncology specialized libraries in Brazil. The study was developed in two rounds. Results: The detailed data model submitted to the validation have achieved a percentage of 75% agreement and the answers have remained stable at the end of the second round. Twenty classes, 140 properties and their respective descriptions of use representing a scholarly article were approved. Conclusions: The validation of the Onco-MAP through the Delphi method has demonstrated the viability of this method for this purpose. It was possible to validate the detailed data model which includes: a) the class representing a scholarly article; b) the classes associated with a scholarly article; c) the properties of a class; d) and the relationship among the classes (cardinality).
Validation of a metadata application profile domain model [Paper]
Mariana Curado Malta, Helena Bermúdez-Sabel, Ana Alice Baptista, Elena González-Blanco
The development of Metadata Application Profiles is done in several phases. According to the Me4MAP method, one of this phases is the validation of the domain model. This paper reports the validation process of a complex domain model developed under the project POSTDATA - Poetry Standardization and Linked Open Data. The development of the domain model ran with two steps of construction and two of validation. The validation steps drew on the participation of specialists in European poetry and the use of real resources. On the first validation we used tables with information about resources related properties and for which the experts had to fill certain fields like, for examples, the values. The second validation used a XML framework to control the input of values in the model. The validation process allowed us to find and fix flaws in the domain model that would otherwise have been passed to the Description Set Profile and possibly would only be found after implementing the application profile in a real case.
Session: Models
Research data management in the field of Ecology: an overview [Paper]
Cristiana Alves, João Aguiar Castro, João Pradinho Honrado, Angela Lomba
The diversity of research topics and resulting datasets in the field of Ecology has grown in line with developments in research data management. Based on a meta-analysis performed on 93 scientific references, this paper presents a comprehensive overview of the use of metadata models in the ecology domain through time. Overall, 40 metadata models were found to be either referred or used by the biodiversity community from 1997 to 2018. In the same period, 50 different initiatives in ecology and biodiversity were conceptualized and implemented to promote effective data sharing in the community. A relevant concern that stems from this analysis is the need to establish simple methods to promote data interoperability and reuse, so far limited by the production of metadata according to different standards. With this study, we also highlight challenges and perspectives in research data management in the domain of Ecology towards best practice guidelines.
Metadata Models for Organizing Digital Archives on the Web: Metadata-Centric Projects at Tsukuba and Lessons Learned [Paper]
Shigeo Sugimoto, Senan Kiryakos, Chiranthi Wijesundara, Winda Monika, Tetsuya Mihara, Mitsuharu Nagamori
There are many digital collections of cultural and historical resources, referred to as digital archives in this paper. Domains of digital archives are expanding from traditional cultural heritage objects to new areas such as pop-culture and intangible objects. Though it is known that metadata models and authority records, such as subject vocabularies, are essential in building digital archives, they are not yet well established in these new domains. Another crucial issue is semantic linking among resources within a digital archive and across digital archives. Metadata aggregation is an essential aspect for the resource linking. This paper overviews three metadata-centric on-going research projects by the authors and discuss some lessons learned from the projects. The subject domains of these research projects are disaster records of the Great East Japan Earthquake happened in 2011, Japanese pop-culture such as Manga, Anime and Game, and cultural heritage resources in South and Southeast Asia. These domains are poorly covered by conventional digital archives by memory institutions because of the nature of the contents. The main goal of this paper is not to report those projects as completed research, but to discuss issues of metadata models and aggregation which are important in organizing digital archives in the Web-based information environment.
Session: Categorisation
Why Build Custom Categorizers Using Boolean Queries Instead of Machine Learning? Robert Wood Johnson Foundation Case Study [Presentation]
Joseph Busch, Vivian Bliss
This presentation will cover a case study for using Boolean queries to scope custom categories, provide a Boolean query syntax primer, and then present a step-by-step process for building a Boolean query categorizer. The Robert Wood Johnson Foundation (RWJF) is the largest philanthropy dedicated solely to health in the United States. Taxonomy Strategies has been working with RWJF to develop an enterprise metadata framework and taxonomy to support needs across areas including program management, research and evaluation, communications, finance, etc. We have also been working with RWJF on methods to apply automation to support taxonomy development and implementation within their various information management applications. Machine learning has become a popular and hyped method promoted by large information management application vendors including Microsoft, IBM, Salesforce and others. The problem is that machine learning is opaque. The benefit is that you don’t need to do any preparation, content just gets processed. The problem is that the categories are generic, may be irrelevant, can be biased, and are difficult to change or tune. Pre-defined categories (e.g., a controlled vocabulary or taxonomy) plus Boolean queries to scope the context for categories are much more transparent. The benefit is relevant categories. The problem is that pre-defined categories requires work to set up, and specialized skills. But how hard is it do this?
Categorization Ethics: Questions about Lying, Moral Truth, Privacy and Big Data [Presentation]
Joseph Busch
Categorization is a common human behavior and it has many social implications. While categorization helps us make sense of the world around us, it also affects how we perceive the world, what we like and dislike, who we feel comfortable with and who we fear. Categorization is affected by our family, culture and education. But we can take responsibility for our own perceptions, misperceptions can be pointed out and sometimes changed. But what about categorization imposed outside of us that affects us. Should that be allowed? How is that determined? How can it be changed? These are difficult issues. For information aggregators and information analyzers, the guidelines for appropriate behavior are not always clear, nor is the responsibility for outcomes as a result of errors, bias and worse … When errors and bias are commonly held, this can be reflected in the information ecology. The tipping point need not be a majority, truth or based on ethics. It’s easy enough to identify cases of mis-categorization, but when do you do something about it? What can you do about it?
Other Presentations and Special Sessions
Metadata 2020: Metadata Connections Across Scholarly Communications [Presentation]
Patricia Feeney, Head of Metadata at Crossref
Metadata 2020 is a nonprofit collaboration that advocates and seeks richer, connected, and reusable, open metadata for all research outputs. The collaboration of over 100 individuals includes representatives from publisher, librarian, service provider, data publisher and repository, researcher, and funder communities. In 2018, Metadata 2020 formed six cross-community, collaborative projects.  These projects include activities to map between schemas; define core element terminology; create principles and share best practice; chart metadata evaluation tools; and significantly, communicate with researchers and organizations about incentives for improving metadata. In this presentation, we will briefly outline each project, and then present in more detail about the ‘Metadata Recommendations and Element Mappings’, and ‘Incentives for Improving Metadata’ projects (which includes the development of a metadata flow diagram) showing work to date, and inviting participation from attendees to help progress the work to more fully represent the librarian community. While the Metadata 2020 collaboration has many highly experienced individuals participating, we believe that it is important to learn from the experience of others who have worked on similar projects in the past, and would be grateful of the input from the DCMI community.
Lightweight rights modeling and linked data publication for online cultural heritage [Special Session]
Antoine Isaac, Mark Matienzo, Michael Steidl
Institutional websites and aggregation initiatives like Europeana and DPLA seek to facilitate access and re-use of vast amounts of digitized cultural material online. Metadata about digitized content has long been identified as a key asset to facilitate these ends, and these initiatives have created metadata frameworks that enhance interoperability across information spaces and systems. Expressing the conditions for re-use that derive from intellectual property rights remains an issue, however. Published (meta)datasets still often indicate copyrights and other access conditions using ad-hoc descriptions that specific to sectors, languages and national contexts. Creative Commons is a great leap forward, as it provides a standardized set of licenses and public domain marks that can be used to label open digital heritage resources in an interoperable way. Its focus on full openness, however, means that it cannot be used for a significant part of cultural collections published online. Recently, W3C has published the Open Digital Rights Language (ODRL) for representing policies that combine permissions and duties. While ODRL enables to express rights-related statements of arbitrary complexity, it does not provide a set of community-backed statements that can be reused out-of-the-box to label cultural resources. Rightsstatements.org is an international initiative that aims at filling these gaps, offering to the cultural heritage domain the resources to label in an interoperable way (using Linked Data technology) digitized objects that are not always in scope for full open publication. In this special session, we will present the challenges that RightsStatements.org has to address to provide a service useful to the digital heritage domain. After a discussion on the context and issues of expressing rights to access and re-use digital cultural material, we will present RightsStatements.org's offer as a complement to initiatives like Creative Commons. We will then dive in the details of implementation and use of the statements and services that RightsStatements.org provides. We will focus first on data modeling, presenting how rights statements are expressed in a lightweight and interoperable way, both for machines and humans, based on Linked Data principles and vocabularies. We will then relate our work with other relevant initiatives in the community, both in terms of (1) standardized and/or shareable sets of statements, including projects such as Wikidata, and (2) frameworks to express statements in a more complex way, such as W3C's ODRL. Finally we will seek to bridge with efforts to express rights and licenses in other domains relevant to the Dublin Core audience, such as in the (ongoing work on the) W3C DCAT vocabulary. For every main agenda item in the session, we have planned "interaction points", not only opening the floor to questions from the audience, but also questioning them on their experience with expressing intellectual property rights and other (non-)legal conditions, asking them feedback on the modeling choices made in Rightsstatements.org, evaluating the labeling of some objects in Europeana, or discussing how the community should further organize itself to tackle rights issues better, if needed.
LOD-KOS: A Framework for Private Enterprise Data as well as Public Open Data [Presentation]
Dave Clarke
Linked Open Data Knowledge Organization Systems (LOD-KOS) are defined by Dr. Zeng and Dr. Mayr in their 2018 paper Knowledge Organization Systems (KOS) in the Semantic Web: a multi-dimensional review as ‘value vocabularies and lightweight ontologies within the Semantic Web framework’. The paper surveys open data examples in the sciences and humanities and describes a community movement to convert and make sharable ‘thesauri, classification schemes, name authorities and lists of codes and terms, produced before the arrival of the ontology-wave’ into the ‘Semantic Web mainstream’. This session will review several examples of open data LOD-KOS, and then contrast them with examples of how commercial enterprises are currently using the Linked Data model to manage commercially sensitive enterprise data. The session will explore the practical challenges faced by any enterprise that has the requirement to manage a mixture of both public open data KOS resources and commercially sensitive KOS resources. The need emerges to support both collaboration and compartmentalization, and to do so flexibly. In order to do this KOS management systems need to support flexible and extensible access level controls (ACLs) and also assign ACL-Metadata to entities, predicates and data values. Both the public open data community and the private enterprise data community stand to benefit from a shared framework for curating KOS, and from mechanisms that will easily allow the selective sharing of some resources while protecting the confidentiality of others.
Metadata for Smart Sustainable Cities [Special Session]
Claudia Sousa Monteiro, Catarina Selada, Vera Nunes, Paula Monteiro, Ana Alice Baptista, João Tremoceiro
In the past years, the Smart City concept has emerged as a way of optimizing the management of resources use, mainly due to increasing urbanization and population growth. There are many definitions of this concept highlighting the central role that new digital technologies play in improving the cities operation. Under this scope, the Urban Analytics evolved as a new research field, looking to data as a mean to understand and study the urban systems by transforming data into information and knowledge. Therefore, there is a significant potential to improve the data collection, integration and processing efforts in what concerns the cities, with the successful use of linked data, depending on its ability to provide an overview of the city data at multiple scales and for different dimensions. In this sense, metadata has a central role in the cities data management, usefulness, and human/machine readability.This session has the goal to foster the collaboration between several city users and bring to the discussion the challenges and opportunities of data availability, access and applications in developing new and innovative solutions to make the cities smarter in enhancing the sustainable change.
The Use Of Persistent Identifiers In Dublin Core Metadata [Special Session]
Paul Walk, Tom Baker
This session will bring together stakeholders and metadata experts to discuss the representation of persistent identifiers (PIDs) in Dublin Core metadata, with a particular focus on the domain of scholarly communications and Open Access. This domain recognises the importance of PIDs in metadata - especially to identify scholarly outputs (using DOIs) and, increasingly, to identify authors (often using ORCIDs). The experiences and recommendations discussed here will almost certainly have wider applicability in many other domains. This session will be a working meeting. It will follow on from a project initiated by DCMI to develop some candidate recommendations. The anticipated outcome of this session will be a formal recommendation from DCMI.
Providing Access to Cultural Objects Curated in Digital Collections – Models and Profiles [Special Session]
Marcia Zeng, Shigeo Sugimoto, Chiranthi Wijesundara, Keven Liu, Cuijian Xia, Maja Žumer
This panel brings together researchers involved in the research and development (R&D) of structured data about information resources with a focus on cultural objects (mainly non-conventional) curated in digital collections. The uniqueness of these collections is that they are not constrained by physical location or the premises of an institution, thus aggregation and re-organization of metadata based on a common model is needed. This uniqueness is also reflected in the cases of digital collections with which the panelists have been involved, including: intangible cultural heritage in developing countries in South/Southeast Asia; Japanese pop-culture (particularly Manga); disaster archive records, genealogy records, ancient Chinese books, and music resources in general. The panelists will share the developments and research findings in three layes: 1, modeling for the domain in question, 2, extension and refinement of conceptual models, and 3, constructing of application profiles and knowledge bases, which are used in real object descriptions, as well as platform construction built on data models. The panelists will discuss challenges, processes, limitations, and strategies.
The times they are a changin' - implementing a modern library and information science curriculum [Presentation]
Magnus Pfeiffer
In the past decade, the consensus of what kind of competencies a graduate with a library and information science degree should have has started shifting. With the ongoing digitization of workflows and the creation of new online services, IT competencies have risen in demand.3 years ago, the school of library and information management at Stuttgart Media University started with the process of overhauling their curriculum in response to this change.The presentation will cover the challenges of integrating new IT subjects like programming, data management, database design and web-based services into a library science curriculum. It will discuss which competencies were considerd necessary for modern metadata management tasks and the didactic concepts that were developed to teach these to a heterohenous audience. As the changes are in effect for two years now, the results of an evaluation will also be presented.
Posters
Linked Data Publishing and Ontology in Korea Libraries [Poster]
Mihwa Lee, Yoonkyung Choi
This posters is to anylze the LOD publishing and reusing the external LODs, and to suggest the future direction for LOD services in Korea. This poster is to analyze the LOD publishing and reusing the external LODs, and to suggest the future direction for LOD service in Korea. For this study, literature reviews and case study are carried on. For case study, KERIS, NLK, and KISTI are selected, which are the major organizations related to the library linked data. They have been publishing the linked open data of bibliographic records and authority data with interlinking the external LOD such as VIAF, LDS, BNB, ISNI, WorldCat, and so on. We analyzed the characteristics of three services – (1) subject domain, (2) volumes of bibliographic, authority, and subject data, (3) bibliographic, name, and subject ontology, (4) local ontology, and (5) interlinking external LOD. As the result for comparing three LOD services in aspect of ontology, FOAF, SKOS, DC, BIBO are common for all, and but MODS, DCTERMS, BIBFRAME, PRISM, and Bibtex are different ontology. Also all services have their own ontology – properties and classes. These local property and class has not consistency and has potential conflict between ontology. In aspect requirements for metadata, interoperability is very important requirement. The reason that locals developed their own ontology is lack of classes and properties for describing data for constructing LOD. Therefore LC BIBFRAME is developed as specific ontology for library sector.
Author Identifier Analysis: Name Authority Control in Two Institutional Repositories [Poster]
Marina Morgan, Naomi Eichenlaub
The aim of this poster is to analyze name authority control in two institutional repositories to determine the extent to which faculty researchers are represented in researcher identifier databases. A purposive sample of 50 faculty authors from Florida Southern College (FSC) and Ryerson University (RU) were compared against five different authority databases: Library of Congress Name Authority File (LCNAF), Scopus, Open Researcher and Contributor ID (ORCID), Virtual International Authority File (VIAF), and International Standard Name Identifier (ISNI). We first analyzed the results locally, then compared them between the two institutions. The findings show that while LCNAF and Scopus results are comparable between the two institutions, the difference in the ORCID, VIAF, and ISNI are considerable. Additionally, the results show that the majority of authors at each institution are represented in two or three external databases. This has implications for enhancing local authority data by linking to external identifier authority data to augment institutional repository metadata.
Visualizing Library Metadata for Discovery [Poster]
Myung-Ja K. Han, Stephanie R. Baker, Peiyuan Zhao, Jiawei Li
Benefits of visualization have been discussed widely and it is actually implemented into library services. However, cases for visualization have been mostly focused on collection analysis to improve collection development policies and budget management, but not for discovery service that facilitates library’s catalog records in its maximum capacity. One of the challenges working with library catalog records for visualization is a sheer number of elements included in the MAchine-Readable Cataloging (MARC) format record, such as control field, data field, subfield, and indicators, used to describe library resources. As is well-known, there are more than 1,900 fields in the MARC, which is just too many to use for visualization (Moen and Benardino, 2003). Instead of showing a clear relationship between resources, it may muddle those relationships since there are so many elements to include in visualization. The question then is whether all information included in the library catalog record should be used for discovery and visualization services, and if not, which should be the essential information to be included.
Building a Framework to Encourage the use of Metadata in Modern Web-Design [Poster]
Jackson Morgan
When Tim Berners-Lee published the roadmap for the semantic web in 1998, it was a promising glimpse into what could be accomplished with a standardized metadata system, but nearly 20 years later, adoption of the semantic web has been less than stellar. In those years, web technology has changed drastically, and techniques for implementing semantic web compliant sites have become relatively inaccessible. This poster outlines a JavaScript framework called Beltline.js which seeks to encourage the use of metadata by making it easy to integrate into modern web best-practices.
Analysis of user-supplied metadata in a health sciences institutional repository [Poster]
Joelen Pastva
Launched in October, 2015 by the Galter Health Sciences Library, the DigitalHub repository is designed to capture and preserve the scholarly outputs of Northwestern Medicine. A major motivation to deposit in the repository is the possibility of improved citations and discovery of resources, however one of the largest barriers hampering discovery is a lack of descriptive metadata. Because DigitalHub was designed for ease of use, very minimal metadata is required in order to successfully deposit a resource. However, many optional descriptive metadata fields are also made available to encourage the consistent and detailed entry of descriptive information. The library was curious to evaluate how users were approaching available metadata fields and accompanying instructions prior to the library's performance of metadata enhancement operations. In order to evaluate user-supplied metadata, an export was made of all of the metadata in DigitalHub for a 2.5 year period. Records previously enhanced by librarians, or records initially deposited by library staff were excluded from consideration. The metadata was then evaluated for completeness, choice of dropdown terms for resource type, inclusion of collaborators, use of controlled vocabulary fields, and any areas that indicated a clear misunderstanding of the intended use of the metadata field. This poster presents the preliminary findings of this analysis of user-supplied metadata. It is hoped that the findings of this analysis will help guide future system and interface design decisions, cleanup activities, and library instruction activities. Ultimately the goal is to make the interface as usable and effective as possible to encourage depositors to supply an optimal amount of descriptive metadata upfront, and to continue using the repository in the future. These results should be of interest to repository managers that rely on users to supply initial descriptive metadata, especially for health sciences disciplines.
Workshops
Domain Specific Extensions for Machine-actionable Data Management Plans [Workshop]
João Cardoso, Tomasz Miksa
The current manifestation of a Data Management Plans (DMP) only contributes to the perception that DMP are an annoying administrative exercise. What they really are—or at least should be—is an integral part of research practice, since today most research across all disciplines involves data, code, and other digital components. There is now widespread recognition that, underneath, the DMP could have more thematic, machine-actionable richness with added value for all stakeholders: researchers, funders, repository managers, ICT providers, librarians, etc. As a result, parts of the DMP can be automatically generated and shared with other collaborators or funders. To achieve this goal we need: (1) good understanding of research data workflows, (2) research data management infrastructure, (3) common data model for machine-actionable DMP. In this workshop we will focus on the common data model for machine-actionable DMP and will seek to identify which domain specific extensions must be implemented to fulfill requirements of stakeholders, such as digital libraries and repositories. We will discuss which information they can provide and which information they can expect, and how existing and future systems and services can support and potentially automate this information flow. [more information]
18th European Networked Knowledge Organization Systems (NKOS) [Workshop]
The proposed joint NKOS workshop at TPDL2018 / DCMI2018 will explore the potential of KOS such as classification systems, taxonomies, thesauri, ontologies, and lexical databases, in the context of current developments and possibilities. These tools help to model the underlying semantic structure of a domain for purposes of information retrieval, knowledge discovery, language engineering, and the Semantic Web. The workshop provides an opportunity to discuss projects, research and development activities, evaluation approaches, lessons learned, and research findings. A further objective is to systematically engage in discussions in common areas of interest with selected related communities and to investigate potential cooperation. [more information]
Web Archive – An introduction to web archives for Humanities and Social Science research [Workshop]
Daniel Gomes, Jane Winters
We now have access to two decades of web archives, collected in different ways and at different times, which constitute an invaluable resource for the study of the late 20th and early 21st centuries. Researchers are only just beginning to explore the potential of these vast archives, and to develop the theoretical and methodological frameworks within which to study them, but recognition of that potential is becoming ever more widespread. This workshop seeks to explore the value of web archives for scholarly use, to highlight innovative research, to investigate the challenges and benefits of working with the archived web, to identify opportunities for incorporating web archives in learning and teaching, and to discuss and inform archival provision in all senses. [more information]
1st International Workshop on Reframing Research (RefResh) [Workshop]
Andrea Mannocci, Francesco Osborne, Paolo Manghi
Over the last decade, research has been scaling up in terms of publications, authors, contributing institutions and funded projects. Nowadays, research literature is estimated to round up 100-150 million publications with an annual increase rate around 1.5 million new publications. The fourth paradigm shift of science on one side, and the ever-increasing availability of data about research drivers and outcomes on the other have enabled scientists and researchers to “place the practice of science itself under the microscope” and dissect and analyse it in unprecedented ways. For these reasons, it is of paramount importance to study such an articulated, global-scale, evolving system in order to understand its dynamics, patterns, internal equilibria and interactions among diverse scientific actors. In particular, recent studies have proved that a holistic study of research as a complex phenomenon inserted in a delicate socioeconomic and geopolitical context, rather than as an isolated, context-unaware system, can provide a deeper insight on how research and researchers influence and are influenced by the world outside academia. Such analysis can provide answers to socioeconomic questions, frame academic research on a geopolitical canvas, provide insights on the factors that generate successful science, allocate better resources and therefore benefit from greater impact and efficacy. [more information]
Multi-domain Research Data Management: from metadata collection to data deposit [Workshop]
Ângela Lomba, João Aguiar Castro
Framed by the many initiatives pushing for Open Science, the Research Data Management workshop at TPDL/DCMI 2018 will offer participants an informal venue to benefit from and share experience with domain experts, dealing with practical data management issues, and to explore RDM open-source tools. Researchers are expected to participate and reflect on aspects relevant to the Open Science policies. The workshop is organized in two sessions. The first is dedicated to domain-specific challenges and perspectives, based on presentations and a round-table for a participated discussion. The second is a hands-on event, with participants collaborating  in a field experiment where they collect metadata with LabTablet, and then synchronize it with the Dendro data organization platform.
Internet of Things Workshop: Live Repositories of Streaming Data [Workshop]
Artur Rocha, Alexandre Valente Sousa, Joaquin Del Rio Fernandez, Hylke van der Schaaf
The workshop focuses on demonstrating a set of standards and good practices used in the Internet of Things and discussing how they can be used to leverage F.A.I.R. evidence-based Science. Implementations based on standards, such as the OGC Sensor Observation Service or the OGC SensorThings API, and well established open frameworks, such as the FIWARE, will be demonstrated. Participants will be given the opportunity to try out such tools and take part in moderated panels. [more information]
Metadata for Manufacturing [Workshop]
Ana Alice Baptista, João P. Mendonça, Paula Monteiro
The use of Linked Data (LD) in manufacturing has many potentialities for the diversity of products requiring technical description and interrelationship, either within the same sector or between industrial sectors or between manufacturing and other sectors of activity as, for example, logistics and trade. An obvious use is, for example, in the catalogs of parts or end products. The existence of this type of information in RDF potentially facilitates not only business-to-business but also business-to-consumer relationships, by making detailed searches, comparisons and product relationships across the Web much faster and more reliable. The potential use of LD principles and technologies in manufacturing goes well beyond catalogs. Business-to-business data sharing requires interoperability and independence of proprietary formats. If there are cases where there are standards that ensure it, others there are where the standards do not exist or are not sufficient to guarantee semantic interoperability without endangering industrial property rights. DCMI, as a central worldwide entity in the topic of metadata has a leading role in all Linked Data developments. It cannot, therefore, fail to keep up with, and in some cases even lead, the developments related to metadata in manufacturing. This special session is intended as a seed for the creation of a community or Special Interest Group of metadata for manufacturing within DCMI.
Tutorials and Hands-on Sessions
Linked Data Generation from Digital Libraries [Tutorial]
Anastasia Dimou, Pieter Heyvaert, Ben Demeester
Knowledge acquisition, modeling and publishing are important in digital libraries with large heterogeneous data sources to construct knowledge-intensive systems for the Semantic Web. Linked Data increases data shareability, extensibility and reusability. However, using Linking Data, as a means to represent knowledge, has proven to be easier said than done! During this tutorial, we will elaborate the importance of semantically annotating data and how existing technologies facilitate the generation of their corresponding Linked Data. We will introduce the [R2]RML, language(s) to generate Linked Data from heterogeneous data and non-Semantic Web experts will annotate their data with the RMLEditor which allows all underlying Semantic Web technologies to be invisible. In the end, participants, independently of their knowledge background, will have model, annotate and publish some Linked Data on their own! [more information]
Research the Past Web using Web archives [Tutorial]
Daniel Gomes, Daniel Bicho, Fernando Melo
The Web is the largest source of public information ever built. However, 80% of the web pages disappear or are changed to a different content within 1 year. The main objectives of this tutorial provided by the Arquivo.pt team are to motivate to the pertinence of web archiving, present use cases and share recommendations to create preservable websites for future access. The tutorial introduces tools to create and explore web archives and presents methods and technologies to develop web applications that automatically access and process information preserved in web archives, for instance using the Wayback Machine, Memento Time Travel protocol or the Arquivo.pt API. [more information]
Europeana hands-on session [Tutorial]
The Europeana REST API allows you to build applications that use the wealth of Europeana collections drawn from the major libraries, museums, archives, and galleries across Europe. The Europeana collections contain over 54 million cultural heritage items, from books and paintings to 3D objects and audiovisual material, that celebrate over 3,500 cultural institutions across Europe. Over the past couple of years, the Europeana REST API has grown beyond its initial scope as set out in September 2011, into a wide range of specialized APIs. At the moment, we offer several APIs that you can use to not only get the most out of Europeana but also to contribute back. This tutorial session will walk you through the wide range of APIs that Europeana now offers, followed by an hands-on session where you will be able to experience first hand what you can do with it. [more information]
Research the Past Web using Web archives [Tutorial]
Daniel Gomes, Daniel Bicho, Fernando Melo
The Web is the largest source of public information ever built. However, 80% of the web pages disappear or are changed to a different content within 1 year. The main objectives of this tutorial provided by the Arquivo.pt team are to motivate to the pertinence of web archiving, present use cases and share recommendations to create preservable websites for future access. The tutorial introduces tools to create and explore web archives and presents methods and technologies to develop web applications that automatically access and process information preserved in web archives, for instance using the Wayback Machine, Memento Time Travel protocol or the Arquivo.pt API. [more information]
DCMI Meetings
DCMI Governing Board Meeting [Closed meeting]
This is the annual meeting of the DCMI Governing Board. This is a closed meeting.
DCMI Open Community meeting [Open meeting]
This meeting is intended to allow anyone from the DCMI community to bring ideas for discussion. All are invited. The meeting will be facilitated in a very informal manner, in an unconference style, so bring your idea!.

Important Links

Twitter Stream

Using hashtag: #dcmi18