Panel 4: Metadata Creation, Discovery, and AI Applications

Starts at
Wed, Nov 8, 2023, 10:30 South Korea Time
( 08 Nov 23 01:30 UTC )
Finishes at
Wed, Nov 8, 2023, 12:00 South Korea Time
( 08 Nov 23 03:00 UTC )
Venue
Gyeongha Hall 1
Moderator
Jian Qin
The rapid development in AI and machine learning (ML) is creating new excitements for almost all fields of learning and sectors of industry. Libraries, archives, and museums (LAMs) as cultural and social institutions are no exception. In fact, LAMs have long started the foundational work toward creating structured, semantically rich (meta)data to embrace the disruptive changes brought about by AI and ML, which will impact LAM institutions in many ways. Metadata creation and discovery as the core area of LAM work is quietly changing from traditional records of text strings to new data structures, workflows, and metadata products. Linked data, ontology models, and metadata for digitized and born digital assets form the established new services and operations in LAM institutions. There are issues to be explored and challenges to be addressed in how LAM institutions can take the full advantages of AI and ML to develop new data models and workflows. This panel presents three projects that explore the application of knowledge models/ontologies and AI & ML techniques in representing and enhancing the power of metadata in information discovery and use/reuse.

Moderator

  • Jian Qin

    Syracuse University

    Jian Qin is Professor at the School of Information Studies, Syracuse University. Her research focuses on metadata and knowledge modeling, knowledge organization, research data management, and scholarly communication. She has published widely and given presentations at numerous national and international conferences and workshops. Her research has been funded by the U.S. National Science Foundation, U.S. National Institutes for Health, and Institute for Museum and Library Services, among others. Jian Qin is a co-author of the Metadata book and the recipient of the 2020 Frederick G. Kilgour Award for Research in Library and Information Technology.

Presentations

Knowledge Organization in Digital Humanities Research: A Case Study of Archives of Qing Secret Societies

This report examines knowledge organization involved in the process of digital humanities research, using " The Secret Societies in Qing China: Archival Studies and Digital Humanities" project as a case study. The study will focus on the following aspects: (1) Transform the preliminary research questions of historians into competency questions that are machine-processible and analyzable. Further develop the domain ontology, as well as develop a SPARQL query template for use in Linked Open Data. (2) Define Qing dynasty official documents as a specific genre and deconstructing important features of this type of textual content from various perspectives such as form, content, and function. This serves as the design foundation for metadata format, authority files and ontology. (3) Explore ways for human-machine collaboration by using tools such as the text analysis system and ChatGPT, to assist in the NER (Named Entity Recognition) of related subjects, properties and objects within the archival content, as well as the extraction of roles of semantic relationships between entities. This can also help to strengthen the task of automatic metadata construction to be able to identify and integrate entity information from different archival sources.

  • Shu-Jiun (Sophy) Chen

    Academia Sinica Center for Digital Culture

    Shujun (Sophy) Chen holds a PhD in library and information science from National Taiwan University and a master's degree in information science from the University of Sheffield, UK. She is the executive secretary of the Digital Culture Center of Academia Sinica and has been conducting vocabulary research projects with the Getty Research Institute for a long time. Her research interests includes knowledge organization, linked data, and digital humanities. She specializes in multilingual indexes for knowledge-based digital collection systems and focuses her research on the theory and methods of linking digital resources with ontologies for cultural memory institutions. Dr. Chen has published widely and given numerous presentations in the knowledge organization and digital humanities areas.

Towards AI/ML-Friendly Metadata: A Metadata Architecture for Effective AI/ML Dataset Management

In AI/ML, data serves as the raw material from which models learn and make predictions. Training AI/ML models demands substantial quantities of high-quality data. Regardless of the algorithm's sophistication, realizing its full potential hinges on the presence of a robust data infrastructure. This data infrastructure is composed of AI/ML datasets. An AI/ML dataset is a collection of data used to train and evaluate artificial intelligence (AI) and machine learning (ML) algorithms. For AI/ML projects, datasets may be built from scratch, derived from large datasets, such as library collections, or reused from existing pre-packaged datasets. Effectively managing them is crucial to ensure that AI and ML models are trained on high-quality, representative data, which is essential for achieving accurate and reliable results in AI/ML applications.

An AI/ML-friendly metadata architecture is critical for effective dataset management. An AI/ML friendly metadata architecture needs to meet the following basic technical requirements: 1) Furnishing structured information about data content, sources, and context to enhance dataset discovery; 2) Precisely specifying the dataset's size; 3) Enumerating and defining labels comprehensively; 4) Clearly identifying data types; 5) Offering comprehensive insights into data preprocessing, cleaning, and transformation procedures; 6) Delivering in-depth details about data scope, including contents, timeframes, and geographic coverage.

A practical use case will be presented for developing a metadata solution for a dataset that is used in an AI/ML-powered project for detecting illustration objects within Chinese rare books.

  • Haiqing Lin

    C.V. Starr East Asian Library, UC Berkeley

    Mr. Lin Haiqing is the director of the East Asian Library Technology Department at the University of California, Berkeley. Before joining the University of California, he served as the head of the Asian Language Department and the Asian Studies Subject Librarian at the University of Auckland, New Zealand. His research interests focus on digital libraries and the application of new information technologies in library services, especially how the development of network-based technologies promotes the transformation of academic libraries.

Metadata and Trustworthy AI

As powerful and useful as AI and ML tools, there are risks in applying them if they are not sufficiently explainable, robust, transparent, or not stick to the fairness and privacy principles, which can seriously jeopardize the trustworthiness of AI. Metadata play an important role in capturing and documenting the data (input & output), parameters, models, environment configurations, and versions of data and models. All of these contribute to the trustworthiness of AI. This presentation reviews the requirements for trustworthy AI as well as metadata categories that have been used in representing trustworthy AI. Through examples, issues and challenges will be discussed for AI applications in metadata generation/creation.

  • Jian Qin

    Syracuse University

    Jian Qin is Professor at the School of Information Studies, Syracuse University. Her research focuses on metadata and knowledge modeling, knowledge organization, research data management, and scholarly communication. She has published widely and given presentations at numerous national and international conferences and workshops. Her research has been funded by the U.S. National Science Foundation, U.S. National Institutes for Health, and Institute for Museum and Library Services, among others. Jian Qin is a co-author of the Metadata book and the recipient of the 2020 Frederick G. Kilgour Award for Research in Library and Information Technology.