NKOS workshop

This session includes a 60-minute lunch break from 13:00 to 14:00.
Starts at
Sun, Oct 20, 2024, 09:00 EDT
Finishes at
Sun, Oct 20, 2024, 17:30 EDT
Venue
Room B
Moderator
Joseph Busch

Artificial intelligence (AI) is broadly defined as the use of automation to solve problems by reasoning autonomously. Today, the popular AI method is large language models (LLMs). But there are many other automation methods, such as rules-based, machine learning, vectors, n-grams, clustering, filtering, NLP (natural language processing), NLG (natural language generation), etc., that can make automation intelligent. While there is a tendency to focus on one primary method, most AI applications use several methods.

The NKOS Workshop is particularly interested in how knowledge organization systems (KOS) are being used or can be used to make automation intelligent. For example, one problem with LLMs is “hallucinations,” where the application generates a response to a prompt that is “correct” but not true. How can KOS be integrated with LLMs to guide their responses so that they do not produce “hallucinations”?

Moderator

  • Joseph Busch

    Taxonomy Strategies

    Mr. Busch is an authority in the field of information science, with an emphasis on helping organizations develop metadata frameworks and taxonomy strategies to ensure that content realizes its highest value through re-use and re-purposing. He has extensive knowledge and experience developing content architectures consisting of metadata frameworks, taxonomies and other information management methods to implement effective applications. He is currently on a full-time assignment as the senior business classification analyst for the African Development Bank which is based in Abidjan in the Côte d’Ivoire.

Presentations

Exploring Patient Perspectives on Anticipating and Mitigating Potential Harms of LLMs in Depression Self-Management

Large Language Models (LLMs), such as ChatGPT, are increasingly being integrated into healthcare, and their application in mental health, particularly in managing depression, presents both potential benefits and challenges. This study investigates how LLM-based chatbots can empower users by providing instant, personalized support while addressing the need for robust safety mechanisms in sensitive mental health contexts. Participants will engage in one-hour remote interviews, interacting with a ChatGPT API-powered chatbot focused on the self-management of depression. Reflexive thematic analysis will be used to identify themes related to user perceptions and potential harms. Anticipated outcomes include insights into the effectiveness of chatbots in managing depression, potential harms, and design implications for safe and effective LLMs for depression self- management. The findings aim to enhance knowledge organization systems within LLMs by improving the structuring and access of mental health information. Preliminary results will be presented at the workshop, showcasing the data collection webpage developed using the ChatGPT API.
  • Dong Whi Yoo

    Kent State University

    Dong Whi Yoo is a researcher and media artist who studies the interaction between society and emerging technologies. As a human-computer interaction (HCI) researcher, he explores design implications for emerging technologies, particularly for marginalized and underrepresented groups such as people with mental health disorders. Over the past few years, he has worked with individuals with schizophrenia to understand their underrepresentation in AI development and to design predictive algorithms that support their work practices. He investigates how people with psychotic disorders make sense of their symptoms and build their identities. His studies have been published in leading HCI and digital mental health venues, including CHI, CSCW, PervasiveHealth, JMIR and Internet Interventions.

Using Gene Ontology and ML Algorithms for Dataset Design and Creation for ML/AI Modeling

Authors: Qiaoyi Liu, Jian Qin

This demo proposal presents a case study that uses Gene Ontology (GO) and ML/AI algorithms to design and create KO-derived datasets for ML/AI applications. We discuss the characteristics and requirements of KO practices and products in implementing ML algorithms. The focus of this demo is on how knowledge organization systems can be utilized to derive datasets that can deliver quality and trustworthiness for achieving the precision, computing of semantic similarity, and interoperability in these algorithms.
  • Qiaoyi Liu

    Syracuse University

    I’m a PhD student studying Information Science and Technology at Syracuse University School of Information Studies. I'm also a member of the Metadata Lab. I have a MS degree in Library and Information Science (SU, G’23) and a BS degree in Biological Sciences (CNU, G’19). My research interests are Science of Science (SoS), Knowledge Organization Systems (KOS).
  • Jian Qin

    Syracuse University

    Jian Qin is Professor of the iSchool at Syracuse University. She conducts research in metadata, knowledge modeling and representation, ontologies, research collaboration networks, research impact assessment, and data curation. Jian Qin directs a Metadata Lab, a research group focusing on big metadata analytics and knowledge modeling. Her research has received funding from US NSF, NIH, IMLS, among others. She publishes widely with more than 100 journal and conference papers in the field of information science, scientometrics, knowledge organization, and metadata and been invited to give keynotes, lectures, and presentations at conferences and institutions inside and outside of the U.S. She is the co-author of the book Metadata and co-editor for several special journal issues on knowledge discovery in databases and knowledge representation. Jian Qin has served as the DCMI conference program chair and track chair and as the member/chair of numerous other conference program committees, including ASIST, iConference, JCDL, among others. She received the 2020 Frederick G. Kilgour Award for Research in Library and Information Technology. Jian Qin holds a Ph.D. from University of Illinois at Urbana-Champaign. Further information can be found at https://ischool.syr.edu/jian-qin/. A complete copy of CV can be found from https://jianqin.metadataetc.org/wp-content/uploads/2023/08/Qin_CV.pdf.

Evaluating AI Assignment of Library of Congress Subject Headings (LCSH)

Authors: Brian Dobreski, Christopher Hastings

As with many areas of research and practice, the cultural heritage domain has shown increasing interest in the use of AI in recent years, with cultural heritage institutions such as libraries, archives, and museums actively exploring the use of AI tools in their workflows. Large language model (LLM)-based text applications including ChatGPT have been touted as holding great promise for cultural heritage work. One of the most challenging parts of producing library metadata specifically may be subject cataloging: the assignment of subject headings and classification numbers. This task requires cataloger fluency in the formal and often complex knowledge organization systems (KOS) used to represent aboutness and genre in bibliographic records. The work presented here is part of a larger, ongoing research project assessing the effectiveness of AI tools for performing subject analysis and representation tasks for cultural heritage data. In this presentation, researchers offer the results of a structured test of freely available AI tools to assign headings from Library of Congress Subject Headings (LCSH) to library materials. The findings add further empirical evidence into current discussions concerning the quality and reliability of AI-performed metadata work, and, more broadly, contribute to the growing discourse around the use of AI in applying KOS.
  • Brian Dobreski

    University of Tennessee, Knoxville

    Brian Dobreski is an Assistant Professor in the School of Information Sciences at University of Tennessee-Knoxville. His research focuses on the practices and implications of knowledge and information organization, as well as the concepts of personhood and personal identity in information. Brian received his Ph.D. in information science from Syracuse University. He has authored works in publications including Journal of Documentation, Knowledge Organization, Cataloging & Classification Quarterly, Social Media + Society, Journal of Information Ethics, and Journal of Education for Library and Information Science.
  • Christopher Hastings

    University of Tennessee, Knoxville

    Christopher Hastings holds a B.A. in History from the University of California, San Diego. During
    and after his undergraduate studies he worked as a manuscript processor in the UCSD Special
    Collections and Archives. Currently, he is attending the University of Tennessee, Knoxville,
    pursuing a M.S. in Information Science. During his MSIS studies he assisted Dr. Brian Dobreski
    with research on the use of Artificial Intelligence for library cataloging. Hastings is involved with
    the Polar Libraries Colloquy and presented his own research on the mammoth ivory trade in
    Siberia at the 29th colloquy in Tromsø, Norway in June 2024.

Patent citation link prediction based on graph neural network

Authors: Wei Hu,Shuying Li,Ning Yang

Patent citation relationships constitute a citation network, and the predictability of edges in a network is a frontier research issue in complex networks. This article explores the prediction model of patent citation relationships. By integrating patent technical text content and classification code features, a graph neural network is trained for patent citation link prediction. These aim to provide methodology support for technology knowledge diffusion and patent data management. This study collects patent data in the field of quantum sensing, constructs a network based on patent citation relationships, and extracts text features such as technical problems, solutions, functions, and effects. This article proposes a new link prediction model framework based on graph neural networks, taking into account the characteristics of natural language in patent documents. Addressing the characteristics of natural language in patent literature, this article proposes a new model framework for link prediction based on graph neural networks.

In terms of model framework, we initially employ the GraphSAGE model on the training citation network to obtain the embedding vectors of patent nodes. Then, the semantic vectors of patent technical text are derived by pre-trained models such as PatentBERT. These two sets of vectors are then integrated and fed in a Random Forest model. Ultimately, we derive the predicted probability values for patent citation link prediction. Furthermore, in terms of interpretability, this study constructs a decision tree model based on the integrated results of the two sets of vectors. This model effectively measures the impact of multidimensional technical text content, local network structure, individual heterogeneity, and other factors on network edge formation.

  • Wei Hu

    National Science Library (Chengdu), Chinese Academy of Sciences

    Dr. Wei Hu is an Assistant Research Fellow at the National Science Library (Chengdu) within the Chinese Academy of Sciences. He obtained his Ph.D. in Statistics from the School of Statistics at Renmin University of China. Dr. Hu's research interests encompass complex network modeling, link prediction, text mining, and knowledge organization. His work has been featured in esteemed journals such as Computational Statistics & Data Analysis, Electronic Journal of Statistics, and Data Analysis and Knowledge Discovery.
  • Ning Yang

    National Science Library (Chengdu), Chinese Academy of Sciences

    Yang Ning is a Senior Engineer at the National Science Library (Chengdu), Chinese Academy of Sciences. He has been selected as a Distinguished Research Fellow at CAS. He currently serves as the Deputy Director of the Knowledge Systems Department, as well as the Deputy Director of the Sichuan Province Engineering Research Center for Intelligent Mining and Application of Scientific and Technological Information. He obtained his PhD degree in Management at the University of the Chinese Academy of Sciences and was a Visiting Scholar at the School of Information at Kent State University in the United States. He has long been engaged in research in the fields of information organization and utilization, knowledge mining and services, and scientific data management and application. He has led one project funded by the National Social Science Fund, published over 20 papers in core journals and academic conferences such as Scientometrics and Library and Information Service, co-authored two books, holds two authorized invention patents, and has three software copyrights. He also serves as a peer reviewer for multiple journals and conferences.

Leveraging Generative AI for Multilingual Thesaurus Development: Insights from the Confucius Ceremony Cultural Vocabulary

Generative artificial intelligence (GAI), particularly those based on large language models (LLMs), has become an increasingly important tool in digital humanities. It enhances research efficiency in tasks such as content analysis, keyword extraction, automated metadata creation, and data management, uncovering previously difficult-to-observe phenomena and tackling challenging issues. Beyond data generation, GAI’s rapid content analysis and knowledge structure design capabilities offer new exploratory directions for constructing and designing thesauri based on Knowledge Organization Systems (KOS). Using the multilingual "Art & Architecture Thesaurus" (AAT) developed by the Getty Research Institute (GRI) as an example, the Academia Sinica Center for Digital Cultures (ASCDC) has collaborated with GRI for over a decade to address the inadequacies of localized cultural vocabulary. The Chinese language and concepts of material culture are converted into English and integrated into the AAT through translation and mapping. During this process, the conceptual structure of controlled vocabularies in Chinese and English terms presents multiple alignment patterns, and a systematic methodology has been developed to support editorial work. This study aims to explore how GAI can assist in constructing a structured thesaurus based on the cultural conceptual vocabulary related to the Confucius Ceremony, with the goal of contributing this localized vocabulary to AAT.
  • Sophy Shu-Jiun Chen

    Academia Sinica

    Sophy Shu-Jiun Chen, Associate Research Fellow at Academia Sinica’s Institute of History and Philology, also serves as Executive Secretary of the Academia Sinica Center for Digital Cultures. She holds an M.A. in Information Studies from the University of Sheffield, UK, and a Ph.D. in Library and Information Science from National Taiwan University. Her research spans cultural heritage informatics, digital libraries, digital humanities, knowledge organization, and linked data. She initiated the Chinese AAT Taiwan project and established the Linked Open Data Lab at Academia Sinica.

The Ontology enhanced multimodal large language models for the Knowledge Organization and Representation of multi-modal cultural memory resources

The development of multimodal large language models(MLLMs) provides new solutions for knowledge organization and representation of multi-modal cultural memory resources. However, for the Knowledge Organization and Representation of some special cultural memory resources such as text, images, audio, and video resources related to the Guqin Subtractive Character Notation(see fig.1 and fig.2), the existing MLLMs need further optimization to achieve the expected results. Guqin Subtractive Character Notation is a distinct notation system rich in Chinese cultural significance, differing from both simplified notation and traditional staff notation.It shows the fingering techniques for Playing Guqin.It is not a Chinese character and can be recognized only by a very small number of professionals who have undergone long-term training.It cannot be recorgnized with the existing OCR technologies.
This study will use multi-modal Guqin Subtractive Character Notation resources as training data(see tab.1), and combine with Guqin ontology application profile and RDF data as prompt tuning data to explore a vertical application path of a MLLMs in the field of cultural heritage, and develop a prototype system to display the research results.The screen recording presented preliminary research results. By using the multimodal resources and Guqin ontology with RDF data as instruction fine-tuning data to fine tune the multimodal large language model, the cross modal retrieval with images and audio as input query can be achieved. The ultimate goal of this study is to use the optimized MLLMs to help more people understand the Guqin Subtractive Character Notation, especially those in the large collection of ancient books in the libraries.
  • Cuijuan XIA

    Shanghai Library

    Cuijuan(Jada) Xia is Researcher of Shanghai Library, team leader of Shanghai Library's Digital Humanities(DH) projects,senior DH Platform architect and KOS(knowledge organization system) designer. She has taken a mainly part in develop and design DH projects of Shanghai Library.She has collaborated with researchers engaged in digital humanities research in different fields of humanities.And She has participated in many research projects of different digital humanities research institutions. She hosts and participates in many national research projects. Her research focuses on Metadata, Ontology, Knowledge Organization, Linked Data, Digital Humanities, and Digital Memory. She has published 3 books and more than 90 papers in many academic journals. She is currently focusing on knowledge representation research of multimodal cultural memory resources for GenAI。E-mail: [email protected].

Using LLMs for Enriching Metadata with Links to KOS and Knowledge Graphs: Case Finnish Named Entity Linking

Authors: Rafael Leal, Annastiina Ahola, and Eero Hyvönen

This paper presents work on using Large Language Models (LLM) for disambiguating Named Entity Linking candidates, which is meant for enriching the metadata of textual documents by linking them to Knowledge Organization Systems, a.k.a domain ontologies, and Knowledge Graphs. We propose a zero-shot classification method that has similarities with Retrieval-Augmented Generation (RAG), and discuss an under-development prototype tool that allows for human intervention when making final disambiguation decisions, especially when this cannot be reliably carried out in automatic fashion. The focus of this work is on Finnish texts, so our methods must take into account the particularities of this language and the resources available for processing it.
  • Rafael Leal

    Aalto University, Department of Computer Science, Finland

    Rafael Leal's research interest lays on developing and using natural language processing technologies, such as large language models, for digital humanities research and applications.