Invited Talk 1: Metadata in the Age of AI

Starts at
Mon, Nov 6, 2023, 11:00 South Korea Time
( 06 Nov 23 02:00 UTC )
Finishes at
Mon, Nov 6, 2023, 12:00 South Korea Time
( 06 Nov 23 03:00 UTC )
Venue
Gyeongha Hall 1
Moderator
Sam Oh

Moderator

  • Sam Oh

    Sunkyunkywan University and DCMI

    Sam Oh is a Distinguished Professor for Global Affairs at Sungkyunkwan University in Seoul Korea, the current executive director for the DCMI, and chairs the ISO/IEC JTC1/SC34 (Document Description & Processing Languages) and ISO TC46/SC9 (Identification & Description) committees. He represents the National Library of Korea on the DCMI Governing Board.

    His main research interest is in the area of metadata and ontology modeling. He has extensive experience in consulting companies and government sectors regarding design of metadata and ontologies. He taught courses such as database design, Web database design, designing XML and metadata schemas, ontology modeling, information architecture, and designing knowledge management systems.

    He received his Ph.D. in Information Science and Technology from Syracuse University, NY, USA in 1995 and worked for the Information School at the University of Washington for 4 years (1994-1998) prior to taking his current post.

Presentations

Properties, Classes, Concepts, Shapes, Vectors: Metadata in the Age of AI

Last November, AI and large language models (LLMs) emerged from the shadows of academia with the promise, or threat, of disruptively changing how we find information, learn, and solve problems. How does this fit with the evolution of metadata technology since the arrival, thirty years ago, of the World Wide Web? Modern metadata began with RDF, a model for expressing data as "graphs" of related entities with global identity (URIs). RDF provided the basis for "ontologies" -- carefully engineered models of logically related classes in support of automatic reasoning. "Linked Data" and "Simple Knowledge Organization System" (SKOS) achieved adoption at scale by promoting easier methods for building "knowledge graphs". Google became the dominant search engine by leveraging full-text indexing and weblink statistics to answer queries with relevant Web "hits". In November 2022, ChatGPT raised the bar for search engines by answering queries not with Web links, but with (ideally) well-formulated explanations. This talk offers a few guesses on how generative AI will change the nature of metadata. Modern metadata in the style of Dublin Core uses a "language of description", with RDF or OWL properties and classes along with SKOS concepts -- all identified with URIs -- to describe things in the world (metadata "descriptions"). It uses ShEx or SHACL "data shapes" ("application profiles") to describe those descriptions. Seen from the standpoint of neural AI networks, the terms of this language now appear as tokens in layered models of statistically weighted vectors. Models that have traditionally required explicit engineering are now "learned" from training data. Creators of metadata models can increasingly become trainers of AI models. Curated models built around controlled vocabularies can potentially improve the focus and reduce the financial and environmental costs of LLMs. On the example of "NALT in the Machine Age", a project of the USDA National Agricultural Library, this talk argues that intelligent use of AI can potentially improve the quality and scalability of information management in an environment where the sheer mass of information continues to expand while budgets for labor-intensive manual curation of that information continue to contract.

  • Tom Baker

    DCMI

    Tom Baker, DCMI Technology Director and Usage Board chair, has worked on metadata and Semantic Web since the 1990s. He helped publish SKOS in the 2000s and currently contributes to Shape Expressions language (ShEx). Tom consults on projects, most recently about data in agriculture. He has worked as a researcher in Italy and Germany, notably at Fraunhofer Society and Goettingen State Library. Tom has an MLS from Rutgers University, an MA and PhD from Stanford University, and has taught at Asian Institute of Technology (Bangkok) and Sungkyunkwan University (Seoul).