nach oben

2024 | Buch

The Semantic Web

21st International Conference, ESWC 2024, Hersonissos, Crete, Greece, May 26–30, 2024, Proceedings, Part I

herausgegeben von: Albert Meroño Peñuela, Anastasia Dimou, Raphaël Troncy, Olaf Hartig, Maribel Acosta, Mehwish Alam, Heiko Paulheim, Pasquale Lisena

Verlag: Springer Nature Switzerland

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

The two-volume set LNCS 14664 and 14665 constitutes the refereed proceedings of the 21st International Conference on The Semantic Web, ESWC 2024, held in Hersonissos, Crete, Greece, during May 26-30, 2024.

The 32 full papers presented were carefully reviewed and selected from 138 submissions. They focus on all aspects of theoretical, analytical, and empirical aspects of the semantic web, semantic technologies, knowledge graphs and semantics on the web in general.

Inhaltsverzeichnis

Frontmatter

Research

Frontmatter

Do Similar Entities Have Similar Embeddings?

Abstract

Knowledge graph embedding models (KGEMs) developed for link prediction learn vector representations for entities in a knowledge graph, known as embeddings. A common tacit assumption is the KGE entity similarity assumption, which states that these KGEMs retain the graph’s structure within their embedding space, i.e., position similar entities within the graph close to one another. This desirable property make KGEMs widely used in downstream tasks such as recommender systems or drug repurposing. Yet, the relation of entity similarity and similarity in the embedding space has rarely been formally evaluated. Typically, KGEMs are assessed based on their sole link prediction capabilities, using ranked-based metrics such as Hits@K or Mean Rank. This paper challenges the prevailing assumption that entity similarity in the graph is inherently mirrored in the embedding space. Therefore, we conduct extensive experiments to measure the capability of KGEMs to cluster similar entities together, and investigate the nature of the underlying factors. Moreover, we study if different KGEMs expose a different notion of similarity. Datasets, pre-trained embeddings and code are available at: https://github.com/nicolas-hbt/similar-embeddings/.

Nicolas Hubert, Heiko Paulheim, Armelle Brun, Davy Monticolo

Treat Different Negatives Differently: Enriching Loss Functions with Domain and Range Constraints for Link Prediction

Abstract

Knowledge graph embedding models (KGEMs) are used for various tasks related to knowledge graphs (KGs), including link prediction. They are trained with loss functions that consider batches of true and false triples. However, different kinds of false triples exist and recent works suggest that they should not be valued equally, leading to specific negative sampling procedures. In line with this recent assumption, we posit that negative triples that are semantically valid w.r.t. signatures of relations (domain and range) are high-quality negatives. Hence, we enrich the three main loss functions for link prediction such that all kinds of negatives are sampled but treated differently based on their semantic validity. In an extensive and controlled experimental setting, we show that the proposed loss functions systematically provide satisfying results which demonstrates both the generality and superiority of our proposed approach. In fact, the proposed loss functions (1) lead to better MRR and Hits@10 values, and (2) drive KGEMs towards better semantic correctness as measured by the Sem@K metric. This highlights that relation signatures globally improve KGEMs, and thus should be incorporated into loss functions. Domains and ranges of relations being largely available in schema-defined KGs, this makes our approach both beneficial and widely usable in practice.

Nicolas Hubert, Pierre Monnin, Armelle Brun, Davy Monticolo

QAGCN: Answering Multi-relation Questions via Single-Step Implicit Reasoning over Knowledge Graphs

Abstract

Multi-relation question answering (QA) is a challenging task, where given questions usually require long reasoning chains in KGs that consist of multiple relations. Recently, methods with explicit multi-step reasoning over KGs have been prominently used in this task and have demonstrated promising performance. Examples include methods that perform stepwise label propagation through KG triples and methods that navigate over KG triples based on reinforcement learning. A main weakness of these methods is that their reasoning mechanisms are usually complex and difficult to implement or train. In this paper, we argue that multi-relation QA can be achieved via end-to-end single-step implicit reasoning, which is simpler, more efficient, and easier to adopt. We propose QAGCN — a Question-Aware Graph Convolutional Network (GCN)-based method that includes a novel GCN architecture with controlled question-dependent message propagation for the implicit reasoning. Extensive experiments have been conducted, where QAGCN achieved competitive and even superior performance compared to state-of-the-art explicit-reasoning methods. Our code and pre-trained models are available in the repository: https://github.com/ruijie-wang-uzh/QAGCN.

Ruijie Wang, Luca Rossetto, Michael Cochez, Abraham Bernstein

Leveraging Pre-trained Language Models for Time Interval Prediction in Text-Enhanced Temporal Knowledge Graphs

Abstract

Most knowledge graph completion (KGC) methods rely solely on structural information, even though a large number of publicly available KGs contain additional temporal (validity time intervals) and textual data (entity descriptions). While recent temporal KGC methods utilize time information to enhance link prediction, they do not leverage textual descriptions or support inductive inference (prediction for entities that have not been seen during training).

In this work, we propose a novel framework called TEMT that exploits the power of pre-trained language models (PLMs) for temporal KGC. TEMT predicts time intervals of facts by fusing their textual and temporal information. It also supports inductive inference by utilizing PLMs. In order to showcase the power of TEMT, we carry out several experiments including time interval prediction, both in transductive and inductive settings, and triple classification. The experimental results demonstrate that TEMT is competitive with the state-of-the-art, while also supporting inductiveness.

Duygu Sezen Islakoglu, Melisachew Wudage Chekol, Yannis Velegrakis

A Language Model Based Framework for New Concept Placement in Ontologies

Abstract

We investigate the task of inserting new concepts extracted from texts into an ontology using language models. We explore an approach with three steps: edge search which is to find a set of candidate locations to insert (i.e., subsumptions between concepts), edge formation and enrichment which leverages the ontological structure to produce and enhance the edge candidates, and edge selection which eventually locates the edge to be placed into. In all steps, we propose to leverage neural methods, where we apply embedding-based methods and contrastive learning with Pre-trained Language Models (PLMs) such as BERT for edge search, and adapt a BERT fine-tuning-based multi-label Edge-Cross-encoder, and Large Language Models (LLMs) such as GPT series, FLAN-T5, and Llama 2, for edge selection. We evaluate the methods on recent datasets created using the SNOMED CT ontology and the MedMentions entity linking benchmark. The best settings in our framework use fine-tuned PLM for search and a multi-label Cross-encoder for selection. Zero-shot prompting of LLMs is still not adequate for the task, and we propose explainable instruction tuning of LLMs for improved performance. Our study shows the advantages of PLMs and highlights the encouraging performance of LLMs that motivates future studies.

Hang Dong, Jiaoyan Chen, Yuan He, Yongsheng Gao, Ian Horrocks

Low-Dimensional Hyperbolic Knowledge Graph Embedding for Better Extrapolation to Under-Represented Data

Abstract

Past works have shown knowledge graph embedding (KGE) methods learn from facts in the form of triples and extrapolate to unseen triples. KGE in hyperbolic space can achieve impressive performance even in low-dimensional embedding space. However, existing work limitedly studied extrapolation to under-represented data, including under-represented entities and relations. To this end, we propose HolmE, a general form of KGE method on hyperbolic manifolds. HolmE addresses extrapolation to under-represented entities through a special treatment of the bias term, and extrapolation to under-represented relations by supporting strong composition. We provide empirical evidence that HolmE achieves promising performance in modelling unseen triples, under-represented entities, and under-represented relations. We prove that mainstream KGE methods either: (1) are special cases of HolmE and thus support strong composition; (2) do not support strong composition. The code and data are open-sourced at https://github.com/nsai-uio/HolmE-KGE.

Zhuoxun Zheng, Baifan Zhou, Hui Yang, Zhipeng Tan, Arild Waaler, Evgeny Kharlamov, Ahmet Soylu

SC-Block: Supervised Contrastive Blocking Within Entity Resolution Pipelines

Abstract

Millions of websites use the schema.org vocabulary to annotate structured data describing products, local businesses, or events within their HTML pages. Integrating schema.org data from the Semantic Web poses distinct requirements to entity resolution methods: (1) the methods must scale to millions of entity descriptions and (2) the methods must be able to deal with the heterogeneity that results from a large number of data sources. In order to scale to numerous entity descriptions, entity resolution methods combine a blocker for candidate pair selection and a matcher for the fine-grained comparison of the pairs in the candidate set. This paper introduces SC-Block, a blocking method that uses supervised contrastive learning to cluster entity descriptions in an embedding space. The embedding enables SC-Block to generate small candidate sets even for use cases that involve a large number of unique tokens within entity descriptions. To measure the effectiveness of blocking methods for Semantic Web use cases, we present a new benchmark, WDC-Block. WDC-Block requires blocking product offers from 3,259 e-shops that use the schema.org vocabulary. The benchmark has a maximum Cartesian product of 200 billion pairs of offers and a vocabulary size of 7 million unique tokens. Our experiments using WDC-Block and other blocking benchmarks demonstrate that SC-Block produces candidate sets that are on average 50% smaller than the candidate sets generated by competing blocking methods. Entity resolution pipelines that combine SC-Block with state-of-the-art matchers finish 1.5 to 4 times faster than pipelines using other blockers, without any loss in F1 score.

Alexander Brinkmann, Roee Shraga, Christina Bizer

Navigating Ontology Development with Large Language Models

Abstract

Ontology engineering is a complex and time-consuming task, even with the help of current modelling environments. Often the result is error-prone unless developed by experienced ontology engineers. However, with the emergence of new tools, such as generative AI, inexperienced modellers might receive assistance. This study investigates the capability of Large Language Models (LLMs) to generate OWL ontologies directly from ontological requirements. Specifically, our research question centres on the potential of LLMs in assisting human modellers, by generating OWL modelling suggestions and alternatives. We experiment with several state-of-the-art models. Our methodology incorporates diverse prompting techniques like Chain of Thoughts (CoT), Graph of Thoughts (GoT), and Decomposed Prompting, along with the Zero-shot method. Results show that currently, GPT-4 is the only model capable of providing suggestions of sufficient quality, and we also note the benefits and drawbacks of the prompting techniques. Overall, we conclude that it seems feasible to use advanced LLMs to generate OWL suggestions, which are at least comparable to the quality of human novice modellers. Our research is a pioneering contribution in this area, being the first to systematically study the ability of LLMs to assist ontology engineers.

Mohammad Javad Saeedizade, Eva Blomqvist

ESLM: Improving Entity Summarization by Leveraging Language Models

Abstract

Entity summarizers for knowledge graphs are crucial in various applications. Achieving high performance on the task of entity summarization is hence critical for many applications based on knowledge graphs. The currently best performing approaches integrate knowledge graphs with text embeddings to encode entity-related triples. However, these approaches still rely on static word embeddings that cannot cover multiple contexts. We hypothesize that incorporating contextual language models into entity summarizers can further improve their performance. We hence propose ESLM (Entity Summarization using Language Models), an approach for enhancing the performance of entity summarization that integrates contextual language models along with knowledge graph embeddings. We evaluate our models on the datasets DBpedia and LinkedMDB from ESBM version 1.2, and on the FACES dataset. In our experiments, ESLM achieves an F-measure of up to 0.591 and outperforms state-of-the-art approaches in four out of six experimental settings with respect to the F-measure. In addition, ESLM outperforms state-of-the-art models in all experimental settings when evaluated using the NDCG metric. Moreover, contextual language models notably enhance the performance of our entity summarization model, especially when combined with knowledge graph embeddings. We observed a notable boost in our model’s efficiency on DBpedia and FACES. Our approach and the code to rerun our experiments are available at https://github.com/dice-group/ESLM.

Asep Fajar Firmansyah, Diego Moussallem, Axel-Cyrille Ngonga Ngomo

Explanation of Link Predictions on Knowledge Graphs via Levelwise Filtering and Graph Summarization

Abstract

Link Prediction methods aim at predicting missing facts in Knowledge Graphs (KGs) as they are inherently incomplete. Several methods rely on Knowledge Graph Embeddings, which are numerical representations of elements in the Knowledge Graph. Embeddings are effective and scalable for large KGs; however, they lack explainability.Kelpie is a recent and versatile framework that provides post-hoc explanations for predictions based on embeddings by revealing the facts that enabled them. Problems have been recognized, however, with filtering potential explanations and dealing with an overload of candidates. We aim at enhancing Kelpie by targeting three goals: reducing the number of candidates, producing explanations at different levels of detail, and improving the effectiveness of the explanations. To accomplish them, we adopt a semantic similarity measure to enhance the filtering of potential explanations, and we focus on a condensed representation of the search space in the form of a quotient graph based on entity types. Three quotient formulations of different granularity are considered to reduce the risk of losing valuable information. We conduct a quantitative and qualitative experimental evaluation of the proposed solutions, using Kelpie as a baseline.

Roberto Barile, Claudia d’Amato, Nicola Fanizzi

Large Language Models for Scientific Question Answering: An Extensive Analysis of the SciQA Benchmark

Abstract

The SciQA benchmark for scientific question answering aims to represent a challenging task for next-generation question-answering systems on which vanilla large language models fail. In this article, we provide an analysis of the performance of language models on this benchmark including prompting and fine-tuning techniques to adapt them to the SciQA task. We show that both fine-tuning and prompting techniques with intelligent few-shot selection allow us to obtain excellent results on the SciQA benchmark. We discuss the valuable lessons and common error categories, and outline their implications on how to optimise large language models for question answering over knowledge graphs.

Jens Lehmann, Antonello Meloni, Enrico Motta, Francesco Osborne, Diego Reforgiato Recupero, Angelo Antonio Salatino, Sahar Vahdati

Efficient Evaluation of Conjunctive Regular Path Queries Using Multi-way Joins

Abstract

Recent analyses of real-world queries show that a prominent type of queries is that of conjunctive regular path queries. Despite the increasing popularity of this type of queries, only limited efforts have been invested in their efficient evaluation. Motivated by recent results on the efficiency of worst-case optimal multi-way join algorithms for the evaluation of conjunctive queries, we present a novel multi-way join algorithm for the efficient evaluation of conjunctive regular path queries. The hallmark of our algorithm is the evaluation of the regular path queries found in conjunctive regular path queries using multi-way joins. This enables the exploitation of regular path queries in the planning steps of the proposed algorithm, which is crucial for the algorithm’s efficiency, as shown by the results of our detailed evaluation using the Wikidata-based benchmark WDBench. The results of this evaluation also show that our approach achieves a value of query mixes per hour that is 4.3 higher than the state of the art and that it outperforms all of the competing graph storage solutions in almost 70% of the benchmark’s queries.

Nikolaos Karalis, Alexander Bigerl, Liss Heidrich, Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo

Can Contrastive Learning Refine Embeddings

Abstract

Recent advancements in contrastive learning have revolutionized self-supervised representation learning and achieved state-of-the-art performance on benchmark tasks. While most existing methods focus on applying contrastive learning on input data modalities like images, natural language sentences, or networks, they overlook the potential of utilizing output from previously trained encoders. In this paper, we introduce SimSkip, a novel contrastive learning framework that specifically refines the input embeddings for downstream tasks. Unlike traditional unsupervised learning approaches, SimSkip takes advantage of the output embedding of encoder models as its input. Through theoretical analysis, we provide evidence that applying SimSkip does not lead to larger upper bounds on downstream task errors than that of the original embedding which is SimSkip’s input. Experiment results on various open datasets demonstrate that the embedding by SimSkip improves the performance on downstream tasks.

Lihui Liu, Jinha Kim, Vidit Bansal

In-Use

Frontmatter

Automation of Electronic Invoice Validation Using Knowledge Graph Technologies

Abstract

Invoicing is a crucial part of any business’s financial and administrative activities. Nowadays, invoicing is handled in the form of Electronic Data Interchange (EDI), where invoices are managed in a standardized electronic or digital format rather than on paper. In this context, EDI increases the efficiency of creating, distributing, and processing invoices. The most used standard for representing electronic invoices is EDIFACT. Yet, the validation of EDIFACT invoices is not standardized. In this work, we tackle the problem of automatically validating electronic invoices in the EDIFACT format by leveraging KG technologies. The core of our proposed solution consists of representing EDIFACT invoices as RDF knowledge graphs (KGs). We developed an OWL ontology to model EDIFACT terms with semantic descriptions. The invoice KG can be validated using SHACL constraints acquired from domain experts. We evaluated our ontology and invoice validation process. The results show that our proposed solution is complete, correct, and efficient, and significantly undercuts the efforts of current human evaluation.

Johannes Mäkelburg, Christian John, Maribel Acosta

Towards Cyber Mapping the German Financial System with Knowledge Graphs

Abstract

The increasing outsourcing by financial intermediaries intensifies the interconnection of the financial system with third-party providers. Concentration risks can materialize and threaten financial stability if these third-party providers are affected by cyber incidents. With the goal of preserving financial stability, regulators are interested in tracing cyber incidents efficiently. One method to achieve this is cyber mapping, which allows them to analyze the connections between the financial network and the cyber network. In this paper, a provenance-aware knowledge graph is constructed to model this kind of mapping for investment funds which are part of the German financial system. As a first application, we provide a front-end for analyzing the funds’ outsourcing behaviors. In a user study with ten experts, we evaluate and show the application’s usability and usefulness. Time estimations for certain scenarios indicate our application’s potential to reduce time and effort for supervisors. Especially for complex analysis tasks, our cyber mapping solution could provide benefits for cyber risk monitoring.

Markus Schröder, Jacqueline Krüger, Neda Foroutan, Philipp Horn, Christoph Fricke, Ezgi Delikanli, Heiko Maus, Andreas Dengel

Integrating Domain Knowledge for Enhanced Concept Model Explainability in Plant Disease Classification

Abstract

Deep learning-based plant disease detection has seen promising advancements, particularly in its remarkable ability to identify diseases through digital images. Nevertheless, these systems’ opacity and lack of transparency, which often offer no human-interpretable explanations for their predictions, raise concerns with respect to their robustness and reliability. While many methods have attempted post-hoc model explainability, few have specifically targeted the integration and impact of domain knowledge. In this study, we propose a novel framework that combines a tomato disease ontology with the concept explainability method Testing with Concept Activation Vectors (TCAV). Unlike the original TCAV method, which required users to gather diverse image concepts manually, our approach automates the creation of images based on relevant concepts used by domain experts in plant disease identification. This not only simplifies the concept collection and labelling process but also reduces the burden on users with limited domain knowledge, ultimately mitigating potential biases in concept selection. Besides automating the concept image generation for the TCAV method, our framework gives insights into the significance of disease-related concepts identified through the ontology in the deep learning model decision-making process. Consequently, our approach enhances the efficiency and interpretability of the model’s diagnostic capabilities, promising a more trustworthy and reliable disease detection model.

Jihen Amara, Sheeba Samuel, Birgitta König-Ries

Generative Expression Constrained Knowledge-Based Decoding for Open Data

Abstract

In this paper, we present GECKO, a knowledge graph question answering (KGQA) system for data from Statistics Netherlands (Centraal Bureau voor de Statistiek). QA poses great challenges in means of generating relevant answers, as well as preventing hallucinations. This is a phenomenon found in language models and creates issues when attempting factual QA with these models alone. To overcome these limitations, the Statistics Netherlands’ publicly available OData4 data was used to create a knowledge graph, in which the answer generation decoding process is grounded, ensuring faithful answers. When processing a question, GECKO performs entity and schema retrieval, does schema-constrained expression decoding, makes assumptions where needed and executes the generated expression as an OData4 query to retrieve information. A novel method was implemented to perform the constrained knowledge-based expression decoding using an encoder-decoder model. Both a sparse and dense entity retrieval method were evaluated. While the encoder-decoder model did not achieve production-ready performance, experiments show promising results for a rule-based baseline using a sparse entity retriever. Additionally, the results of qualitative user testing were positive. We therefore formulate recommendations for deployment help guide users of Statistics Netherlands data to their answers more quickly.

Lucas Lageweg, Benno Kruit

OntoEditor: Real-Time Collaboration via Distributed Version Control for Ontology Development

Abstract

In today’s remote work environment, the demand for real-time collaborative tools has surged. Our research targets efficient collaboration among knowledge engineers and domain experts in Ontology development. We developed a web-based tool for real-time collaboration, compatible with GitLab, GitHub, and Bitbucket. To tackle the challenge of concurrent modifications leading to potential inconsistencies, we integrated an Operational Transformation-based real-time database. This integration enables multiple users to concurrently collaborate to build and edit their ontologies, ensuring both consistency and atomicity. Furthermore, our tool enhances user experience by providing meaningful syntax error messages for ontologies expressed in various RDF serialization formats. This streamlined the manual correction process. Additionally, we established a reliable synchronization channel for users to allow pulling and committing changes to distributed repositories for their developed ontologies. Yielding promising results, our evaluation focused on two key aspects: first, assessing the tool’s collaborative editing consistency via an automated typing script; second, conducting a comprehensive user study to evaluate its features and compare its functionalities with similar tools.

Ahmad Hemid, Waleed Shabbir, Abderrahmane Khiat, Christoph Lange, Christoph Quix, Stefan Decker

Backmatter

Titel: The Semantic Web
herausgegeben von: Albert Meroño Peñuela
Anastasia Dimou
Raphaël Troncy
Olaf Hartig
Maribel Acosta
Mehwish Alam
Heiko Paulheim
Pasquale Lisena
Verlag: Springer Nature Switzerland
Electronic ISBN: 978-3-031-60626-7
Print ISBN: 978-3-031-60625-0
DOI: https://doi.org/10.1007/978-3-031-60626-7