1 Introduction
2 Requirements for handling assets and metadata
-
a distributed ecosystem of individual storage entities which keep all data assets under the control of their respective owners, while facilitating a federated sharing, subject to individual usage conditions. Faced with the variety of possibly interesting data assets in agricultural scenarios, such system must be independent of the actual data formats.
-
a universally usable metadata system to describe content and usage of available data assets. Common understanding across various participants (even world-wide) needs formalized, computer-processable descriptions based on established, widely usable domain ontologies.
-
dynamic extension of descriptive information and data schema. To allow for the different viewpoints of participants in agricultural scenarios and to cope with varying and unforeseen needs of AI development, such metadata system needs to allow for dynamic addition of descriptive metadata by participants over time.
-
A formal definition of the classes and concepts describing the various information elements and data types is the basis for the elementary system functionalities. The properties contained in such definitions are indispensible for the functioning of the system and thus considered mandatory.
-
Further descriptive properties in any dataset’s metadata reflect the intended usage and interpretation considered by the data generators. While technically optional, the richness of such annotation is crucial for a wide usability.
-
As later data consumers’ interests might not be known at the time of data generation and annotation, the foundation of any metadata in a solid domain ontology (and thus the use of both a common vocabulary and represented domain knowledge) together with suitable mechanisms for browsing and semantic search are necessary. Taking into account the dynamic modifications over time, any interactive system has to provide means to dynamically adapt its user interfaces and data entry forms to the ever-changing ontological basis.
3 The Agri-Gaia federated basic architecture
4 Related Work
5 Technology decisions
6 Ontology-based metadata graph
6.1 Referenced Ontologies
-
vocabularies for describing people, organizations and contact information like vCard [13] and FOAF
-
spelling and/or language variants would have to be taken into account
-
multilingual search is limited to the languages of keywords assigned and search will not produce any output for datasets that have been annotated using another language
-
assets using synonyms of terms will not be found
6.2 Dynamic extensions: Growing the metadata space
6.2.1 Agri-Gaia Ontology
AgriDataResource
which encompasses all datasets related to agriculture. This class extends from the W3C standard Dataset
class from DCAT and DataResource
class from the Gaia‑X ontology gax-trust-framework. This ensures that any resource described in our ontology will conform to Gaia‑X standards and can also utilize the rich variety of classes and properties that are connected to DCAT. Additionally, following the open world RDF model, it can include additional attributes from other standards for a more versatile metadata description.AgriDataResource
class to three format specific classes, namely for image datasets, JSON datasets and CSV datasets meant for the agriculture domain (Fig. 2). When users want to provide metadata for these specific types of dataset, they will be prompted with a property list that is specific to the chosen type of dataset. For image datasets the number of images, resolution, image channels etc. might be more important but these properties do not apply to a dataset of CSV files. Furthermore, since the ontology is connected to Gaia‑X and other W3C standards, many of the generic properties can be reused to enhance the quality of metadata while also conforming to W3C and Gaia‑X standards. As Gaia‑X expands and we continue to work with our partners to gather more information on various types of resources and how to describe them, we can build on the existing ontology to accommodate more elements to expand our coverage for metadata description.
6.2.2 Applying Constraints
AgriImageDataResourceShape
for the class AgriImageDataResource
. Within that NodeShape, for each property of the class (e.g. imageCount) there will be a PropertyShape (e.g. imageCountShape). A subset of properties for AgriImageDataResource is shown in Listing Fig. 4
sh:property
without editing the ontology itself. This gives us full control of the design decisions about which attributes to show the external users for their inputs to describe their particular type of resource. Listing Fig. 5 shows an example of the shapes that were created for the class AgriImageDataResource
using the query:
imageCount
property and add constraints like the maximum value or the type (for example integer) for the property.