Thesis start date: 10/01/2024
Thesis end date: 09/30/2027
Expected defense date: —
Abstract
The thesis aims to find alternatives to large language models (LLMs), characterized by a large number of parameters and/or a large number of symbols in their training corpus. The use of LLMs is synonymous with considerable energy expenditure, both in their training phase and in their usage phase (inference), and with a lack of transparency regarding the text produced. The objective of the thesis will be to demonstrate that knowledge graphs such as DBpedia, BabelNet, or ConceptNet can be a solution to these two problems. They are already widely used for question-answering tasks, despite notorious incompleteness in physical modeling (spatio-temporal reasoning, among others). The incompleteness of a large knowledge graph can be compensated by learning vector representations of the main concepts of the graph (its foundational ontology), whose geometric properties remain semantically interpretable.
The objective of the doctorate will be to develop a method for cost-effective learning of a vector representation of concepts and to produce a pre-trained language model from DBpedia (or similar knowledge graph). The pre-trained model can be used for spatio-temporal reasoning in an application related to cyber-physical systems.
Keywords
Knowledge graphs, Sustainability, Explainability, Neuro-symbolic integration
Relevant Sustainable Development Goals

