Researcher Alaa El-Ebshihy was part of the first symposium on AI and robotics in Austria: “AIRoV” in Innsbruck. She took part in a workshop on “Knowledge Graphs and Neurosymbolic AI”. The presentation of an articles that she had written together with RSA FG Senior Researcher Florina Piroi and other colleagues from TU Wien was also on the agenda. The two researchers from the “Data Science” studio, El-Ebshihy and Piroi, focussed on “Extending Content-based Scientific Knowledge Graphs with Research Results”.
These Scientific Knowledge Graphs (SKG) are a structured representation of knowledge gained from scientific texts. They are used, for example, in research databases and libraries, but also in AI development and in the training of machine learning models to provide algorithms with structured knowledge. For example, they can be used to answer questions in natural language or to generate new hypotheses from existing data.
There are two main challenges with SKGs:
- SKGs mainly contain information that is extracted from the abstracts. As abstracts only represent a small part of the entire article, the information obtained is often incomplete.
- Due to the focus on summaries, often only certain parts of the article are taken into account, such as the methodology. Important other parts and especially the research results, which are described in the full text of the article, are often overlooked.
El-Ebshihy and her colleagues attempt to solve the problems of SKGs with a general framework. Their approach describes the process, the data and the techniques to enrich SKCs with more comprehensive information. It is about extracting the research results from the entire text of a scientific article and including them in the SKGs.
The authors have tested this approach on a small selection of articles from the same subject area and have shown the challenges of extracting the research results in full text. In addition, the article presents an investigation of LLMs (“Large Language Models”), which can automatically represent semantically important text components in computer-readable RDF format. RDF data is a type of structure used to store information in knowledge graphs.
In addition to publishing her results and research, the doctoral student from TU Wien was also able to learn a lot from other presentations. For example, how knowledge graphs can be used to detect hallucinations of LLMs. This refers to the phenomenon when language models such as ChatGPT invent information.
AIRoV 2024 was the joint symposium of the Austrian Society for Artificial Intelligence (ASAI), the Austrian Society of Measurement, Automation and Robot Technologies (GMAR), and the Austrian Association for Pattern Recognition (OAGM). Following a long tradition of individual and joint OAGM and ARW/GMAR workshops, this was the first joint symposium of the three closely-related associations.