Integrating LLM, Semantic Perception and Spatial Reasoning for Improved Robot Action Control

Eladdachi, Oussama; Attal, Ferhat; Chibani, Abdelghani; Chibane, Ilies; Kenai, Imad-Eddine; Amirat, Yacine

doi:10.3233/FAIA251449

Abstract

Despite recent progress in integrating Large Language Models (LLMs) to enhance robotic capabilities, key challenges remain—particularly in interpreting perceptions, interacting with humans, and executing tasks in real-world environments. These challenges are largely due to LLM hallucinations and lack of formal semantic knowledge that hinder contextual understanding. To address this issue, we propose a semantic framework that combines an LLM and a commonsense knowledge graph to align the robot’s perceptions with a suitable task execution plan. The framework integrates pre-trained deep learning models for object detection, face identification, speech recognition, an LLM, and a local knowledge graph based on Resource Description Framework (RDF). The knowledge graph including scene and object description, as well spatial knowledge is used to select and execute the suitable task plan for the given context. Experimental results demonstrate the effectiveness of framework’s components in improving the robot’s interaction and and actuation in its environment.

This website uses cookies

This website uses cookies