Interaction with a humanoid robot through a conversational interface using DeepSeek

Autores/as

Palabras clave:

Human-robot interaction, Chat completion, Large language model, Text-to-speech, Automatic speech recognition, Edge computing, Humanoid robotics

Resumen

This work presents an application that enables users to hold a conversation with a chat agent, aimed to be launched aboard a humanoid robot equipped with audio sink and source hardware. It was designed to fulfill two constraints: all components are backed up by free and open-source software, and all computations can be carried out offline, either locally or offloaded to a dedicated edge server on the robot itself. The YARP robotics middleware and framework was leveraged to be at the foundation of a distributed architecture of newly developed modules: text-to-speech (implemented with eSpeak and Piper), automatic speech recognition (powered by Kaldi, wrapped by Vosk), wake word detection (via openWakeWord) and inference on large language models (with llama.cpp). The application was tested on the humanoid robot TEO using a distilled variant of the DeepSeek-R1 language model. Results show that a fully offline low-latency conversational agent can be adopted to achieve human-robot interaction tasks.

Descargas

Publicado

2025-05-31