IJCAI 2025 Tutorial

Beyond Text: Advanced Retrieval-Augmented Generation for Complex and Multimodal Data

Abstract

Retrieval-Augmented Generation (RAG) is a cutting-edge framework that combines retrieval-based methods with generative models to enhance the accuracy and relevance of responses by retrieving relevant information from a knowledge base before generating answers. Its significance lies in its ability to handle complex, knowledge-intensive tasks like question answering, document summarization, and conversational AI, making it a powerful tool for applications in healthcare, finance, education, and more.

As RAG rapidly evolves, it is being applied to increasingly diverse domains, requiring it to handle broader types of data, including text, images, tables, graphs, and time-series data. However, this expansion introduces challenges such as cross-modal retrieval, unified representation learning, data fusion, scalability, noise handling, and evaluation.

This tutorial will cover recent progress in retrieval augmented generation, including fundamental concepts, technical advances for complex data, corresponding applications, and evaluation.

Presenters

Outline

  1. Introduction to RAG
  2. Naive RAG and its limitations
  3. Challenges with complex data
  4. Structured and semi-structured RAG
  5. Multimodal RAG
  6. Agents and future directions

Materials

About

This tutorial provides a concise overview of advanced retrieval-augmented generation methods for structured, semi-structured, and multimodal data.