IJCAI 2025 Tutorial - Advanced RAG

Abstract

Retrieval-Augmented Generation (RAG) is a cutting-edge framework that combines retrieval-based methods with generative models to enhance the accuracy and relevance of responses by retrieving relevant information from a knowledge base before generating answers. Its significance lies in its ability to handle complex, knowledge-intensive tasks like question answering, document summarization, and conversational AI, making it a powerful tool for applications in healthcare, finance, education, and more. As RAG rapidly evolves, it is being applied to increasingly diverse domains, requiring it to handle broader types of data, including text, images, tables, graphs, and time-series data. However, this expansion introduces challenges such as cross-modal retrieval, unified representation learning, data fusion, scalability, noise handling, and evaluation. Addressing these challenges is urgent to ensure RAG's effectiveness in real-world applications, where data is often heterogeneous, dynamic, and imperfect, and to unlock its full potential across a wide range of industries and use cases.

This tutorial will cover a broad range of topics in recent progress of retrieval augmented generation, by reviewing and introducing the fundamental concepts and algorithms of RAGs, new research frontiers and technical advancement of RAGs for complex data, as well as corresponding applications and evaluations. In addition, rich tutorial materials will be included and introduced to help the audience gain a systematic understanding beyond our recently published survey paper and open-source repositories of state-of-the-art RAG algorithms.

Presenters


Liang Zhao¹	Chao Huang²

Outline

Introduction to RAG
- Motivation, limits of LLMs
- Basics of retrieval augmentation
Naive RAG and Its Limitations
- Flattened text
- Single-hop retrieval
- Lack of structure
Challenges with Complex Data
- Unstructured
- Semi-structured
- Structured
- Multimodal
Unstructured RAG – Standard pipelines and their constraints
Structured RAG
- Tables
  - Neural retrieval + reader
  - Structure-aware encodings
  - Tool-augmented (SQL, APIs)
- Structured Documents
  - Structure-preserving encodings
  - Structure-aware retrieval
  - Tool-augmented navigation
Semi-Structured RAG
- Definition & characteristics
- Graph-indexed texts
- Text-attributed graphs
- Hybrid approaches
Hybrid RAG
- Fusion in retriever
- Fusion in generator
- Iterative & planning-based retrieval
Multimodal RAG – Architecture
- Workflow: query → encoder → retriever → fusion → answer
Multimodal Retrieval
- Text retrieval
- Vision retrieval
- Video retrieval
- Audio retrieval
- Entire multimodal object retrieval
Post-Retrieval Processing
- Re-ranking
- Joint selection
- Filtering
Generation & Reasoning
- Fusion strategies
- In-context learning
- Multimodal reasoning (CoT, branching, compositional)
- Instruction tuning
Agents & Future Directions
- Agentic frameworks
- Multi-agent RAG
- Planning
- Challenges: trust, grounding, evaluation paradigms

Resources

TBA

About

TBA