Abstract
Retrieval-Augmented Generation (RAG) is a cutting-edge framework that combines retrieval-based methods with generative models to enhance the accuracy and relevance of responses by retrieving relevant information from a knowledge base before generating answers. Its significance lies in its ability to handle complex, knowledge-intensive tasks like question answering, document summarization, and conversational AI, making it a powerful tool for applications in healthcare, finance, education, and more. As RAG rapidly evolves, it is being applied to increasingly diverse domains, requiring it to handle broader types of data, including text, images, tables, graphs, and time-series data. However, this expansion introduces challenges such as cross-modal retrieval, unified representation learning, data fusion, scalability, noise handling, and evaluation. Addressing these challenges is urgent to ensure RAG's effectiveness in real-world applications, where data is often heterogeneous, dynamic, and imperfect, and to unlock its full potential across a wide range of industries and use cases.
This tutorial will cover a broad range of topics in recent progress of retrieval augmented generation, by reviewing and introducing the fundamental concepts and algorithms of RAGs, new research frontiers and technical advancement of RAGs for complex data, as well as corresponding applications and evaluations. In addition, rich tutorial materials will be included and introduced to help the audience gain a systematic understanding beyond our recently published survey paper and open-source repositories of state-of-the-art RAG algorithms.
Presenters
![]() |
![]() |
Liang Zhao1 | Chao Huang2 |
Outline
- Introduction to RAG
- Motivation, limits of LLMs
- Basics of retrieval augmentation
- Naive RAG and Its Limitations
- Flattened text
- Single-hop retrieval
- Lack of structure
- Challenges with Complex Data
- Unstructured
- Semi-structured
- Structured
- Multimodal
- Unstructured RAG – Standard pipelines and their constraints
- Structured RAG
- Tables
- Neural retrieval + reader
- Structure-aware encodings
- Tool-augmented (SQL, APIs)
- Structured Documents
- Structure-preserving encodings
- Structure-aware retrieval
- Tool-augmented navigation
- Tables
- Semi-Structured RAG
- Definition & characteristics
- Graph-indexed texts
- Text-attributed graphs
- Hybrid approaches
- Hybrid RAG
- Fusion in retriever
- Fusion in generator
- Iterative & planning-based retrieval
- Multimodal RAG – Architecture
- Workflow: query → encoder → retriever → fusion → answer
- Multimodal Retrieval
- Text retrieval
- Vision retrieval
- Video retrieval
- Audio retrieval
- Entire multimodal object retrieval
- Post-Retrieval Processing
- Re-ranking
- Joint selection
- Filtering
- Generation & Reasoning
- Fusion strategies
- In-context learning
- Multimodal reasoning (CoT, branching, compositional)
- Instruction tuning
- Agents & Future Directions
- Agentic frameworks
- Multi-agent RAG
- Planning
- Challenges: trust, grounding, evaluation paradigms
Resources
TBA
About
TBA