Retrieval-Augmented Generation (RAG) for Older Adult Mobility and Health Information

Author: Jia Yang

Supervisor: Dr. Rong Zheng

Institution: McMaster University

Project Description and Approach
• Motivation
• System Overview
• Key Design Features
• Architecture Components
Results
• Retrieval Performance (BEIR Benchmark)
• Local vs Web Retrieval
• Qualitative Evaluation
Source Code
References

1. Project Description and Approach

Motivation

Older adults increasingly rely on online resources for health information related to mobility, fall prevention, balance training, and rehabilitation. However:

Search engines often return fragmented or unreliable results.
Large Language Models (LLMs) may generate fluent but unverified responses.
Health misinformation can negatively impact safety and decision-making.

This project develops a reliable, citation-grounded Retrieval-Augmented Generation (RAG) system designed specifically for older-adult mobility and health information.

System Overview

The system integrates:

BM25 lexical retrieval
Cross-encoder semantic reranking
Relevance-gating mechanism (CE threshold τ = 0.6)
Google Search API web fallback
LLM-based answer generation with explicit citations

Pipeline Summary

User submits a natural-language query.
BM25 retrieves top-k passages from a curated MongoDB corpus.
A cross-encoder reranks candidates.
A relevance gate determines:
• High CE score → use local database evidence
• Low CE score → trigger web fallback
The LLM generates a plain-language answer with citations.

Key Design Features

Curated corpus of trusted health documents and passages
Explicit source citations in every answer
Plain-language output tailored for older adults
Safety-aware prompts with medical disclaimers
Modular Python implementation (BM25, Sentence-Transformers, Flask, Streamlit)

Architecture Components

Crawler Layer (PDF + Web)
Retrieval Layer (BM25)
Reranking Layer (Cross-encoder)
Decision Layer (Relevance Gate)
Generation Layer (LLM synthesis)
User Interface Layer (CLI, Flask API, Streamlit UI)

2. Results

Retrieval Performance (BEIR Benchmark)

Semantic retrieval outperformed BM25:

Dataset	Method	nDCG@10	MAP@10	Recall@100
SciFact	BM25	0.5178	0.4760	0.7896
SciFact	Semantic	0.6451	0.5959	0.9250
NFCorpus	BM25	0.3804	0.3415	0.6827
NFCorpus	Semantic	0.4686	0.4317	0.8541

A hybrid BM25 + cross-encoder balances computational efficiency and semantic precision.

Local vs Web Retrieval

Query Type	Local CE	Web CE
In-domain	≈ 0.997	≈ 0.892
Out-of-domain	≈ 0.000018	≈ 0.958

These results demonstrate strong domain sensitivity and reliable fallback behavior for out-of-scope queries.

Qualitative Evaluation

Evaluation using real-world older-adult mobility queries demonstrates:

Evidence-grounded responses
High in-domain relevance
Appropriate fallback behavior
Clear citation display
Inclusion of safety disclaimers

3. Source Code

The full implementation, retrieval pipeline, and deployment instructions are maintained in a private GitHub repository:

GitHub Repository: View GitHub Repository

4. References

Lewis, P., et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS 2020.
Paper link
Izacard, G., & Grave, E. (2021). Leveraging Passage Retrieval with Generative Models for Open-Domain QA. EACL 2021.
Paper link
Nogueira, R., & Cho, K. (2019). Passage Re-ranking with BERT. arXiv:1901.04085.
Paper link
Thakur, N., et al. (2021). BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of IR Models. NeurIPS 2021.
Paper link
World Health Organization. (2022). Healthy Ageing.
WHO page