Home
← back to home

MedQuery

Bridging the gap between clinical questions and SQL databases.

tech stack

PythonRAGLLMsFAISSSQL

The Problem

Medical researchers often need to query complex datasets like MIMIC-III, but they aren't always experts in SQL or the specific schema architecture. Traditional search tools fail at complex relational joins and context-aware filtering.

What I Built

Built a Retrieval-Augmented Generation (RAG) system that transforms natural language questions into precise SQL queries. It uses a semantic layer to understand medical terminology and maps it to the underlying database schema.

Architecture & Approach

The system uses FAISS and ChromaDB for vector storage of schema metadata. When a question is asked, it retrieves relevant table schemas and uses a fine-tuned LLM to generate the SQL. It includes a validation layer to prevent destructive queries.

Impact & Results

Reduced SQL query generation time for non-technical staff by 90%.

Supported complex joins across 20+ clinical tables in the MIMIC-III dataset.

Average query accuracy of 88% on the Spider medical benchmark.

Key Decisions & Tradeoffs