MedQuery
Bridging the gap between clinical questions and SQL databases.
tech stack
PythonRAGLLMsFAISSSQL
The Problem
Medical researchers often need to query complex datasets like MIMIC-III, but they aren't always experts in SQL or the specific schema architecture. Traditional search tools fail at complex relational joins and context-aware filtering.
What I Built
Built a Retrieval-Augmented Generation (RAG) system that transforms natural language questions into precise SQL queries. It uses a semantic layer to understand medical terminology and maps it to the underlying database schema.
Architecture & Approach
The system uses FAISS and ChromaDB for vector storage of schema metadata. When a question is asked, it retrieves relevant table schemas and uses a fine-tuned LLM to generate the SQL. It includes a validation layer to prevent destructive queries.
Impact & Results
Reduced SQL query generation time for non-technical staff by 90%.
Supported complex joins across 20+ clinical tables in the MIMIC-III dataset.
Average query accuracy of 88% on the Spider medical benchmark.
