Under the hood.
How every tool actually works.
A technical reference for every AI feature on JustHigherEd — from RAG pipelines and vector search to SQL agents and streaming. Written so future-you can pick right back up.
The 30,000-foot view
A Next.js frontend, a FastAPI backend, and two AI models. Nginx ties it together in production.
AI Program Advisor — RAG pipeline
Pathfinder blends Retrieval-Augmented Generation with a hybrid search engine to give every student a personalised conversation grounded in real program data.
+ (0.4 × keyword_score)
semantic_score — cosine similarity of query embedding vs stored program embedding (ChromaDB).
keyword_score — fraction of query words that appear in the program name or description.
- Persona: "Pathfinder", a friendly, knowledgeable guide
- Always ask about degree level early if unknown
- Never overwhelm — show max 3 programs at once
- Cite BLS for every salary figure
- Call the school "Carvard University", never "University of Utah"
- Trigger handoff for: financial aid, visas, disabilities, emotional distress
When a student logs in with their ID, format_student_context() injects their name, GPA, credits completed, enrollment status, interests, and up to 10 recent courses directly into the context window — so Pathfinder can give advice that accounts for where they actually are in their degree.
The backend stores one PathfinderChatbot instance per session in an in-memory Python dict (chatbot_sessions). Each instance holds conversation history, the inferred degree level, and the handoff flag. In production this should migrate to Redis for horizontal scaling.
Side-by-side program comparison
A direct API call — no AI involved. Raw program data is fetched, enriched with BLS salary figures, and laid out in a structured table.
Parsed from the official University of Utah 2025–2026 Academic Catalog PDF (18 MB) using a custom parser.py script.
- 557 programs indexed
- Fields: name, degree type, credits, description, required courses, admissions requirements, contact info
- Re-run
parser.pywhenever the catalog updates
- Program name + degree type + total credits
- Tuition estimate (resident / non-resident)
- Required courses list
- Admission requirements summary
- BLS job title(s) + median annual salary
- Program website URL
Compare is deterministic by design. Students need reliable, verifiable facts when making a multi-year financial decision. There is no LLM inference in the compare flow — just structured data fetched and formatted.
Financial modelling — cost vs. salary
Pure math, real data. The calculator models total program cost and projects career earnings from BLS to compute a break-even timeline.
Salaries come from program_occupation_map.json — a hand-curated mapping from program name to BLS Standard Occupational Classification (SOC) codes. Each SOC entry includes the latest BLS median annual wage and employment outlook (projected 10-year job growth %). Data was pulled from the BLS Occupational Outlook Handbook.
Like Compare, the ROI Calculator is deterministic. All calculations happen in the FastAPI tuition router — no LLM call. This keeps latency under 200 ms and results fully reproducible.
Text-to-SQL — asking questions of enrollment data
University leadership types a plain-English question. Claude Sonnet 4.6 interprets it, writes a SQL query, executes it against a SQLite database, and returns a formatted answer.
SQLite database at backend/enrollment/enrollment.db. Key tables:
- enrollment — headcount per program per semester/year
- retention — cohort retention rates by year and program
- yield — admitted → enrolled conversion rates
- demographics — breakdown by gender, ethnicity, residency
Enrollment questions often require multi-table JOINs, GROUP BY aggregations, window functions for year-over-year deltas, and careful column aliasing. Claude Haiku (optimised for conversational speed) occasionally produces subtly incorrect SQL for complex analytical queries. Sonnet 4.6 is reliably accurate for these — and since administrators ask far fewer questions per minute than students chat with Pathfinder, the higher per-token cost is acceptable.
- Claude is instructed to generate only SELECT statements
- The database connection is opened read-only
- Out-of-scope questions are flagged before any SQL runs
- Error responses from SQLite are caught and surfaced cleanly
Where the data comes from
No hallucinations. Every fact shown to students or administrators traces back to one of these real data sources.
program_catalog.pdfThe official Carvard/University of Utah 2025–2026 undergraduate and graduate catalog. Source-of-truth for every program.
Note: Parsed into programs.json + chroma_db/ by parser.py and db.py
data/programs.jsonStructured extraction of the PDF. Fields: name, degree type, total credits, description, required courses, admission requirements, contact info.
Note: Generated by parser.py from the PDF
chroma_db/Each program's name and description is encoded as a dense vector embedding using ChromaDB's default sentence-transformers model. Enables semantic similarity search.
Note: Built by db.py — re-run to refresh after programs.json changes
data/program_occupation_map.jsonHand-curated mapping from program name to Bureau of Labor Statistics SOC codes, job titles, median annual wages, and 10-year job growth projections.
Note: Manually maintained — update from bls.gov when new Occupational Outlook data releases
backend/enrollment/enrollment.dbStructured enrollment data: headcount by program and semester, retention rates, yield rates, demographic breakdowns. Used exclusively by the Enrollment Analyst.
Note: Schema defined in schema_context.py; seeded with synthetic data for demo
students/students.jsonMock student records for demonstration. Each record has: student_id, name, GPA, credits, enrollment status, current program, completed courses, interests.
Note: Test login: student ID u1000015
Retrieval-Augmented Generation explained
RAG means the AI model doesn't answer from memory — it first fetches relevant facts, then generates a response grounded in those facts.
Every technology used
A deliberate stack — each choice made for a specific reason.
Trade-offs worth remembering
Things that were deliberate — not accidental — so future-you knows why.
See it in action
Reading about it is one thing. Try the tools yourself — Pathfinder's RAG, the Compare table, the ROI model, and the Enrollment Analyst are all live.