Benchmark answers, not just tokens

This page shows the hackathon evaluation outputs for LLM-only, Basic RAG, and TigerGraph GraphRAG using both judge-based grading and BERTScore.

Dataset

n/a

Questions

Source

Offline view

Charts

Leading pipeline

n/a

No data yet

Judge pass rate

0.0%

Average across pipelines

Rescaled BERTScore

0.000

Semantic alignment to ground truth

Avg token reduction

0.0%

Relative to LLM-only

Questions loaded

Saved offline evaluation file

Loading evaluation results...