Accuracy evaluation
Offline benchmark only
Benchmark answers, not just tokens
This page shows the hackathon evaluation outputs for LLM-only, Basic RAG, and TigerGraph GraphRAG using both judge-based grading and BERTScore.
Dataset
n/a
Questions
0
Source
Offline view
Charts
0
Leading pipeline
n/a
No data yet
Judge pass rate
0.0%
Average across pipelines
Rescaled BERTScore
0.000
Semantic alignment to ground truth
Avg token reduction
0.0%
Relative to LLM-only
Questions loaded
0
Saved offline evaluation file
Loading evaluation results...