Judging Criteria
What judges look for
Judging
How entries are evaluated, what judges expect to see, and how ties are handled. Read this carefully before you submit

Phases & What Judges Review
Approach (Online)
Judges review your 3-5 page PDF plus artifacts.
- Worked examples (≥3) with explicit reasoning paths (which classes/relations/filters were used for each track you enter).
- Prompts / DSL / pseudocode / rules — enough to reproduce the reasoning.
- Optional: ≤3-minute explainer video; repo link not required in Phase 1.
- You may enter one Approach per track (you can enter 1, 2, or all 3 tracks; each is judged independently).
Outcome: Top 5 finalists per track (15 total). See Timeline for dates.
Build (Finalists)
Judges review the running system and evidence:
- Repo (private) shared with organizers/judges or a secure private bundle link (ZIP) — must include a clear LICENSE, README, and setup steps.
- Runnable demo/notebook and evaluation results on the provided datasets
- Video (5-7 min) + one-pager (problem → method → results).
- Any additional items specified in the finalists' brief.
Outcome: Grand-Prize winner per track.
Score Bands
Applied To Each Criterion
1. Insufficient
Off-scope, largely speculative, not grounded in the ontology or rules.
2. Limited
Major gaps in reasoning/grounding/safety; unclear feasibility.
3. Adequate
Reasonable idea with partial grounding; some ambiguities or missing details.
4. Good
Minor gaps only; solid approach with clear, useful examples.
5. Excellent
Thorough, correct, and well-justified; anticipates edge cases; strong grounded evidence.

How Judges Interpret Each Criterion
How Judges Interpret Each Criterion - Phase 1
1. Ontology Reasoning Rigor (30%)
- Correct use of classes/relations/constraints; explicit, readable reasoning paths.
- Examples show which nodes/edges support answers/impacts; IDs are listed when counting or aggregating.
- Handles ambiguity, temporal filters, aggregations, and joins without ad-hoc hacks; never invents schema.
- Generalizes beyond specific instance IDs found in seed data.
2. AI/LLM Technique & Safety (25%)
- Sensible use of LLMs (prompting, tools/RAG, constrained decoding, guard rails).
- Hallucination controls; privacy & data-handling explained (where inference runs; what leaves the machine).
- Transparent disclosure of model names/versions and any providers/costs/limits.
3. Feasibility & Build Plan (25%)
- Clear architecture, milestones, risks/mitigations, and evaluation plan.
- Realistic to implement within the finalist window; minimal dependencies on unavailable infra/data.
4. Originality (10%)
- Novel insight, compact IR/DSL, elegant mapping/propagation/confidence math; useful but not gratuitous novelty.
5. Clarity (10%)
- Writing, diagrams, and structure make it easy to reproduce your reasoning.
Weighted Scoring Formula
For each submission, judges assign 1-5 on each criterion.
Final = 0.30·Rigor + 0.25·AI_Safety + 0.25·Feasibility + 0.10·Originality + 0.10·Clarity (scaled to 100).
What Judges Emphasize - Phase 2 (Finalists)
Phase-2 judging is holistic using the same 1-5 score bands:
1. Functionality
- End-to-end system runs as described; demo/notebook is reproducible
2. Rigor
- Correctness of reasoning, well-designed evaluation, sensible trade-offs.
3. Clarity
- Clean repo, README, video, and one-pager; results are easy to verify.
Discipline checks (track-specific):
1. T1/T2 Evidence discipline
- Return IDs of nodes/edges used in counts/paths (e.g., eventIds, shipmentIds, routeIds) plus filters applied.
2. T3 Patch discipline
- Emit a valid, machine-readable patch JSON with anchors, placement, datatypes/cardinalities, rationale, confidence, and rollback notes

Process, Fairness & Conflicts
- Panel composition: Researchers/practitioners in AI/ML, knowledge graphs, and reasoning.
- Reviews per entry: Each Approach is scored by ≥2 judges; large score variance may trigger a third review.
- Conflicts of interest: Mandatory recusal for employer/advisor/advisee, recent co-author (≤24 months), or close personal relationships
- Normalization: We may normalize scores across judges to reduce leniency/strictness bias.
- Clarifications: Judges may request minor clarifications (no new material).
- Decisions: All judging decisions are final; brief feedback may be shared when feasible.
Minimum standards & disqualifiers (judging context)
Submissions may be removed from consideration if they:
- 1. Do not use the provided ontology in a meaningful way (e.g., purely procedural/SQL without schema grounding).
- 2. Omit required artifacts (e.g., no Approach PDF).
- 3. Include sensitive data or unauthorized third-party content/code.
- 4. Plagiarize or falsify results/evidence
- 5. Violate the Code of Conduct or applicable law.
- 6. Breach the site/Discord platform terms.
What Great Submissions Look Like (Per Track)
Track 1
NL → Ontology Querying
A crisp IR/plan or DSL that shows filters, temporal windows, joins, and handling of synonyms/aliases
Answers return concise evidence (IDs of nodes/edges) with a brief explanation / reasoning path.
Ambiguity handling: either ask for clarification or return top candidates with confidence.
Guard-rails: never invent classes/relations; cumulative intent respected across turns.
Track 2
Causal Reasoning (Deterministic ± Probabilistic)
Explicit propagation rules: directionality, cut-offs, and time windows are clear and reproducible.
A readable impact tree with the 'why' at each hop; optional confidence/probability math is transparent
One reverse-diagnosis example that's principled (not ad-hoc), showing plausible root causes and justification
Track 3
Ontology Evolution from Integration Data
Clean field-to-ontology mapping (classes/properties/relations).
Gap detection with a machine-usable patch JSON: anchors, placement (parent/extension), datatypes, cardinality, relation anchors, rationale + confidence, and rollback/versioning notes.
Avoids duplicates/synonyms; proposes merges where appropriate and shows quick verification checks.
What Judges Will Not Reward
- Hard-coded IDs, one-off SQL per example, or brittle rules that don't generalize.
- Opaque scores or confidence without explanation.
- Diagrams/architectures without executable logic or reproducible reasoning steps.
- Ignoring privacy/LLM disclosure requirements (models, where inference runs, data handling).
- Over-claiming results without evidence, ablations, or evaluation on the provided datasets.
- Methods that invent ontology classes/relations or fail to ground in the provided schema.

Judging Timeline (Summary)
- 01.
Approach judging: October 22-24, 2025
- 02.
Finalists announced: October 25, 2025 (online webinar + site/Discord)
- 03.
Build Judging (Finalists): November 25 - December 1, 2025
- 04.
Winner announced: December 2, 2025
Any change will be posted on the Timeline page and in Discord #announcements.

Example Judging Sheet (Reference)
Team / Title:
Track: [T1 / T2 / T3]
Reviewer: [ID]
Phase 1 (Approach)Ontology Reasoning Rigor (1-5):
AI/LLM Technique & Safety (1-5):
Feasibility & Build Plan (1-5):
Originality (1-5):
Clarity (1-5):
Comments (strengths, risks, questions):
Weighted score auto-calc:
0.30*Rigor + 0.25*AI_Safety + 0.25*Feasibility + 0.10*Originality + 0.10*Clarity
Phase 2 — Build (Finalists)Functionality & Rigor (1-5): runnable demo/notebook, eval results
Repository Quality (1-5): LICENSE, README, setup steps
Video (5-7 min) + One-pager (1-5): clarity of method/results
Final Notes: any items specified in the finalist brief
Outcome:
aggregate score + panel deliberation ⇒ Grand Prize per track
Quick Facts
Team Size::
Solo/1-5 Member(s)
Cost:
Free
Prizes:
$45K in Total + Job Opportunity
Finalists: Top 5 per track announced October 25, 2025
Build window(finalists): October 27 - November 24, 2025 - 23:59
Winners announced: December 2, 2025