Judging Criteria

What judges look for

Judging

How entries are evaluated, what judges expect to see, and how ties are handled. Read this carefully before you submit

Phases & What Judges Review

Phase 1

Approach (Online)

Judges review your 3-5 page PDF plus artifacts.

Worked examples (≥3) with explicit reasoning paths (which classes/relations/filters were used for each track you enter).
Prompts / DSL / pseudocode / rules — enough to reproduce the reasoning.
Optional: ≤3-minute explainer video; repo link not required in Phase 1.
You may enter one Approach per track (you can enter 1, 2, or all 3 tracks; each is judged independently).

Outcome: Top 5 finalists per track (15 total). See Timeline for dates.

Phase 2

Build (Finalists)

Judges review the running system and evidence:

Repo (private) shared with organizers/judges or a secure private bundle link (ZIP) — must include a clear LICENSE, README, and setup steps.
Runnable demo/notebook and evaluation results on the provided datasets
Video (5-7 min) + one-pager (problem → method → results).
Any additional items specified in the finalists' brief.

Outcome: Grand-Prize winner per track.

Score Bands

Applied To Each Criterion

1. Insufficient
Off-scope, largely speculative, not grounded in the ontology or rules.
2. Limited
Major gaps in reasoning/grounding/safety; unclear feasibility.
3. Adequate
Reasonable idea with partial grounding; some ambiguities or missing details.
4. Good
Minor gaps only; solid approach with clear, useful examples.
5. Excellent
Thorough, correct, and well-justified; anticipates edge cases; strong grounded evidence.

How Judges Interpret Each Criterion

01.

How Judges Interpret Each Criterion - Phase 1

1. Ontology Reasoning Rigor (30%)

Correct use of classes/relations/constraints; explicit, readable reasoning paths.
Examples show which nodes/edges support answers/impacts; IDs are listed when counting or aggregating.
Handles ambiguity, temporal filters, aggregations, and joins without ad-hoc hacks; never invents schema.
Generalizes beyond specific instance IDs found in seed data.

2. AI/LLM Technique & Safety (25%)

Sensible use of LLMs (prompting, tools/RAG, constrained decoding, guard rails).
Hallucination controls; privacy & data-handling explained (where inference runs; what leaves the machine).
Transparent disclosure of model names/versions and any providers/costs/limits.

3. Feasibility & Build Plan (25%)

Clear architecture, milestones, risks/mitigations, and evaluation plan.
Realistic to implement within the finalist window; minimal dependencies on unavailable infra/data.

4. Originality (10%)

Novel insight, compact IR/DSL, elegant mapping/propagation/confidence math; useful but not gratuitous novelty.

5. Clarity (10%)

Writing, diagrams, and structure make it easy to reproduce your reasoning.

Weighted Scoring Formula

For each submission, judges assign 1-5 on each criterion.
Final = 0.30·Rigor + 0.25·AI_Safety + 0.25·Feasibility + 0.10·Originality + 0.10·Clarity (scaled to 100).

02.

What Judges Emphasize - Phase 2 (Finalists)

Phase-2 judging is holistic using the same 1-5 score bands:

1. Functionality

End-to-end system runs as described; demo/notebook is reproducible

2. Rigor

Correctness of reasoning, well-designed evaluation, sensible trade-offs.

3. Clarity

Clean repo, README, video, and one-pager; results are easy to verify.

Discipline checks (track-specific):

1. T1/T2 Evidence discipline

Return IDs of nodes/edges used in counts/paths (e.g., eventIds, shipmentIds, routeIds) plus filters applied.

2. T3 Patch discipline

Emit a valid, machine-readable patch JSON with anchors, placement, datatypes/cardinalities, rationale, confidence, and rollback notes

Process, Fairness & Conflicts

Panel composition: Researchers/practitioners in AI/ML, knowledge graphs, and reasoning.
Reviews per entry: Each Approach is scored by ≥2 judges; large score variance may trigger a third review.
Conflicts of interest: Mandatory recusal for employer/advisor/advisee, recent co-author (≤24 months), or close personal relationships
Normalization: We may normalize scores across judges to reduce leniency/strictness bias.
Clarifications: Judges may request minor clarifications (no new material).
Decisions: All judging decisions are final; brief feedback may be shared when feasible.

Minimum standards & disqualifiers (judging context)

Submissions may be removed from consideration if they:

1. Do not use the provided ontology in a meaningful way (e.g., purely procedural/SQL without schema grounding).
2. Omit required artifacts (e.g., no Approach PDF).
3. Include sensitive data or unauthorized third-party content/code.
4. Plagiarize or falsify results/evidence
5. Violate the Code of Conduct or applicable law.
6. Breach the site/Discord platform terms.

What Great Submissions Look Like (Per Track)

Track 1

NL → Ontology Querying

A crisp IR/plan or DSL that shows filters, temporal windows, joins, and handling of synonyms/aliases
Answers return concise evidence (IDs of nodes/edges) with a brief explanation / reasoning path.
Ambiguity handling: either ask for clarification or return top candidates with confidence.
Guard-rails: never invent classes/relations; cumulative intent respected across turns.

Track 2

Causal Reasoning (Deterministic ± Probabilistic)

Explicit propagation rules: directionality, cut-offs, and time windows are clear and reproducible.
A readable impact tree with the 'why' at each hop; optional confidence/probability math is transparent
One reverse-diagnosis example that's principled (not ad-hoc), showing plausible root causes and justification

Track 3

Ontology Evolution from Integration Data

Clean field-to-ontology mapping (classes/properties/relations).
Gap detection with a machine-usable patch JSON: anchors, placement (parent/extension), datatypes, cardinality, relation anchors, rationale + confidence, and rollback/versioning notes.
Avoids duplicates/synonyms; proposes merges where appropriate and shows quick verification checks.

What Judges Will Not Reward

Hard-coded IDs, one-off SQL per example, or brittle rules that don't generalize.
Opaque scores or confidence without explanation.
Diagrams/architectures without executable logic or reproducible reasoning steps.
Ignoring privacy/LLM disclosure requirements (models, where inference runs, data handling).
Over-claiming results without evidence, ablations, or evaluation on the provided datasets.
Methods that invent ontology classes/relations or fail to ground in the provided schema.

Judging Timeline (Summary)

01.
Approach judging: October 22-24, 2025
02.
Finalists announced: October 25, 2025 (online webinar + site/Discord)
03.
Build Judging (Finalists): November 25 - December 1, 2025
04.
Winner announced: December 2, 2025

Any change will be posted on the Timeline page and in Discord #announcements.

Example Judging Sheet (Reference)

Team / Title:
Track: [T1 / T2 / T3]
Reviewer: [ID]

Phase 1 (Approach)Ontology Reasoning Rigor (1-5):
AI/LLM Technique & Safety (1-5):
Feasibility & Build Plan (1-5):
Originality (1-5):
Clarity (1-5):
Comments (strengths, risks, questions):

Weighted score auto-calc:
0.30*Rigor + 0.25*AI_Safety + 0.25*Feasibility + 0.10*Originality + 0.10*Clarity

Phase 2 — Build (Finalists)Functionality & Rigor (1-5): runnable demo/notebook, eval results
Repository Quality (1-5): LICENSE, README, setup steps
Video (5-7 min) + One-pager (1-5): clarity of method/results
Final Notes: any items specified in the finalist brief

Outcome:
aggregate score + panel deliberation ⇒ Grand Prize per track

Quick Facts

Team Size::

Solo/1-5 Member(s)

Cost:

Free

Prizes:

$45K in Total + Job Opportunity

Finalists: Top 5 per track announced October 25, 2025

Build window(finalists): October 27 - November 24, 2025 - 23:59

Winners announced: December 2, 2025

Join Discord Submit Approach