Judging Criteria

What judges look for

Judging

How entries are evaluated, what judges expect to see, and how ties are handled. Read this carefully before you submit

Phases & What Judges Review

Phase 1

Approach (Online)

Judges review your 3-5 page PDF plus artifacts.

  • Worked examples (≥3) with explicit reasoning paths (which classes/relations/filters were used for each track you enter).
  • Prompts / DSL / pseudocode / rules — enough to reproduce the reasoning.
  • Optional: ≤3-minute explainer video; repo link not required in Phase 1.
  • You may enter one Approach per track (you can enter 1, 2, or all 3 tracks; each is judged independently).

Outcome: Top 5 finalists per track (15 total). See Timeline for dates.

Phase 2

Build (Finalists)

Judges review the running system and evidence:

  • Repo (private) shared with organizers/judges or a secure private bundle link (ZIP) — must include a clear LICENSE, README, and setup steps.
  • Runnable demo/notebook and evaluation results on the provided datasets
  • Video (5-7 min) + one-pager (problem → method → results).
  • Any additional items specified in the finalists' brief.

Outcome: Grand-Prize winner per track.

Score Bands

Applied To Each Criterion

  • 1. Insufficient

    Off-scope, largely speculative, not grounded in the ontology or rules.

  • 2. Limited

    Major gaps in reasoning/grounding/safety; unclear feasibility.

  • 3. Adequate

    Reasonable idea with partial grounding; some ambiguities or missing details.

  • 4. Good

    Minor gaps only; solid approach with clear, useful examples.

  • 5. Excellent

    Thorough, correct, and well-justified; anticipates edge cases; strong grounded evidence.

How Judges Interpret Each Criterion

01.

How Judges Interpret Each Criterion - Phase 1

1. Ontology Reasoning Rigor (30%)

  • Correct use of classes/relations/constraints; explicit, readable reasoning paths.
  • Examples show which nodes/edges support answers/impacts; IDs are listed when counting or aggregating.
  • Handles ambiguity, temporal filters, aggregations, and joins without ad-hoc hacks; never invents schema.
  • Generalizes beyond specific instance IDs found in seed data.

2. AI/LLM Technique & Safety (25%)

  • Sensible use of LLMs (prompting, tools/RAG, constrained decoding, guard rails).
  • Hallucination controls; privacy & data-handling explained (where inference runs; what leaves the machine).
  • Transparent disclosure of model names/versions and any providers/costs/limits.

3. Feasibility & Build Plan (25%)

  • Clear architecture, milestones, risks/mitigations, and evaluation plan.
  • Realistic to implement within the finalist window; minimal dependencies on unavailable infra/data.

4. Originality (10%)

  • Novel insight, compact IR/DSL, elegant mapping/propagation/confidence math; useful but not gratuitous novelty.

5. Clarity (10%)

  • Writing, diagrams, and structure make it easy to reproduce your reasoning.

Weighted Scoring Formula

For each submission, judges assign 1-5 on each criterion.
Final = 0.30·Rigor + 0.25·AI_Safety + 0.25·Feasibility + 0.10·Originality + 0.10·Clarity (scaled to 100).

02.

What Judges Emphasize - Phase 2 (Finalists)

Phase-2 judging is holistic using the same 1-5 score bands:

1. Functionality

  • End-to-end system runs as described; demo/notebook is reproducible

2. Rigor

  • Correctness of reasoning, well-designed evaluation, sensible trade-offs.

3. Clarity

  • Clean repo, README, video, and one-pager; results are easy to verify.

Discipline checks (track-specific):

1. T1/T2 Evidence discipline

  • Return IDs of nodes/edges used in counts/paths (e.g., eventIds, shipmentIds, routeIds) plus filters applied.

2. T3 Patch discipline

  • Emit a valid, machine-readable patch JSON with anchors, placement, datatypes/cardinalities, rationale, confidence, and rollback notes

Process, Fairness & Conflicts

  • Panel composition: Researchers/practitioners in AI/ML, knowledge graphs, and reasoning.
  • Reviews per entry: Each Approach is scored by ≥2 judges; large score variance may trigger a third review.
  • Conflicts of interest: Mandatory recusal for employer/advisor/advisee, recent co-author (≤24 months), or close personal relationships
  • Normalization: We may normalize scores across judges to reduce leniency/strictness bias.
  • Clarifications: Judges may request minor clarifications (no new material).
  • Decisions: All judging decisions are final; brief feedback may be shared when feasible.

Minimum standards & disqualifiers (judging context)

Submissions may be removed from consideration if they:

  • 1. Do not use the provided ontology in a meaningful way (e.g., purely procedural/SQL without schema grounding).
  • 2. Omit required artifacts (e.g., no Approach PDF).
  • 3. Include sensitive data or unauthorized third-party content/code.
  • 4. Plagiarize or falsify results/evidence
  • 5. Violate the Code of Conduct or applicable law.
  • 6. Breach the site/Discord platform terms.

What Great Submissions Look Like (Per Track)

Track 1

NL → Ontology Querying

  • A crisp IR/plan or DSL that shows filters, temporal windows, joins, and handling of synonyms/aliases

  • Answers return concise evidence (IDs of nodes/edges) with a brief explanation / reasoning path.

  • Ambiguity handling: either ask for clarification or return top candidates with confidence.

  • Guard-rails: never invent classes/relations; cumulative intent respected across turns.

Track 2

Causal Reasoning (Deterministic ± Probabilistic)

  • Explicit propagation rules: directionality, cut-offs, and time windows are clear and reproducible.

  • A readable impact tree with the 'why' at each hop; optional confidence/probability math is transparent

  • One reverse-diagnosis example that's principled (not ad-hoc), showing plausible root causes and justification

Track 3

Ontology Evolution from Integration Data

  • Clean field-to-ontology mapping (classes/properties/relations).

  • Gap detection with a machine-usable patch JSON: anchors, placement (parent/extension), datatypes, cardinality, relation anchors, rationale + confidence, and rollback/versioning notes.

  • Avoids duplicates/synonyms; proposes merges where appropriate and shows quick verification checks.

What Judges Will Not Reward

  • Hard-coded IDs, one-off SQL per example, or brittle rules that don't generalize.
  • Opaque scores or confidence without explanation.
  • Diagrams/architectures without executable logic or reproducible reasoning steps.
  • Ignoring privacy/LLM disclosure requirements (models, where inference runs, data handling).
  • Over-claiming results without evidence, ablations, or evaluation on the provided datasets.
  • Methods that invent ontology classes/relations or fail to ground in the provided schema.

Judging Timeline (Summary)

  • 01.

    Approach judging: October 22-24, 2025

  • 02.

    Finalists announced: October 25, 2025 (online webinar + site/Discord)

  • 03.

    Build Judging (Finalists): November 25 - December 1, 2025

  • 04.

    Winner announced: December 2, 2025

Any change will be posted on the Timeline page and in Discord #announcements.

Example Judging Sheet (Reference)

Team / Title:
Track: [T1 / T2 / T3]
Reviewer: [ID]

Phase 1 (Approach)Ontology Reasoning Rigor (1-5):
AI/LLM Technique & Safety (1-5):
Feasibility & Build Plan (1-5):
Originality (1-5):
Clarity (1-5):
Comments (strengths, risks, questions):

Weighted score auto-calc:
0.30*Rigor + 0.25*AI_Safety + 0.25*Feasibility + 0.10*Originality + 0.10*Clarity

Phase 2 — Build (Finalists)Functionality & Rigor (1-5): runnable demo/notebook, eval results
Repository Quality (1-5): LICENSE, README, setup steps
Video (5-7 min) + One-pager (1-5): clarity of method/results
Final Notes: any items specified in the finalist brief

Outcome:
aggregate score + panel deliberation ⇒ Grand Prize per track

Quick Facts

Team Size::

Solo/1-5 Member(s)

Cost:

Free

Prizes:

$45K in Total + Job Opportunity

Finalists: Top 5 per track announced October 25, 2025

Build window(finalists): October 27 - November 24, 2025 - 23:59

Winners announced: December 2, 2025