The Data · ZENOS Data

See it

Engine telemetry, alongside the frame.

Real gameplay with the captured signals playing in sync. Scrub to any frame and read the exact state, input, and buffers behind it.

Real capture session: switch render buffers on the video, scrub to any frame and read the exact state behind it — all from the session's VTX file.

Ground truth

Not scraped. Not inferred. Captured.

Most game data is reconstructed from pixels. Ours is read directly from the running game, so every signal is exact, not estimated. The result is simulation-grade: exact enough to rebuild the moment, not just label it.

Tier 1 · Raw video

Pixels only

No state, no labels. What scraping gives you.

Tier 2 · Visual labelling

Inferred from pixels

Lossy and indirect. Estimated state, approximate labels.

Tier 3 · ZENOS ground truth

Frame-perfect game state

Direct from the running game. The (state, action, next-state) triplet, rights-cleared.

+ all Tier 1 & 2 data

[ Placeholder pull-quote from a frontier-model researcher on why ground-truth game data beats scraped video. ]Frontier-model researcher, to be cleared ⚑

Real play

Human behaviour. Not agent rollouts.

The behaviour in this data is human: hesitation, intent, mistakes, recovery. Agent play has a ceiling, scripted or learned: a policy generating its own training data can only teach what it already knows. The distribution narrows, and the model learns the agent, not the world. We capture real play for coverage, to a brief, and QA it before delivery.

Human behaviour

Natural play across skill levels. The rare and the routine, not just the highlight reel.

Capture to a brief ⚑

Specify titles, scenarios, signals, and rates. We capture to order.

QA'd before delivery ⚑

Checked for sync, completeness, and label quality.

Synthetic capture doesn't escape licensing either: data generated from a game is still derived from that game's IP. The rights question follows the title, not the capture method. Ours arrives already answered. ⚑

What's captured

Every signal, frame-aligned.

All of it synchronised to the same clock as the rendered frame. None of it can be inferred from video alone.

Video ⚑

HUD-less, up to 4K and 60 Hz. The rendered world, not screen capture.

Audio ⚑

Separate tracks: dialogue, environment, effects. Not a single mixed-down stream.

World state

Entities, transforms, velocity, physics, collisions, health, animations, events, objectives. Structured, not inferred.

Player input

Raw keyboard, mouse, controller. Exact sequences and timing. The action half of every learning pair.

Render outputs ⚑

Depth, surface normals, segmentation, motion vectors, UI / HUD masks.

Events + enrichment ⚑

Objectives, transitions, rewards, scene captions, named actions, semantic labels.

VISUAL · same frame, five ways: RGB | depth | normals | segmentation | UI mask · buffers you can't scrape

The render buffers come straight from the engine's pipeline, frame-aligned to the video and state, down to the G-buffer. The geometry behind the pixels, not estimated from them. ⚑

Normalised

One coordinate system. Every title.

Every title is normalised to a single coordinate system, scale, and convention. Engine quirks removed. Data from one game runs unchanged alongside another, so you can train and mix across the whole catalogue with zero per-title wrangling.

DIAGRAM · two titles (open-world + shooter) → one shared axis set

Metadata ⚑

Queryable metadata.

Every clip carries structured metadata across around 20 categories, plus semantic labels spanning a large activity vocabulary. Filter and search across any combination. ⚑

environmentweathertime-of-daylightingvehiclesactivitiesinteractionscamerashot typematerialsaudiolanguage

metadata record · clip 00428 ⚑

settingurban

weatherrain

time-of-daydusk

activitydriving · pursuit

labelsvehicles · crowd · wet surfaces

rights.scopetrain

Rights & provenance

Cleared to train on. Auditable to the licence.

Every record is licensed at the source and traces back to it. Train without the legal exposure of scraped footage or grey-area self-capture.

Licensed, never scraped

Captured only from titles we have licensed, non-exclusively, with the rights to train.

Licence ID per session

Every session's manifest carries its source, licence, and rights scope. Full chain of custody.

Private & secure ⚑

Access controlled, delivery versioned, usage reportable.