gpu_stack.exe

gpu_stack/readme/frontdoor.txt

gpu_stack

A virtual AI datacenter you can interrogate. It predicts what a training run does to time, power, and money, says how sure it is, and can show you what every one of its numbers is made of.

Open observatory Research program Inspect symbolic engine

read_graph.exe

Read it like a receipt, not a magic answer.

Start with a human question, then follow the named dependencies upstream. Every hop should tell you whether you are looking at an equation, a scenario value, or an unresolved root input.

The registry currently names 1517 variables and 950 equations.
Root debt means the 619 unresolved inputs stay visible instead of being smoothed over.

target cost.per_token The number a human asks about.

equations run cost, tokens, power The graph walks the ancestry.

root debt 619 named inputs Unknowns stay inspectable.

scenario explicit assignments Fixtures are anchors, not market claims.

Target selected: start with the question. The page keeps the output attached to the labels underneath it.

1517registered variables

950equations connecting them

619root inputs, named instead of hidden

884equations with unit checks

token_journey.exe

Follow one token from math to money.

Every station is a layer of the model. The lamp lights as the token passes through, and the trip always ends at an invoice.

Model math attention, FFN, parameters

Kernels instructions, tiling, occupancy

Memory HBM traffic, caches, bandwidth

Silicon transistors, lithography, atoms

Power board, rack, cooling, PUE

Invoice: ~$3 per million tokens synthetic fixture value, not a market price

Symbolic causal backbone

The old engine becomes useful when measurements can prove it wrong.

The registry keeps equations, units, references, constraints, scenario assumptions, and unresolved boundaries attached. The research layer now adds observations, held-out splits, temporal events, interventions, uncertainty, residuals, and decision regret so graph completeness is no longer mistaken for scientific progress.

Only universal physics constants belong in Constant.
Everything else stays a Variable: clocks, voltages, tariffs, GPU counts, batch sizes, and facility assumptions.
A root input is visible modeling debt. That is much better than hidden modeling debt with a haircut.

One output, many upstream obligations. Click a layer to see what it owes.

layers.sys

Click a layer. Watch the dependency chain move.

The visible machine is a building, but the model treats it as a constraint bundle: grid interconnect, substations, cooling loops, water, occupancy, capex, operations, and uptime.

Inputs include power envelope, PUE, utilization, cooling load, and build cost.
Outputs feed cluster capacity, cost allocation, emissions, and schedule pressure.

trace_target.exe

Choose a question and follow what it depends on.

Cost per token is not a lone price. It depends on run cost, token count, facility power, throughput, utilization, hardware choices, and root assumptions that still need better evidence.

Use this as a mental model for the resolver, not as a live numerical solver.
The moving gold segment marks the direction of dependency pressure.

The graph is useful because each hop keeps its label. If a hop cannot be resolved from equations or scenario assignments, it comes back as a named missing boundary.

synthetic fixture resolves 4 of 4 targets

scenario-report status: ok

cone_browser.exe

Inspect any variable's upstream cone.

Choose a target and click any node to expand its direct dependencies. Each hop shows the unit, scope, and whether it is an equation, a root input, or a physics constant. Root inputs carry a gold badge because they are the visible modeling debt.

The tree loads from a pre-generated JSON snapshot of the registry.
Click a node button to toggle its direct dependencies open or closed.
Constants are marked to distinguish universal physics values.

Loading dependency data...

root_debt.dat

Root inputs are the visible unpaid invoices.

root-debt ranks unresolved root inputs by downstream blast radius. The point is not to pretend the largest family is bad. The point is to know which unknowns are currently expensive.

Total roots in the observed summary: 619.
Grouped root families in the observed summary: 151.
The heaviest shown family is physical.lithography.medium with total weight 3014.

physical.lithography.medium
weight 3014, roots 15

physical.lithography
weight 2185, roots 11

physical.lithography.source_plasma_drive
weight 1943, roots 8

physical.mosfet
weight 1866, roots 18

physical.process
weight 1293, roots 8

These bars normalize the five README weights against the top family. No new metric is being invented here.

CLI.exe

Use the command line as a microscope.

The package is still closer to a research instrument than a polished app. That is useful right now. Ask it what exists, what is unresolved, and where a claim bottoms out.

python -m gpu_stack.cli stats
python -m gpu_stack.cli verify --profile fast
python -m gpu_stack.cli resolve econ.cost.per_token --preset scenarios.dense_training_cost_fixture --trace --missing
python -m gpu_stack.cli root-debt --families --limit 5
python -m gpu_stack.cli scenario-report scenarios.dense_training_cost_fixture --json
python -m gpu_stack.cli next-work
python -m gpu_stack.cli experiment-protocol E001 --json
python -m gpu_stack.cli experiment-run E001 --scenario experiments/e001-beyond-one-datacenter/screening-scenario-v1.json --output result.json --observatory-output docs/data/e001-screening-v1.json

Root input

A variable with no defining value relation yet. It might be a real scenario boundary, or it might be physics that still needs decomposition.

Dependency cone

The upstream set of variables, equations, assumptions, and constants needed to explain one target.

Scenario fixture

A named set of explicit assignments used to resolve targets reproducibly. Synthetic fixtures are test anchors, not market claims.

MFU

Model FLOPs Utilization: how much of the theoretical model compute is actually useful during training.

HBM

High Bandwidth Memory: the fast memory sitting close to the accelerator package, often a ceiling for throughput.

PUE

Power Usage Effectiveness: total facility power divided by IT equipment power. Cooling and overhead show up here.

Good for now

Tracing a result into mechanics and evidence. Replaying explicit compute, communication, checkpoint, outage, and recovery events. Keeping observations, assumptions, priors, modeled values, and unmeasured claims visibly separate.

Not finished yet

E001 now has explicit recovery mechanics and three measured learning stages. LC3 held canonical work equal: adaptive continuation preserved learning and saved attempted work and opportunity ticks, but failed its frozen device-energy bound. This is not evidence that a frontier-scale multi-site run converges, and local GPU energy is not facility energy.

next_work.exe

The frontier program is six falsifiable questions.

1. Beyond One Datacenter · energy gate failedAt equal canonical work, adaptive continuation was learning-noninferior and saved work and scheduled time, but its measured RTX energy interval exceeded the frozen bound. Scale remains blocked.

2. Shape the Power Waveform · nextFactor checkpoint cadence from survivor continuation, attribute phase-level power, and test whether dependency-safe scheduling removes the energy penalty without losing the measured learning, work, or time gains.

3. Semantic Fault ToleranceAllocate canaries, replay, and redundancy by counterfactual learning harm instead of fault label.

4. Fluid Inference TopologyMeasure interaction gains and regime crossings when serving topology changes per request.

5. Architecture as a Datacenter VariableCo-design model modules and heterogeneous hardware under one facility power and time envelope.

6. Firm Grid-responsive InferenceMeasure meter-verified demand response with quality, tail latency, rebound, and hidden work inside the same boundary.