Editing Openai/69191dd5-6360-800d-b47b-7ebcc8b69dc8 (section)

==== 2. Cost of a Zebra-Llama-style 8B hybrid on MI450X ====

Now compare that to post-training a hybrid (1–8B) starting from an existing foundation.

===== 2.1 GPU-hours for a hybrid =====

AMD HybridLM / Zebra-Llama prototypes report: ROCm Blog<ref>{{cite web|title=ROCm Blog|url=https://rocm.blogs.amd.com/artificial-intelligence/hybrid-models%2C-mla%2C/README.html|publisher=ROCm Blog|access-date=2025-11-16}}</ref>
* 1B hybrid (HybridLM), 7B tokens, trained in ~17 hr on 8 × MI300 - That’s 136 GPU-hours.
* Other variants at 10B tokens are in the same order of magnitude (a few hundred GPU-hours).

If we scale that to an 8B hybrid:
* 8× parameters ⇒ ~8× compute.
* 7B → 11B tokens ⇒ ~1.6×.
* Rough GPU-hr:

136×8×1.6≈1,700 GPU-hours136 \times 8 \times 1.6 \approx 1{,}700 \text{ GPU-hours}136×8×1.6≈1,700 GPU-hours
So a Zebra-Llama-8B-class post-train is probably in the 1–3k GPU-hr range. I’ll use 1,700 GPU-hr as a mid estimate.

===== 2.2 $/GPU-hr for MI450X =====

We don’t have MI450X cloud prices yet, but we do have MI300X:
* MI300X on-demand ranges from $1.5–$3.3 / GPU-hr depending on provider (Vultr, TensorWave, RunPod, etc.). Thunder Compute<ref>{{cite web|title=Thunder Compute|url=https://www.thundercompute.com/blog/amd-mi300x-pricing|publisher=Thunder Compute|access-date=2025-11-16}}</ref>

MI450X will be newer and denser; to be conservative I’ll assume:
* Effective range: $2.5–$3.0 / GPU-hr (similar to “nice” MI300X or discounted H100).

===== 2.3 Cost per 8B hybrid =====

Using 1,700 GPU-hr:
* Low (2.5 $/hr): 1,700×$2.5≈$4,2501{,}700 \times \$2.5 \approx \$4{,}2501,700×$2.5≈$4,250
* High (3.0 $/hr): 1,700×$3.0≈$5,1001{,}700 \times \$3.0 \approx \$5{,}1001,700×$3.0≈$5,100

So one full 8B Zebra-Llama-style post-train is on the order of $4–5k of GPU time.

Power cost: if MI450X is ~1kW while training, then:

1,700 hr×1 kW=1,700 kWh1{,}700\text{ hr} \times 1\text{ kW} = 1{,}700\text{ kWh}1,700 hr×1 kW=1,700 kWh
At $0.14/kWh → $238. So energy is basically noise compared to GPU cost for small models.

: Takeaway: A full 8B hybrid post-train is in the low thousands of dollars in raw compute, not millions.