A reproducible 2026 cost model across AWS, Azure, GCP, and a sovereign regional provider, and where the money actually goes.
Every GPU cloud comparison I read stops at the same number: dollars per GPU-hour. It is the number on the pricing page, the number in the launch blog, the number people screenshot on X. And it is, at most, half of an inference bill.
The other half is the part nobody models. Data egress, plus a handful of network line items that never show up in the headline rate. On a workload that mostly trains in place, that half rounds to nothing. On a workload that serves, an inference API shipping tokens to users all day, it can be a third of what you pay, and it grows with every new user you celebrate.
So I built a model. One identical inference workload, priced across four GPU clouds using each provider's own published rates. No invoice, no NDA numbers, nothing invented, just arithmetic you can reproduce in a spreadsheet. Here is what fell out, and why the cheapest GPU-hour does not win.
To compare anything fairly you have to pin down the workload, so here are the assumptions. Change them and the conclusion moves, which is the whole point.
Three layers make up the bill: compute, egress, and the hidden network multipliers. Let's take them one at a time.
This is the easy part, and the part every comparison already covers. On-demand, per GPU-hour, as of June 2026:
|
Provider / instance |
GPU |
$/GPU-hr |
Compute / mo (×730) |
|---|---|---|---|
|
B200 |
$12.36 |
$9,023 | |
|
H100 |
$12.29 |
$8,972 | |
|
Azure ND H100 v5 |
H100 |
$12.29 |
$8,972 |
|
H100 |
$10.98 |
$8,015 | |
|
B200 |
~$15.00 |
$10,950 | |
|
Orion AI Factory (Max) |
B200 |
$9.61 (€8.90) |
$7,015 |
Two honesty notes before anyone replies.
These are not all the same chip. The AWS P6 and Orion rows are NVIDIA B200 (Blackwell). The AWS P5, Azure, and GCP rows are H100. B200 is a newer, faster, higher-memory part, so this is not silicon for silicon. I am comparing what you can rent for one GPU-hour today, not running a MLPerf bake-off. If anything, that makes the spread more interesting: the two B200 rows sit at opposite ends of the price range.
Packaging differs wildly. The hyperscaler flagship nodes are 8-GPU instances. A p5.48xlarge is eight H100s at roughly $98/hr, and you rent the whole box. Per-GPU-hour is the only fair way to line them up, but if your workload genuinely needs one GPU, "rent eight" is its own kind of tax. Some providers let you take a fraction of a GPU. The hyperscalers mostly don't at the top end, which is part of why teams over-provision and then wonder why utilization reports look so grim. As others have written on this site,
A note on spot and committed pricing. Every number above is on-demand. Spot and preemptible rates can cut compute by 50 to 70 percent if your workload tolerates eviction, and one-year committed-use discounts land somewhere in between. Those levers are real, but they apply roughly evenly across providers, so they shift the absolute numbers without changing the shape of the comparison. They also do nothing for egress, which is the part of the bill that has no discount lever at all.
If compute were the whole story, you would shrug. A few dollars an hour between them, narrowed further by spot. It isn't.
Before the math, the why. Egress fees are not a cost-recovery mechanism, or not only one. Moving bytes is cheap. Egress pricing is, in large part, a switching cost: the more expensive it is to get your data out, the harder it is to leave. That is also why ingress (data going in) is almost universally free. Getting you in is the easy part. This is the same dynamic that turns optimistic cloud migrations into
Egress is what a cloud charges to move your data out, to the internet, to another region, to another cloud. It is metered by the gigabyte, and for an inference API every response you serve is egress. Tokens out, images out, audio out, all of it billed on the way to the user.
The rates look small until you multiply. Here is the worked example for 30 TB/month, so you can check it. 30 TB is 30,720 GB. On AWS the first 10,240 GB bill at $0.09 and the rest at $0.085, which is $921.60 plus $1,740.80, so about $2,662. Azure runs slightly lower at $0.087 then $0.083, about $2,591. GCP's default Premium tier is a flat $0.12, so 30,720 GB is about $3,686. These are cross-checked against the independent
|
Provider |
Egress rate |
Egress / mo |
|---|---|---|
|
AWS |
$0.09, then $0.085/GB |
~$2,662 |
|
Azure |
$0.087, then $0.083/GB |
~$2,591 |
|
GCP (internet, Premium) |
$0.12/GB |
~$3,686 |
|
Oracle OCI |
$0 (egress fees removed Feb 2026) |
$0 |
|
Orion AI Factory |
€0 (zero egress) |
$0 |
That is between $2,600 and $3,700 a month for the privilege of letting your own users receive your own model's output. It is recurring, it scales with success, and it is almost never in the slide that sold you the platform. Double your traffic and you double this number, with no volume relief worth mentioning until you are well into hundreds of terabytes.
Even the egress table understates it, because the sticker rate is not the bill. A few line items quietly ride along on the hyperscalers:
None of these are scandals. They are just the difference between the number you were quoted and the number you pay, and they are exactly the kind of thing a
Stack the layers. One GPU, 730 hours, 30 TB out. I am leaving the NAT gateway and inter-AZ costs out of the totals to stay conservative, which means the hyperscaler columns are if anything understated:
|
Provider |
Compute |
Egress |
All-in / mo |
Egress as % |
|---|---|---|---|---|
|
AWS P6-B200 |
$9,023 |
$2,662 |
$11,685 |
22.8% |
|
AWS P5 (H100) |
$8,972 |
$2,662 |
$11,634 |
22.9% |
|
Azure H100 |
$8,972 |
$2,591 |
$11,563 |
22.4% |
|
GCP H100 |
$8,015 |
$3,686 |
$11,701 |
31.5% |
|
Oracle B200 |
$10,950 |
$0 |
$10,950 |
0% |
|
Orion AI Factory (B200) |
$7,015 |
$0 |
$7,015 |
0% |
Two things jump out, and both are just the arithmetic:
Before anyone migrates a fleet over one table: the 30 TB assumption is doing a lot of work. Slide it around.
|
Monthly egress |
Egress as % of bill |
Read |
|---|---|---|
|
~10 TB |
9 to 11% |
Compute still rules; convenience tax is probably worth it |
|
~30 TB (base) |
22 to 31% |
Real money; worth an architecture conversation |
|
~50 TB |
32 to 40% |
Egress is the dominant variable cost; the decision often flips |
If your workload trains in place and rarely ships large payloads, ignore most of this. If it serves, especially anything multimodal where outputs are images or video, egress belongs in your forecast next to GPU-hours. This is the same trap behind a lot of
The useful takeaway is not "provider X wins." It is that egress is an architectural decision, not a billing surprise. A few honest patterns.
Zero egress, by itself, is not rare.
So the differentiator is never one feature, it is the stack of them. Oracle proves you can get current-generation B200 plus zero egress from a major cloud, but it is a US-headquartered provider subject to the CLOUD Act, and its B200 sticker (~$15/GPU-hr) is the highest in the comparison. Neoclouds like
Hyperscaler breadth still wins plenty of the time. If you live inside a provider's managed services, need 200 other products next to your GPUs, or burst across dozens of regions on demand, the egress premium is the cost of an ecosystem you are genuinely using. The mistake is not choosing a hyperscaler. The mistake is choosing one without ever pricing the second half of the bill.
You do not have to trust my 30 TB. You have to find yours. A five-line checklist:
The GPU-hour is the number everyone argues about. The bill is decided by the number nobody models. Price both, and the comparison usually looks nothing like the pricing page.
All pricing reflects published on-demand rates as of June 2026, cited inline. This is a modeled comparison, not financial advice. Reproduce it with your own workload before making infrastructure decisions.
Disclosure: the author's agency works with companies in the cloud and AI-infrastructure space, including one referenced here.