I Priced the Same Inference Workload on 4 GPU Clouds. Egress Was the Catch

I Priced the Same Inference Workload on 4 GPU Clouds. Egress Was the Catch
A reproducible 2026 cost model across AWS, Azure, GCP, and a sovereign regional provider, and where 2026-7-3 05:14:11 Author: hackernoon.com(查看原文) 阅读量:5 收藏

A reproducible 2026 cost model across AWS, Azure, GCP, and a sovereign regional provider, and where the money actually goes.

Every GPU cloud comparison I read stops at the same number: dollars per GPU-hour. It is the number on the pricing page, the number in the launch blog, the number people screenshot on X. And it is, at most, half of an inference bill.

The other half is the part nobody models. Data egress, plus a handful of network line items that never show up in the headline rate. On a workload that mostly trains in place, that half rounds to nothing. On a workload that serves, an inference API shipping tokens to users all day, it can be a third of what you pay, and it grows with every new user you celebrate.

So I built a model. One identical inference workload, priced across four GPU clouds using each provider's own published rates. No invoice, no NDA numbers, nothing invented, just arithmetic you can reproduce in a spreadsheet. Here is what fell out, and why the cheapest GPU-hour does not win.

The workload I priced

To compare anything fairly you have to pin down the workload, so here are the assumptions. Change them and the conclusion moves, which is the whole point.

One GPU, running 24/7 for one month, so 730 GPU-hours.
A self-hosted LLM inference API serving production traffic, not a batch training job that never phones home.
30 TB/month of outbound data (egress). That is a busy text API or a moderate multimodal one. I run a sensitivity range later, because this single input drives most of the result.
Prices normalized to per-GPU-hour, because the providers package GPUs very differently (more on that below).
FX at €1 = $1.08. Free-tier egress (the first 100 GB) ignored as rounding. Tiered egress rates applied.

Three layers make up the bill: compute, egress, and the hidden network multipliers. Let's take them one at a time.

Layer 1: Compute, the number everyone quotes

This is the easy part, and the part every comparison already covers. On-demand, per GPU-hour, as of June 2026:

Provider / instance	GPU	$/GPU-hr	Compute / mo (×730)
AWS P6-B200	B200	$12.36	$9,023
AWS P5 (p5.48xlarge)	H100	$12.29	$8,972
Azure ND H100 v5	H100	$12.29	$8,972
GCP A3 (a3-highgpu-8g)	H100	$10.98	$8,015
Oracle OCI (BM.GPU.B200)	B200	~$15.00	$10,950
Orion AI Factory (Max)	B200	$9.61 (€8.90)	$7,015

Two honesty notes before anyone replies.

These are not all the same chip. The AWS P6 and Orion rows are NVIDIA B200 (Blackwell). The AWS P5, Azure, and GCP rows are H100. B200 is a newer, faster, higher-memory part, so this is not silicon for silicon. I am comparing what you can rent for one GPU-hour today, not running a MLPerf bake-off. If anything, that makes the spread more interesting: the two B200 rows sit at opposite ends of the price range.

Packaging differs wildly. The hyperscaler flagship nodes are 8-GPU instances. A p5.48xlarge is eight H100s at roughly $98/hr, and you rent the whole box. Per-GPU-hour is the only fair way to line them up, but if your workload genuinely needs one GPU, "rent eight" is its own kind of tax. Some providers let you take a fraction of a GPU. The hyperscalers mostly don't at the top end, which is part of why teams over-provision and then wonder why utilization reports look so grim. As others have written on this site, your GPU is often lying to you about its real capacity, and paying for eight when you use one makes that worse.

A note on spot and committed pricing. Every number above is on-demand. Spot and preemptible rates can cut compute by 50 to 70 percent if your workload tolerates eviction, and one-year committed-use discounts land somewhere in between. Those levers are real, but they apply roughly evenly across providers, so they shift the absolute numbers without changing the shape of the comparison. They also do nothing for egress, which is the part of the bill that has no discount lever at all.

If compute were the whole story, you would shrug. A few dollars an hour between them, narrowed further by spot. It isn't.

Why egress even exists

Before the math, the why. Egress fees are not a cost-recovery mechanism, or not only one. Moving bytes is cheap. Egress pricing is, in large part, a switching cost: the more expensive it is to get your data out, the harder it is to leave. That is also why ingress (data going in) is almost universally free. Getting you in is the easy part. This is the same dynamic that turns optimistic cloud migrations into bill shock six months later, and it is why egress deserves a line in your forecast, not a footnote.

Layer 2: Egress, the number nobody quotes

Egress is what a cloud charges to move your data out, to the internet, to another region, to another cloud. It is metered by the gigabyte, and for an inference API every response you serve is egress. Tokens out, images out, audio out, all of it billed on the way to the user.

The rates look small until you multiply. Here is the worked example for 30 TB/month, so you can check it. 30 TB is 30,720 GB. On AWS the first 10,240 GB bill at $0.09 and the rest at $0.085, which is $921.60 plus $1,740.80, so about $2,662. Azure runs slightly lower at $0.087 then $0.083, about $2,591. GCP's default Premium tier is a flat $0.12, so 30,720 GB is about $3,686. These are cross-checked against the independent EgressCost.com index (AWS rates here, Azure rates here):

Provider	Egress rate	Egress / mo
AWS	$0.09, then $0.085/GB	~$2,662
Azure	$0.087, then $0.083/GB	~$2,591
GCP (internet, Premium)	$0.12/GB	~$3,686
Oracle OCI	$0 (egress fees removed Feb 2026)	$0
Orion AI Factory	€0 (zero egress)	$0

That is between $2,600 and $3,700 a month for the privilege of letting your own users receive your own model's output. It is recurring, it scales with success, and it is almost never in the slide that sold you the platform. Double your traffic and you double this number, with no volume relief worth mentioning until you are well into hundreds of terabytes.

Even the egress table understates it, because the sticker rate is not the bill. A few line items quietly ride along on the hyperscalers:

NAT Gateway data processing, $0.045/GB on AWS. If your GPU instances sit in a private subnet (they should), traffic through a NAT gateway is billed on top of egress. On 30 TB that is another ~$1,380/month, more than half the egress bill again, for a box most people forget is even in the path. EgressCost keeps a running tally of these on its NAT Gateway and hidden costs pages, and they add up fast.
Inter-AZ transfer at $0.01/GB in each direction, which is death by a thousand cross-zone calls between your services.
Public IPv4 charges, load-balancer capacity units, and managed-disk add-ons. Block storage runs roughly $80 to $100 per TB per month on a hyperscaler versus about €50 for the same terabyte on a regional provider.
Regional surcharges of 5 to 15 percent if you are not in the cheapest region, which you often cannot be for latency or compliance reasons.

None of these are scandals. They are just the difference between the number you were quoted and the number you pay, and they are exactly the kind of thing a serious cost comparison across the three big clouds has to account for before it means anything.

Same workload, all in

Stack the layers. One GPU, 730 hours, 30 TB out. I am leaving the NAT gateway and inter-AZ costs out of the totals to stay conservative, which means the hyperscaler columns are if anything understated:

Provider	Compute	Egress	All-in / mo	Egress as %
AWS P6-B200	$9,023	$2,662	$11,685	22.8%
AWS P5 (H100)	$8,972	$2,662	$11,634	22.9%
Azure H100	$8,972	$2,591	$11,563	22.4%
GCP H100	$8,015	$3,686	$11,701	31.5%
Oracle B200	$10,950	$0	$10,950	0%
Orion AI Factory (B200)	$7,015	$0	$7,015	0%

Two things jump out, and both are just the arithmetic:

Egress is roughly 22 to 31 percent of the big-three (AWS, Azure, GCP) bill in this scenario, a line item that is simply $0 on providers that do not meter egress, including Oracle, which scrapped egress fees entirely in February 2026, and Orion. Add the NAT gateway back in and several of those columns cross 30 percent comfortably.
Once you also account for lower B200 compute, the zero-egress option lands about 40 percent under the hyperscaler B200 bill for the same single-GPU month.

When egress dominates, and when it doesn't

Before anyone migrates a fleet over one table: the 30 TB assumption is doing a lot of work. Slide it around.

Monthly egress	Egress as % of bill	Read
~10 TB	9 to 11%	Compute still rules; convenience tax is probably worth it
~30 TB (base)	22 to 31%	Real money; worth an architecture conversation
~50 TB	32 to 40%	Egress is the dominant variable cost; the decision often flips

If your workload trains in place and rarely ships large payloads, ignore most of this. If it serves, especially anything multimodal where outputs are images or video, egress belongs in your forecast next to GPU-hours. This is the same trap behind a lot of AI compute overspend: teams optimize the number on the pricing page and never instrument the one that actually grows with usage.

What this means for architecture

The useful takeaway is not "provider X wins." It is that egress is an architectural decision, not a billing surprise. A few honest patterns.

Zero egress, by itself, is not rare. Cloudflare R2 charges nothing for egress, Hetzner bundles 20 TB per server, and Oracle went further than anyone by removing outbound data-transfer charges across all regions in February 2026. Oracle even pairs that with B200 GPUs, which is why it sits in the chart above as a genuine zero-egress major cloud. So any vendor that pitches "zero egress" as a unique selling point is hoping you have not checked.

So the differentiator is never one feature, it is the stack of them. Oracle proves you can get current-generation B200 plus zero egress from a major cloud, but it is a US-headquartered provider subject to the CLOUD Act, and its B200 sticker (~$15/GPU-hr) is the highest in the comparison. Neoclouds like Lambda go the other way, undercutting everyone on raw B200 at roughly $5 to $6 per GPU-hour, but without the residency guarantees or hands-on support some workloads need. What gets genuinely scarce is the full combination: a modern GPU, unmetered egress, low single-digit-millisecond latency to a specific region, data residency outside US CLOUD Act reach, and a price that is not the highest on the board, all at once. When you actually need all of those, the list of options collapses fast, and the per-hour sticker stops being the deciding factor.

Hyperscaler breadth still wins plenty of the time. If you live inside a provider's managed services, need 200 other products next to your GPUs, or burst across dozens of regions on demand, the egress premium is the cost of an ecosystem you are genuinely using. The mistake is not choosing a hyperscaler. The mistake is choosing one without ever pricing the second half of the bill.

Model your own bill

You do not have to trust my 30 TB. You have to find yours. A five-line checklist:

Estimate real monthly egress. Average response size times requests, plus model and data sync and log shipping. This is the number that decides everything, so measure it rather than guessing.
Add the network tax. NAT gateway data processing, inter-AZ, public IPv4. Pull them from the same pricing pages as the GPU rate, not from memory.
Normalize per GPU, and check whether you are forced to rent a whole 8-GPU node for one GPU of work.
Run the sensitivity. Recompute at half and double your egress estimate. If the ranking flips, egress is your real cost driver and belongs at the top of the decision.
Re-pull the rates. GPU prices and FX move monthly. Every figure here is sourced and current to June 2026, but verify against the live pages before you commit.

The GPU-hour is the number everyone argues about. The bill is decided by the number nobody models. Price both, and the comparison usually looks nothing like the pricing page.

All pricing reflects published on-demand rates as of June 2026, cited inline. This is a modeled comparison, not financial advice. Reproduce it with your own workload before making infrastructure decisions.

Disclosure: the author's agency works with companies in the cloud and AI-infrastructure space, including one referenced here.

文章来源: https://hackernoon.com/i-priced-the-same-inference-workload-on-4-gpu-clouds-egress-was-the-catch?source=rss
如有侵权请联系:admin#unsafe.sh