Inference Costs Are Not Sustainable
嗯,用户让我帮忙总结一篇文章,控制在100字以内,而且不需要特定的开头。首先,我得仔细阅读文章内容,抓住主要观点。 文章提到Anthropic的费用调整,用户觉得这可能影响他们的使用成本。然后作者讨论了多模型框架和更便宜的模型的重要性,比如20倍 cheaper的Haiku。还提到了技术解决方案,比如Cerberus,以及混合付费和本地模型的策略。 用户可能是一位开发者或者企业用户,他们关心成本控制和模型效率。深层需求可能是寻找如何应对未来模型成本上升的方法。 所以总结的时候要涵盖费用调整、多模型框架、更便宜的模型以及技术解决方案这几个点,同时保持简洁明了。 </think> 文章讨论了AI模型推理成本上升的问题,并指出未来需要更便宜的多模型框架或更经济的单一平台模型来应对不可持续的成本。作者建议采用新技术降低成本、开发更小更便宜的高质量模型,并结合本地与云端混合模式来优化资源使用。 2026-4-6 08:0:44 Author: danielmiessler.com(查看原文) 阅读量:12 收藏

We're about to need multi-model harnesses, much cheaper models, or both

April 5, 2026

Inference Costs Are Not Sustainable

Welp, I'm now getting through a quarter of my week's MAX subscription in a few hours of work with Claude Code.

I think Anthropic is smart, and I don't think they're trying to screw us. I think they're honestly just trying to bring inference charges inline with reality.

And that should be a wake-up call for all of us.

I think we're about to need multi-model harnesses (or FAR cheaper good models within a single platform), like 20x cheaper Haiku or whatever.

This is not sustainable.

The better the harnesses and models get, the more people will build. Which will require more and more inference.

I think the real solutions here are going to come from:

  • Technologies like Cerberus, et al which make inference many times cheaper and faster

  • A major push by the major labs to produce higher quality in much smaller/cheaper models

  • Harnesses moving to a hybrid of paid/cloud and local/cheap models.

If this continues I'm going to have to build my own custom version of PAI using Pi, that can use local models on my dual 4090s, models like Gemini-Flash, models like Gemma 4, etc.

And most importantly, a new hook infra that rates the task and properly routes to the right model.

  • Max: Opus / GPT-5.4
  • High: Sonnet
  • Medium: Haiku
  • Low: (Local) Whatever the latest best OSS model is that can run on my NVIDIA / Mac Silicon

I think we all knew this was coming; I just thought it would be in 2027 sometime. And more gentle.

It appears to be very close now because this much subsidization doesn't seem sustainable to Anthropic, which means it's probably not sustainable for OpenAI either.

My recommendation: Start planning your Multi / Local / Cheaper model strategy for your harness.


文章来源: https://danielmiessler.com/blog/inference-costs-are-not-sustainable?utm_source=rss&utm_medium=feed&utm_campaign=website
如有侵权请联系:admin#unsafe.sh