Understanding Inference and the Economics of Enterprise AI
A Framework for Investors: What Is Inference? Why Does It Matter? How Does It Impact Value Creation?
The Bottom Line for Investors
Three takeaways are worth keeping in mind as you evaluate Agentic AI investment opportunities:
- The revenue opportunity represents a paradigm shift. We believe the agentic expansion of the addressable market from software budgets into labor and services budgets presents a material opportunity for software companies that transform to deliver valuable enterprise agentic solutions.
- Inference is becoming an important P&L line item in the AI era. Inference costs can scale as companies utilize AI to enhance efficiency across their organization, and as they develop enterprise agentic solutions to expand value for their end-customers. The cost is already material.
- Inference can be strategically optimized. The same workload can cost dramatically different amounts depending on how it is architected. The capability to do this work, in our view, is becoming a real source of competitive differentiation. The companies that capture AI-driven operational efficiencies and TAM/revenue expansion from agentic enterprise solutions are poised to improve their P&L’s. Those that combine that upside while managing the cost of inference are poised to capture more than their fair share of the economic rent- maximizing both revenue and profitability.
A New Economic Reality for AI and Software Companies
Prior technological transitions have reshaped how value gets created and captured by software. We believe the transition to Agentic AI could be the most consequential value unlock to date.
Unlike traditional software, which automates predefined workflows, Agentic AI systems can reason, make decisions and execute multistep tasks with minimal human direction. As a result, software is evolving from a tool that supports work into a digital worker that executes, and the addressable market may expand accordingly. Third-party research estimates a $3 trillion opportunity for software as AI agents take on tasks previously performed by people.1
However, this revenue opportunity arrives with a new structural cost. AI requires compute at every stage of its lifecycle, from model training and tuning as AI labs develop new models, to the inference required to run AI models once launched. Every action taken by an AI agent requires inference, and that inference has an associated cost. For software companies delivering Agentic AI products at scale, inference is emerging as a significant line item in the P&L.
We believe the software companies that thrive in this era will need to adopt Agentic AI in the way it creates and expands value for its end customers, while also managing the cost of running those agents – inference is not a fixed input.
This paper explains what inference is, why it matters and how it can be managed.
Inference Is the Variable Cost of Running AI
Agentic AI systems are powered by AI models, the foundational technology that enables these systems to reason, generate language and take action. These models are developed by specialized companies, including Anthropic, Google and OpenAI. Just as SaaS companies pay cloud providers like Amazon Web Services for hosting, software companies that build AI products pay these providers for their models and intelligence. Understanding how models are built and run is essential to understanding the cost structure of Agentic AI software.
AI models have two cost components. The first cost is fixed and paid by the company that builds the model. The second is variable and paid by everyone who uses the model.
- Training is the one-time cost of building a model. Training the most advanced models – what the industry calls frontier models – now costs hundreds of millions of dollars, borne by the companies that build them.2
- Inference is the variable cost incurred every time a model is used. Unlike training, it is paid by whoever runs the model, whether that be an individual user or a business.
To understand why inference is becoming such an important line item for software companies, it is helpful to understand the mechanics of how inference is priced.

Source: ICONIQ Capital, State of AI: Bi-Annual Snapshot — The Execution Era of AI (January 2026). Based on survey of 202 enterprise AI leaders across product stages. “Other” includes compliance and miscellaneous costs. Model inference rises from 20% of total AI product cost at pre-launch to 23% at scaling stage in this sample; talent declines from 32% to 26% over the same progression. Findings reflect early-stage data on a rapidly evolving cost structure.
Inference Is Priced and Billed in Tokens
Model providers charge for inference based on usage, measured in units called tokens. A token is a small piece of text, roughly three-quarters of a word. The phrase, “Hello, how can I help you?” is about seven tokens. Every time a model is used, it processes input tokens (such as a question and any context) and generates output tokens (the answer). The model provider charges for both.
Today, the most cutting-edge, advanced models charge between $1 and $75 per one million tokens, depending on the model and whether the tokens are input or output.3 Older or smaller models cost a fraction of that. As a point of reference, a single ChatGPT-style query currently consumes roughly half a cent of compute. Individually, the cost is small, but as consumption rises – particularly among enterprise customers – it can become consequential.
Enterprise Inference Consumption Is Rising Rapidly
As an individual consumer, you may pay an AI model provider (like Anthropic or OpenAI) a monthly subscription fee. Enterprise pricing works differently.
When an enterprise software company embeds an AI model within its platform, the software company pays the model provider for every inference call its customers generate. The cost is variable and grows with every user, every query and every action an agent takes.
Enterprise AI workloads consume far more tokens per task than individual use. An individual request to draft an email might use a few thousand tokens. But an enterprise-grade agent that processes an insurance claim by reasoning through the steps, looking up policy details and calling specialized sub-agents can use five to 30 times more tokens to complete the task.4 Multiply that across thousands of employees running agents, and inference can become a material variable cost in the business.
According to a Deloitte survey of 550 U.S. enterprise leaders, many organizations already exceed 10 billion tokens per month, and the share expecting to surpass 100 billion is projected to triple by 2028.5 The average enterprise spent roughly $7 million on AI model usage in 2025, nearly triple the $2.5 million spent in 2024,6 and some enterprises are now reporting monthly inference bills in the tens of millions of dollars.7

Source: McKinsey & Company, The future of AI workloads (February 2026), drawing on the McKinsey Data Center Demand Model. Supporting analysis in The next big shifts in AI workloads and hyperscaler strategies (December 2025). Global data center demand projected to grow from 82.3 GW in 2025 to 219 GW by 2030 (22% CAGR). Inference workload demand grows 35% annually from 20.9 GW to 93.3 GW, overtaking non-AI workloads by 2029. Training grows 22% annually to 62.2 GW. Forecast reflects a midrange “continued momentum” scenario; McKinsey notes that the precise growth trajectory of inference workloads remains uncertain given improving hardware efficiency curves.
Managing AI Cost Is Critical to Capturing Value
Software’s Traditional Cost Structure
Software has commanded a premium for the last quarter-century because of an enviable P&L dynamic: build it once, sell it many times and incur near-zero incremental cost per customer.
A traditional SaaS company spends roughly 33 to 48% of revenue on sales and marketing, 23 to 30% on R&D (mostly engineering), 15 to 25% on cost of goods sold (COGS, primarily cloud hosting), and 10 to 15% on general and administrative costs (G&A).8 The result is gross margins of 75 to 80%, with the largest cost being people, and the largest variable cost being the cloud bill paid to hyperscalers.9
Agentic AI Changes Both Sides of the Ledger
We believe the shift to Agentic AI can impact both sides of the P&L: revenue has the potential to expand and operating costs can compress as AI drives operational efficiency. Inference is emerging as the line item that largely influences how much of the upside revenue reaches the profit line.
Operating costs compress. The functions that have historically consumed the most headcount in software companies – R&D, sales and marketing, and customer support – are where AI is already delivering measurable efficiency gains. Across Vista’s portfolio, R&D FTEs (full-time employees) per $10 million of revenue have decreased approximately 9.5% since 2023, and sales and marketing FTEs per $10 million of revenue have fallen by more than 30%.10
Revenue expands. In SaaS, revenue potential was capped by headcount, or how many licenses a corporate customer bought for their employees. Agentic AI removes that constraint. When software performs work previously done by humans, the addressable market of potential revenue extends beyond the software budget and into the labor and services budget. Bain & Company projects $5 to $7 trillion in cumulative value flowing to AI-enabled software by 2030 – a more than 4x expansion from current levels and potentially the largest addressable-market expansion in software’s history.11
A new variable cost emerges: AI COGS. Inference is the largest component of a new cost category: AI cost of goods sold. Unlike traditional cloud hosting, which scales modestly with users, AI COGS scales directly with usage intensity. Every agent action, every reasoning step and every model call adds cost. For agentic products, inference can already exceed hosting as a cost line, with one analysis finding inference costs average 23% of revenue at AI-native B2B companies.12
The Margin Question
This is the central economic tension of the agentic transition: revenue expands and operating costs compress, but a new variable cost emerges. Left unmanaged, the new revenue from agentic products is captured by model providers and hyperscalers rather than by the software company. Inference optimization is therefore one of the most consequential value creation levers in the transition from SaaS to Agentic AI, and margin advantage may accrue to companies that capture the revenue upside while treating AI COGS as a core cost discipline.

Source: Vista analysis as of February 2026. Provided for illustrative purposes only. Ranges provided such as revenue, COGS, gross profit margins, operating expenses, and free cash flow ranges are illustrative to reflect mature, high quality software companies in each respective period. The information presented herein is based upon Vista’s analysis and assumptions and reflect Vista’s beliefs. There can be no assurances that any plans, estimates or expectations noted herein will occur as described, if at all. Moreover, there can be no assurances that historical trends will continue. Statements regarding the impact of artificial intelligence represent opinions and are not statements of fact. Any correlations or relationships shown between market movements and AI-related developments are illustrative and do not imply causation. Please see the Important Disclosures for additional important information.
What Disciplined AI Cost Management Looks Like
If inference is the critical new cost line, what can actually be done about it? Our belief, drawn from the agents in production across over 50 of our portfolio companies, is that there are three key levers for driving value. We observe that the same workload, delivered with the same accuracy, can cost dramatically different amounts depending on how these levers are pulled.
Model selection. Not every task requires the most powerful and most expensive model. Models vary in cost per token by 10x or more.13 Frontier models are appropriate for tasks that genuinely require state-of-the-art reasoning that most enterprise workflows do not. For example, a claims-processing agent that extracts data from a standard form and checks it against policy terms does not need the same model that powers a research assistant parsing thousands of documents. A well-designed enterprise AI system uses a router that classifies each incoming request and directs it to the cheapest model that can handle it effectively. In many cases, the most cost-effective option is an open-source model – a model whose underlying code is publicly available, allowing companies to run it on their own infrastructure rather than paying a model provider per token. With proper tuning and guardrails, open-source models can perform at or near frontier benchmarks for many enterprise tasks while reducing costs by 40 to 90%.14
Infrastructure. Models run on chips housed in data centers. Based on our analysis, frontier models running on general-purpose chips in a commercial cloud are among the most expensive options, because the chips require significant amounts of energy, liquid cooling and specialized physical infrastructure. For many enterprise workloads, it’s our belief that simpler infrastructure that is purpose-built for optimized, lower-cost models can often deliver comparable quality outputs while significantly reducing costs. Our research on running Agentic AI workloads has found that disciplined inference management strategies in aggregate can deliver cost reductions of 80% or more, with accuracy within one to two percent of the most expensive alternative.15
Agent design. Every action an agent takes sends a package of instructions to the model, including context, history, tools and a task description. Poorly designed agents send bloated, redundant packages, while well-designed agents strip instructions to their essentials, reuse shared context and avoid unnecessary steps. Vista’s internal research found that the way an agent reuses context can reduce inference costs by 47 to 99% depending on the workflow, with the optimization of agent prompts reducing inference by an additional 15 to 40%.16
Vista’s Agentic Factory has built capabilities across each of these levers in partnership with our portfolio companies.
Rethinking Software Benchmarks in an AI-Driven Market
AI Investment Landscape Primer


