3d rendered image, representing the benefits of open source models

Do you need a pricey LLM for everything? 3 scenarios when open source models make more sense

Enterprise AI costs are soaring – but not all use cases require the most expensive model. Here’s how open source models can help you cut costs without cutting corners.


You might expect the biggest worries about generative AI to be job displacement or accuracy. Those are valid – but they’re not what’s keeping AI decision-makers up at night.

According to the 2025 State of Generative AI Benchmark Report, the top-growing concern is deployment cost, which has skyrocketed a whopping 18X since 2023. Just two years ago, only 3% of surveyed enterprise leaders called AI cost a “major” or “extreme” concern. Today, it’s 55%.

AD 4nXdOXfICDr uSFehdgztvPwvqs7eDzdid8Nlw4vEqx

That’s a wake-up call. Many businesses started experimenting with free or trial-tier tools like ChatGPT in 2023, then quickly realized that per-query pricing adds up when you’re operating at scale. A few thousand queries a day? Manageable. Hundreds of thousands? That’s a different story.

This trend aligns with data from payments startup Ramp, which found a slight dip in companies spending on AI products like ChatGPT between May and June 2024 – suggesting businesses are actively seeking more cost-effective solutions.

As Lucidworks CEO Mike Sinoway put it in a recent roundtable discussion: “Three years ago, we all kind of thought these things would be free. Now, cost is right behind data security as a top concern.”

AI isn’t optional – but spending smarter is top priority

But here’s the thing: the conversation shouldn’t be about using less AI. It should be about using AI more strategically. Despite the sticker shock, pulling back on AI isn’t really an option. The conversation has shifted from “should we use AI?” to “how can we afford to use it at scale and sustainably?”

The answer lies in smarter architecture: using the right models for the right tasks, orchestrating different types of models, and avoiding one-size-fits-all approaches.

3 scenarios where open source models make more sense

1. Not every query deserves a $0.02 token bill

It’s tempting to run everything through your “best” model, usually a commercial LLM like GPT-4, Claude, or Gemini. But that’s like using a luxury SUV to deliver envelopes.

Many everyday use cases (summarization, classification, document tagging) don’t require the most powerful (or expensive!) model. An open source model like LLaMA or Mistral might be just as accurate for that task, at a fraction of the cost.

Consider a typical enterprise search application with RAG. Out of eight common steps, only one or two clearly benefit from an expensive generative model:

  • Document chunking: May not need AI at all
  • Embedding generation: Efficient embedding models work well
  • Document summarization: Good fit for generative models, but doesn’t need the most expensive option
  • Concept tagging: Specialized classification models are more efficient than general-purpose LLMs
  • Query processing: Can use the same embedding model as documents
  • Answer generation: This is where you might want your premium commercial model
  • Snippet extraction: Often doesn’t require AI
AD 4nXfi9xTl5FU18wuHaoS4nGakwXttjEn4Y5eKFb27 LxIfPcgbIKWmz0etpXD0spWU9DRfnBtunomMOOGjKbYMqpeAo26bOxT5cpJkG

2. You’re probably using too few models

According to our benchmark data, most enterprises are still using only one or two models. And often, they’re the commercial ones. That means every query goes through the same expensive LLM, regardless of its complexity.

AD 4nXeFzVkp Jv3ILCBlR0Ag78mjFPzNDlU6aTqgWbkRvf74lL7aU Hdn78ZautWPG9lFsXR wfUYgt wgEE1mUcQ yVGi7t9qOf egq I9bUpuRH696 TTOutRkzqDnAEVdrbwuzt 9Q

Smart AI orchestration involves using multiple model types, each optimized for a specific step in the pipeline, from indexing to query classification to generation. By distributing the workload across fit-for-purpose models, you can dramatically reduce infrastructure costs without sacrificing quality.

3. Open source models reduce vendor lock-in – and you have more commercial options than ever

Open source doesn’t just save money, it can also add flexibility. You’re not tied to one provider’s roadmap, pricing model, or service level agreement. You control when to upgrade, fine-tune, or scale.

But here’s what’s changed since 2023: while the premium models (GPT-5, Claude, Gemini) have gotten more expensive, there’s been a proliferation of cheaper commercial alternatives from those same providers. Think GPT-4o Mini and Google’s Gemini Flash. It’s also important to note that as competition heats up, more and more providers are offering hosted versions of open source models at competitive rates, while newer entrants are launching their own models at aggressive price points.

This is the beauty of vendor competition and supply-demand economics in action. More options mean providers have to compete on price, not just performance. You’re no longer stuck choosing between “expensive but good” and “free but complicated.” There’s now a whole spectrum of commercial models at different price points, giving you leverage in negotiations and alternatives if your primary provider raises prices.

That’s especially valuable for:

  • Internal search and analytics tools
  • Overnight batch jobs like indexing
  • Use cases with tight security, compliance, or data residency requirements

“If someone’s shopping for a car or a refrigerator or diamond jewelry, yeah, spending a lot on a large language model from a commercial source is worth it,” Sinoway explained. “If somebody’s shopping for pencils or paper clips, you don’t want to spend that much money on a query.”

Open source vs. commercial models: Understanding the trade-offs

Open Source ModelsCommercial Models
Examples: Meta’s LLaMA, Mistral, FalconExamples: OpenAI (GPT), Anthropic (Claude), Google (Gemini)
Cost: Free to use; run on your infrastructure, with some commercial restrictions.Cost: Pay-per-token or subscription
Deployment: Self-hosted (cloud or on-premises)Deployment: Hosted by vendor
Scalability: Custom, no per-token feesScalability: Vendor-managed, priced per use
Security: Full control, harder to manageSecurity: Easier, but depends on vendor practices
Performance: Often excellent with tuningPerformance: Pre-optimized, easier to use out of the box

Questions to ask before choosing a model

Before defaulting to a commercial LLM, consider these key questions:

  • What type of prediction do you need? (Classification, summarization, generation?)
  • Does the model deliver satisfactory performance for your use case?
  • Does the model need to explain its answers?
  • What’s your cost per prediction, and is it sustainable at your volume?
  • Does this use case require real-time performance, or can it run in batch?
  • Are there latency requirements?

The goal is to find the model with satisfactory performance at the lowest cost per prediction while meeting your specific requirements. This won’t always be a generative LLM, and it certainly won’t always be the most expensive one.

What is AI orchestration—and why it matters

This is where AI orchestration becomes crucial. AI orchestration refers to the practice of managing multiple AI models and services efficiently within a workflow. Think of it like routing traffic on a busy highway – some vehicles (queries) need the express lane (commercial LLMs), while others are fine on local roads (open source models).

Rather than treating every query the same way, intelligent orchestration routes different tasks to the most appropriate and cost-effective model.

Good orchestration addresses several key challenges:

  • Which model should handle which query?
  • How can we scale this cost-effectively?
  • How do we track usage across models?
  • How do we host and route between models with minimal complexity?

For example, you might use an open source model for overnight document processing, a small language model running on your own servers for quick classifications, and reserve expensive commercial models only for complex natural language tasks that require real-time responses.

Why AI costs are spiraling

Most companies are using expensive, commercial models for everything. Our research shows that 80% of deployments rely on pay-per-token services like ChatGPT or Gemini, while only 20% incorporate open source alternatives like Meta’s LLaMA.

“Three years ago, we all kind of thought these things would be free,” said Sinoway. “ChatGPT was giving you a free account, as was Google and others, or then it was $19 a month to sign up. Now we found out these things aren’t free.”

The infrastructure demands are staggering. Large language models require significant GPU compute power, especially for real-time applications. A single high-end GPU can cost $10,000-40,000, and most applications need multiple GPUs running continuously. When you’re processing thousands of requests daily, cloud inference costs compound quickly.

But the bigger issue is architectural. Most AI implementations use multiple models… one for indexing, another for retrieval-augmented generation (RAG), another for natural conversation, and yet another for query triaging. In a search use case, for example, you wouldn’t use an LLM at all for retrieval or indexing. Instead, you would only add an LLM when you want to add a conversational experience. 

Simplifying the AI stack reduces risk

There’s another often-overlooked benefit to strategic model selection: reduced operational complexity. Each additional AI services provider introduces new concerns around data residency, privacy, security, and incident response.

Each new AI vendor you integrate with is another trust point… another provider whose policies around data residency, privacy, and compliance you must vet and manage. Working with fewer providers means fewer trust relationships to manage, simplified compliance requirements, and reduced risk exposure. This operational simplification can be just as valuable as direct cost savings.

The bottom line: Save money by matching the right model to the task

The AI cost crisis isn’t a reason to slow down adoption. It’s a signal to get smarter about implementation. You don’t need a premium LLM for every task. The key is to think beyond “just use ChatGPT” and build a system that:

  • Uses commercial models only when they’re truly needed
  • Leverages open source for tasks like embedding, tagging, and indexing
  • Orchestrates intelligently to reduce both cost and complexity

Companies that learn to orchestrate multiple models strategically, mixing open source and commercial options based on specific requirements, will gain a significant competitive advantage.

The future belongs to organizations that can do more with AI, not less. But doing more means choosing the right tool for each job, not defaulting to the most expensive option available.

Ready to explore how strategic model selection can reduce your AI costs while improving performance? Lucidworks AI helps enterprises take this more nuanced approach, with hosted AI models, orchestration tools, and advisory services tailored to your needs.

Quick Links