GenAI on a budget: How CIOs can unlock value without breaking the bank

The first step in every organization’s Generative AI journey would be a subscription to a tool like ChatGPT or Microsoft Copilot. This typically costs a few hundred dollars a month - or maybe more - depending on the size of your organization.

As you mature, you want a custom implementation with data you own or have acquired to meet your specific needs. You start slow and experiment with existing infra and small models. Once comfortable, you aspire to go big, but the costs skyrocket, with little to show for returns!

Well, it doesn’t have to. In this blog post, we talk about the total cost of ownership (TCO) of GenAI and how you can minimize that strategically.

What does GenAI cost?

The total cost of ownership of a Generative AI solution is often the total of expenses toward inference, fine-tuning, infrastructure and data security. Each of these includes several variables. Here’s a quick look.

Inference cost

Inference cost is essentially the cost of an LLM to take an input and generate a corresponding output.

In other words, it’s the cost of making an API call to the LLM. For reference, GPT-4 costs $0.006 per 1,000 output tokens and $0.003 per 1,000 input tokens. Generating an image DALL-E 2 costs around $0.18.

Typically, this is the metric for the cost-effectiveness of a GenAI implementation. It is a combination of all the below costs.

Fine-tuning costs

Fine-tuning cost is the expense toward personalizing an available foundational model and training it with custom data.

Depending on the foundational model, data used, iterations (known as epochs), and the use case you’re training for, these costs might vary. In addition to technology costs, you might also need to hire data engineers, machine learning engineers, and AI specialists to fine-tune your models.

Infrastructure

Infrastructure costs include the hardware you need to deploy your Gen AI solution to private cloud or on prem.

Compute: On-prem implementations would incur hardware costs for GPUs and CPUs to run the models. If you’re using cloud, you might have to pay usage-based service fees based on compute time and usage.

Maintenance: You also need to spend on electricity and maintenance for your hardware, such as additional cooling systems.

Data storage: Complex use cases need massive data. For instance, a robust content generation AI product needs 100+ billion parameters of data. Storing and managing this can be a significant expense, to say nothing of your redundancy needs. OpenAI charges about $0.2 per 1GB of data per day.

Deployment and integration: Based on the complexity of integration, you might incur costs to deploy your models on your infrastructure. If you’re using legacy systems that you’d like to integrate with cloud solutions, you might need intermediaries or custom development, which then costs more.

If you’re using the model provider’s own infra and exchanging tokens using APIs, infrastructure costs don’t apply.

Data security

A good data security strategy for your GenAI implementation would include robust production/development environments, data loss prevention practices, transparent data governance policies, regular vulnerability assessment and penetration testing (VAPT), etc.

In addition to all that, at each step of your GenAI journey, you need software developers, data scientists, AI specialists, and more to keep the vehicle moving.

How to reduce GenAI TCO?

By making the right decisions around each of the above factors, you can reduce the total cost of ownership of your GenAI initiatives. Here’s how.

Closed source vs. open source LLMs: Choosing open-source LLMs gives you more control over the cost and management of your GenAI initiative. Without paying much for using the model itself, you can focus on optimizing infrastructure, fine-tuning, and data security costs.

Model choice: Whether you’re going open source or not, you might not need the latest and greatest model for your use case. If you’re beginning with simple needs like sentiment analysis, a smaller model like XLNet or StyleGan2 might do. The decision on the model you choose can have a compounding impact on the costs you incur.

Cloud vs. on-prem deployment: While the cloud offers flexibility of payment and scalability, on-prem infrastructure provides more control. You can choose your GPU, CPU and AI accelerators to meet your needs. We’ve seen that, even for enterprise deployments, on-prem can cost much less in the long run (though cloud seems cheaper immediately).

Strategic fine-tuning: The more prompts it takes for a user to get to the result they need (i.e., more API calls), the most it will cost the organization. A well tuned model can reduce the number of calls, costing lesser. Using fine-tuning approaches such as transfer learning and distributed learning can reduce costs of the training process itself.

Employing an outsourced vendor to do your model customization (instead of hiring in-house resources) can also optimize expenses.

Optimizing operations: Another place to reduce costs is in streamlining MLOps, balancing the need to keep the models performant, without breaking the bank with retraining.

Let’s see how all of this adds up in practice.

Calculating the TCO of GenAI

Recently, we made TCO projections for one of our client’s decision-making and business development use cases. Here’s the anonymized calculation.

As you can see, measuring the TCO of GenAI is not a simple back-of-the-envelope calculation. Every one of the above factors - however small they may appear - can change the weight of your bottom line.

In the above example, the cost per million output tokens of the closed source model, Claude 3 Haiku, is $1.25. A newer more advanced model like Claude 3 Opus costs $75 for the same, i.e., 60x more! Somewhere in the middle of that is the answer to the ‘private cloud vs. hosted solution’ question.

Projecting your total cost of ownership needs a thorough study of various factors of your business, technology landscape, and AI strategy.

Don’t sweat. Speak to Tune AI’s experts today.

The first step in every organization’s Generative AI journey would be a subscription to a tool like ChatGPT or Microsoft Copilot. This typically costs a few hundred dollars a month - or maybe more - depending on the size of your organization.

As you mature, you want a custom implementation with data you own or have acquired to meet your specific needs. You start slow and experiment with existing infra and small models. Once comfortable, you aspire to go big, but the costs skyrocket, with little to show for returns!

Well, it doesn’t have to. In this blog post, we talk about the total cost of ownership (TCO) of GenAI and how you can minimize that strategically.

What does GenAI cost?

The total cost of ownership of a Generative AI solution is often the total of expenses toward inference, fine-tuning, infrastructure and data security. Each of these includes several variables. Here’s a quick look.

Inference cost

Inference cost is essentially the cost of an LLM to take an input and generate a corresponding output.

In other words, it’s the cost of making an API call to the LLM. For reference, GPT-4 costs $0.006 per 1,000 output tokens and $0.003 per 1,000 input tokens. Generating an image DALL-E 2 costs around $0.18.

Typically, this is the metric for the cost-effectiveness of a GenAI implementation. It is a combination of all the below costs.

Fine-tuning costs

Fine-tuning cost is the expense toward personalizing an available foundational model and training it with custom data.

Depending on the foundational model, data used, iterations (known as epochs), and the use case you’re training for, these costs might vary. In addition to technology costs, you might also need to hire data engineers, machine learning engineers, and AI specialists to fine-tune your models.

Infrastructure

Infrastructure costs include the hardware you need to deploy your Gen AI solution to private cloud or on prem.

Compute: On-prem implementations would incur hardware costs for GPUs and CPUs to run the models. If you’re using cloud, you might have to pay usage-based service fees based on compute time and usage.

Maintenance: You also need to spend on electricity and maintenance for your hardware, such as additional cooling systems.

Data storage: Complex use cases need massive data. For instance, a robust content generation AI product needs 100+ billion parameters of data. Storing and managing this can be a significant expense, to say nothing of your redundancy needs. OpenAI charges about $0.2 per 1GB of data per day.

Deployment and integration: Based on the complexity of integration, you might incur costs to deploy your models on your infrastructure. If you’re using legacy systems that you’d like to integrate with cloud solutions, you might need intermediaries or custom development, which then costs more.

If you’re using the model provider’s own infra and exchanging tokens using APIs, infrastructure costs don’t apply.

Data security

A good data security strategy for your GenAI implementation would include robust production/development environments, data loss prevention practices, transparent data governance policies, regular vulnerability assessment and penetration testing (VAPT), etc.

In addition to all that, at each step of your GenAI journey, you need software developers, data scientists, AI specialists, and more to keep the vehicle moving.

How to reduce GenAI TCO?

By making the right decisions around each of the above factors, you can reduce the total cost of ownership of your GenAI initiatives. Here’s how.

Closed source vs. open source LLMs: Choosing open-source LLMs gives you more control over the cost and management of your GenAI initiative. Without paying much for using the model itself, you can focus on optimizing infrastructure, fine-tuning, and data security costs.

Model choice: Whether you’re going open source or not, you might not need the latest and greatest model for your use case. If you’re beginning with simple needs like sentiment analysis, a smaller model like XLNet or StyleGan2 might do. The decision on the model you choose can have a compounding impact on the costs you incur.

Cloud vs. on-prem deployment: While the cloud offers flexibility of payment and scalability, on-prem infrastructure provides more control. You can choose your GPU, CPU and AI accelerators to meet your needs. We’ve seen that, even for enterprise deployments, on-prem can cost much less in the long run (though cloud seems cheaper immediately).

Strategic fine-tuning: The more prompts it takes for a user to get to the result they need (i.e., more API calls), the most it will cost the organization. A well tuned model can reduce the number of calls, costing lesser. Using fine-tuning approaches such as transfer learning and distributed learning can reduce costs of the training process itself.

Employing an outsourced vendor to do your model customization (instead of hiring in-house resources) can also optimize expenses.

Optimizing operations: Another place to reduce costs is in streamlining MLOps, balancing the need to keep the models performant, without breaking the bank with retraining.

Let’s see how all of this adds up in practice.

Calculating the TCO of GenAI

Recently, we made TCO projections for one of our client’s decision-making and business development use cases. Here’s the anonymized calculation.

As you can see, measuring the TCO of GenAI is not a simple back-of-the-envelope calculation. Every one of the above factors - however small they may appear - can change the weight of your bottom line.

In the above example, the cost per million output tokens of the closed source model, Claude 3 Haiku, is $1.25. A newer more advanced model like Claude 3 Opus costs $75 for the same, i.e., 60x more! Somewhere in the middle of that is the answer to the ‘private cloud vs. hosted solution’ question.

Projecting your total cost of ownership needs a thorough study of various factors of your business, technology landscape, and AI strategy.

Don’t sweat. Speak to Tune AI’s experts today.

The first step in every organization’s Generative AI journey would be a subscription to a tool like ChatGPT or Microsoft Copilot. This typically costs a few hundred dollars a month - or maybe more - depending on the size of your organization.

As you mature, you want a custom implementation with data you own or have acquired to meet your specific needs. You start slow and experiment with existing infra and small models. Once comfortable, you aspire to go big, but the costs skyrocket, with little to show for returns!

Well, it doesn’t have to. In this blog post, we talk about the total cost of ownership (TCO) of GenAI and how you can minimize that strategically.

What does GenAI cost?

The total cost of ownership of a Generative AI solution is often the total of expenses toward inference, fine-tuning, infrastructure and data security. Each of these includes several variables. Here’s a quick look.

Inference cost

Inference cost is essentially the cost of an LLM to take an input and generate a corresponding output.

In other words, it’s the cost of making an API call to the LLM. For reference, GPT-4 costs $0.006 per 1,000 output tokens and $0.003 per 1,000 input tokens. Generating an image DALL-E 2 costs around $0.18.

Typically, this is the metric for the cost-effectiveness of a GenAI implementation. It is a combination of all the below costs.

Fine-tuning costs

Fine-tuning cost is the expense toward personalizing an available foundational model and training it with custom data.

Depending on the foundational model, data used, iterations (known as epochs), and the use case you’re training for, these costs might vary. In addition to technology costs, you might also need to hire data engineers, machine learning engineers, and AI specialists to fine-tune your models.

Infrastructure

Infrastructure costs include the hardware you need to deploy your Gen AI solution to private cloud or on prem.

Compute: On-prem implementations would incur hardware costs for GPUs and CPUs to run the models. If you’re using cloud, you might have to pay usage-based service fees based on compute time and usage.

Maintenance: You also need to spend on electricity and maintenance for your hardware, such as additional cooling systems.

Data storage: Complex use cases need massive data. For instance, a robust content generation AI product needs 100+ billion parameters of data. Storing and managing this can be a significant expense, to say nothing of your redundancy needs. OpenAI charges about $0.2 per 1GB of data per day.

Deployment and integration: Based on the complexity of integration, you might incur costs to deploy your models on your infrastructure. If you’re using legacy systems that you’d like to integrate with cloud solutions, you might need intermediaries or custom development, which then costs more.

If you’re using the model provider’s own infra and exchanging tokens using APIs, infrastructure costs don’t apply.

Data security

A good data security strategy for your GenAI implementation would include robust production/development environments, data loss prevention practices, transparent data governance policies, regular vulnerability assessment and penetration testing (VAPT), etc.

In addition to all that, at each step of your GenAI journey, you need software developers, data scientists, AI specialists, and more to keep the vehicle moving.

How to reduce GenAI TCO?

By making the right decisions around each of the above factors, you can reduce the total cost of ownership of your GenAI initiatives. Here’s how.

Closed source vs. open source LLMs: Choosing open-source LLMs gives you more control over the cost and management of your GenAI initiative. Without paying much for using the model itself, you can focus on optimizing infrastructure, fine-tuning, and data security costs.

Model choice: Whether you’re going open source or not, you might not need the latest and greatest model for your use case. If you’re beginning with simple needs like sentiment analysis, a smaller model like XLNet or StyleGan2 might do. The decision on the model you choose can have a compounding impact on the costs you incur.

Cloud vs. on-prem deployment: While the cloud offers flexibility of payment and scalability, on-prem infrastructure provides more control. You can choose your GPU, CPU and AI accelerators to meet your needs. We’ve seen that, even for enterprise deployments, on-prem can cost much less in the long run (though cloud seems cheaper immediately).

Strategic fine-tuning: The more prompts it takes for a user to get to the result they need (i.e., more API calls), the most it will cost the organization. A well tuned model can reduce the number of calls, costing lesser. Using fine-tuning approaches such as transfer learning and distributed learning can reduce costs of the training process itself.

Employing an outsourced vendor to do your model customization (instead of hiring in-house resources) can also optimize expenses.

Optimizing operations: Another place to reduce costs is in streamlining MLOps, balancing the need to keep the models performant, without breaking the bank with retraining.

Let’s see how all of this adds up in practice.

Calculating the TCO of GenAI

Recently, we made TCO projections for one of our client’s decision-making and business development use cases. Here’s the anonymized calculation.

As you can see, measuring the TCO of GenAI is not a simple back-of-the-envelope calculation. Every one of the above factors - however small they may appear - can change the weight of your bottom line.

In the above example, the cost per million output tokens of the closed source model, Claude 3 Haiku, is $1.25. A newer more advanced model like Claude 3 Opus costs $75 for the same, i.e., 60x more! Somewhere in the middle of that is the answer to the ‘private cloud vs. hosted solution’ question.

Projecting your total cost of ownership needs a thorough study of various factors of your business, technology landscape, and AI strategy.

Don’t sweat. Speak to Tune AI’s experts today.