TLDR: Financial services firms, early adopters of AI, are discovering that AI inference, particularly with generative AI models, presents complexities and costs comparable to, or even exceeding, those of AI model training. This is due to the diverse requirements for model deployment, ranging from low-latency edge devices to high-compute data centers, necessitating advanced storage solutions and careful consideration of data security.
A decade ago, the commercialization of traditional machine learning saw training as a significant hurdle, while inference—the process of running new data through a model—was relatively straightforward. However, in 2025, financial services companies, including commercial and investment banks, trading firms, and insurance providers, are encountering a surprising reality: generative AI (GenAI) inference is proving to be as, if not more, challenging and varied than training.
The complexity stems from the diverse operational needs of these firms. AI models must be “fit for purpose” and adaptable to various environments. This includes fitting into compact devices like smartphones, laptops, or edge devices in bank branches to ensure low-latency processing. Conversely, highly complex AI models demand substantial compute, memory, and storage resources, necessitating their operation within data centers, which then requires applications to compensate for any resulting latency.
Consequently, financial services institutions (FSIs) are compelled to manage a broad spectrum of inference operations across a diverse array of computing engines and their supporting storage. Storage, once an afterthought in high-performance computing, is now critical. Advanced storage systems are essential for maintaining context, thereby minimizing redundant computations and making inference more cost-effective.
While FSIs are at the forefront of commercializing these technologies, they remain highly secretive about their AI implementations, particularly regarding reliability and affordability. This secrecy, though perhaps frustrating for those seeking to learn from their experiences, is understandable given their role in handling sensitive financial data and assets.
Also Read:
- AI Adoption Surges in Data Centers, But Trust and Data Quality Remain Key Concerns
- AI Data Centers Fueling Soaring US Electricity Bills, Future Energy Policies Under Scrutiny
The current state of AI inference for GenAI models is far from simple, often requiring rack-scale systems that, programmatically, appear as a single, giant GPU to AI applications. Many models will still run on machines equipped with two, four, or eight GPUs, especially in financial services data centers located near major metropolitan areas where power density is limited and liquid cooling solutions are not readily available.


