TLDR: Pliops is showcasing its FusIOnX technology at FMS 2025, demonstrating how its innovative memory solution addresses the bottlenecks in generative AI inference. The technology promises significant performance gains, including up to 5x pre-fill speedup and 7x prompt token throughput, while reducing costs and environmental impact by optimizing memory usage for large language models.
SANTA CLARA, Calif. – August 5, 2025 – Pliops, a frontrunner in high-performance AI infrastructure, is making waves at FMS: The Future of Memory and Storage 2025, held this week at the Santa Clara Convention Center. The company is highlighting its groundbreaking generative AI (GenAI) solutions, specifically demonstrating how its FusIOnX technology is poised to unlock scalable inference across diverse agents, models, and workflows.
The burgeoning field of GenAI, while transformative, faces a critical bottleneck: memory. Current infrastructure struggles to keep pace with the scale, context, and complexity demanded by modern AI applications, particularly large language model (LLM) inference. This challenge often translates into substantial computational and memory resource requirements, leading to increased operational costs, higher power consumption, and a larger carbon footprint for data centers.
Pliops’ FusIOnX is presented as a breakthrough AI infrastructure stack designed to redefine inference. It achieves this by enabling real-time memory reuse, scalable performance, and seamless deployment. A core aspect of its functionality involves offloading Key-Value (KV) caches to Pliops’ proprietary LightningAI KV-Store. This innovative approach allows AI agents to share memory efficiently, maintain context across complex operations, and scale seamlessly without requiring extensive retooling of existing systems.
The performance gains attributed to FusIOnX are significant. Pliops reports up to a 5x pre-fill speedup and a 7x improvement in prompt token throughput. Furthermore, the technology delivers 3x end-to-end inference gains, all achieved without the need for expensive High Bandwidth Memory (HBM) or additional high-end GPUs. Beyond raw speed, Pliops emphasizes that FusIOnX makes inference for LLMs 80-90% more efficient, leading to over 50% improvements in Total Cost of Ownership (TCO), a 50% reduction in CO2 emissions, and 50% savings on data center cooling costs. The technology is designed to enable smarter caching architectures that accelerate inference and conserve valuable GPU resources.
Ido Bukspan, CEO of Pliops, underscored the importance of memory in the AI landscape, stating, “At Pliops, we believe memory is the missing piece in making GenAI truly intelligent and efficient. Our FusIOnX goes beyond acceleration – it enables inference to remember, collaborate and evolve. It’s the infrastructure GenAI has been waiting for to power the next wave of scalable, memory-intensive LLM innovation.” He added that FusIOnX helps GenAI systems deliver smarter, faster, and more resource-efficient results, whether for customer support agents, developer copilots, dialogue systems, or knowledge retrieval tools.
Also Read:
- Generative AI Transforms Video Production and Content Creation Landscape in 2025
- Composio’s Adaptive AI Agents Revolutionize Tool Integration, Overcoming Development Hurdles
Adding to the company’s strategic investment in product innovation, Amit Golander, the newly appointed VP of Products, is joining the Pliops team onsite at FMS 2025. Pliops also announced that customers are already actively shipping FusIOnX-based solutions, with an accelerated inference instance powered by FusIOnX now available for trial, signaling immediate real-world adoption of their memory-optimized GenAI systems.


