AI was supposed to make engineers' jobs easier. But progress has lagged behind expectations.
Engineers have struggled to incorporate new GPU hardware into unclear roadmaps. And teams are stuck trying to force experimental AI code onto systems that weren't built to handle it.
AI workloads behave differently from traditional software. They constantly change and consume resources in unpredictable ways that standard pipelines can’t handle. In other words, teams are trying to fit the square peg of dynamic AI models into the round hole of rigid production systems. This causes infrastructure to break easily while costs spiral out of control.
It is essential to recognize that this is not a case of temporary growing pains; it’s a fundamental mismatch between how we used to build software and what AI actually requires today.
Drawing on findings from Platform Engineering’s annual survey, The State of AI in Platform Engineering, let’s examine where teams are getting stuck and what it takes to build truly AI-native operations.
The human hurdle: Skills gaps and silos
The biggest hurdle to going AI-native isn't actually technological – it’s human. In fact, 57% of organizations cite skill gaps as their top barrier to adoption, while over half lack the necessary expertise in everything from data science to security.
It doesn’t help that teams aren't talking to each other. Almost one-third (31%) of companies report little interaction between their platform engineers and data science teams. Sixteen percent say there is none at all. Siloing model code and production infrastructure like this results in fragile systems and operational risk.
To bridge this gap, it’s essential to apply standard DevOps principles (automation, ownership, and observability) to AI models, treating them as evolving software, not one-off experiments.
Why legacy pipelines can't keep up
Even mature platform teams are discovering that their current architectures cannot handle the demands of AI.
Integration is a top concern, with 51% of respondents reporting difficulty embedding AI into existing systems. Legacy monoliths make it complex to connect models to live apps, and while new protocols like MCP and A2A are helping, the landscape remains fragmented.
Delivery is hitting a wall, too. Forty-one percent of teams haven’t adapted their CI/CD pipelines to handle model retraining or versioning. Traditional DevOps flows simply weren't designed for workloads that drift and learn continuously.
The solution lies in standardization via lightweight, composable infrastructure. Instead of being trapped in a rigid, all-in-one monolith, this approach empowers you to build a modular stack, piecing together specific components such as fractional NVIDIA GPUs, managed Kubernetes, and high-performance vector databases. This flexibility allows you to:
- Standardize GPU access so hardware becomes a utility, not a bottleneck
- Automate environment setup to remove manual configuration drift
- Enable inference anywhere, allowing you to move models to the edge where your data actually resides
Fixing the infrastructure patchwork
Infrastructure is often the weakest link in an AI strategy. For many, it remains a patchwork of manual scripts and inconsistent environments that simply can't scale.
The fix? Infrastructure-as-Code (IaC). By treating infrastructure like software, you can unify your models and compute with governance and autoscaling baked right in.
This control is critical because 85% of organizations are shifting inference to the edge. While moving compute closer to the data improves speed, few platforms can reliably manage hundreds of distributed locations.
To handle this distribution, you need three key strategies:
- Silicon diversity: Mixing CPUs, GPUs, and accelerators to balance performance and cost
- Serverless inference: Abstracting away the hardware to enable elastic scaling
- Real-time data integration: Using Retrieval-Augmented Generation (RAG) to feed models live context without retraining
Together, these approaches create a composable architecture that enables the deployment of modular workloads across cloud, on-premises, and the edge.
From experiment to business capability
The roadblocks we’ve covered – skills gaps, pipeline bottlenecks, and messy infrastructure – are the deciding factors in whether your AI stays an experiment or becomes a dependable business capability.
To make that leap and gain a competitive AI advantage, you need to treat AI integration as a strict engineering discipline rather than just a flashy initiative. That means weaving AI into the hard engineering disciplines of observability, automation, and scalability. When you do this, you stop patching together fragile workflows and start building infrastructure that adapts as fast as the models it supports.
Explore the full report from the Platform Engineering survey and get the key findings in our at-a-glance infographic.

