Low-Ops for Production ML Means the Difference Between a Profitable or Unprofitable AI Program
The Paradox of Efficiency
The Jevons Paradox states that increased efficiency leads not to less consumption of a particular resource but rather more as the falling cost leads to greater overall demand. We can see this principle at work in air travel. During the Golden Age of Travel in the 1950s and 1960s, flying was a luxury, with a roundtrip ticket from Chicago to Phoenix costing 5% of the average yearly salary compared to around 1% today. So while the Golden Age might conjure images of glamor, the number of global passengers then is less than 0.001% of what we have today as the introduction of the jumbo jet made air travel faster, cheaper, and, yes, more cramped yet far more ubiquitous.
Machine Learning’s “Jumbo Jet” Moment
Enterprise AI is in the same stage as commercial aviation was in the 1950s…expensive, slow, and reserved for a small subset of the population. We believe the new paradigm of low-ops / no-ops for production machine learning (ML) will be the equivalent of the jumbo jet for air travel. That is, low-ops will lower the barriers to operationalizing machine learning from a cost, effort, and head count perspective so that it is no longer just a few companies like Amazon, Netflix, Google and Microsoft taking advantage of ML to drive business outcomes, but rather all sorts of enterprises, whether start-up or a Fortune 500 stalwart with decades of technical debt.
However, we sometimes joke that production is where ML models go to die. What are the most common sources of friction that raise the cost of production ML?
- How do I even deploy a model into production? What works in a training environment often fails in production. The traditional MLOps approach involves bespoke containerization solutions requiring weeks or even months to convert a data scientist’s notebook into production-ready code.
- Once a model is live, how do I keep it performant? Accuracy is only one piece of what makes a model performant. For production, models also need to meet cost, latency, and throughput requirements of the business and other downstream systems.
- How do I optimize the performance of my live models? Model observability is important, but without the ability to quickly test, validate, and replace outdated models with new ones, then observability on its own provides little value.
- How do I make production ML agile and fast to respond to the real world while also compliant and secure? Specific compliance and security requirements may not be a blocker to early adoption but can be a blocker to scale.
What is Low-Ops for Production ML?
Just like low-code/no-code broadened the segment of individuals and companies that could build applications, low-ops/no-ops for production ML will:
- Empower data scientists (who are traditionally not involved in production) to deploy and monitor models in production by removing most of the infrastructure “plumbing” required to operationalize models.
- Enable ML engineers to manage several models concurrently instead of dedicating their entire focus to just a single model, or at most, a handful of models.
What exactly do we mean by low-ops for production ML?
- Automated and repeatable processes for deploying models into production and managing the surrounding infrastructure.
- Data scientists or ML engineers can go right from a notebook to a production environment without major engineering effort.
- Inferences run efficiently to keep cloud costs low or make ML even possible in compute-constrained environments like the edge.
- Model insights are automated to detect drift when it happens or throughput performance starts to slip.
- Observability, testing, and deployment are integrated tightly so data science teams can go beyond detecting drift in order to close the loop quickly and efficiently.
- Collaboration and agility is enabled while providing SysAdmins, compliance, or regulators full control and visibility into who has what kind of access to which model.
Making production ML faster and cheaper through low-ops means not just that enterprise data teams will do the same stuff but faster, it’ll open up all new use cases not feasible before.