We’re excited to bring Transform 2022 back in person on July 19 and pretty much July 20-28. Join AI and data leaders for insightful conversations and exciting networking opportunities. Register today†
With the boom in machine learning (ML) supported services, the term MLops has become a regular part of the conversation – and rightly so. Abbreviation of ‘machine learning operations’. MLops refers to a broad set of tools, work functions, and best practices to ensure that machine learning models are deployed and maintained reliably and efficiently in production. Practice is at the heart of production quality models – ensuring rapid implementation, facilitating experimentation for improved performance, and avoiding model bias or loss in prediction quality. Without it, ML on a large scale becomes impossible.
With any emerging practice, it’s easy to get confused about what it actually entails. To help, we’ve got seven common myths about MLops to avoid so you can get on track to successfully leverage ML at scale.
Myth #1: MLops ends at launch
Reality: launching an ML model is just one step in a continuous process†
ML is an inherently experimental practice. Even after the initial launch, it is necessary to test new hypotheses while refining signals and parameters. This allows the model to improve in accuracy and performance over time. MLops processes help engineers effectively manage the experimentation process.
For example, a core component of MLops is version control. This allows teams to track key stats across a wide range of model variants to ensure the optimal one is selected, while easily reversing them in the event of an error.
It is also important to monitor model performance over time because of the risk of data anomaly. Data drift occurs when the data a model examines in production shifts drastically from the data the model was originally trained on, leading to poor quality predictions. For example, many ML models trained for pre-COVID-19 pandemic consumer behavior seriously deteriorated in quality after the lockdowns changed the way we live. MLops works to address these scenarios by creating strong monitoring practices and building infrastructure to adapt quickly when major change occurs. It goes way beyond launching a model.
Myth #2: MLops is the same as model development
Reality: MLops is the bridge between model development and the successful use of ML in production†
The process used to develop a model in a test environment is usually not the same process that allows it to be successful in production. Running models in production requires robust data pipelines to source, process, and train models, which often span much larger data sets than those found in development.
Databases and compute typically need to be moved to distributed environments to manage the increased load. Much of this process needs to be automated to ensure reliable deployments and the ability to iterate quickly at scale. Tracking also needs to be much more robust, as production environments will see data beyond what is available in the test, and therefore the potential for the unexpected is much greater. MLops consists of all these practices to bring a model from development to launch.
Myth #3: MLops is the same as devops
Reality: MLops works towards similar goals to devops, but its implementation differs in several ways†
While both MLops and devops strive to make implementation scalable and efficient, achieving this goal for ML systems requires a new set of approaches. MLops places a stronger emphasis on experiments over devops. Unlike standard software implementation, ML models are often implemented with many variants at once, therefore there is a need for model monitoring to compare them to select an optimal version. For each re-implementation, it is not enough to just land the code – the models have to be retrained every time there is a change. This is different from standard devops implementations, as the pipeline must now include a retraining and validation phase.
For many of the common devops practices, MLops extends the scope to meet its specific needs. Continuous integration for MLops goes beyond code testing to include data quality checks and model validation. Continuous deployment is more than just a set of software packages, but now includes a pipeline to adapt or roll back changes in models.
Myth #4: Fixing a mistake is just changing lines of code
Reality: Fixing production ML model errors requires advance planning and multiple fallbacks†
If a new deployment results in a performance degradation or other error, MLops teams should have a range of options at hand to resolve the issue. Simply reverting to the previous code is often not enough, as models must be retrained before being deployed. Instead, teams need to keep multiple versions of models on hand so that a production-ready version is always available in the event of a failure.
In addition, in scenarios involving data loss or a significant shift in production data distribution, teams must have simple fallback heuristics so that the system can maintain at least a certain level of performance. All of this requires significant advance planning, which is a core aspect of MLops.
Myth #5: Governance is completely different from MLops
Reality: While governance has different goals than MLops, many of Mlops can help support governance objectives.
Model governance manages regulatory compliance and the risk associated with using ML systems. This includes things like maintaining appropriate policies for protecting user data and avoiding bias or discriminatory outcomes in model predictions. While MLops is generally seen as a guarantee that models will deliver, this is a limited representation of what it can deliver.
Tracking and monitoring of models in production can be supplemented with analysis to improve model explainability and find bias in results. Transparency in model training and implementation pipelines can facilitate data processing compliance goals. MLops should be seen as a practice to enable scalable ML for all business objectives, including performance, governance and model risk management.
Myth #6: You can manage ML systems in silos
Reality: Successful MLops systems require collaborative teams with hybrid skills†
The implementation of ML models involves many roles, including data scientists, data engineers, ML engineers, and devops engineers. Without collaboration and understanding of each other’s work, effective ML systems can become impractical on a large scale.
For example, a data scientist may develop models without much external visibility or input, which can then lead to implementation challenges due to performance and scale issues. Perhaps, without understanding key ML practices, a devops team will not develop the proper tracking to enable iterative modeling experiments.
That’s why it’s important across the board that all team members have a broad understanding of the model development pipeline and ML practices – with collaboration from day one.
Myth #7: Managing ML systems is risky and unsustainable
Reality: Any team can leverage ML at scale with the right tools and practices†
Since MLops is still a growing field, it can seem like there is a lot of complexity. However, the ecosystem is rapidly maturing and there are a myriad of resources and tools available to help teams succeed at every step of the MLops lifecycle.
With the right processes, you can unlock the full potential of ML at scale.
Krishnaram Kenthapadi is the chief scientist at Fiddler AI†
Welcome to the VentureBeat Community!
DataDecisionMakers is where experts, including the technical people who do data work, can share data-related insights and innovation.
If you want to read about the latest ideas and up-to-date information, best practices and the future of data and data technology, join us at DataDecisionMakers.
You might even consider contribute an article of your own!