What is MLOPS ? and Why Should You Start Using It in Your Machine Learning Projects? …
What is MLOPS?
MLOPS refers to the combined use of Machine Learning and DevOps to create robust automation, tracking, pipeline, monitoring, and packaging system for machine learning models during its Life Cycle.
So, let start with Machine learning first, what is it exactly? Machine Learning is defined as a data analysis technique that automates the creation of analytical models. It is a subfield of Artificial Intelligence (aka AI) that is predicated on the premise that systems can learn from data, spot patterns, and make choices with little or no human interaction e.g. book recommendations on Amazon, music recommendations on Spotify, and self-driving cars.
Next, what is DevOps? DevOps is a collection of cultural ideas, practices, and technologies software development (Dev) with information technology and operations (Ops). Its main goal is to reduce the systems development life cycle and enable the continuous delivery of high-quality software at a quicker rate than traditional software development and infrastructure management methods.
Now let's see what is the machine learning life cycle
What is Machine Learning Life Cycle?
The machine learning life cycle may be described as a multi-component flow, in which each component affects the next ones. The first component (1)Prepare the dataset, this step includes the data collection phase, transforming and cleaning the raw data. In the second component of the machine learning life cycle (2) Experiments are conducted, the dataset is explored and features are extracted from it for further investigation. After in the (3)Train component, a model is selected and the training process is started to fine-tune its parameters, during this process the model is monitored and evaluated at the end. Once a model satisfies the requirements set (e.g. precision/accuracy) is selected, in the last component (4)Deploy the chosen model is deployed for inference and monitored.
What Are The Available Free MLOPS Tools?
At the time of writing this post (end of 2021), there were a lot of companies offering MLOPS tools. The tools have all the same goal which is to allow the users to track their Machine Learning during its Lifecycle, with some differences between them like for example:
- Open Source (Cost / Support / UI / …)
- Platform and Language Support (Python / API Support / … )
- Data Storage (Local / Cloud / Hybrid)
- Custom Visualization (Accurate / Clear / …)
- Ease of Setup and Use (Requirements / UI & UX)
- Scalability for a large number of experiments (Team / Experiment Volume)
- Datasets Tracking (Delta Data1 vs Data2 /…)
- Distributed execution of experiments (Scheduler / Containers / K8s / HPC)
In this section, some of the open-source MLOps tools are listed. The focus is set only on the free ones as they give you the freedom to enjoy the automation and flexibility offered by MLOps without spending a dollar.
ClearML is an end-to-end platform that connects all data science tools in a unified environment. It is a suite of open source tools for automating the preparation, execution, and analysis of machine learning experiences. The tool helps keep track of settings, jobs, artifacts, metrics, debug data, metadata and log everything into one interface
The execution of the ClearML stack can be self-hosted. The platform’s official website offers a free hosting plan and a paid hosting plan.
More about the tool https://clear.ml/docs/latest/docs/
PS: This is the one I am currently using in my projects
2. Weight and Biases
It is a tool that allows you to easily track and record the performance of deep learning models. The tool makes it possible to follow experiences, improve models and share the results with collaborators while monitoring results. Hyperparameters and output measurements are stored in one place. The logs of each experiment are saved, in order to consult the progress made and compare the models with existing projects.
Visualization tools are also available to understand the immense amount of data in a clear manner. Similar to ClearML to use Weights & Biases, just add a few lines of code to your script.
More about the tool https://clear.ml/docs/latest/docs/
Kubeflow is a full-fledged open-source MLOps tool that facilitates the orchestration and deployment of machine learning workflows. Kubeflow provides dedicated services and integration for various phases of machine learning, including training, building pipelines, and managing Jupyter notebooks.
More about the tool https://www.kubeflow.org/
MLFlow is an open-source machine learning lifecycle management platform that offers various components of experience tracking, project packaging, model deployment, and registry.
MLFlow integrates with various machine learning libraries, including TensorFlow and Pytorch, to streamline the training, deployment, and management of machine learning applications.
More about the tool https://mlflow.org/
Data Version Control (DVC)
DVC is an open-source tool written in python for data science and machine learning projects. It adopts a Git-like model to provide management and versioning of datasets and machine learning models. DVC is a simple command-line tool that makes machine learning projects shareable and repeatable.
More about the tool https://dvc.org/
Like DVC, Pachyderm is a version control tool for machine learning and data science. On top of that, it’s built on Docker and Kubernetes, which helps it run and deploy machine learning projects on any cloud platform. Pachyderm ensures that all data ingested into a machine learning model is versioned and traceable.
More about the tool https://www.pachyderm.com/
Metaflow is an open-source MLOps platform originally developed by Netflix. It is a tool written in Python / R that makes it easy to create and manage enterprise data science projects.
Metaflow integrates Python-based Machine Learning, Deep Learning, and Big Data libraries to effectively train, deploy, and manage ML models.
More about the tool https://metaflow.org/
Kedro is an open-source MLOps framework written in Python used to create reproducible and maintainable Data Science code. It implements software engineering practices such as versioning and modularity in Machine Learning projects.
It offers pipeline visualization, project models, and flexible deployment of data science projects.
More about the tool https://kedro.readthedocs.io/en/stable/
Noyau de Seldon
Seldon is an open-source MLOps framework designed to streamline machine learning workflows with logging, advanced metrics, testing, scaling, and conversion of models to production microservices.
Seldon offers high-level features that make it easy to containerize ML models, test model usability and security, and make them fully auditable by integrating with multiple services.
More about the tool https://www.seldon.io/tech/
Flyte is another open-source MLOps platform used for tracking, maintaining, and automating native Kubernetes machine learning workflows. It ensures that the execution of machine learning models is repeatable by tracking changes to the model, versioning it, and containerizing the model with its dependencies.
Flyte is written in Python and is designed to support complex ML workflows written in Python, Java, and Scala.
More about the tool https://flyte.org/
ZenML is an extensible open-source MLOps framework that integrates ML tools such as Jupyter Notebooks to deploy ML models in a consistent and easy way. Built for data scientists, it has a simple, flexible syntax, is cloud- and tool-agnostic, and has interfaces/abstractions that are catered towards ML workflows. ZenML is used to create repeatable machine learning pipelines to produce machine learning projects.
ZenML is not here to replace the great tools that solve these individual problems. Rather, it integrates natively with popular ML tooling and gives standard abstraction to write your workflows.
More about the tool https://zenml.io/
MLRun offers an integrative approach to managing your machine-learning pipelines from early development through management in your production environment. MLRun introduces the easy tracking, automation, rapid deployment, management, and scaling of models to your machine learning pipeline. It offers a convenient abstraction layer to a wide variety of technology stacks while empowering the Data Engineers and Data Scientists to define the feature and models.
More about the tool https://docs.mlrun.org/en/latest/quick-start.html
That is it for the free MLOPS tools, if I have missed one let me know and I will update the article with it
And if you have any good :) or bad :( experience with any of the MLOPS tools listed feel free to share your experience in the comment section.
Comparing Data Version Control Tools - 2020
Data versioning is one of the keys to automating a team's machine learning model development. While it can be very…
MLOps Toys | A Curated List of Machine Learning Projects
DAGsHub enables data scientists and ML engineers to work together, effectively. Integrating open-source tools like Git…
Before any machine learning model can be put in production, many experimentation cycles are needed to identify the…