Like its competitors, Microsoft Azure offers solutions for managing data science projects. Azure Machine Learning integrates tools, languages (Python and R) and frameworks (Keras, TensorFlow, mxnet, XGBoost, ONNX, Sci-Kit Learn, etc.) into a work environment based on Azure Cloud Bricks.
Continuation of the article below
With the new capabilities of Azure Machine Learning, Microsoft wants to make the work of data scientists easier. A trend that generalizes with regard to the approaches of other providers and publishers. The first announcement is more about the internal management of the service to manage only one version. The cloud giant does without the Enterprise Edition and puts all functions in the Base Edition. The Enterprise version will be discontinued on January 1, 2021. In the meantime, customer accounts will be migrated to the basic version. Microsoft only calculates the resources used in its cloud.
Of more interest to users are three products that are generally available: Designer, AutoML UI, and ML Labeling Assist.
Tools to simplify the work of data scientists in the Azure cloud
Designer is a no-code, drag-and-drop model making service for no code machine learning. It should facilitate the connection between data sets and “modules” for data preparation and processing (algorithms, format conversion, ETL tools, provision of manual training and inference ML pipelines, etc.). . If the user interface is not identical, the parallelism to Data Science Studios Flow, the platform from Dataiku, is obvious. Only the resources are available on the left side of the user interface. In terms of algorithms, Microsoft offers regression modules (six including random forests, linear regression, or a neural network for regression), two-class and multi-class classification (11 total), and clustering (K-). means). Designer also has recommendation, anomaly detection, and computer vision modules. Algorithms can be developed in R or Python and executed via designers.
Despite its obvious simplicity, designer is not for everyone, although Microsoft does provide advice and recommendations based on use cases. It truly is a handy turnkey studio for the most seasoned data scientist and analyst. This continues to require the intervention of data engineers and data architects to best configure the integrations from the various storage services and Azure databases (Azure Blob Storage, to name just one). Training is conducted on Azure Machine Learning instances while models are deployed to infer Azure Kubernetes Service (AKS).
The AutoML UI tool (referred to as Automated ML in the documentation) is also an interface-based tool for providing regression classification and predictive machine learning models for time series data. Users have the choice between a Python SDK and the “Studio Experience”, which offers a no-code interface. You still need to know what kinds of algorithms to apply, characterize the data and prepare its formats, configure the IT resources (locally or depending on certain conditions on the many Azure options) and then start the automation process for feature engineering . AutoML then compares the parameters and models that best match the exercise that is imposed.
ML Labeling Assist is a tool for automating the labeling of data, especially images for processing by Computer Vision. One of the longest (and often tedious) operations in data science. Here, too, the studio service uses automated models. This tool prompts you to specify a GPU before starting two operations: clustering and pre-labeling. To use it, a minimum of 300 images must be manually tagged, which are contained in records of up to 500,000 images. It can then be checked whether the ML Labeling Assist is correctly following the instructions given.
Telework: Secure access to ML environments
Other features included in Azure Machine Learning are previewed. Two of them seem directly inspired by the teleworking that has been imposed due to the current health crisis. Microsoft is therefore introducing an enhanced RBAC (in English, role-based access control) to better manage access permissions and roles in work environments. Workspace Private Link enables an Azure ML environment to be used from a private IP address in a virtual network (VNet).
Microsoft Azure updates the Mlflow distribution included with Azure Machine Learning. Mlflow is the open source platform for MLOps that manages the lifecycle of end-to-end machine learning models originally developed by DataBricks. Additionally, another version of MLflow is available in Azure Databricks, the Azure version of the Spark-based analytics service (and the ACID layer over a data lake, Delta Lake).
After all, the cloud giant with Azure ML is closer to the AWS strategy with SageMaker than that of Google Cloud with AI Platform. Let’s be clear, it’s mainly the way the tools are offered that differ from each other. On the one hand it is possible to call different functions (Azure and AWS) in one environment, on the other hand you have to create your platform (GCP). But it will really be the cost and use that the CIOs decide in terms of standardizing what is offered between these three actors. Unless they contact Oracle or IBM.