We rely heavily on Microsoft’s cloud platform Azure during for our analytics workloads at the Austrian Postal Service. Azure has grown rapidly over the past few years and is adding features at a very fast pace, so it is easy to lose track which services are (still) offered and what services one should use .
In this post, I am going to explore and compare different ways to do machine learning in Azure. In particular, I am going to focus on the following services:
- Azure Machine Learning Studio
- Azure Machine Learning Services
- Azure Container Services
For each service I will cover features offered, deployment options, pricing model, target-users and availability regions as of Feburary 27, 2018.
At of the post I will include a short checklist with important points (imo) to consider before choosing a specific service.
Azure Machine Learning Studio
Azure Machine Learning Studio is perhaps the easiest option to experiment and deploy a machine learning solution. It provides a browser-based drag-and-drop interface that allows users to create a machine learning pipeline by connecting various pre-configured elements:
Moreover, it allows more advanced users to include R and Python scripts as well. The final model can be published as a web service. The service includes various templates for anomaly detection, customer analysis etc. There is also an R package available on CRAN called AzureML that allows you to:
- Connect to your Azure ML workspace
- Up/download datasets
- Publish/consume/update Azure ML web services
You can also connect to AzureML using Python. Just run pip install azureml
and you are ready to go:) More detailed documentation is available on Github.
The pricing model has two components:
- Development: FREE and STANDARD tier (about € 8.5 per user/month + € 0.85 per compute hour)
- Deployment: free for DEV/TEST and STANDARD tiers that charge between €85 and €8,500 per month
which give you a certain number of compute hours, transactions and number of web services running per month. For more details check the Azure pricing calculator
West and North Europe are both offered as availability regions. The target-user group for this service covers both data analysts and data scientists.
Azure Machine Learning Services
Microsoft markets Azure ML Serives (not Studio!) as an ‘end-to-end, scalable, trusted platform’, which begs the question why someone would want to use an untrusted platform:) AzureML Services consists of the following components:
A desktop data-science development-environment called:
- Workbench
to manage:
- Experimentation service
- Model management
AzureML Workbench runs on W10, WS2016, macOS Sierra or High Sierra only. Azure ML Workbench will also force you to install Python 3.5, Miniconda, Azure ML Data Profile, Azure ML CLI and a couple of other dependencies such as Jupyter. So it seems that changing the Python interpreter is not possible at the moment. AzureML Workbench does not support R at the time of writing.
In order to use AzureML Workbench, you need to create an Azure Experimentation service and Model management service. AzureML Workbench is then used to connect to these services.
Workbench supports:
- Data science projects
- Run history of submitted models
- Jupyter notebook integration
- Working with Python IDEs (currently VS Code and PyCharm only)
I found the following graphic buried in Microsoft’s online documentation really helpful:
The Experimentation account contains workspaces with 1:n projects. Workspaces are used for sharing and collaborating in Azure ML and as the name implies, for experimenting with different models. You can execute experiments either:
- locally in Python,
- locally in a Docker container,
- remotely in a Docker container or
- in a Spark cluster in Azure
The Model Management service is used to register models and manifests (= recipe to create a Docker container image; automatically generated by Model Management). Furthermore, Model Management is used to deploy models either locally or as web services and monitor their performance. The service is available in West Europe.
The pricing model contains two components:
1) Experimentation account + Model Management and on top
2) Compute/Storage/other Azure resources
The pricing section of AzureML Services quotes prices for Experimentation accounts (2 users for free, afterwards €42 per user per month) and prices for Model Management (from free to about €2,100 p.m. for the largest tier). The tiers come with “available cores”, but the number of cores refers to the maximum number the account may have active at any given time and NOT the charges for compute hours. Running the models during experimentation and deployment incurs extra charges on top. Extra charges can also be incurred for any Azure services consumed in conjunction with Azure Machine Learning such as compute charges, storage charges, usage of Azure Container Services/Registry, Azure Key Vault, etc.
To sum it up: AzureML Services is basically a wrapper around Docker and Kubernetes to help Data Scientists deploy models without having to explicitly deal with those technologies. While I really, like this idea the fact that AzureML Workbench is still in preview shows. Trying to install the Workbench failed 3x and a quick search through Microsoft’s online discussion boards shows that I am far from alone in having issues with the software. Update: As of today (2018-03-31) the installation works:) Overall, I like the concept and the bugs are probably going to disappear sooner than later. If the bugs are fixed, AzureML Services is certainly worth looking at especially if you do not want to deal with Docker/Kubernetes yourself and you are willing to pay Microsoft for doing it for you.
Azure Container Services
With Azure Container Services you can bring your own docker container and deploy it using a variety of container orchestrators, such as (fully managed) Kubernetes, Docker, or DC/OS. You can specify which container registry to use (e.g. Azure Container Registry, Dockerhub, etc.) and scale your resources as needed.
The pricing model is per resource consumed and the service is available in West Europe. This option offers the greatest flexibility and is cheaper than AzureML Services, because you do not pay Microsoft for holding your hand.
Which option to choose depends on multiple factors, since every platform has its pros and cons. The following section provides a few questions you might want to ask yourself before choosing any specific option:
Data science platform checklist
The following checklist is by no means complete, but maybe helpful in choosing data science platforms:
- Availability region: Is my company restricted to certain regions (e.g. Germany, West Europe, etc) and is the service available in this region?
- Does the service offer all necessary libraries for my project and if not, is there an option to install them?
- Does the service support the appropriate runtime of R/Python I want to use?
- Does the service provide public endpoints (i.e. can it be reached by a public ip address) and if yes, does my corporate security policy allow such services?
- Is the service GDPR compliant? If in doubt, check with your companies security department and/or the platform vendor.
- How easy is it to port an existing data science model to a different platform, i.e. how strong is the lock-in-effect (e.g. from Azure to AWS or back)?
- What kind of load am I expecting and can the service scale appropriately?
- What does pricing look like for different usage scenarios?
- Do we have the technical skills necessary to use a specific option?
Some more interesting resources
I found the following tools/projects interesting, but since they do not depend on any specific platform I include them here for reference only:
- Visual Studio Code Tools for AI
- AI Toolkit for Azure IoT Edge
- MMLSpark Microsoft Machine Learning for Apache Spark
- AZTK on-demand dockerized Spark jobs on Azure (powered by Azure Batch)