14.11.2017 | Lauri Lehman
Microsoft Ignite 2017 saw some big changes to the Microsoft Azure's Data Science portfolio. These changes improve the way machine learning models are developed in Azure and allow deeper integration of machine learning with Azure applications. Developing and experimenting with different models is now faster and more efficient than before using these new tools.
The most important new tools announced in Ignite are Azure ML Workbench, Azure ML Model Management ja Visual Studio Code Tools for AI. Azure ML Workbench is a new desktop tool that makes handling and developing new models easier than before. Azure ML Model Management is a new service for versioning and managing machine learning models in production. Visual Studio Code Tools for AI offers a toolbox for efficient development and management of ML models in Azure. Read below about my first impressions of these new tools and features.
Thus far, development of machine learning models in Azure has been based on Azure ML Studio, which is a browser-based graphical tool. In Ignite 2017, Microsoft announced "Azure ML Services" that provides a new way to develop machine learning models in the cloud. Model development in Azure ML Services is done in Azure ML Workbench, a new desktop tool for Windows and Mac. AML Workbench is a script-based tool and it does not offer a graphical UI for model development like AML Studio. AML Workbench integrates seamlessly with Azure and it allows efficient deployment of ML models into the cloud.
The development language in AML Workbench is Python (Microsoft says that Scala is also supported, but all the examples seem to be in Python at the moment). AML Workbench includes a dedicated Python environment that is isolated from other locally installed Python environments. Only Python version 3 is supported; at the time of writing the installation included Python 3.5.2 from Continuum Analytics. Installing new libraries is very easy with pip, and there is a decent variety of deep learning libraries available, for example.
At first sight, the best feature about the AML Workbench is the ability to train and score models locally. In Azure ML Studio, the data must be uploaded as a Dataset in AML Studio or to an external database such as Azure Blob. In AML Workbench the input data can be loaded from a local disk and the results can be saved on the local disk. This facilitates the Data Science process drastically, as the data can be modified instantly and the results can be visualized locally using any visualization tools you might prefer. Also, the script-based approach allows to tinker with models much more efficiently, since the scripts can be modified in an instant and the effects of those changes can be checked immediately.
Another exciting feature about the AML Workbench is the data wrangling experience. AML workbench offers a graphical UI for preprocessing and simple transformations of data. Tasks such as removing empty values, filtering values, transforming data types and adding new data fields can now be done easily and intuitively. The data wrangling process is in some ways very similar to that of MS Power BI, which I don't mind at all. According to Microsoft, the data wrangling feature uses AI, but during this short test it was not clear what that means in practice.
AML Workbench models can be run locally either "natively" or inside a Docker container. The models created in AML Workbench can be published to the new Azure ML Model Management Service, which is distinct from the familiar AML Studio Web Service. AML Model Management is based on Docker containers, which are widely used in other Azure services as well. An excellent feature of the AML Workbench is also the ability to train and score models in Azure HDInsight Spark clusters. This means that the same scripts can be used, in case you want to change from the AML Experimentation service to HDInsight.
As it is, Microsoft is now offering two parallel PaaS services for machine learning: AML Studio and AML Services (HDInsight is actually a third service, but it integrates well with AML Services). The picture below shows how the new resources look in Azure at the moment. Microsoft has not given a clear signal about the future plans for AML Studio, but I do not believe that it will be completely phased out. AML Studio and AML Services offer very different user experiences, which are complementary. Most likely both will remain in the catalogue in form or another.
AML Workbench integrates currently with two IDE tools, Visual Studio Code ja pyCharm. Read below for more information about VS Code.
Visual Studio Code is Microsoft's light software and script development tool. It is particularly well suited for developing serverless Azure applications. In Ignite 2017, Microsoft announced VS Code Tools for AI, a new extension that makes developing machine learning models easier and more efficient. ML models can also be developed in AML Workbench, but the code editing features are somewhat lacking in AML Workbench. VS Code offers a full-blown programming tool with extensive support for Python and the possibility to debug programs, for example.
Machine learning models can be developed entirely within VS Code and scripts can be run locally in VS Code similarly as in AML Workbench. The VS Code user experience seems very fluent initially, and I think that I will use VS Code as my principal editor when developing ML models in the future.
One of the best features about VS Code Tools for AI is integration with Microsoft's model templates. Creating a new project based on a model template is effortless, and the templates are well documented in the web. This is a great way to start investigating and experimenting with unfamiliar models.
If you also have an Azure Data Science Virtual Machine at your disposal, it is also possible to train and score models in the cloud. Switching to the cloud requires only creating a new run configuration. Once the run configuration is created, changing execution environments is very easy using a simple dropdown menu. Remote execution uses Docker containers, meaning that the execution environment is included with the script. This allows to run the scripts using the development environment on a different platform. The AML Model Management service uses Docker as well, so the process of publishing and managing ML models in Azure is now very streamlined.
All the products discussed in this article are still in the Preview phase (the GA date has not been announced yet, but I would put my bet on late 2017 or early 2018). Therefore it is not a polished product yet, and getting the examples work might require some troubleshooting at times. However, I see a lot of potential in these tools and I recommend every Azure Data Scientist to try them out. Microsoft develops their products based on user feedback, so it is good to be vocal about bugs and missing features. I believe that Microsoft will put a significant emphasis on developing these tools in the future, so jump to the wagon as early as possible!