diff --git a/bootstrap/README.md b/bootstrap/README.md index bf7842db..443f926a 100644 --- a/bootstrap/README.md +++ b/bootstrap/README.md @@ -1,6 +1,6 @@ # Bootstrap from MLOpsPython repository -To use this existing project structure and scripts for your new ML project, you can quickly get started from the existing repository, bootstrap and create a template that works for your ML project. Bootstrapping will prepare a similar directory structure for your project which includes renaming files and folders, deleting and cleaning up some directories and fixing imports and absolute path based on your project name. This will enable reusing various resources like pre-built pipelines and scripts for your new project. +To use this existing project structure and scripts for your new ML project, you can quickly get started from the existing repository, bootstrap and create a template that works for your ML project. Bootstrapping will prepare a similar directory structure for your project which includes renaming files and folders, deleting and cleaning up some directories and fixing imports and absolute path based on your project name. This will enable reusing various resources like pre-built pipelines and scripts for your new project. ## Generating the project structure @@ -10,14 +10,19 @@ To bootstrap from the existing MLOpsPython repository clone this repository, ens Where `[dirpath]` is the absolute path to the root of your directory where MLOps repo is cloned and `[projectname]` is the name of your ML project. -The script renames folders, files and files' content from the base project name `diabetes` to your project name. However, you might need to manually rename variables defined in a variable group and their values. +The script renames folders, files and files' content from the base project name `diabetes` to your project name. However, you might need to manually rename variables defined in a variable group and their values. [This article](https://docs.microsoft.com/azure/machine-learning/tutorial-convert-ml-experiment-to-production#use-your-own-model-with-mlopspython-code-template) will also assist to use this code template for your own ML project. +### Using an existing dataset + +The training ML pipeline uses a [sample diabetes dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html) as training data. To use your own data, you need to [create a Dataset](https://docs.microsoft.com/azure/machine-learning/how-to-create-register-datasets) in your workspace and add a DATASET_NAME variable in the ***devopsforai-aml-vg*** variable group with the Dataset name. You'll also need to modify the test cases in the **ml_service/util/smoke_test_scoring_service.py** script to match the schema of the training features in your dataset. + ## Customizing the CI and AML environments In your project you will want to customize your own Docker image and Conda environment to use only the dependencies and tools required for your use case. This requires you to edit the following environment definition files: + - The Azure ML training and scoring Conda environment defined in [conda_dependencies.yml](diabetes_regression/conda_dependencies.yml). - The CI Docker image and Conda environment used by the Azure DevOps build agent. See [instructions for customizing the Azure DevOps job container](../docs/custom_container.md). -You will want to synchronize dependency versions as appropriate between both environment definitions (for example, ML libraries used both in training and in unit tests). \ No newline at end of file +You will want to synchronize dependency versions as appropriate between both environment definitions (for example, ML libraries used both in training and in unit tests). diff --git a/docs/getting_started.md b/docs/getting_started.md index 5bbfe0bb..0638e21c 100644 --- a/docs/getting_started.md +++ b/docs/getting_started.md @@ -1,238 +1,206 @@ -# Getting Started with this Repo -## Create an Azure DevOps organization +# Getting Started with MLOpsPython -We use Azure DevOps for running our multi-stage pipeline with build(CI), ML training and scoring service release -(CD) stages. If you don't already have an Azure DevOps organization, create one by -following the instructions [here](https://docs.microsoft.com/en-us/azure/devops/organizations/accounts/create-organization?view=azure-devops). +This guide shows how to get MLOpsPython working with a sample ML project ***diabetes_regression***. The project creates a linear regression model to predict diabetes. You can adapt this example to use with your own project. -If you already have an Azure DevOps organization, create a [new project](https://docs.microsoft.com/en-us/azure/devops/organizations/projects/create-project?view=azure-devops). +We recommend working through this guide completely to ensure everything is working in your environment. After the sample is working, follow the [bootstrap instructions](../bootstrap/README.md) to convert the ***diabetes_regression*** sample into a starting point for your project. -## Decide best option to copy repository code +- [Setting up Azure DevOps](#setting-up-azure-devops) +- [Get the code](#get-the-code) +- [Create a Variable Group for your Pipeline](#create-a-variable-group-for-your-pipeline) + - [Variable Descriptions](#variable-descriptions) +- [Provisioning resources using Azure Pipelines](#provisioning-resources-using-azure-pipelines) + - [Create an Azure DevOps Service Connection for the Azure Resource Manager](#create-an-azure-devops-service-connection-for-the-azure-resource-manager) + - [Create the IaC Pipeline](#create-the-iac-pipeline) +- [Create an Azure DevOps Service Connection for the Azure ML Workspace](#create-an-azure-devops-service-connection-for-the-azure-ml-workspace) +- [Set up Build, Release Trigger, and Release Multi-Stage Pipeline](#set-up-build-release-trigger-and-release-multi-stage-pipeline) + - [Set up the Pipeline](#set-up-the-pipeline) +- [Further Exploration](#further-exploration) + - [Deploy the model to Azure Kubernetes Service](#deploy-the-model-to-azure-kubernetes-service) + - [Deploy the model to Azure App Service (Azure Web App for containers)](#deploy-the-model-to-azure-app-service-azure-web-app-for-containers) + - [Example pipelines using R](#example-pipelines-using-r) + - [Observability and Monitoring](#observability-and-monitoring) + - [Clean up the example resources](#clean-up-the-example-resources) +- [Next Steps: Integrating your project](#next-steps-integrating-your-project) + - [Additional Variables and Configuration](#additional-variables-and-configuration) + - [More variable options](#more-variable-options) + - [Local configuration](#local-configuration) -* Fork this repository if there is a desire to contribute back to the repository else -* Use this [code template](https://github.com/microsoft/MLOpsPython/generate) which copies the entire code base to your own GitHub location with the git commit history restarted. This can be used for learning and following the guide. +## Setting up Azure DevOps -This repository contains a template and demonstrates how to apply it to a sample ML project ***diabetes_regression*** that creates a linear regression model to predict the diabetes. +You'll use Azure DevOps for running the multi-stage pipeline with build, model training, and scoring service release stages. If you don't already have an Azure DevOps organization, create one by following the instructions at [Quickstart: Create an organization or project collection](https://docs.microsoft.com/en-us/azure/devops/organizations/accounts/create-organization?view=azure-devops). -If the desire is to adopt this template for your project and to use it with your machine learning code, it is recommended to go through this guide as it is first. This ensures everything is working on your environment. After the sample is working, follow the [bootstrap instructions](../bootstrap/README.md) to convert the ***diabetes_regression*** sample into your project starting point. +If you already have an Azure DevOps organization, create a new project using the guide at [Create a project in Azure DevOps and TFS](https://docs.microsoft.com/en-us/azure/devops/organizations/projects/create-project?view=azure-devops). -## Create a Variable Group for your Pipeline - -We make use of a variable group inside Azure DevOps to store variables and their -values that we want to make available across multiple pipelines or pipeline stages. You can either -store the values directly in [Azure DevOps](https://docs.microsoft.com/en-us/azure/devops/pipelines/library/variable-groups?view=azure-devops&tabs=designer#create-a-variable-group) -or connect to an Azure Key Vault in your subscription. Please refer to the -documentation [here](https://docs.microsoft.com/en-us/azure/devops/pipelines/library/variable-groups?view=azure-devops&tabs=designer#create-a-variable-group) to -learn more about how to create a variable group and -[link](https://docs.microsoft.com/en-us/azure/devops/pipelines/library/variable-groups?view=azure-devops&tabs=designer#use-a-variable-group) it to your pipeline. -Click on **Library** in the **Pipelines** section as indicated below: +## Get the code -![library_variable groups](./images/library_variable_groups.png) +We recommend using the [repository template](https://github.com/microsoft/MLOpsPython/generate), which effectively forks the repository to your own GitHub location and squashes the history. You can use the resulting repository for this guide and for your own experimentation. -Create a variable group named **``devopsforai-aml-vg``**. The YAML pipeline definitions in this repository refer to this variable group by name. - -The variable group should contain the following required variables: +## Create a Variable Group for your Pipeline -| Variable Name | Suggested Value | -| ------------------------ | ------------------------ | -| BASE_NAME | [unique base name] | -| LOCATION | centralus | -| RESOURCE_GROUP | mlops-RG | -| WORKSPACE_NAME | mlops-AML-WS | -| AZURE_RM_SVC_CONNECTION | azure-resource-connection| -| WORKSPACE_SVC_CONNECTION | aml-workspace-connection | -| ACI_DEPLOYMENT_NAME | diabetes-aci | +MLOpsPython requires some variables to be set before you can run any pipelines. You'll need to create a *variable group* in Azure DevOps to store values that are reused across multiple pipelines or pipeline stages. Either store the values directly in [Azure DevOps](https://docs.microsoft.com/en-us/azure/devops/pipelines/library/variable-groups?view=azure-devops&tabs=designer#create-a-variable-group) or connect to an Azure Key Vault in your subscription. Check out the [Add & use variable groups](https://docs.microsoft.com/en-us/azure/devops/pipelines/library/variable-groups?view=azure-devops&tabs=yaml#use-a-variable-group) documentation to learn more about how to create a variable group and link it to your pipeline. -**Note:** +Navigate to **Library** in the **Pipelines** section as indicated below: -The **WORKSPACE_NAME** parameter is used for the Azure Machine Learning Workspace creation. You can provide an existing AML Workspace here if you have one. +![Library Variable Groups](./images/library_variable_groups.png) -The **BASE_NAME** parameter is used throughout the solution for naming -Azure resources. When the solution is used in a shared subscription, there can -be naming collisions with resources that require unique names like azure blob -storage and registry DNS naming. Make sure to give a unique value to the -BASE_NAME variable (e.g. MyUniqueML), so that the created resources will have -unique names (e.g. MyUniqueMLamlcr, MyUniqueML-AML-KV, etc.). The length of -the BASE_NAME value should not exceed 10 characters and it should contain numbers and letters only. +Create a variable group named **``devopsforai-aml-vg``**. The YAML pipeline definitions in this repository refer to this variable group by name. -The **RESOURCE_GROUP** parameter is used as the name for the resource group that will hold the Azure resources for the solution. If providing an existing AML Workspace, set this value to the corresponding resource group name. +The variable group should contain the following required variables: -The **AZURE_RM_SVC_CONNECTION** parameter is used by the [Azure DevOps pipeline]((../environment_setup/iac-create-environment-pipeline.yml)) that creates the Azure ML workspace and associated resources through Azure Resource Manager. The pipeline requires an **Azure Resource Manager** -[service connection](https://docs.microsoft.com/en-us/azure/devops/pipelines/library/service-endpoints?view=azure-devops&tabs=yaml#create-a-service-connection). +| Variable Name | Suggested Value | Short description | +| ------------------------ | ------------------------- | ---------------------------------------------------------------------------------------------------------------------------- | +| BASE_NAME | [your project name] | Unique naming prefix for created resources - max 10 chars, letters and numbers only | +| LOCATION | centralus | Azure location | +| RESOURCE_GROUP | mlops-RG | Azure Resource Group | +| WORKSPACE_NAME | mlops-AML-WS | Azure ML Workspace name | +| AZURE_RM_SVC_CONNECTION | azure-resource-connection | [Azure Resource Manager Service Connection](#create-an-azure-devops-service-connection-for-the-azure-resource-manager) name | +| WORKSPACE_SVC_CONNECTION | aml-workspace-connection | [Azure ML Workspace Service Connection](#create-an-azure-devops-azure-ml-workspace-service-connection) name | +| ACI_DEPLOYMENT_NAME | diabetes-aci | Azure Container Interface | -![create service connection](./images/create-rm-service-connection.png) +Make sure you select the **Allow access to all pipelines** checkbox in the variable group configuration. -Leave the **``Resource Group``** field empty. +More variables are available for further tweaking, but the above variables are all you need to get started with this example. For more information, see the [Additional Variables and Configuration](#additional-variables-and-configuration) section. -**Note:** Creating the ARM service connection scope requires 'Owner' or 'User Access Administrator' permissions on the subscription. -You must also have sufficient permissions to register an application with -your Azure AD tenant, or receive the ID and secret of a service principal -from your Azure AD Administrator. That principal must have 'Contributor' -permissions on the subscription. +### Variable Descriptions -The **WORKSPACE_SVC_CONNECTION** parameter is used to reference a service connection for the Azure ML workspace. You will create this after provisioning the workspace (we recommend using the IaC pipeline as described below), and installing the Azure ML extension in your Azure DevOps project. +**WORKSPACE_NAME** is used for creating the Azure Machine Learning Workspace. You can provide an existing Azure ML Workspace here if you've got one. -Optionally, a **DATASET_NAME** parameter can be used to reference a training dataset that you have registered in your Azure ML workspace (more details below). +**BASE_NAME** is used as a prefix for naming Azure resources. When sharing an Azure subscription, the prefix allows you to avoid naming collisions for resources that require unique names, for example, Azure Blob Storage and Registry DNS. Make sure to set BASE_NAME to a unique name so that created resources will have unique names, for example, MyUniqueMLamlcr, MyUniqueML-AML-KV, and so on. The length of the BASE_NAME value shouldn't exceed 10 characters and must contain letters and numbers only. -Make sure to select the **Allow access to all pipelines** checkbox in the -variable group configuration. +**RESOURCE_GROUP** is used as the name for the resource group that will hold the Azure resources for the solution. If providing an existing Azure ML Workspace, set this value to the corresponding resource group name. -## More variable options +**AZURE_RM_SVC_CONNECTION** is used by the [Azure Pipeline]((../environment_setup/iac-create-environment-pipeline.yml)) in Azure DevOps that creates the Azure ML workspace and associated resources through Azure Resource Manager. You'll create the connection in a [step below](#create-an-azure-devops-service-connection-for-the-azure-resource-manager). -There are more variables used in the project. They're defined in two places, one for local execution and one for using Azure DevOps Pipelines. +**WORKSPACE_SVC_CONNECTION** is used to reference a [service connection for the Azure ML workspace](#create-an-azure-devops-azure-ml-workspace-service-connection). You'll create the connection after [provisioning the workspace](#provisioning-resources-using-azure-pipelines) in the [Create an Azure DevOps Service Connection for the Azure ML Workspace](#create-an-azure-devops-service-connection-for-the-azure-ml-workspace) section below. -### Local configuration +## Provisioning resources using Azure Pipelines -For instructions on how to set up a local development environment, refer to the [Development environment setup instructions](development_setup.md). +The easiest way to create all required Azure resources (Resource Group, Azure ML Workspace, Container Registry, and others) is to use the **Infrastructure as Code (IaC)** [pipeline in this repository](../environment_setup/iac-create-environment-pipeline.yml). The pipeline takes care of setting up all required resources based on these [Azure Resource Manager templates](../environment_setup/arm-templates/cloud-environment.json). -### Azure DevOps configuration +### Create an Azure DevOps Service Connection for the Azure Resource Manager -For using Azure DevOps Pipelines all other variables are stored in the file `.pipelines/diabetes_regression-variables-template.yml`. Using the default values as a starting point, adjust the variables to suit your requirements. +The [IaC provisioning pipeline]((../environment_setup/iac-create-environment-pipeline.yml)) requires an **Azure Resource Manager** [service connection](https://docs.microsoft.com/en-us/azure/devops/pipelines/library/service-endpoints?view=azure-devops&tabs=yaml#create-a-service-connection). -**Note:** In `diabetes_regression` folder you can find `config.json` file that we would recommend to use in order to provide parameters for training, evaluation and scoring scripts. An example of a such parameter is a hyperparameter of a training algorithm: in our case it's the ridge regression [*alpha* hyperparameter](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html). We don't provide any special serializers for this config file. So, it's up to you which template to support there. +![Create service connection](./images/create-rm-service-connection.png) -Up until now you should have: - -* Forked (or cloned) the repo -* Configured an Azure DevOps project with a service connection to Azure Resource Manager -* Set up a variable group with all configuration values - -## Create Resources with Azure Pipelines +Leave the **``Resource Group``** field empty. -The easiest way to create all required resources (Resource Group, ML Workspace, -Container Registry, Storage Account, etc.) is to leverage an -"Infrastructure as Code" [pipeline in this repository](../environment_setup/iac-create-environment-pipeline.yml). This **IaC** pipeline takes care of setting up -all required resources based on these [ARM templates](../environment_setup/arm-templates/cloud-environment.json). +**Note:** Creating the Azure Resource Manager service connection scope requires 'Owner' or 'User Access Administrator' permissions on the subscription. +You'll also need sufficient permissions to register an application with your Azure AD tenant, or you can get the ID and secret of a service principal from your Azure AD Administrator. That principal must have 'Contributor' permissions on the subscription. -### Create a Build IaC Pipeline +### Create the IaC Pipeline In your Azure DevOps project, create a build pipeline from your forked repository: -![build connnect step](./images/build-connect.png) +![Build connect step](./images/build-connect.png) Select the **Existing Azure Pipelines YAML file** option and set the path to [/environment_setup/iac-create-environment-pipeline.yml](../environment_setup/iac-create-environment-pipeline.yml): -![configure step](./images/select-iac-pipeline.png) +![Configure step](./images/select-iac-pipeline.png) Having done that, run the pipeline: -![iac run](./images/run-iac-pipeline.png) +![IaC run](./images/run-iac-pipeline.png) -Check out the newly created resources in the [Azure Portal](https://portal.azure.com): +Check that the newly created resources appear in the [Azure Portal](https://portal.azure.com): -![created resources](./images/created-resources.png) +![Created resources](./images/created-resources.png) -(Optional) To remove the resources created for this project you can use the [/environment_setup/iac-remove-environment-pipeline.yml](../environment_setup/iac-remove-environment-pipeline.yml) definition or you can just delete the resource group in the [Azure Portal](https://portal.azure.com). +## Create an Azure DevOps Service Connection for the Azure ML Workspace -**Note:** The training ML pipeline uses a [sample diabetes dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html) as training data. To use your own data, you need to [create a Dataset](https://docs.microsoft.com/azure/machine-learning/how-to-create-register-datasets) in your workspace and specify its name in a DATASET_NAME variable in the ***devopsforai-aml-vg*** variable group. You will also need to modify the test cases in the **ml_service/util/smoke_test_scoring_service.py** script to match the schema of the training features in your dataset. +At this point, you should have an Azure ML Workspace created. Similar to the Azure Resource Manager service connection, you need to create an additional one for the Azure ML Workspace. -## Create an Azure DevOps Azure ML Workspace Service Connection +Install the **Azure Machine Learning** extension to your Azure DevOps organization from the [Visual Studio Marketplace](https://marketplace.visualstudio.com/items?itemName=ms-air-aiagility.vss-services-azureml). The extension is required for the service connection. -Install the **Azure Machine Learning** extension to your organization from the -[marketplace](https://marketplace.visualstudio.com/items?itemName=ms-air-aiagility.vss-services-azureml), -so that you can set up a service connection to your AML workspace. +Create a new service connection to your Azure ML Workspace using the [Machine Learning Extension](https://marketplace.visualstudio.com/items?itemName=ms-air-aiagility.vss-services-azureml) instructions to enable executing the Azure ML training pipeline. The connection name needs to match `WORKSPACE_SVC_CONNECTION` that you set in the variable group above. -Create a service connection to your ML workspace via the [Azure DevOps Azure ML task instructions](https://marketplace.visualstudio.com/items?itemName=ms-air-aiagility.vss-services-azureml) to be able to execute the Azure ML training pipeline. The connection name specified here needs to be used for the value of the `WORKSPACE_SVC_CONNECTION` set in the variable group above. +![Created resources](./images/ml-ws-svc-connection.png) -![created resources](./images/ml-ws-svc-connection.png) - -**Note:** Creating service connection with Azure Machine Learning workspace scope requires 'Owner' or 'User Access Administrator' permissions on the Workspace. -You must also have sufficient permissions to register an application with -your Azure AD tenant, or receive the ID and secret of a service principal -from your Azure AD Administrator. That principal must have Contributor -permissions on the Azure ML Workspace. +**Note:** Similar to the Azure Resource Manager service connection you created earlier, creating a service connection with Azure Machine Learning workspace scope requires 'Owner' or 'User Access Administrator' permissions on the Workspace. +You'll need sufficient permissions to register an application with your Azure AD tenant, or you can get the ID and secret of a service principal from your Azure AD Administrator. That principal must have Contributor permissions on the Azure ML Workspace. ## Set up Build, Release Trigger, and Release Multi-Stage Pipeline -Now that you have all the required resources created from the IaC pipeline, -you can set up the pipeline necessary for deploying your ML model -to production. The pipeline has a sequence of stages for: +Now that you've provisioned all the required Azure resources and service connections, you can set up the pipeline for deploying your machine learning model to production. The pipeline has a sequence of stages for: -1. **Model Code Continuous Integration:** triggered on code change to master branch on GitHub, -performs linting, unit testing and publishes a training pipeline. +1. **Model Code Continuous Integration:** triggered on code changes to master branch on GitHub. Runs linting, unit tests, code coverage and publishes a training pipeline. 1. **Train Model**: invokes the Azure ML service to trigger the published training pipeline to train, evaluate, and register a model. -1. **Release Deployment:** deploys a model to ACI, AKS and Azure App Service environments. +1. **Release Deployment:** deploys a model to either [Azure Container Instances (ACI)](https://azure.microsoft.com/en-us/services/container-instances/), [Azure Kubernetes Service (AKS)](https://azure.microsoft.com/en-us/services/kubernetes-service), or [Azure App Service](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-app-service) environments. For simplicity, you're going to initially focus on Azure Container Instances. See [Further Exploration](#further-exploration) for other deployment types. + 1. **Note:** Edit the pipeline definition to remove unused stages. For example, if you're deploying to Azure Container Instances and Azure Kubernetes Service only, delete the unused `Deploy_Webapp` stage. ### Set up the Pipeline -In your [Azure DevOps](https://dev.azure.com) project create and run a new build -pipeline referring to the [diabetes_regression-ci.yml](../.pipelines/diabetes_regression-ci.yml) -pipeline definition in your forked repository: +In your Azure DevOps project, create and run a new build pipeline based on the [diabetes_regression-ci.yml](../.pipelines/diabetes_regression-ci.yml) +pipeline definition in your forked repository. + +![Configure CI build pipeline](./images/ci-build-pipeline-configure.png) -![configure ci build pipeline](./images/ci-build-pipeline-configure.png) +Once the pipeline is finished, check the execution result: -Once the pipeline is finished, explore the execution result: +![Build](./images/multi-stage-aci.png) -![build](./images/multi-stage-aci.png) +Also check the published training pipeline in the **mlops-AML-WS** workspace in [Azure Portal](https://portal.azure.com/): -and check out the published training pipeline in the **mlops-AML-WS** workspace in [Azure Portal](https://portal.azure.com/): +![Training pipeline](./images/training-pipeline.png) -![training pipeline](./images/training-pipeline.png) +Great, you now have the build pipeline set up which automatically triggers every time there's a change in the master branch! -Great, you now have the build pipeline set up which automatically triggers every time there's a change in the master branch. +* The first stage of the pipeline, **Model CI**, does linting, unit testing, code coverage, building, and publishes an **ML Training Pipeline** in an **ML Workspace**. -* The first stage of the pipeline, **Model CI**, performs linting, unit testing, build and publishes an **ML Training Pipeline** in an **ML Workspace**. +* The second stage of the pipeline, **Train model**, triggers the run of the Azure ML training pipeline. The training pipeline will train, evaluate, and register a new model. The actual computation happens on an [Azure Machine Learning Compute cluster](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute). In Azure DevOps, the stage runs an agentless job that waits for the completion of the Azure ML job. This allows the pipeline to wait for training completion for hours or even days without using agent resources. - **Note:** The build pipeline also supports building and publishing ML -pipelines using R to train a model. This is enabled -by changing the `build-train-script` pipeline variable to either of: -* `diabetes_regression_build_train_pipeline_with_r.py` to train a model -with R on Azure ML Compute. You will also need to uncomment (i.e. include) the -`r-essentials` Conda packages in the environment definition -`diabetes_regression/conda_dependencies.yml`. -* `diabetes_regression_build_train_pipeline_with_r_on_dbricks.py` -to train a model with R on Databricks. You will need -to manually create a Databricks cluster and attach it to the ML Workspace as a -compute (Values DB_CLUSTER_ID and DATABRICKS_COMPUTE_NAME variables should be -specified). Example ML pipelines using R have a single step to train a model. They don't demonstrate how to evaluate and register a model. The evaluation and registering techniques are shown only in the Python implementation. +* **Note:** If the model evaluation determines that the new model doesn't perform any better than the previous one, the new model won't register and the pipeline will be **canceled**. + * In this case, you'll see a message in the 'Train Model' job under the 'Determine if evaluation succeeded and new model is registered' step saying '**Model was not registered for this run.**' + * See [evaluate_model.py](../diabetes_regression/evaluate/evaluate_model.py#L118) for the evaluation logic and [diabetes_regression_verify_train_pipeline.py](../ml_service/pipelines/diabetes_regression_verify_train_pipeline.py#L54) for the pipeline reporting logic. + * [Additional Variables and Configuration](#additional-variables-and-configuration) for configuring this and other behavior. -* The second stage of the pipeline, **Train model**, triggers the run of the ML Training Pipeline. The training pipeline will train, evaluate, and register a new model. The actual computation is performed in an [Azure Machine Learning Compute cluster](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute). In Azure DevOps, this stage runs an agentless job that waits for the completion of the Azure ML job, allowing the pipeline to wait for training completion for hours or even days without using agent resources. +* The third stage of the pipeline, **Deploy to ACI**, deploys the model to the QA environment in [Azure Container Instances](https://azure.microsoft.com/en-us/services/container-instances/). After deployment, it runs a *smoke test* for validation. The test sends a sample query to the scoring web service and verifies that it returns the expected response. Have a look at the [smoke test code](../ml_service/util/smoke_test_scoring_service.py) for an example. -**Note:** If the model evaluation determines that the new model does not perform better than the previous one then the new model will not be registered and the pipeline will be cancelled. +The pipeline uses a Docker container on the Azure Pipelines agents to accomplish the pipeline steps. The container image ***mcr.microsoft.com/mlops/python:latest*** is built with [this Dockerfile](../environment_setup/Dockerfile) and has all the necessary dependencies installed for MLOpsPython and ***diabetes_regression***. This image is an example of a custom Docker image with a pre-baked environment. The environment is guaranteed to be the same on any building agent, VM, or local machine. In your project, you'll want to build your own Docker image that only contains the dependencies and tools required for your use case. Your image will probably be smaller and faster, and it will be maintained by your team. -* The third stage of the pipeline, **Deploy to ACI**, deploys the model to the QA environment in [Azure Container Instances](https://azure.microsoft.com/en-us/services/container-instances/). It then runs a *smoke test* to validate the deployment, i.e. sends a sample query to the scoring web service and verifies that it returns a response in the expected format. +After the pipeline is finished, you'll see a new model in the **ML Workspace**: -The pipeline uses a Docker container on the Azure Pipelines agents to accomplish the pipeline steps. The image of the container ***mcr.microsoft.com/mlops/python:latest*** is built with this [Dockerfile](../environment_setup/Dockerfile) and it has all necessary dependencies installed for the purposes of this repository. This image serves as an example of using a custom Docker image that provides a pre-baked environment. This environment is guaranteed to be the same on any building agent, VM or local machine. In your project you will want to build your own Docker image that only contains the dependencies and tools required for your use case. This image will be more likely smaller and therefore faster, and it will be totally maintained by your team. +![Trained model](./images/trained-model.png) -Wait until the pipeline finishes and verify that there is a new model in the **ML Workspace**: +To disable the automatic trigger of the training pipeline, change the `auto-trigger-training` variable as listed in the `.pipelines\diabetes_regression-ci.yml` pipeline to `false`. You can also override the variable at runtime execution of the pipeline. -![trained model](./images/trained-model.png) +To skip model training and registration, and deploy a model successfully registered by a previous build (for testing changes to the score file or inference configuration), add the variable `MODEL_BUILD_ID` when the pipeline is queued, and set the value to the ID of the previous build. -To disable the automatic trigger of the training pipeline, change the `auto-trigger-training` variable as listed in the `.pipelines\diabetes_regression-ci.yml` pipeline to `false`. This can also be overridden at runtime execution of the pipeline. +## Further Exploration -To skip model training and registration, and deploy a model successfully registered by a previous build (for testing changes to the score file or inference configuration), add the variable `MODEL_BUILD_ID` when the pipeline is queued, and set the value to the id of the previous build. +You should now have a working pipeline that can get you started with MLOpsPython. Below are some additional features offered that might suit your scenario. -### Deploy the Model to Azure Kubernetes Service +### Deploy the model to Azure Kubernetes Service -The final stage is to deploy the model to the production environment running on -[Azure Kubernetes Service](https://azure.microsoft.com/en-us/services/kubernetes-service). +MLOpsPython also can deploy to [Azure Kubernetes Service](https://azure.microsoft.com/en-us/services/kubernetes-service). -**Note:** Creating a Kubernetes cluster on AKS is out of scope of this -tutorial, but you can find set up information -[here](https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough-portal#create-an-aks-cluster). +Creating a cluster on Azure Kubernetes Service is out of scope of this tutorial, but you can find set up information on the [Quickstart: Deploy an Azure Kubernetes Service (AKS) cluster using the Azure portal](https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough-portal#create-an-aks-cluster) page. -**Note:** If your target deployment environment is a K8s cluster and you want to implement Canary and/or A/B testing deployment strategies check out this [tutorial](./canary_ab_deployment.md). +**Note:** If your target deployment environment is a Kubernetes cluster and you want to implement Canary and/or A/B testing deployment strategies, check out this [tutorial](./canary_ab_deployment.md). -In the Variables tab, edit your variable group (`devopsforai-aml-vg`). In the variable group definition, add the following variables: +Keep the Azure Container Instances deployment active because it's a lightweight way to validate changes before deploying to Azure Kubernetes Service. + +In the Variables tab, edit your variable group (`devopsforai-aml-vg`). In the variable group definition, add these variables: | Variable Name | Suggested Value | | ------------------- | --------------- | | AKS_COMPUTE_NAME | aks | | AKS_DEPLOYMENT_NAME | diabetes-aks | -Set **AKS_COMPUTE_NAME** to the *Compute name* of the Inference Cluster referencing your AKS cluster in your Azure ML Workspace. +Set **AKS_COMPUTE_NAME** to the *Compute name* of the Inference Cluster that references the Azure Kubernetes Service cluster in your Azure ML Workspace. After successfully deploying to Azure Container Instances, the next stage will deploy the model to Kubernetes and run a smoke test. ![build](./images/multi-stage-aci-aks.png) -## Deploy the Model to Azure App Service (Azure Web App for containers) +Consider enabling [manual approvals](https://docs.microsoft.com/en-us/azure/devops/pipelines/process/approvals) before the deployment stages. + +### Deploy the model to Azure App Service (Azure Web App for containers) -Note: This is an optional step and can be used only if you are [deploying your -scoring service on Azure App Service](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-app-service). +If you want to deploy your scoring service as an [Azure App Service](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-app-service) instead of Azure Container Instances and Azure Kubernetes Service, follow these additional steps. -In the Variables tab, edit your variable group (`devopsforai-aml-vg`). In the variable group definition, add the following variable: +In the Variables tab, edit your variable group (`devopsforai-aml-vg`) and add a variable: | Variable Name | Suggested Value | | ---------------------- | ---------------------- | @@ -242,43 +210,63 @@ Set **WEBAPP_DEPLOYMENT_NAME** to the name of your Azure Web App. This app must Delete the **ACI_DEPLOYMENT_NAME** variable. -The pipeline uses the [Create Image Script](../ml_service/util/create_scoring_image.py) -to create a scoring image. The image -created by this script will be registered under Azure Container Registry (ACR) -instance that belongs to Azure Machine Learning Service. Any dependencies that -scoring file depends on can also be packaged with the container with Image -config. -[Learn more on how to create a container with AML SDK](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.image.image.image?view=azure-ml-py#create-workspace--name--models--image-config-). +The pipeline uses the [Create Image Script](../ml_service/util/create_scoring_image.py) to create a scoring image. The image will be registered under an Azure Container Registry instance that belongs to the Azure Machine Learning Service. Any dependencies that the scoring file depends on can also be packaged with the container with an image config. Learn more about how to create a container using the Azure ML SDK with the [Image class](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.image.image.image?view=azure-ml-py#create-workspace--name--models--image-config-) API documentation. -Make sure your webapp has the credentials to pull the image from the Azure Container Registry created by the Infrastructure as Code pipeline. You could do this by following the instructions in the section [Configure registry credentials in web app](https://docs.microsoft.com/en-us/azure/devops/pipelines/targets/webapp-on-container-linux?view=azure-devops&tabs=dotnet-core%2Cyaml#configure-registry-credentials-in-web-app). Note that you must have run the pipeline once (including the Deploy to Webapp stage up to the `Create scoring image` step) so that an image is present in the registry, before you can connect the Webapp to the Azure Container Registry in the Azure Portal. +Make sure your webapp has the credentials to pull the image from the Azure Container Registry created by the Infrastructure as Code pipeline. Instructions can be found on the [Configure registry credentials in web app](https://docs.microsoft.com/en-us/azure/devops/pipelines/targets/webapp-on-container-linux?view=azure-devops&tabs=dotnet-core%2Cyaml#configure-registry-credentials-in-web-app) page. You'll need to run the pipeline once (including the Deploy to Webapp stage up to the `Create scoring image` step) so an image is present in the registry. After that, you can connect the Webapp to the Azure Container Registry in the Azure Portal. ![build](./images/multi-stage-webapp.png) -## Next steps - -* You may wish to follow the [bootstrap instructions](../bootstrap/README.md) to create a starting point for your project use case. -* Use the [Convert ML experimental code to production code](https://docs.microsoft.com/azure/machine-learning/tutorial-convert-ml-experiment-to-production#use-your-own-model-with-mlopspython-code-template) tutorial which explains how to bring your machine learning code on top of this template. -* The provided pipeline definition YAML file is a sample starting point, which you should tailor to your processes and environment. -* You should edit the pipeline definition to remove unused stages. For example, if you are deploying to ACI and AKS, you should delete the unused `Deploy_Webapp` stage. -* You may wish to enable [manual approvals](https://docs.microsoft.com/en-us/azure/devops/pipelines/process/approvals) before the deployment stages. -* You may want to use [Azure DevOps self-hosted agents](https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/agents?view=azure-devops&tabs=browser#install) to speed up your ML pipeline execution. The Docker container image for the ML pipeline is sizable, and having it cached on the agent between runs can trim several minutes from your runs. -* You can install additional Conda or pip packages by modifying the YAML environment configurations under the `diabetes_regression` directory. Make sure to use fixed version numbers for all packages to ensure reproducibility, and use the same versions across environments. -* You can explore aspects of model observability in the solution, such as: - * **Logging**: navigate to the Application Insights instance linked to the Azure ML Portal, - then to the Logs (Analytics) pane. The following sample query correlates HTTP requests with custom logs - generated in `score.py`, and can be used for example to analyze query duration vs. scoring batch size: - - let Traceinfo=traces - | extend d=parse_json(tostring(customDimensions.Content)) - | project workspace=customDimensions.["Workspace Name"], - service=customDimensions.["Service Name"], - NumberOfPredictions=tostring(d.NumberOfPredictions), - id=tostring(d.RequestId), - TraceParent=tostring(d.TraceParent); - requests - | project timestamp, id, success, resultCode, duration - | join kind=fullouter Traceinfo on id - | project-away id1 - - * **Distributed tracing**: The smoke test client code sets an HTTP `traceparent` header (per the [W3C Trace Context proposed specification](https://www.w3.org/TR/trace-context-1)), and the `score.py` code logs this header. The query above shows how to surface this value. You can adapt this to your tracing framework. - * **Monitoring**: You can use [Azure Monitor for containers](https://docs.microsoft.com/en-us/azure/azure-monitor/insights/container-insights-overview) to monitor the Azure ML scoring containers' performance, just as for any other container. +### Example pipelines using R + +The build pipeline also supports building and publishing Azure ML pipelines using R to train a model. You can enable it by changing the `build-train-script` pipeline variable to either of the following values: + +* `diabetes_regression_build_train_pipeline_with_r.py` to train a model with R on Azure ML Compute. You'll also need to uncomment (include) the `r-essentials` Conda packages in the environment definition YAML `diabetes_regression/conda_dependencies.yml`. +* `diabetes_regression_build_train_pipeline_with_r_on_dbricks.py` to train a model with R on Databricks. You'll need to manually create a Databricks cluster and attach it to the Azure ML Workspace as a compute resource. Set the DB_CLUSTER_ID and DATABRICKS_COMPUTE_NAME variables in your variable group. + +Example ML pipelines using R have a single step to train a model. They don't demonstrate how to evaluate and register a model. The evaluation and registering techniques are shown only in the Python implementation. + +### Observability and Monitoring + +You can explore aspects of model observability in the solution, such as: + +* **Logging**: Navigate to the Application Insights instance linked to the Azure ML Portal, then go to the Logs (Analytics) pane. The following sample query correlates HTTP requests with custom logs generated in `score.py`. This can be used, for example, to analyze query duration vs. scoring batch size: + + ``` + let Traceinfo=traces + | extend d=parse_json(tostring(customDimensions.Content)) + | project workspace=customDimensions.["Workspace Name"], + service=customDimensions.["Service Name"], + NumberOfPredictions=tostring(d.NumberOfPredictions), + id=tostring(d.RequestId), + TraceParent=tostring(d.TraceParent); + requests + | project timestamp, id, success, resultCode, duration + | join kind=fullouter Traceinfo on id + | project-away id1 + ``` + +* **Distributed tracing**: The smoke test client code sets an HTTP `traceparent` header (per the [W3C Trace Context proposed specification](https://www.w3.org/TR/trace-context-1)), and the `score.py` code logs the header. The query above shows how to surface this value. You can adapt it to your tracing framework. +* **Monitoring**: You can use [Azure Monitor for containers](https://docs.microsoft.com/en-us/azure/azure-monitor/insights/container-insights-overview) to monitor the Azure ML scoring containers' performance. + +### Clean up the example resources + +To remove the resources created for this project, use the [/environment_setup/iac-remove-environment-pipeline.yml](../environment_setup/iac-remove-environment-pipeline.yml) definition or you can just delete the resource group in the [Azure Portal](https://portal.azure.com). + +## Next Steps: Integrating your project + +* Follow the [bootstrap instructions](../bootstrap/README.md) to create a starting point for your project use case. This guide includes information on bringing your own code to this repository template. +* Consider using [Azure Pipelines self-hosted agents](https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/agents?view=azure-devops&tabs=browser#install) to speed up your Azure ML pipeline execution. The Docker container image for the Azure ML pipeline is sizable, and having it cached on the agent between runs can trim several minutes from your runs. + +### Additional Variables and Configuration + +#### More variable options + +There are more variables used in the project. They're defined in two places: one for local execution and one for using Azure DevOps Pipelines. + +For using Azure Pipelines, all other variables are stored in the file `.pipelines/diabetes_regression-variables-template.yml`. Using the default values as a starting point, adjust the variables to suit your requirements. + +In that folder, you'll also find the `config.json` file that we recommend using to provide parameters for training, evaluation, and scoring scripts. The sample parameter that `diabetes_regression` uses is the ridge regression [*alpha* hyperparameter](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html). We don't provide any serializers for this config file. + +#### Local configuration + +For instructions on how to set up a local development environment, refer to the [Development environment setup instructions](development_setup.md). diff --git a/docs/images/ci-build-pipeline-configure.png b/docs/images/ci-build-pipeline-configure.png index d593d1dc..62953b53 100644 Binary files a/docs/images/ci-build-pipeline-configure.png and b/docs/images/ci-build-pipeline-configure.png differ diff --git a/docs/images/create-rm-service-connection.png b/docs/images/create-rm-service-connection.png index 011018d3..e677636a 100644 Binary files a/docs/images/create-rm-service-connection.png and b/docs/images/create-rm-service-connection.png differ diff --git a/docs/images/ml-ws-svc-connection.png b/docs/images/ml-ws-svc-connection.png index 66c3b3f1..baf52e1f 100644 Binary files a/docs/images/ml-ws-svc-connection.png and b/docs/images/ml-ws-svc-connection.png differ diff --git a/docs/images/run-iac-pipeline.png b/docs/images/run-iac-pipeline.png index 15771246..f2549da8 100644 Binary files a/docs/images/run-iac-pipeline.png and b/docs/images/run-iac-pipeline.png differ diff --git a/docs/images/select-iac-pipeline.png b/docs/images/select-iac-pipeline.png index e165ccc8..695b041f 100644 Binary files a/docs/images/select-iac-pipeline.png and b/docs/images/select-iac-pipeline.png differ