Clabon Consulting Ltd

How to control which Azure DevOps tasks are executed based on changed files in a repository.

Introduction

A typical mono repository, or even larger repository containing multiple services or applications may require numerous jobs or tasks to run for a “full” build. Running all of these jobs each time can significantly reduce the ability of development teams to rapidly iterate due to the length of time required for a full run of a CI/CD pipeline. In this article we explore how to detect changes to files in a repository and use this to control which tasks are executed by our Azure DevOps pipelines. By reducing the number of tasks on each run we can speed up the runtime of the pipeline on each committed change, thus increasing the speed with which our team gain feedback on the success or failure of their changes.

Code Samples

Complete code samples for this article can be found on GitHub

1. Setup

To begin we will create a new repository in Azure DevOps with a basic structure As per this branch then we will create our pipeline.

Repository Setup

Firstly, go to your Azure DevOps instance and create a new repository named ChangedFileDemo. Next create the file structure and push this to your new repository using the script below:

New-Item -Path . -Name "ChangedFileDemo" -ItemType "directory"
New-Item -Path ./ChangedFileDemo -Name "azure-pipelines.yml" -ItemType "file" -Value "# Some content so the pipeline can be saved"
New-Item -Path ./ChangedFileDemo -Name "Files" -ItemType "directory"
New-Item -Path ./ChangedFileDemo/Files -Name "example.txt" -ItemType "file" -Value ""
New-Item -Path ./ChangedFileDemo -Name "Services" -ItemType "directory"
New-Item -Path ./ChangedFileDemo/Services -Name "example.txt" -ItemType "file" -Value ""
git init
git add -A
git commit -m "init"
git remote add origin https://<your-devops-username>@dev.azure.com/<your-devops-instance>/<your-devops-project>/_git/ChangedFileDemo
git push -u origin --all

Pipeline Setup

In your Azure Devops project navigate to Pipelines

Pipelines button in Azure Devops

  1. Go to New pipeline

New Pipeline button in Azure Devops

  1. Select Azure Repos Git

Azure Repos Git button in Azure Devops

  1. Select your Repository

Selecting a Repository

  1. Select Existing Azure pipelines YAML file

Existing YAML file button in Azure Devops

  1. Select the existing azure-pipelines.yml file

Selecting YAML pipelines file

  1. Save the pipeline without running

Save button in Azure Devops

2 Detecting Changed Files

Now our basic repository is set up we will look at how we can detect changes. To do this we will make use of Git diff to detect changes in files since the last commit.

git diff --name-status HEAD HEAD^

Here we git diff a few arguments:

  • --name-status shows only the names and status of changed files as opposed to git diff’s full output.
  • HEAD the current commit we are working on
  • HEAD^ the immediate parent of the start of the current branch

As a test, lets modify some files to demonstrate:

# Update File contents
Set-Content Files/example.txt -Value "Update 1"
Set-Content Services/example.txt -Value "Update 1"

# Commit all changes
git commit -am "Demonstrate git diff"

# Run git diff
PS C:\ChangedFileDemo> git diff --name-status HEAD HEAD^    
M       Files/example.txt
M       Services/example.txt

The above output now shows that we have modified (M) two files since the previous commit (HEAD^). Specifically Files/example.txt and Services/example.txt. We can store this as an output for use later:

$changedFiles = git diff --name-status HEAD HEAD^

3 Basic Pipeline

Next we want to see how we can use this in a pipeline. To do this we will update our azure-pipelines.yml with two jobs to read each of our example files and output their contents using basic PowerShell tasks. First we add the pipeline tigger and agent pool to be used:

trigger:
- master

pool:
  vmImage: windows-latest

Next create a job to check for modified files and, if found, publish a variable indicating that files in the path have changed:

jobs:
- job: determine_changes
  displayName: Determine paths changed to understand jobs to run
  steps:
    - powershell: |
        #Get any changed files
        $changedFiles = git diff --name-status HEAD HEAD^ 
        
        # Return $true if Files/ path is in the list of changed files, otherwise $false
        $filesChanged = ((Select-String -InputObject $changedFiles -Pattern "Files/" -AllMatches).Matches.Count -gt 0)
        
        # Return $true if Services/ path is in the list of changed files, otherwise $false
        $servicesChanged = ((Select-String -InputObject $changedFiles -Pattern "Services/" -AllMatches).Matches.Count -gt 0)

        #Set Azure DevOps Variables for use later
        Write-Host "##vso[task.setvariable variable=FilesChanged;isOutput=true;]$filesChanged"
        Write-Host "vso[task.setvariable variable=FilesChanged;isOutput=true;]$filesChanged"
        Write-Host "##vso[task.setvariable variable=ServicesChanged;isOutput=true;]$servicesChanged"
        Write-Host "vso[task.setvariable variable=ServicesChanged;isOutput=true;]$servicesChanged"        
      name: check_modified

When this job has run it will make available to subsequent jobs the variables check_modified.FilesChanged and check_modified.ServicesChanged which we can use to set a condition for those jobs to run like so:

For changes to files in the Files path:

- job: Runs_when_Files_path_changed
  dependsOn: determine_changes
  variables: 
    filesChanged: $[ dependencies.determine_changes.outputs['check_modified.FilesChanged'] ]
  condition: eq(variables.filesChanged, 'True')

and for the Services path:

- job: Runs_when_Services_path_changed
  dependsOn: determine_changes
  variables: 
    filesChanged: $[ dependencies.determine_changes.outputs['check_modified.ServicesChanged'] ]
  condition: eq(variables.filesChanged, 'True')

Putting this all together, with steps to output the file content, included gives us:

trigger:
- master

pool:
  vmImage: windows-latest

jobs:
- job: determine_changes
  displayName: Determine paths changed to understand jobs to run
  steps:
    - powershell: |
        #Get any changed files
        $changedFiles = git diff --name-status HEAD HEAD^ 
        
        # Return $true if Files/ path is in the list of changed files, otherwise $false
        $filesChanged = ((Select-String -InputObject $changedFiles -Pattern "Files/" -AllMatches).Matches.Count -gt 0)
        
        # Return $true if Services/ path is in the list of changed files, otherwise $false
        $servicesChanged = ((Select-String -InputObject $changedFiles -Pattern "Services/" -AllMatches).Matches.Count -gt 0)

        #Set Azure DevOps Variables for use later
        Write-Host "##vso[task.setvariable variable=FilesChanged;isOutput=true;]$filesChanged"
        Write-Host "vso[task.setvariable variable=FilesChanged;isOutput=true;]$filesChanged"
        Write-Host "##vso[task.setvariable variable=ServicesChanged;isOutput=true;]$servicesChanged"
        Write-Host "vso[task.setvariable variable=ServicesChanged;isOutput=true;]$servicesChanged"        
      name: check_modified

- job: Runs_when_Files_path_changed
  dependsOn: determine_changes
  variables: 
    filesChanged: $[ dependencies.determine_changes.outputs['check_modified.FilesChanged'] ]
  condition: eq(variables.filesChanged, 'True')
  steps:
    - powershell: Write-Host (Get-Content Files/example.txt)

- job: Runs_when_Services_path_changed
  dependsOn: determine_changes
  variables: 
    servicesChanged: $[ dependencies.determine_changes.outputs['check_modified.ServicesChanged'] ]
  condition: eq(variables.servicesChanged, 'True')
  steps:
    - powershell: Write-Host (Get-Content Services/example.txt)

Running this now, assuming you havent modified any other files, should show you the determine_changes job completing and both the Runs_when_Files_path_changed and Runs_when_Services_path_changed being skipped as below

Pipeline run with skipped jobs

If we add some content to our example files we should and run this again, if you are working on your master branch and push this it will run automatically, we will see all three jobs complete:

Set-Content .\Services\example.txt -Value "Some Text"
Set-Content .\Files\example.txt -Value "Some Text"

Job Status:

Pipeline run with completed jobs

In its most basic form, that is how we can check for modified files and use this to execute only jobs or tasks which would be affected by those modifications. However, one of the key issues this was meant to address is the scalability, and it is easy to see how our determine_changes could easily become cluttered with multiple repetitions of our variable declaration and outputs. Below I will show you how I resolved this.

Scalability

To better implement the guiding principle of DRY (Don’t repeat Yourself) which we should all do our best to stick to, make our determine_changes job cleaner and make it easier to manage different paths and variables I abstract this out into a config file which declares the job variables such as ServicesChanged, the Path to be checked and allows me to specify a default state for each of these paths.

Config File

First our config file, for this I use JSON as this is simple to work with in PowerShell. My file looks like this:

{
    "jobs": [
        {   "name" : "FilesChanged",
            "path": "Files/",
            "state": false
        },
        {   
            "name" : "ServicesChanged",
            "path": "Services/",
            "state": false
        }
    ]
}

Final Determine Changes Job

jobs:
- job: determine_changes
  displayName: Determine paths changed to understand jobs to run
  steps:
    - powershell: |
        #Get changed files
        $changedFiles = git diff --name-status HEAD HEAD^ 

        #Get config file to determine paths and jobs to check
        $config = Get-Content config.json | ConvertFrom-Json
        

        foreach ($job in $config.jobs)
        {
            $name = $job.name
            $path = $job.path
            $changed = $null

            if ($job.state){
              $changed = $job.state
            } else {
              $changed = ((Select-String -InputObject $changedFiles -Pattern $path -AllMatches).Matches.Count -gt 0)
            }
            Write-Host "##vso[task.setvariable variable=$name;isOutput=true;]$changed"
            Write-Host "vso[task.setvariable variable=$name;isOutput=true;]$changed"
        }        
      name: files_changed

The end result, if we run this with and without modified files is the same in the GUI, however it is much more scalable for us to add, remove or modify the paths and jobs we want to run or map dependencies on.

Multiple Checks?

Finally, what if you want to run a job when one of two or more paths change? I have this requirement when I modify either my Templates or Parameters directories for ARM templates. The simple solution is below:

- job: Run_when_either_changes
  dependsOn: determine_changes
  variables: 
    filesChanged: $[ dependencies.determine_changes.outputs['check_modified.FilesChanged'] ]
    servicesChanged: $[ dependencies.determine_changes.outputs['check_modified.ServicesChanged'] ]
  condition: or(eq(variables.filesChanged, 'True'), eq(variables.servicesChanged, 'True'))
Top