.Net, Azure and occasionally gamedev

Mirror github, gitlab and VSTS repositories

2018/08/31

With git being a decentralized version control system it makes it easy to keep multiple mirrors of your code around.

In this post I will describe an automated way to perform the mirroring using a free VSTS account.

Motivation

I recently decided to switch back to Github as my primary open-source platform.

I still want to keep Gitlab up as a readonly code mirror.

For the last few years I have already maintained a backup of all my open-source repositories inside my private VSTS account.

In the beginning I used to just manually push to gitlab and VSTS whenever I made any changes. Eventually I wrote a small script that I could fire off on the commandline to just auto push all projects to gitlab and VSTS, however it was always running locally as it needed my account details to access VSTS and gitlab.

Now that I have to mirror to three separate platforms I decided to automate the whole process.

Since (in my case) gitlab and VSTS are code-only mirrors of github, I can just do an automated git push to them whenever changes are detected on Github.

This stackoverflow answer got the gist of it right, however it is a bit outdated.

Trivia: In case you didn't know "$env:SYSTEM_ACCESSTOKEN" is a PAT (Personal/Private Access Token) that is auto generated by the build server (but disabled by default) and allows to authenticate against VSTS from inside your builds and releases. To enable it, you have select the "agent job" inside your build or release definition and check the "Allow scripts to access the OAuth token" checkbox under "Additional options".

Using VSTS build to mirror repositories

While it's a pretty neat trick to use the system accesstoken mentioned above it will authenticate you as the build identity. By default the build identity doesn't have the rights to push code changes so you will be greeted by:

You need the Git 'GenericContribute' permission to perform this action

In my opinion this is a good default and since I didn't feel comfortable giving the build server access to push code changes I instead opted to use my own PAT inside VSTS.

The steps I took are thus slightly different than the stackoverflow post suggested:

Create the build definition to mirror repositories

  1. Create an empty build definition with the github repository as the source (this requires Github authentication to be added to the project: "Project settings -> Service connections")
    • Be sure to set "clean" to true and clean at least "sources" in your build definition
  2. On "Triggers" tab check "enable continuous integration" and "batch changes"
    • Also set branch filter to the most commonly pushed branches i.e. master/develop/etc. (you may have multiple, I wish there was an "all branches" option, but there isn't)
  3. (Optional) Under Options set "(date:yyyyMMdd)(rev:.r)" as the build number format. For normal builds I'd use the version number, but for the mirroring process this will just show the date (also a lot better than the default "ever increasing unique number")
  4. Add a powershell task and paste the code below as an inline script

This script assumes that the repository is named the same in github, gitlab and VSTS:

$VstsPAT = "$(VstsPAT)"
$GitlabPAT= "$(GitlabPAT)"

# VSTS projects can have multiple repositories
# I use one project to hold all my mirrored github repositories
$vstsProject = "your-vsts-project-name"
$gitlabUser = "your-gitlab-name"

git branch -r | findstr /v "\->" |  ForEach-Object {$br=$_.TrimStart(); git branch --track $br.TrimStart("origin/") $br}
# When usig Github repositories the name is set to "Github/RepoName", we only care for the repo name
$repoName = "$env:BUILD_REPOSITORY_NAME".split('/')[1]
# remove "https://" because we need to insert the PAT
$vstsRepoUri = "$env:SYSTEM_TEAMFOUNDATIONCOLLECTIONURI".Substring(8) + "$vstsProject/_git/$repoName"
$gitlabRepoUri = "gitlab.com/$gitlabUser/$repoName"

# push all branches to vsts project
git remote add vsts "https://$VstsPAT@$vstsRepoUri"
git branch -r | findstr /v "\->" |  ForEach-Object {
    $br=$_.TrimStart("  origin/")
    Write-Host "Pushing $br to VSTS"
    git push -u vsts $br
}
# don't forget the tags
git push vsts --tags

# cleanup before next or else it try to "[new branch]      vsts/master -> vsts/master"
# which technically doesn't do anything but at least on gitlab the history will state "pushed vsts/master to vsts/master" without any git changes
git remote remove vsts

# push all branches to gitlab
$gitlabUrl = "https://" +$gitlabUser + ":" + "$GitlabPAT@$gitlabRepoUri"
git remote add gitlab $gitlabUrl
git branch -r | findstr /v "\->" |  ForEach-Object {
    $br=$_.TrimStart("  origin/");
    Write-Host "Pushing $br to gitlab";
    git push -u gitlab $br
}
# don't forget the tags
git push gitlab --tags

Fill in both the project name in VSTS (the url can be determined automatically via env:SYSTEM_TEAMFOUNDATIONCOLLECTIONURI) as well as your gitlab username.

Normal build variables are automatically available as environment variables to all tasks, however (as is good practice) secrets should be marked with the "secret" flag . This prevents them from showing up in the logs (they will be replaced with "***") and also doesn't expose them as environment variables.

To access the secrets I use the "$()" notation.

I also didn't just use the secrets flag for my PATs. Instead I actually store them in an azure keyvault and pull then in via the "variable group to keyvault link" feature.

This allows me to securely store the PATs in a Keyvault where I can easily replace them once they expire. This is useful because for every repository I want cloned I have to create a new build definition. Using the keyvault means the PATs are stored in only one place (single source of truth) as opposed to the variable section of every single build.

Now if you (like me) have multiple repositories to clone there is one additional step that makes sense: Wrap the powershell task into a task group by right clicking it and selecting "Create task group".

A task will automatically create "required variables" based on all the used $() notations.

I just set the default values of "VstsPAT" to "(VstsPAT)" and likewise for "GitlabPAT" to "(GitlabPAT)". Everytime I use the task I can optionally override these values (or leave the prefilled values). The prefilled values will resolve the same variables I mentioned above (linked in via the variable group).

Task groups are perfect for shared functionality, so when I clone the build it always links in the taskgroup instead of duplicating the script. Thus if I need to make any change to the script it will be in inside the task group that applies for all builds.

For all other repositories I now just have to clone the build definition and the only required change is the github repository source (and the build definition name, duh).

I now no longer have to worry about backing up to VSTS and gitlab anymore. It will just happen automagically whenever I push to github!

Shortcomings

Branch filter limits

As mentioned in step 2 you have to set a branch filter in the VSTS build definition. The downside is that all other branches (new or not) won't trigger the mirror process.

The powershell code will in fact clone all existing branches in the repository, however the build will only be triggered for changes in the filtered branches (you can add multiple filters, e.g "master" and "develop" or any other branches which you may use often).

The same issue also applies to tags. However, with the primary branches in the filter the mirroring process should be triggered often enough to also mirror the other branches and tags in a timely manner.

Force push isn't supported

(This can be easily fixed by using "git push -f").

Any commits that are force pushed to github and override existing history (e.g. to fix quick mistakes) will break the cloning pipeline.

For now I accept this behaviour as any malicious party that pushes to git won't be able to override gitlab and VSTS. I may revisit this behaviour in the future, though.

(E.g. allow force push to gitlab to keep the public repositories "in sync" while disallowing it to VSTS so I have a backup incase of malicious actions).

tagged as Git, Visual Studio Team Services, VSTS, Gitlab, Github, Azure DevOps