servo/etc/taskcluster/windows
2018-11-19 14:46:43 +01:00
..
.gitignore Initial Windows AMI-building script 2018-10-08 16:44:20 +02:00
base-ami.txt Initial Windows AMI-building script 2018-10-08 16:44:20 +02:00
bootstrap.ps1 Optional Windows Taskcluster dependency should be able to use TLS 1.2 2018-11-06 17:29:43 +01:00
build-ami.py Update MPL license to https (part 1) 2018-11-19 14:46:43 +01:00
first-boot.ps1 Ensuring https for Windows Taskcluster 2018-11-06 13:00:47 +01:00
README.md Taskcluster Windows: add gstreamer 2018-10-08 16:44:23 +02:00

Windows AMIs for Servo on Taskcluster

Unlike Linux tasks on docker-worker where each tasks is executed in a container based on a Docker image provided with the task, Windows tasks on Taskcluster are typically run by generic-worker where tasks are executed directly in the workers environment. So we may want to install some tools globally on the system, to make them available to tasks.

With the AWS provisioner, this means building a custom AMI. We need to boot an instance on a base Windows AMI, install what we need (including generic-worker itself), then take an image of that instance. The worker_types directory in generic-workers repository has some scripts that automate this, in order to make it more reproducible than clicking around. The trick is that a PowerShell script to run on boot can be provided when starting a Windows instance on EC2, and of course AWS has an API.

Building and deploying a new image

  • Install and configure the AWS command-line tool.
  • Make your changes to first-boot.ps1 and/or base-ami.txt.
  • Run python3 build-ami.py. Note that it can take many minutes to complete.
  • Save the administrator password together with the image ID in Servos shared 1Password account, in the Taskcluster Windows AMIs note.
  • In the worker type definition, edit ImageId and DeploymentId.

Note that the new worker type definition will only apply to newly-provisionned workers.

DeploymentId can be any string. It can for example include the image ID. Workers check it between tasks (if checkForNewDeploymentEverySecs since the last check). If it has changed, they shut down in order to leave room for new workers with the new definition.

The EC2 Resources page has red Terminate All Instances button, but that will make any running task fail.

FIXME: possible improvement

  • Have a separate staging worker type to try new AMIs without affecting the production CI
  • Automate cleaning up old, unused AMIs and their backing EBS snapshots
  • Use multiple AWS regions
  • Use the Taskcluster API to automate updating worker type definitions?

Picking a base AMI

Amazon provides an ovewhelming number of different Windows images, so its hard to find whats relevant. Their console might show a paginated view like this:

⇤ ← 1 to 50 of 13,914 AMIs → ⇥

Lets grep through this with the API:

aws ec2 describe-images --owners amazon --filters 'Name=platform,Values=windows' \
    --query 'Images[*].[ImageId,Name,Description]' --output table > /tmp/images
< /tmp/images less -S

It turns out that these images are all based on Windows Server, but their number is explained by the presence of many (all?) combinations of:

  • Multiple OS Version
  • Many available locales
  • Full (a.k.a. with Desktop Experience), or Core
  • Base with only the OS, or multiple flavors with tools like SQL Server pre-installed

If we make some choices and filter the list:

< /tmp/images grep 2016-English-Full-Base | less -S

… we get a much more manageable handlful of images with names like Windows_Server-2016-English-Full-Base-2018.09.15 or other dates.

Lets set base-ami.txt to Windows_Server-2016-English-Full-Base-*, and have build-ami.py pick the most recently-created AMI whose name matches that pattern.