mirror of
https://github.com/servo/servo.git
synced 2025-08-06 06:00:15 +01:00
Initial Windows AMI-building script
This commit is contained in:
parent
eb13ddc00c
commit
e0d6cb8a60
6 changed files with 252 additions and 1 deletions
|
@ -35,7 +35,7 @@ class DecisionTask:
|
||||||
"0a7d012ce444d62ffb9e7f06f0c52fedc24b68c2060711b313263367f7272d9d"
|
"0a7d012ce444d62ffb9e7f06f0c52fedc24b68c2060711b313263367f7272d9d"
|
||||||
|
|
||||||
def __init__(self, *, index_prefix="garbage.servo-decisionlib", task_name_template="%s",
|
def __init__(self, *, index_prefix="garbage.servo-decisionlib", task_name_template="%s",
|
||||||
worker_type="github-worker", docker_image_cache_expiry="1 year",
|
worker_type="github-worker", docker_image_cache_expiry="1 month",
|
||||||
routes_for_all_subtasks=None, scopes_for_all_subtasks=None):
|
routes_for_all_subtasks=None, scopes_for_all_subtasks=None):
|
||||||
self.task_name_template = task_name_template
|
self.task_name_template = task_name_template
|
||||||
self.index_prefix = index_prefix
|
self.index_prefix = index_prefix
|
||||||
|
|
1
etc/taskcluster/windows/.gitignore
vendored
Normal file
1
etc/taskcluster/windows/.gitignore
vendored
Normal file
|
@ -0,0 +1 @@
|
||||||
|
*.id_rsa
|
88
etc/taskcluster/windows/README.md
Normal file
88
etc/taskcluster/windows/README.md
Normal file
|
@ -0,0 +1,88 @@
|
||||||
|
# Windows AMIs for Servo on Taskcluster
|
||||||
|
|
||||||
|
Unlike Linux tasks on `docker-worker` where each tasks is executed in a container
|
||||||
|
based on a Docker image provided with the task,
|
||||||
|
Windows tasks on Taskcluster are typically run by `generic-worker`
|
||||||
|
where tasks are executed directly in the worker’s environment.
|
||||||
|
So we may want to install some tools globally on the system, to make them available to tasks.
|
||||||
|
|
||||||
|
With the [AWS provisioner], this means building a custom AMI.
|
||||||
|
We need to boot an instance on a base Windows AMI,
|
||||||
|
install what we need (including `generic-worker` itself),
|
||||||
|
then take an image of that instance.
|
||||||
|
The [`worker_types`] directory in `generic-worker`’s repository
|
||||||
|
has some scripts that automate this,
|
||||||
|
in order to make it more reproducible than clicking around.
|
||||||
|
The trick is that a PowerShell script to run on boot can be provided
|
||||||
|
when starting a Windows instance on EC2, and of course AWS has an API.
|
||||||
|
|
||||||
|
[AWS provisioner]: https://docs.taskcluster.net/docs/reference/integrations/aws-provisioner/references/api
|
||||||
|
[`worker_types`]: https://github.com/taskcluster/generic-worker/blob/master/worker_types/
|
||||||
|
|
||||||
|
|
||||||
|
## Building and deploying a new image
|
||||||
|
|
||||||
|
* Install and configure the [AWS command-line tool].
|
||||||
|
* Make your changes to `first-boot.ps1` and/or `base-ami.txt`.
|
||||||
|
* Run `python3 build-ami.py`. Note that it can take many minutes to complete.
|
||||||
|
* Save the administrator password together with the image ID
|
||||||
|
in Servo’s shared 1Password account, in the *Taskcluster Windows AMIs* note.
|
||||||
|
* In the [worker type definition], edit `ImageId` and `DeploymentId`.
|
||||||
|
|
||||||
|
Note that the new worker type definition will only apply to newly-provisionned workers.
|
||||||
|
|
||||||
|
`DeploymentId` can be any string. It can for example include the image ID.
|
||||||
|
Workers check it between tasks (if `checkForNewDeploymentEverySecs` since the last check).
|
||||||
|
If it has changed, they shut down in order to leave room for new workers with the new definition.
|
||||||
|
|
||||||
|
The [EC2 Resources] page has red *Terminate All Instances* button,
|
||||||
|
but that will make any running task fail.
|
||||||
|
|
||||||
|
[AWS command-line tool]: https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html
|
||||||
|
[worker type definition]: https://tools.taskcluster.net/aws-provisioner/servo-win2016/edit
|
||||||
|
[EC2 Resources]: https://tools.taskcluster.net/aws-provisioner/servo-win2016/resources
|
||||||
|
|
||||||
|
|
||||||
|
## FIXME: possible improvement
|
||||||
|
|
||||||
|
* Have a separate staging worker type to try new AMIs without affecting the production CI
|
||||||
|
* Automate cleaning up old, unused AMIs
|
||||||
|
* Use multiple AWS regions
|
||||||
|
* Use the Taskcluster API to automate updating worker type definitions?
|
||||||
|
|
||||||
|
|
||||||
|
## Picking a base AMI
|
||||||
|
|
||||||
|
Amazon provides an ovewhelming number of different Windows images,
|
||||||
|
so it’s hard to find what’s relevant.
|
||||||
|
Their console might show a paginated view like this:
|
||||||
|
|
||||||
|
> ⇤ ← 1 to 50 of 13,914 AMIs → ⇥
|
||||||
|
|
||||||
|
Let’s grep through this with the API:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
aws ec2 describe-images --owners amazon --filters 'Name=platform,Values=windows' \
|
||||||
|
--query 'Images[*].[ImageId,Name,Description]' --output table > /tmp/images
|
||||||
|
< /tmp/images less -S
|
||||||
|
```
|
||||||
|
|
||||||
|
It turns out that these images are all based on Windows Server,
|
||||||
|
but their number is explained by the presence of many (all?) combinations of:
|
||||||
|
|
||||||
|
* Multiple OS Version
|
||||||
|
* Many available locales
|
||||||
|
* *Full* (a.k.a. *with Desktop Experience*), or *Core*
|
||||||
|
* *Base* with only the OS, or multiple flavors with tools like SQL Server pre-installed
|
||||||
|
|
||||||
|
If we make some choices and filter the list:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
< /tmp/images grep 2016-English-Full-Base | less -S
|
||||||
|
```
|
||||||
|
|
||||||
|
… we get a much more manageable handlful of images with names like
|
||||||
|
`Windows_Server-2016-English-Full-Base-2018.09.15` or other dates.
|
||||||
|
|
||||||
|
Let’s set `base-ami.txt` to `Windows_Server-2016-English-Full-Base-*`,
|
||||||
|
and have `build-ami.py` pick the most recently-created AMI whose name matches that pattern.
|
1
etc/taskcluster/windows/base-ami.txt
Normal file
1
etc/taskcluster/windows/base-ami.txt
Normal file
|
@ -0,0 +1 @@
|
||||||
|
Windows_Server-2016-English-Full-Base-*
|
116
etc/taskcluster/windows/build-ami.py
Executable file
116
etc/taskcluster/windows/build-ami.py
Executable file
|
@ -0,0 +1,116 @@
|
||||||
|
#!/usr/bin/python3
|
||||||
|
|
||||||
|
# This Source Code Form is subject to the terms of the Mozilla Public
|
||||||
|
# License, v. 2.0. If a copy of the MPL was not distributed with this
|
||||||
|
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
|
||||||
|
|
||||||
|
import os
|
||||||
|
import json
|
||||||
|
import datetime
|
||||||
|
import subprocess
|
||||||
|
|
||||||
|
|
||||||
|
REGION = "us-west-2"
|
||||||
|
WORKER_TYPE = "servo-win2016"
|
||||||
|
AWS_PROVISIONER_USER_ID = "692406183521"
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
base_ami_pattern = read_file("base-ami.txt").strip()
|
||||||
|
base_ami = most_recent_ami(base_ami_pattern)
|
||||||
|
print("Starting an instance with base image:", base_ami["ImageId"], base_ami["Name"])
|
||||||
|
|
||||||
|
key_name = "%s_%s" % (WORKER_TYPE, REGION)
|
||||||
|
key_filename = key_name + ".id_rsa"
|
||||||
|
ec2("delete-key-pair", "--key-name", key_name)
|
||||||
|
result = ec2("create-key-pair", "--key-name", key_name)
|
||||||
|
write_file(key_filename, result["KeyMaterial"].encode("utf-8"))
|
||||||
|
|
||||||
|
user_data = b"<powershell>\n%s\n</powershell>" % read_file("first-boot.ps1")
|
||||||
|
result = ec2(
|
||||||
|
"run-instances", "--image-id", base_ami["ImageId"],
|
||||||
|
"--key-name", key_name,
|
||||||
|
"--user-data", user_data,
|
||||||
|
"--instance-type", "c4.xlarge",
|
||||||
|
"--block-device-mappings",
|
||||||
|
"DeviceName=/dev/sda1,Ebs={VolumeSize=75,DeleteOnTermination=true,VolumeType=gp2}",
|
||||||
|
"--instance-initiated-shutdown-behavior", "stop"
|
||||||
|
)
|
||||||
|
assert len(result["Instances"]) == 1
|
||||||
|
instance_id = result["Instances"][0]["InstanceId"]
|
||||||
|
|
||||||
|
ec2("create-tags", "--resources", instance_id, "--tags",
|
||||||
|
"Key=Name,Value=TC %s base instance" % WORKER_TYPE)
|
||||||
|
|
||||||
|
print("Waiting for password data to be available…")
|
||||||
|
ec2_wait("password-data-available", "--instance-id", instance_id)
|
||||||
|
result = ec2("get-password-data", "--instance-id", instance_id,
|
||||||
|
"--priv-launch-key", here(key_filename))
|
||||||
|
print("Administrator password:", result["PasswordData"])
|
||||||
|
|
||||||
|
print("Waiting for the instance to finish executing first-boot.ps1 and shut down…")
|
||||||
|
ec2_wait("instance-stopped", "--instance-id", instance_id)
|
||||||
|
|
||||||
|
now = datetime.datetime.utcnow().strftime("%Y-%m-%d_%H.%M.%S")
|
||||||
|
image_id = ec2("create-image", "--instance-id", instance_id,
|
||||||
|
"--name", "TC %s %s" % (WORKER_TYPE, now))["ImageId"]
|
||||||
|
print("Started creating image with ID %s …" % image_id)
|
||||||
|
|
||||||
|
ec2_wait("image-available", "--image-ids", image_id)
|
||||||
|
ec2("modify-image-attribute", "--image-id", image_id,
|
||||||
|
"--launch-permission", "Add=[{UserId=%s}]" % AWS_PROVISIONER_USER_ID)
|
||||||
|
|
||||||
|
print("Image available. Terminating the temporary instance…")
|
||||||
|
ec2("terminate-instances", "--instance-ids", instance_id)
|
||||||
|
|
||||||
|
|
||||||
|
def most_recent_ami(name_pattern):
|
||||||
|
result = ec2(
|
||||||
|
"describe-images", "--owners", "amazon",
|
||||||
|
"--filters", "Name=platform,Values=windows", b"Name=name,Values=" + name_pattern,
|
||||||
|
)
|
||||||
|
return max(result["Images"], key=lambda x: x["CreationDate"])
|
||||||
|
|
||||||
|
|
||||||
|
def ec2_wait(*args):
|
||||||
|
# https://docs.aws.amazon.com/cli/latest/reference/ec2/wait/password-data-available.html
|
||||||
|
# “It will poll every 15 seconds until a successful state has been reached.
|
||||||
|
# This will exit with a return code of 255 after 40 failed checks.”
|
||||||
|
while True:
|
||||||
|
try:
|
||||||
|
return ec2("wait", *args)
|
||||||
|
except subprocess.CalledProcessError as err:
|
||||||
|
if err.returncode != 255:
|
||||||
|
raise
|
||||||
|
|
||||||
|
|
||||||
|
def try_ec2(*args):
|
||||||
|
try:
|
||||||
|
return ec2(*args)
|
||||||
|
except subprocess.CalledProcessError:
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def ec2(*args):
|
||||||
|
args = ["aws", "ec2", "--region", REGION, "--output", "json"] + list(args)
|
||||||
|
output = subprocess.check_output(args)
|
||||||
|
if output:
|
||||||
|
return json.loads(output)
|
||||||
|
|
||||||
|
|
||||||
|
def read_file(filename):
|
||||||
|
with open(here(filename), "rb") as f:
|
||||||
|
return f.read()
|
||||||
|
|
||||||
|
|
||||||
|
def write_file(filename, contents):
|
||||||
|
with open(here(filename), "wb") as f:
|
||||||
|
f.write(contents)
|
||||||
|
|
||||||
|
|
||||||
|
def here(filename, base=os.path.dirname(__file__)):
|
||||||
|
return os.path.join(base, filename)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
45
etc/taskcluster/windows/first-boot.ps1
Normal file
45
etc/taskcluster/windows/first-boot.ps1
Normal file
|
@ -0,0 +1,45 @@
|
||||||
|
Start-Transcript -Path "C:\first_boot.txt"
|
||||||
|
|
||||||
|
Get-ChildItem Env: | Out-File "C:\install_env.txt"
|
||||||
|
|
||||||
|
# use TLS 1.2 (see bug 1443595)
|
||||||
|
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
|
||||||
|
|
||||||
|
# For making http requests
|
||||||
|
$client = New-Object system.net.WebClient
|
||||||
|
$shell = new-object -com shell.application
|
||||||
|
|
||||||
|
# Download a zip file and extract it
|
||||||
|
function Expand-ZIPFile($file, $destination, $url)
|
||||||
|
{
|
||||||
|
$client.DownloadFile($url, $file)
|
||||||
|
$zip = $shell.NameSpace($file)
|
||||||
|
foreach($item in $zip.items())
|
||||||
|
{
|
||||||
|
$shell.Namespace($destination).copyhere($item)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
# Open up firewall for livelog (both PUT and GET interfaces)
|
||||||
|
New-NetFirewallRule -DisplayName "Allow livelog PUT requests" `
|
||||||
|
-Direction Inbound -LocalPort 60022 -Protocol TCP -Action Allow
|
||||||
|
New-NetFirewallRule -DisplayName "Allow livelog GET requests" `
|
||||||
|
-Direction Inbound -LocalPort 60023 -Protocol TCP -Action Allow
|
||||||
|
|
||||||
|
# Install generic-worker and dependencies
|
||||||
|
md C:\generic-worker
|
||||||
|
$client.DownloadFile("https://github.com/taskcluster/generic-worker/releases/download" +
|
||||||
|
"/v10.11.3/generic-worker-windows-amd64.exe", "C:\generic-worker\generic-worker.exe")
|
||||||
|
$client.DownloadFile("https://github.com/taskcluster/livelog/releases/download" +
|
||||||
|
"/v1.1.0/livelog-windows-amd64.exe", "C:\generic-worker\livelog.exe")
|
||||||
|
Expand-ZIPFile -File "C:\nssm-2.24.zip" -Destination "C:\" `
|
||||||
|
-Url "http://www.nssm.cc/release/nssm-2.24.zip"
|
||||||
|
Start-Process C:\generic-worker\generic-worker.exe -ArgumentList (
|
||||||
|
"install service --nssm C:\nssm-2.24\win64\nssm.exe " +
|
||||||
|
"--config C:\generic-worker\generic-worker.config"
|
||||||
|
) -Wait -NoNewWindow -PassThru `
|
||||||
|
-RedirectStandardOutput C:\generic-worker\install.log `
|
||||||
|
-RedirectStandardError C:\generic-worker\install.err
|
||||||
|
|
||||||
|
# Now shutdown, in preparation for creating an image
|
||||||
|
shutdown -s
|
Loading…
Add table
Add a link
Reference in a new issue