OCP on AWS Local Zones - hands on steps
Installing OpenShift on AWS in existing VPC extending compute nodes to Local Zones quickly.
The steps described here is a hands-on steps to achieve the goal of installing a cluster with customized VPC (existing network), then describing how to deploy an application exposing the ingress with dedicated Application Load Balancer controller. A quick validation will be done from a source outside the cluster, on the metropolitan region deployed the app.
The goal is not to explain each step. If you are looking for more details look at those documents:
- Enhancement Proposal
- OpenShift + LZ documentation
- Research article - Phase-2
- Research article - Phase-0
- Research article - plugin
Table of Contents
- Creating the Network/VPC
- Setup the installer
- Create the install-config.yaml
- Create the manifests
- Setup the AWS LB Operator
- ALB Operator prerequisites
- Install the ALB Operator
- Create the ALB Controller
- Deploy a sample APP on AWS Local Zones
- Deploy sample Local Zone app with persistent storage
Creating the Network/VPC
- Create the Network Stack: VPC and Local Zone subnet
export CLUSTER_REGION=us-east-1
export CLUSTER_NAME=byon-lz415-00
WORKDIR=${PWD}
export INSTALL_DIR=${WORKDIR}/install-${CLUSTER_NAME}
mkdir -p $INSTALL_DIR
TEMPLATES_BASE=https://raw.githubusercontent.com/openshift/installer
TEMPLATES_VERSION=master
TEMPLATES_PATH=upi/aws/cloudformation
TEMPLATE_URL=${TEMPLATES_BASE}/${TEMPLATES_VERSION}/${TEMPLATES_PATH}
TEMPLATES=( "01_vpc.yaml" )
TEMPLATES+=( "01.99_net_local-zone.yaml" )
for TEMPLATE in "${TEMPLATES[@]}"; do
  echo "Updating ${TEMPLATE}"
  curl -sL "${TEMPLATE_URL}/${TEMPLATE}" > "${WORKDIR}/${TEMPLATE}"
done
export STACK_VPC=${CLUSTER_NAME}-vpc
aws cloudformation create-stack \
  --region ${CLUSTER_REGION} \
  --stack-name ${STACK_VPC} \
  --template-body file://${WORKDIR}/01_vpc.yaml \
  --parameters \
    ParameterKey=VpcCidr,ParameterValue="10.0.0.0/16" \
    ParameterKey=AvailabilityZoneCount,ParameterValue=3 \
    ParameterKey=SubnetBits,ParameterValue=12
aws --region $CLUSTER_REGION cloudformation wait stack-create-complete --stack-name ${STACK_VPC}
aws --region $CLUSTER_REGION cloudformation describe-stacks --stack-name ${STACK_VPC}
# Choosing randomly the Local Zone available in the Region
AZ_NAME=$(aws --region $CLUSTER_REGION ec2 describe-availability-zones \
  --filters Name=opt-in-status,Values=opted-in Name=zone-type,Values=local-zone \
  | jq -r .AvailabilityZones[].ZoneName | shuf | head -n1)
AZ_NAME="us-east-1-nyc-1a"
AZ_SUFFIX=$(echo ${AZ_NAME/${CLUSTER_REGION}-/})
AZ_GROUP=$(aws --region $CLUSTER_REGION ec2 describe-availability-zones \
  --filters Name=zone-name,Values=$AZ_NAME \
  | jq -r .AvailabilityZones[].GroupName)
export STACK_LZ=${CLUSTER_NAME}-lz-${AZ_SUFFIX}
export ZONE_GROUP_NAME=${AZ_GROUP}
export VPC_ID=$(aws --region $CLUSTER_REGION cloudformation describe-stacks \
  --stack-name ${STACK_VPC} \
  | jq -r '.Stacks[0].Outputs[] | select(.OutputKey=="VpcId").OutputValue' )
export VPC_RTB_PUB=$(aws --region $CLUSTER_REGION cloudformation describe-stacks \
  --stack-name ${STACK_VPC} \
  | jq -r '.Stacks[0].Outputs[] | select(.OutputKey=="PublicRouteTableId").OutputValue' )
aws --region $CLUSTER_REGION ec2 modify-availability-zone-group \
    --group-name "${ZONE_GROUP_NAME}" \
    --opt-in-status opted-in
aws --region $CLUSTER_REGION cloudformation create-stack --stack-name ${STACK_LZ} \
     --template-body file://${WORKDIR}/01.99_net_local-zone.yaml \
     --parameters \
        ParameterKey=VpcId,ParameterValue="${VPC_ID}" \
        ParameterKey=PublicRouteTableId,ParameterValue="${VPC_RTB_PUB}" \
        ParameterKey=ZoneName,ParameterValue="${AZ_NAME}" \
        ParameterKey=SubnetName,ParameterValue="${CLUSTER_NAME}-public-${AZ_NAME}" \
        ParameterKey=PublicSubnetCidr,ParameterValue="10.0.128.0/20"
aws --region $CLUSTER_REGION cloudformation wait stack-create-complete --stack-name ${STACK_LZ}
aws --region $CLUSTER_REGION cloudformation describe-stacks --stack-name ${STACK_LZ}
mapfile -t SUBNETS < <(aws --region $CLUSTER_REGION cloudformation describe-stacks \
  --stack-name "${STACK_VPC}" \
  | jq -r '.Stacks[0].Outputs[0].OutputValue' | tr ',' '\n')
mapfile -t -O "${#SUBNETS[@]}" SUBNETS < <(aws --region $CLUSTER_REGION cloudformation describe-stacks \
  --stack-name "${STACK_VPC}" \
  | jq -r '.Stacks[0].Outputs[1].OutputValue' | tr ',' '\n')
# get the Local Zone subnetID
export SUBNET_ID=$(aws --region $CLUSTER_REGION cloudformation describe-stacks --stack-name "${STACK_LZ}" \
  | jq -r .Stacks[0].Outputs[0].OutputValue)
echo ${SUBNETS[*]}
SUBNETS+=(${SUBNET_ID})
echo ${SUBNETS[*]}
- Append the LZ Subnet to the Subnets array - available only in the phase-1of Local Zones implementation (not covered on HC blog):
Phase-1 means the installer should discovery the Local Zone subnet by it's ID, parse it and automatically create the Machine Sets for those zones
Setup the installer
Create the install-config.yaml
- create the install-config setting subnet IDs
installing in existing VPC method (phase I)
export BASE_DOMAIN=devcluster.openshift.com
export SSH_PUB_KEY_FILE=$HOME/.ssh/id_rsa.pub
CLUSTER_NAME_VARIANT=${CLUSTER_NAME}1
INSTALL_DIR=${CLUSTER_NAME_VARIANT}
mkdir $INSTALL_DIR
cat <<EOF > ${INSTALL_DIR}/install-config.yaml
apiVersion: v1
publish: External
baseDomain: ${BASE_DOMAIN}
metadata:
  name: "${CLUSTER_NAME_VARIANT}"
platform:
  aws:
    region: ${CLUSTER_REGION}
    subnets:
$(for SB in ${SUBNETS[*]}; do echo "    - $SB"; done)
pullSecret: '$(cat ${PULL_SECRET_FILE} | awk -v ORS= -v OFS= '{$1=$1}1')'
sshKey: |
  $(cat ${SSH_PUB_KEY_FILE})
EOF
- Default (without subnets)
installer creates the VPC and LZ subnets (phase II)
export BASE_DOMAIN=devcluster.openshift.com
export SSH_PUB_KEY_FILE=$HOME/.ssh/id_rsa.pub
INSTALL_DIR=${CLUSTER_NAME}-7
mkdir $INSTALL_DIR
cat <<EOF > ${INSTALL_DIR}/install-config.yaml
apiVersion: v1
publish: External
baseDomain: ${BASE_DOMAIN}
metadata:
  name: "${CLUSTER_NAME}"
platform:
  aws:
    region: ${CLUSTER_REGION}
pullSecret: '$(cat ${PULL_SECRET_FILE} |awk -v ORS= -v OFS= '{$1=$1}1')'
sshKey: |
  $(cat ${SSH_PUB_KEY_FILE})
EOF
- Install Config setting HostedZone when installing in existing VPC with LZ subnets
# 1) Manual action: Create Private Hosted Zone associating to the existing VPC
# 2) Discovery the ID - or populate the variable hostedZone
PHZ_BASE_DOMAIN="${CLUSTER_NAME}.${BASE_DOMAIN}"
hosted_zone_id="$(aws route53 list-hosted-zones-by-name \
            --dns-name "${PHZ_BASE_DOMAIN}" \
            --query "HostedZones[? Config.PrivateZone != \`true\` && Name == \`${PHZ_BASE_DOMAIN}.\`].Id" \
            --output text)"
echo "${hosted_zone_id}"
hostedZone=$(basename "${hosted_zone_id}")
cat <<EOF > ${INSTALL_DIR}/install-config.yaml
apiVersion: v1
publish: External
baseDomain: ${BASE_DOMAIN}
metadata:
  name: "${CLUSTER_NAME}"
platform:
  aws:
    region: ${CLUSTER_REGION}
    hostedZone: $hostedZone
    subnets:
$(for SB in ${SUBNETS[*]}; do echo "    - $SB"; done)
pullSecret: '$(cat ${PULL_SECRET_FILE} |awk -v ORS= -v OFS= '{$1=$1}1')'
sshKey: |
  $(cat ${SSH_PUB_KEY_FILE})
EOF
Create the manifests
export BASE_DOMAIN=devcluster.openshift.com
export SSH_PUB_KEY_FILE=$HOME/.ssh/id_rsa.pub
CLUSTER_NAME_VARIANT=${CLUSTER_NAME}1
INSTALL_DIR=${CLUSTER_NAME_VARIANT}
mkdir $INSTALL_DIR
cat <<EOF > ${INSTALL_DIR}/install-config.yaml
apiVersion: v1
publish: External
baseDomain: ${BASE_DOMAIN}
metadata:
  name: "${CLUSTER_NAME_VARIANT}"
platform:
  aws:
    region: ${CLUSTER_REGION}
    subnets:
$(for SB in ${SUBNETS[*]}; do echo "    - $SB"; done)
compute:
- name: edge
  replicas: 0
pullSecret: '$(cat ${PULL_SECRET_FILE} | awk -v ORS= -v OFS= '{$1=$1}1')'
sshKey: |
  $(cat ${SSH_PUB_KEY_FILE})
EOF
# Installer version/path
export INSTALLER=./openshift-install
export RELEASE="quay.io/openshift-release-dev/ocp-release:4.15.0-ec.2-x86_64"
cp $INSTALL_DIR/install-config.yaml $INSTALL_DIR/install-config.yaml-bkp
# Process the manifests
OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE="$RELEASE" $INSTALLER version
OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE="$RELEASE" $INSTALLER create manifests --dir $INSTALL_DIR
# Review if MTU patch has been created
ls -ls $INSTALL_DIR/manifests/cluster-network-*
yq ea .spec.defaultNetwork $INSTALL_DIR/manifests/cluster-network-03-config.yml
# Review if MachineSet for Local Zone has been created
ls -la $INSTALL_DIR/openshift/99_openshift-cluster-api_worker-machineset*
cat $INSTALL_DIR/openshift/99_openshift-cluster-api_worker-machineset-3.yaml
- Phase-0 only: manual patch to use lower MTU
cat <<EOF > $INSTALL_DIR/manifests/cluster-network-03-config.yml
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  defaultNetwork:
    ovnKubernetesConfig:
      mtu: 1200
EOF
cat $INSTALL_DIR/manifests/cluster-network-03-config.yml
- Create the cluster
OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE="$RELEASE" \
  $INSTALLER create cluster --dir $INSTALL_DIR --log-level=debug
Setup the AWS LB Operator
ALB Operator prerequisites
- The ALB Operator requires the Kubernetes Cluster tag on the VPC:
$ oc logs pod/aws-load-balancer-operator-controller-manager-56664699b4-kjdd4 -n aws-load-balancer-operator
I0207 20:32:30.393782       1 request.go:682] Waited for 1.0400816s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/ingress.operator.openshift.io/v1?timeout=32s
1.6758019517970793e+09  INFO    controller-runtime.metrics  Metrics server is starting to listen    {"addr": "127.0.0.1:8080"}
1.6758019518810503e+09  ERROR   setup   failed to get VPC ID    {"error": "no VPC with tag \"kubernetes.io/cluster/ocp-lz-2nnns\" found"}
main.main
    /remote-source/workspace/app/main.go:133
runtime.main
    /usr/lib/golang/src/runtime/proc.go:250
Tag the VPC:
CLUSTER_ID=$(oc get infrastructures cluster -o jsonpath='{.status.infrastructureName}')
aws ec2 create-tags --resources ${VPC_ID} \
  --tags Key="kubernetes.io/cluster/${CLUSTER_ID}",Value="shared" \
  --region ${CLUSTER_REGION}
Install the ALB Operator
Steps to install Application Load Balancer Operator.
Create the Credentials for the Operator
oc create namespace aws-load-balancer-operator
cat << EOF| oc create -f -
apiVersion: cloudcredential.openshift.io/v1
kind: CredentialsRequest
metadata:
  name: aws-load-balancer-operator
  namespace: openshift-cloud-credential-operator
spec:
  providerSpec:
    apiVersion: cloudcredential.openshift.io/v1
    kind: AWSProviderSpec
    statementEntries:
      - action:
          - ec2:DescribeSubnets
        effect: Allow
        resource: "*"
      - action:
          - ec2:CreateTags
          - ec2:DeleteTags
        effect: Allow
        resource: arn:aws:ec2:*:*:subnet/*
      - action:
          - ec2:DescribeVpcs
        effect: Allow
        resource: "*"
  secretRef:
    name: aws-load-balancer-operator
    namespace: aws-load-balancer-operator
  serviceAccountNames:
    - aws-load-balancer-operator-controller-manager
EOF
Install the Operator from OLM
cat <<EOF | oc create -f -
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: aws-load-balancer-operator
  namespace: aws-load-balancer-operator
spec:
  targetNamespaces:
  - aws-load-balancer-operator
EOF
Create the subscription
cat <<EOF | oc create -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: aws-load-balancer-operator
  namespace: aws-load-balancer-operator
spec:
  channel: stable-v0
  installPlanApproval: Automatic 
  name: aws-load-balancer-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
EOF
Check if the install plan has been automatic approved:
$ oc get installplan -n aws-load-balancer-operator
NAME            CSV                                 APPROVAL    APPROVED
install-qlsxz   aws-load-balancer-operator.v0.2.0   Automatic   true
install-x7vwn   aws-load-balancer-operator.v0.2.0   Automatic   true
Check if the operator has been created correctly:
$ oc get all -n aws-load-balancer-operator
NAME                                                                 READY   STATUS    RESTARTS   AGE
pod/aws-load-balancer-operator-controller-manager-56664699b4-j77js   2/2     Running   0          100s
NAME                                                                    TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/aws-load-balancer-operator-controller-manager-metrics-service   ClusterIP   172.30.57.143   <none>        8443/TCP   12m
NAME                                                            READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/aws-load-balancer-operator-controller-manager   1/1     1            1           12m
NAME                                                                       DESIRED   CURRENT   READY   AGE
replicaset.apps/aws-load-balancer-operator-controller-manager-56664699b4   1         1         1       12m
Create the ALB Controller
Steps to create the ALB Controller.
cat <<EOF | oc create -f -
apiVersion: networking.olm.openshift.io/v1alpha1
kind: AWSLoadBalancerController 
metadata:
  name: cluster 
spec:
  subnetTagging: Auto 
  ingressClass: cloud 
  config:
    replicas: 2 
  enabledAddons: 
    - AWSWAFv2
EOF
Check if the controller was installed
$ oc get all -n aws-load-balancer-operator -l app.kubernetes.io/name=aws-load-balancer-operator
NAME                                                        READY   STATUS    RESTARTS   AGE
pod/aws-load-balancer-controller-cluster-67b6dd6974-6r6tp   1/1     Running   0          43s
pod/aws-load-balancer-controller-cluster-67b6dd6974-vw5vw   1/1     Running   0          43s
NAME                                                              DESIRED   CURRENT   READY   AGE
replicaset.apps/aws-load-balancer-controller-cluster-67b6dd6974   2         2         2       44s
Deploy a sample APP on AWS Local Zones
Deploy a sample application and a service:
cat << EOF | oc create -f -
---
apiVersion: v1
kind: Namespace
metadata:
  name: lz-apps
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: lz-app-nyc-1
  namespace: lz-apps
spec:
  selector:
    matchLabels:
      app: lz-app-nyc-1
  replicas: 1
  template:
    metadata:
      labels:
        app: lz-app-nyc-1
        zone_group: us-east-1-nyc-1
    spec:
      securityContext:
        seccompProfile:
          type: RuntimeDefault
      nodeSelector:
        zone_group: us-east-1-nyc-1
      tolerations:
      - key: "node-role.kubernetes.io/edge"
        operator: "Equal"
        value: ""
        effect: "NoSchedule"
      containers:
        - image: openshift/origin-node
          command:
           - "/bin/socat"
          args:
            - TCP4-LISTEN:8080,reuseaddr,fork
            - EXEC:'/bin/bash -c \"printf \\\"HTTP/1.0 200 OK\r\n\r\n\\\"; sed -e \\\"/^\r/q\\\"\"'
          imagePullPolicy: Always
          name: echoserver
          ports:
            - containerPort: 8080
---
apiVersion: v1
kind: Service 
metadata:
  name:  lz-app-nyc-1 
  namespace: lz-apps
spec:
  ports:
    - port: 80
      targetPort: 8080
      protocol: TCP
  type: NodePort
  selector:
    app: lz-app-nyc-1
EOF
Create the Ingress with the Local Zone subnet:
cat << EOF | oc create -f -
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ingress-lz-nyc-1
  namespace: lz-apps
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: instance
    alb.ingress.kubernetes.io/subnets: ${SUBNET_ID}
  labels:
    zone_group: us-east-1-nyc-1
spec:
  ingressClassName: cloud
  rules:
    - http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: lz-app-nyc-1
                port:
                  number: 80
EOF
Test the endpoint (using ALB DNS):
$ HOST=$(oc get ingress -n lz-apps ingress-lz-nyc-1 --template='{{(index .status.loadBalancer.ingress 0).hostname}}')
$ echo $HOST
k8s-lzapps-ingressl-49a869b572-66443804.us-east-1.elb.amazonaws.com
$ curl $HOST
GET / HTTP/1.1
X-Forwarded-For: 179.181.81.124
X-Forwarded-Proto: http
X-Forwarded-Port: 80
Host: k8s-lzapps-ingressl-13226f2551-de.us-east-1.elb.amazonaws.com
X-Amzn-Trace-Id: Root=1-63e18147-1532a244542b04bc75ffd473
User-Agent: curl/7.61.1
Accept: */*
Test from different [source] locations:
export HOST=k8s-lzapps-ingressl-49a869b572-66443804.us-east-1.elb.amazonaws.com
# [1] NYC (outside AWS backbone)
$ curl -s http://ip-api.com/json/$(curl -s ifconfig.me) |jq -r '[.city, .countryCode]'
[
  "North Bergen",
  "US"
]
$ curl -sw "%{time_namelookup}   %{time_connect}     %{time_starttransfer}    %{time_total}\n" -o /dev/null $HOST
0.001452   0.004079     0.008914    0.009830
# [2] Within the Region (master nodes)
$ oc debug node/$(oc get nodes -l node-role.kubernetes.io/master -o jsonpath={'.items[0].metadata.name'}) -- chroot /host /bin/bash -c "\
hostname; \
curl -s http://ip-api.com/json/\$(curl -s ifconfig.me) |jq -r '[.city, .countryCode]';\
curl -sw \"%{time_namelookup}   %{time_connect}     %{time_starttransfer}    %{time_total}\\n\" -o /dev/null $HOST"
ip-10-0-54-118
[
  "Ashburn",
  "US"
]
0.002068   0.010196     0.019962    0.020985
# [3] London (outside AWS backbone)
$ curl -s http://ip-api.com/json/$(curl -s ifconfig.me) |jq -r '[.city, .countryCode]'
[
  "London",
  "GB"
]
$ curl -sw "%{time_namelookup}   %{time_connect}     %{time_starttransfer}    %{time_total}\n" -o /dev/null $HOST
0.003332   0.099921     0.197535    0.198802
# [4] Brazil
$ curl -s http://ip-api.com/json/$(curl -s ifconfig.me) |jq -r '[.city, .countryCode]'
[
  "Florianópolis",
  "BR"
]
$ curl -sw "%{time_namelookup}   %{time_connect}     %{time_starttransfer}    %{time_total}\n" -o /dev/null $HOST
0.022869   0.187408     0.355456    0.356435
| Server / Client | [1]NYC/US | [2]AWS Region/use1 | [3]London/UK | [4]Brazil | 
|---|---|---|---|---|
| us-east-1-nyc-1a | 0.004079 | 0.010196 | 0.099921 | 0.187408 | 
Deploy sample Local Zone app with persistent storage
To deploy an Application on Local Zone subnet, edge nodes, using persistent storage,
you must check which kind of storage is available on the location. The most locations,
when this article have been written supports only gp2 storage. Although the default
Storage Class on OCP is gp3, so you must set the gp2-csi StorageClass on the deployment
to consume EBS to your pods:
cat <<EOF | oc create -f -
kind: Namespace
apiVersion: v1
metadata:
  name: lz-app-ns
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: lz-pvc
  namespace: lz-app-ns
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: gp2-csi 
  volumeMode: Filesystem
---
apiVersion: apps/v1
kind: Deployment 
metadata:
  name: lz-app
  namespace: lz-app-ns 
spec:
  selector:
    matchLabels:
      app: lz-app
  replicas: 1
  template:
    metadata:
      labels:
        app: lz-app
        machine.openshift.io/zone-group: ${ZONE_GROUP_NAME} 
    spec:
      securityContext:
        seccompProfile:
          type: RuntimeDefault
      nodeSelector: 
        machine.openshift.io/zone-group: ${ZONE_GROUP_NAME}
      tolerations: 
      - key: "node-role.kubernetes.io/edge"
        operator: "Equal"
        value: ""
        effect: "NoSchedule"
      containers:
        - image: openshift/origin-node
          command:
           - "/bin/socat"
          args:
            - TCP4-LISTEN:8080,reuseaddr,fork
            - EXEC:'/bin/bash -c \"printf \\\"HTTP/1.0 200 OK\r\n\r\n\\\"; sed -e \\\"/^\r/q\\\"\"'
          imagePullPolicy: Always
          name: echoserver
          ports:
            - containerPort: 8080
          volumeMounts:
            - mountPath: "/mnt/storage"
              name: data
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: lz-pvc
EOF
cat <<EOF | oc create -f -
apiVersion: v1
kind: Service 
metadata:
  name: lz-app-svc
  namespace: lz-app-ns
spec:
  ports:
    - port: 80
      targetPort: 8080
      protocol: TCP
  type: NodePort
  selector: 
    app: lz-app
EOF
echo $ZONE_GROUP_NAME
oc get machines -n openshift-machine-api
oc get pvc -n lz-app-ns
oc get all -n lz-app-ns