HyperShift CCM Managed Security Groups Lab

This guide includes hands-on steps to explore HyperShift to:

Build HyperShift binary
Build/BYO HyperShift control plane operator image for custom installs
Install HyperShift using custom images with feature set TechPreviewNoUpgrade
Install hosted cluster with feature set TechPreviewNoUpgrade
Uninstall HyperShift operator

Building

# building the binary
make

# building the operator (control-plane-operator) image
export REGISTRY=${REGISTRY:-quay.io/mrbraga}
make docker-build IMG=${REGISTRY}/hypershift-controlplane-operator:devel

Create Layered Hosted Cluster

Prerequisites: - OCP self-managed - KUBECONFIG variable from self-managed exported

Steps to create nested cluster to allow flexibility when destroying layers:

OCP Self Managed -> HCP operator -> Hosted -> HCP Operator -> e2e

# Globals
export AWS_CREDS="$AWS_SHARED_CREDENTIALS_FILE"
export AWS_DEFAULT_REGION=us-east-1
export CLUSTER_BASE_DOMAIN=splat.devcluster.openshift.com
export PULL_SECRET_FILE="${HOME}/.openshift/pull-secret-latest.json"
export SSH_PUB_KEY_FILE=$HOME/.ssh/id_rsa.pub

# Create OIDC generic bucket
export OIDC_BUCKET_NAME="hcp-e2e-oidc"
# Setup Bucket for OIDC discovery documents
bucket_policy_file=${OIDC_BUCKET_NAME}-oidc-workload-clusters_policy.json
aws s3api create-bucket --bucket ${OIDC_BUCKET_NAME}
aws s3api delete-public-access-block --bucket ${OIDC_BUCKET_NAME}
echo '{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": "*",
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::${OIDC_BUCKET_NAME}/*"
    }
  ]
}' | envsubst > ${bucket_policy_file}
aws s3api put-bucket-policy --bucket ${OIDC_BUCKET_NAME} --policy file://${bucket_policy_file}


#> Install Hosted Cluster to be used as management cluster (Layered)
export CLUSTER_PREFIX=hcp-e2e-v5

# Hypershift operator must be stable (no TP, development, etc)
./bin/hypershift install \
    --oidc-storage-provider-s3-bucket-name="${OIDC_BUCKET_NAME}" \
    --oidc-storage-provider-s3-credentials="${AWS_CREDS}" \
    --oidc-storage-provider-s3-region="${AWS_DEFAULT_REGION}" \
    --tech-preview-no-upgrade=true


# Create hosted cluster to host the management cluster
OCP_RELEASE_IMAGE=quay.io/openshift-release-dev/ocp-release:4.22.0-ec.1-x86_64

./bin/hypershift create cluster aws \
  --name="${CLUSTER_PREFIX}" \
  --region="${AWS_DEFAULT_REGION}" \
  --node-pool-replicas=2 \
  --base-domain="${CLUSTER_BASE_DOMAIN}" \
  --pull-secret="${PULL_SECRET_FILE}" \
  --aws-creds="${AWS_CREDS}" \
  --ssh-key="${SSH_PUB_KEY_FILE}" \
  --release-image="${OCP_RELEASE_IMAGE}" \
  --feature-set=TechPreviewNoUpgrade

# Wait the cluster to be installed
 oc get hostedclusters -A -w

# Extract the kubeconfig for the HC/mgr
./bin/hypershift create kubeconfig --name ${CLUSTER_PREFIX} > kubeconfig-${CLUSTER_PREFIX}

export KUBECONFIG_OLD=$KUBECONFIG
export KUBECONFIG=$PWD/kubeconfig-${CLUSTER_PREFIX}

# Ensure that the cluster is stable:
oc get co -w

BYO HyperShift Images

Step 1: Build the Control-Plane-Operator Image

export REGISTRY=${REGISTRY:-quay.io/mrbraga}
export TAG="feat-ccm-nlb-sg-$(git rev-parse --short HEAD)"
export CONTROL_PLANE_IMAGE=${REGISTRY}/hypershift-control-plane-operator:${TAG}

podman build -f Dockerfile.control-plane -t ${CONTROL_PLANE_IMAGE} .

Step 2: Push the Image to Your Registry

podman login quay.io
podman push ${CONTROL_PLANE_IMAGE}

Step 3: Install HyperShift Operator with Custom Images

./bin/hypershift install \
    --oidc-storage-provider-s3-bucket-name="${OIDC_BUCKET_NAME}" \
    --oidc-storage-provider-s3-credentials="${AWS_CREDS}" \
    --oidc-storage-provider-s3-region="${AWS_DEFAULT_REGION}" \
    --tech-preview-no-upgrade=true \
    --additional-operator-env-vars CONTROL_PLANE_OPERATOR_IMAGE=${CONTROL_PLANE_IMAGE} \
    --development

# Scale up hypershift (--development flag does not start operator)

Step 4: Create Layered Hosted Cluster with Custom Operator

Note 1: Use -control-plane-operator-image only when you are building it. Note 2: Use --feature-set=TechPreviewNoUpgrade only when you want to change the feature set

HOSTED_CLUSTER_NAME="${CLUSTER_PREFIX}-hc1"

./bin/hypershift create cluster aws \
  --name="${HOSTED_CLUSTER_NAME}" \
  --region="${AWS_DEFAULT_REGION}" \
  --node-pool-replicas=2 \
  --base-domain="${CLUSTER_BASE_DOMAIN}" \
  --pull-secret="${PULL_SECRET_FILE}" \
  --aws-creds="${AWS_CREDS}" \
  --ssh-key="${SSH_PUB_KEY_FILE}" \
  --release-image="${OCP_RELEASE_IMAGE}" \
  --control-plane-operator-image="${CONTROL_PLANE_IMAGE}" \
  --feature-set=TechPreviewNoUpgrade

# Check the cluster information:
oc get --namespace clusters hostedclusters

# When completed, extract the credentials for workload cluster:
./bin/hypershift create kubeconfig --name ${HOSTED_CLUSTER_NAME} > kubeconfig-${HOSTED_CLUSTER_NAME}

# kubeconfig for management cluster
export KUBECONFIG_MGR=$KUBECONFIG

# kubeconfig for workload cluster
export KUBECONFIG=$PWD/kubeconfig-${HOSTED_CLUSTER_NAME}

Testing

CCM Upstream Tests

cd ${PATH_TO_CCM}
make e2e.test

./e2e.test --ginkgo.v --ginkgo.focus="loadbalancer NLB internal should be reachable with hairpinning traffic"
./e2e.test --ginkgo.v --ginkgo.focus="loadbalancer NLB should be reachable with target-node-labels"

HyperShift e2e

Reference: HyperShift e2e docs

make e2e

bin/test-e2e \
    -test.v \
    -test.run=TestCCMCreateCluster/Main/When_feature_set_is_TechPreviewNoUpgrade/LoadBalancer_service_should_have_security_groups_attached \
    -test.timeout=2h \
    --e2e.aws-region=${AWS_DEFAULT_REGION} \
    --e2e.aws-credentials-file="${AWS_SHARED_CREDENTIALS_FILE}" \
    --e2e.pull-secret-file="${PULL_SECRET_FILE}" \
    --e2e.base-domain="${CLUSTER_BASE_DOMAIN}" \
    --e2e.platform=AWS \
    --e2e.latest-release-image="${OCP_RELEASE_IMAGE}" \
    --e2e.aws-oidc-s3-bucket-name="${OIDC_BUCKET_NAME}" \
    --e2e.aws-oidc-s3-region="${AWS_DEFAULT_REGION}"

Alternative test target with custom CPO image:

bin/test-e2e \
    -test.v --ginkgo.vv \
    -test.run=TestCreateCluster/Main/AWSCCMWithCustomizations \
    -test.timeout=30m \
    --e2e.platform=AWS \
    --e2e.aws-region=${AWS_DEFAULT_REGION} \
    --e2e.aws-credentials-file="${AWS_SHARED_CREDENTIALS_FILE}" \
    --e2e.pull-secret-file="${PULL_SECRET_FILE}" \
    --e2e.base-domain="${CLUSTER_BASE_DOMAIN}" \
    --e2e.aws-oidc-s3-bucket-name="${OIDC_BUCKET_NAME}" \
    --e2e.control-plane-operator-image="${CONTROL_PLANE_IMAGE}"

Manual NLB Verification

Create a test LoadBalancer Service in the guest cluster and verify security groups:

KUBECONFIG_HC=$PWD/kubeconfig-${HOSTED_CLUSTER_NAME}

oc --kubeconfig=${KUBECONFIG_HC} create namespace test-ccm-nlb-sg

oc --kubeconfig=${KUBECONFIG_HC} apply -f - <<EOF
apiVersion: v1
kind: Service
metadata:
  name: test-nlb-service
  namespace: test-ccm-nlb-sg
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
spec:
  type: LoadBalancer
  selector:
    app: test
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP
EOF

oc --kubeconfig=${KUBECONFIG_HC} get svc -n test-ccm-nlb-sg test-nlb-service -w

Once the LoadBalancer has an external hostname, verify security groups are attached:

LB_HOSTNAME=$(oc --kubeconfig=${KUBECONFIG_HC} get svc -n test-ccm-nlb-sg test-nlb-service \
  -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')

LB_NAME=$(echo ${LB_HOSTNAME} | cut -d'.' -f1 | rev | cut -d'-' -f2- | rev)

aws elbv2 describe-load-balancers --names ${LB_NAME} --region ${AWS_DEFAULT_REGION} \
  --query 'LoadBalancers[0].SecurityGroups'

Cleanup:

oc --kubeconfig=${KUBECONFIG_HC} delete svc test-nlb-service -n test-ccm-nlb-sg
oc --kubeconfig=${KUBECONFIG_HC} delete namespace test-ccm-nlb-sg

Iteration Workflow

When iterating on CPO code changes without recreating the cluster:

# Rebuild with a new tag
export TAG="feat-ccm-nlb-sg-$(git rev-parse --short HEAD)"
export CONTROL_PLANE_IMAGE=${REGISTRY}/hypershift-control-plane-operator:${TAG}

podman build -f Dockerfile.control-plane -t ${CONTROL_PLANE_IMAGE} .
podman push ${CONTROL_PLANE_IMAGE}

# Patch the running HostedCluster to use the new image
oc patch hostedcluster ${HOSTED_CLUSTER_NAME} -n clusters --type=merge \
  -p "{\"spec\":{\"controlPlaneOperatorImage\":\"${CONTROL_PLANE_IMAGE}\"}}"

Destroy HyperShift Operator

# Render the manifests that were installed
bin/hypershift install render > hypershift-manifests.yaml

# delete the controller
oc delete ns hypershift

# Delete using the rendered manifests
oc delete -f hypershift-manifests.yaml

Troubleshooting

Verify CPO Image in Use

oc get pods -n clusters-${HOSTED_CLUSTER_NAME} -l app=control-plane-operator

oc get deployment -n clusters-${HOSTED_CLUSTER_NAME} \
  -l hypershift.openshift.io/control-plane-component \
  -o jsonpath='{.items[*].spec.template.spec.containers[*].image}'

Verify NLBSecurityGroupMode in CCM Config

oc get configmap aws-cloud-config -n clusters-${HOSTED_CLUSTER_NAME} \
  -o jsonpath='{.data.aws\.conf}' | grep NLBSecurityGroupMode

IAM Permissions for Control Plane Components

Extract credentials from the CCM pod to validate IAM permissions:

CCM_POD=$(oc get pod -l app=cloud-controller-manager \
  -n clusters-${HOSTED_CLUSTER_NAME} \
  -o jsonpath='{.items[0].metadata.name}')

COMPONENT_IAM_ROLE=$(oc exec ${CCM_POD} \
  -c cloud-controller-manager \
  -n clusters-${HOSTED_CLUSTER_NAME} \
  -- cat /etc/kubernetes/secrets/cloud-provider/credentials \
  | grep role_arn | awk -F"= " '{print$2}')

TOKEN_PATH=$(oc exec ${CCM_POD} \
  -c cloud-controller-manager \
  -n clusters-${HOSTED_CLUSTER_NAME} \
  -- cat /etc/kubernetes/secrets/cloud-provider/credentials \
  | grep web_identity | awk -F"= " '{print$2}')

COMPONENT_TOKEN=$(oc exec ${CCM_POD} \
  -c cloud-controller-manager \
  -n clusters-${HOSTED_CLUSTER_NAME} \
  -- cat ${TOKEN_PATH})

Assume the role and get temporary credentials:

aws sts assume-role-with-web-identity \
    --role-arn "${COMPONENT_IAM_ROLE}" \
    --web-identity-token "${COMPONENT_TOKEN}" \
    --role-session-name my-session \
    --query "Credentials.[AccessKeyId,SecretAccessKey,SessionToken]" \
    --output text \
    | awk '{print "[default]\naws_region = us-east-1\naws_access_key_id = "$1"\naws_secret_access_key = "$2"\naws_session_token = "$3}' \
    | tee ./workload-creds.conf

Verify identity and simulate permissions:

AWS_SHARED_CREDENTIALS_FILE=$PWD/workload-creds.conf \
  aws sts get-caller-identity

aws iam simulate-principal-policy \
    --policy-source-arn ${COMPONENT_IAM_ROLE} \
    --action-names \
      "elasticloadbalancing:DescribeLoadBalancers" \
      "elasticloadbalancing:DescribeTargetGroupAttributes" \
    | jq -r '(["EVAL_ACTION","EVAL_DECISION"] | (., map(length*"-"))),
             (.EvaluationResults[] | [.EvalActionName, .EvalDecision ])
             | @tsv' | column -t

Example output:

EVAL_ACTION                                         EVAL_DECISION
-----------                                         -------------
elasticloadbalancing:DescribeLoadBalancers          allowed
elasticloadbalancing:DescribeTargetGroupAttributes  implicitDeny

CCM OTE and Hypershift

This change requires merged PR to work as expected: https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/464

Creating a HyperShift hosted cluster with NLB ingress

Some tests (e.g., should have security groups attached to default ingress controller NLB) require the default ingress controller to use NLB type. By default, HyperShift creates hosted clusters with a Classic Load Balancer (CLB) for ingress.

To enable NLB, patch the HostedCluster's spec.configuration.ingress after creation:

# Create the hosted cluster with TechPreview (required for AWSServiceLBNetworkSecurityGroup)
hypershift create cluster aws \
  --name="${HOSTED_CLUSTER_NAME}" \
  --region="${AWS_DEFAULT_REGION}" \
  --node-pool-replicas=3 \
  --base-domain="${CLUSTER_BASE_DOMAIN}" \
  --pull-secret="${PULL_SECRET_FILE}" \
  --aws-creds="${AWS_CREDS}" \
  --ssh-key="${SSH_PUB_KEY_FILE}" \
  --release-image="${OCP_RELEASE_IMAGE}" \
  --feature-set=TechPreviewNoUpgrade

# Patch the HostedCluster to use NLB for the default ingress controller
oc --kubeconfig "$HYPERSHIFT_MANAGEMENT_CLUSTER_KUBECONFIG" patch hostedcluster "${HOSTED_CLUSTER_NAME}" -n clusters --type=merge -p '
{
  "spec": {
    "configuration": {
      "ingress": {
        "loadBalancer": {
          "platform": {
            "type": "AWS",
            "aws": {
              "type": "NLB"
            }
          }
        }
      }
    }
  }
}'

Note: The ingress controller configuration is only applied during initial creation of the IngressController resource on the guest cluster. If the default ingress controller already exists, you may need to delete it first for the NLB configuration to take effect, or apply this patch before the cluster finishes installation.