HyperShift CCM Managed Security Groups Lab
This guide includes hands-on steps to explore HyperShift to:
- Build HyperShift binary
- Build/BYO HyperShift control plane operator image for custom installs
- Install HyperShift using custom images with feature set TechPreviewNoUpgrade
- Install hosted cluster with feature set TechPreviewNoUpgrade
- Uninstall HyperShift operator
Building
# building the binary
make
# building the operator (control-plane-operator) image
export REGISTRY=${REGISTRY:-quay.io/mrbraga}
make docker-build IMG=${REGISTRY}/hypershift-controlplane-operator:devel
Create Layered Hosted Cluster
Prerequisites: - OCP self-managed - KUBECONFIG variable from self-managed exported
Steps to create nested cluster to allow flexibility when destroying layers:
OCP Self Managed -> HCP operator -> Hosted -> HCP Operator -> e2e
# Globals
export AWS_CREDS="$AWS_SHARED_CREDENTIALS_FILE"
export AWS_DEFAULT_REGION=us-east-1
export CLUSTER_BASE_DOMAIN=splat.devcluster.openshift.com
export PULL_SECRET_FILE="${HOME}/.openshift/pull-secret-latest.json"
export SSH_PUB_KEY_FILE=$HOME/.ssh/id_rsa.pub
# Create OIDC generic bucket
export OIDC_BUCKET_NAME="hcp-e2e-oidc"
# Setup Bucket for OIDC discovery documents
bucket_policy_file=${OIDC_BUCKET_NAME}-oidc-workload-clusters_policy.json
aws s3api create-bucket --bucket ${OIDC_BUCKET_NAME}
aws s3api delete-public-access-block --bucket ${OIDC_BUCKET_NAME}
echo '{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::${OIDC_BUCKET_NAME}/*"
}
]
}' | envsubst > ${bucket_policy_file}
aws s3api put-bucket-policy --bucket ${OIDC_BUCKET_NAME} --policy file://${bucket_policy_file}
#> Install Hosted Cluster to be used as management cluster (Layered)
export CLUSTER_PREFIX=hcp-e2e-v5
# Hypershift operator must be stable (no TP, development, etc)
./bin/hypershift install \
--oidc-storage-provider-s3-bucket-name="${OIDC_BUCKET_NAME}" \
--oidc-storage-provider-s3-credentials="${AWS_CREDS}" \
--oidc-storage-provider-s3-region="${AWS_DEFAULT_REGION}" \
--tech-preview-no-upgrade=true
# Create hosted cluster to host the management cluster
OCP_RELEASE_IMAGE=quay.io/openshift-release-dev/ocp-release:4.22.0-ec.1-x86_64
./bin/hypershift create cluster aws \
--name="${CLUSTER_PREFIX}" \
--region="${AWS_DEFAULT_REGION}" \
--node-pool-replicas=2 \
--base-domain="${CLUSTER_BASE_DOMAIN}" \
--pull-secret="${PULL_SECRET_FILE}" \
--aws-creds="${AWS_CREDS}" \
--ssh-key="${SSH_PUB_KEY_FILE}" \
--release-image="${OCP_RELEASE_IMAGE}" \
--feature-set=TechPreviewNoUpgrade
# Wait the cluster to be installed
oc get hostedclusters -A -w
# Extract the kubeconfig for the HC/mgr
./bin/hypershift create kubeconfig --name ${CLUSTER_PREFIX} > kubeconfig-${CLUSTER_PREFIX}
export KUBECONFIG_OLD=$KUBECONFIG
export KUBECONFIG=$PWD/kubeconfig-${CLUSTER_PREFIX}
# Ensure that the cluster is stable:
oc get co -w
BYO HyperShift Images
Step 1: Build the Control-Plane-Operator Image
export REGISTRY=${REGISTRY:-quay.io/mrbraga}
export TAG="feat-ccm-nlb-sg-$(git rev-parse --short HEAD)"
export CONTROL_PLANE_IMAGE=${REGISTRY}/hypershift-control-plane-operator:${TAG}
podman build -f Dockerfile.control-plane -t ${CONTROL_PLANE_IMAGE} .
Step 2: Push the Image to Your Registry
Step 3: Install HyperShift Operator with Custom Images
./bin/hypershift install \
--oidc-storage-provider-s3-bucket-name="${OIDC_BUCKET_NAME}" \
--oidc-storage-provider-s3-credentials="${AWS_CREDS}" \
--oidc-storage-provider-s3-region="${AWS_DEFAULT_REGION}" \
--tech-preview-no-upgrade=true \
--additional-operator-env-vars CONTROL_PLANE_OPERATOR_IMAGE=${CONTROL_PLANE_IMAGE} \
--development
# Scale up hypershift (--development flag does not start operator)
Step 4: Create Layered Hosted Cluster with Custom Operator
Note 1: Use
-control-plane-operator-imageonly when you are building it. Note 2: Use--feature-set=TechPreviewNoUpgradeonly when you want to change the feature set
HOSTED_CLUSTER_NAME="${CLUSTER_PREFIX}-hc1"
./bin/hypershift create cluster aws \
--name="${HOSTED_CLUSTER_NAME}" \
--region="${AWS_DEFAULT_REGION}" \
--node-pool-replicas=2 \
--base-domain="${CLUSTER_BASE_DOMAIN}" \
--pull-secret="${PULL_SECRET_FILE}" \
--aws-creds="${AWS_CREDS}" \
--ssh-key="${SSH_PUB_KEY_FILE}" \
--release-image="${OCP_RELEASE_IMAGE}" \
--control-plane-operator-image="${CONTROL_PLANE_IMAGE}" \
--feature-set=TechPreviewNoUpgrade
# Check the cluster information:
oc get --namespace clusters hostedclusters
# When completed, extract the credentials for workload cluster:
./bin/hypershift create kubeconfig --name ${HOSTED_CLUSTER_NAME} > kubeconfig-${HOSTED_CLUSTER_NAME}
# kubeconfig for management cluster
export KUBECONFIG_MGR=$KUBECONFIG
# kubeconfig for workload cluster
export KUBECONFIG=$PWD/kubeconfig-${HOSTED_CLUSTER_NAME}
Testing
CCM Upstream Tests
cd ${PATH_TO_CCM}
make e2e.test
./e2e.test --ginkgo.v --ginkgo.focus="loadbalancer NLB internal should be reachable with hairpinning traffic"
./e2e.test --ginkgo.v --ginkgo.focus="loadbalancer NLB should be reachable with target-node-labels"
HyperShift e2e
Reference: HyperShift e2e docs
make e2e
bin/test-e2e \
-test.v \
-test.run=TestCCMCreateCluster/Main/When_feature_set_is_TechPreviewNoUpgrade/LoadBalancer_service_should_have_security_groups_attached \
-test.timeout=2h \
--e2e.aws-region=${AWS_DEFAULT_REGION} \
--e2e.aws-credentials-file="${AWS_SHARED_CREDENTIALS_FILE}" \
--e2e.pull-secret-file="${PULL_SECRET_FILE}" \
--e2e.base-domain="${CLUSTER_BASE_DOMAIN}" \
--e2e.platform=AWS \
--e2e.latest-release-image="${OCP_RELEASE_IMAGE}" \
--e2e.aws-oidc-s3-bucket-name="${OIDC_BUCKET_NAME}" \
--e2e.aws-oidc-s3-region="${AWS_DEFAULT_REGION}"
Alternative test target with custom CPO image:
bin/test-e2e \
-test.v --ginkgo.vv \
-test.run=TestCreateCluster/Main/AWSCCMWithCustomizations \
-test.timeout=30m \
--e2e.platform=AWS \
--e2e.aws-region=${AWS_DEFAULT_REGION} \
--e2e.aws-credentials-file="${AWS_SHARED_CREDENTIALS_FILE}" \
--e2e.pull-secret-file="${PULL_SECRET_FILE}" \
--e2e.base-domain="${CLUSTER_BASE_DOMAIN}" \
--e2e.aws-oidc-s3-bucket-name="${OIDC_BUCKET_NAME}" \
--e2e.control-plane-operator-image="${CONTROL_PLANE_IMAGE}"
Manual NLB Verification
Create a test LoadBalancer Service in the guest cluster and verify security groups:
KUBECONFIG_HC=$PWD/kubeconfig-${HOSTED_CLUSTER_NAME}
oc --kubeconfig=${KUBECONFIG_HC} create namespace test-ccm-nlb-sg
oc --kubeconfig=${KUBECONFIG_HC} apply -f - <<EOF
apiVersion: v1
kind: Service
metadata:
name: test-nlb-service
namespace: test-ccm-nlb-sg
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
spec:
type: LoadBalancer
selector:
app: test
ports:
- port: 80
targetPort: 8080
protocol: TCP
EOF
oc --kubeconfig=${KUBECONFIG_HC} get svc -n test-ccm-nlb-sg test-nlb-service -w
Once the LoadBalancer has an external hostname, verify security groups are attached:
LB_HOSTNAME=$(oc --kubeconfig=${KUBECONFIG_HC} get svc -n test-ccm-nlb-sg test-nlb-service \
-o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
LB_NAME=$(echo ${LB_HOSTNAME} | cut -d'.' -f1 | rev | cut -d'-' -f2- | rev)
aws elbv2 describe-load-balancers --names ${LB_NAME} --region ${AWS_DEFAULT_REGION} \
--query 'LoadBalancers[0].SecurityGroups'
Cleanup:
oc --kubeconfig=${KUBECONFIG_HC} delete svc test-nlb-service -n test-ccm-nlb-sg
oc --kubeconfig=${KUBECONFIG_HC} delete namespace test-ccm-nlb-sg
Iteration Workflow
When iterating on CPO code changes without recreating the cluster:
# Rebuild with a new tag
export TAG="feat-ccm-nlb-sg-$(git rev-parse --short HEAD)"
export CONTROL_PLANE_IMAGE=${REGISTRY}/hypershift-control-plane-operator:${TAG}
podman build -f Dockerfile.control-plane -t ${CONTROL_PLANE_IMAGE} .
podman push ${CONTROL_PLANE_IMAGE}
# Patch the running HostedCluster to use the new image
oc patch hostedcluster ${HOSTED_CLUSTER_NAME} -n clusters --type=merge \
-p "{\"spec\":{\"controlPlaneOperatorImage\":\"${CONTROL_PLANE_IMAGE}\"}}"
Destroy HyperShift Operator
# Render the manifests that were installed
bin/hypershift install render > hypershift-manifests.yaml
# delete the controller
oc delete ns hypershift
# Delete using the rendered manifests
oc delete -f hypershift-manifests.yaml
Troubleshooting
Verify CPO Image in Use
oc get pods -n clusters-${HOSTED_CLUSTER_NAME} -l app=control-plane-operator
oc get deployment -n clusters-${HOSTED_CLUSTER_NAME} \
-l hypershift.openshift.io/control-plane-component \
-o jsonpath='{.items[*].spec.template.spec.containers[*].image}'
Verify NLBSecurityGroupMode in CCM Config
oc get configmap aws-cloud-config -n clusters-${HOSTED_CLUSTER_NAME} \
-o jsonpath='{.data.aws\.conf}' | grep NLBSecurityGroupMode
IAM Permissions for Control Plane Components
Extract credentials from the CCM pod to validate IAM permissions:
CCM_POD=$(oc get pod -l app=cloud-controller-manager \
-n clusters-${HOSTED_CLUSTER_NAME} \
-o jsonpath='{.items[0].metadata.name}')
COMPONENT_IAM_ROLE=$(oc exec ${CCM_POD} \
-c cloud-controller-manager \
-n clusters-${HOSTED_CLUSTER_NAME} \
-- cat /etc/kubernetes/secrets/cloud-provider/credentials \
| grep role_arn | awk -F"= " '{print$2}')
TOKEN_PATH=$(oc exec ${CCM_POD} \
-c cloud-controller-manager \
-n clusters-${HOSTED_CLUSTER_NAME} \
-- cat /etc/kubernetes/secrets/cloud-provider/credentials \
| grep web_identity | awk -F"= " '{print$2}')
COMPONENT_TOKEN=$(oc exec ${CCM_POD} \
-c cloud-controller-manager \
-n clusters-${HOSTED_CLUSTER_NAME} \
-- cat ${TOKEN_PATH})
Assume the role and get temporary credentials:
aws sts assume-role-with-web-identity \
--role-arn "${COMPONENT_IAM_ROLE}" \
--web-identity-token "${COMPONENT_TOKEN}" \
--role-session-name my-session \
--query "Credentials.[AccessKeyId,SecretAccessKey,SessionToken]" \
--output text \
| awk '{print "[default]\naws_region = us-east-1\naws_access_key_id = "$1"\naws_secret_access_key = "$2"\naws_session_token = "$3}' \
| tee ./workload-creds.conf
Verify identity and simulate permissions:
AWS_SHARED_CREDENTIALS_FILE=$PWD/workload-creds.conf \
aws sts get-caller-identity
aws iam simulate-principal-policy \
--policy-source-arn ${COMPONENT_IAM_ROLE} \
--action-names \
"elasticloadbalancing:DescribeLoadBalancers" \
"elasticloadbalancing:DescribeTargetGroupAttributes" \
| jq -r '(["EVAL_ACTION","EVAL_DECISION"] | (., map(length*"-"))),
(.EvaluationResults[] | [.EvalActionName, .EvalDecision ])
| @tsv' | column -t
Example output:
EVAL_ACTION EVAL_DECISION
----------- -------------
elasticloadbalancing:DescribeLoadBalancers allowed
elasticloadbalancing:DescribeTargetGroupAttributes implicitDeny
CCM OTE and Hypershift
This change requires merged PR to work as expected: https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/464
Creating a HyperShift hosted cluster with NLB ingress
Some tests (e.g., should have security groups attached to default ingress
controller NLB) require the default ingress controller to use NLB type. By
default, HyperShift creates hosted clusters with a Classic Load Balancer (CLB)
for ingress.
To enable NLB, patch the HostedCluster's spec.configuration.ingress after
creation:
# Create the hosted cluster with TechPreview (required for AWSServiceLBNetworkSecurityGroup)
hypershift create cluster aws \
--name="${HOSTED_CLUSTER_NAME}" \
--region="${AWS_DEFAULT_REGION}" \
--node-pool-replicas=3 \
--base-domain="${CLUSTER_BASE_DOMAIN}" \
--pull-secret="${PULL_SECRET_FILE}" \
--aws-creds="${AWS_CREDS}" \
--ssh-key="${SSH_PUB_KEY_FILE}" \
--release-image="${OCP_RELEASE_IMAGE}" \
--feature-set=TechPreviewNoUpgrade
# Patch the HostedCluster to use NLB for the default ingress controller
oc --kubeconfig "$HYPERSHIFT_MANAGEMENT_CLUSTER_KUBECONFIG" patch hostedcluster "${HOSTED_CLUSTER_NAME}" -n clusters --type=merge -p '
{
"spec": {
"configuration": {
"ingress": {
"loadBalancer": {
"platform": {
"type": "AWS",
"aws": {
"type": "NLB"
}
}
}
}
}
}
}'
Note: The ingress controller configuration is only applied during initial creation of the
IngressControllerresource on the guest cluster. If the default ingress controller already exists, you may need to delete it first for the NLB configuration to take effect, or apply this patch before the cluster finishes installation.