OCP on AWS - Troubleshooting RunInstance Error
KCS URL: https://access.redhat.com/solutions/6969704
Red Hat OpenShift Container Platform Machine Failed with error launching instance: You are not authorized to perform this operation
Issue
- The Machine creation is stuck in Failed phase with
Error Message
:error launching instance: You are not authorized to perform this operation
(without an encoded error message when running the cluster using service endpoints)
$ oc describe machines -n openshift-machine-api ${MACHINE_NAME} |grep 'Error Message:'
Error Message: error launching instance: You are not authorized to perform this operation.
- The Machine creation is stuck in Failed phase with
Error Message
:error launching instance: You are not authorized to perform this operation
(with an encoded error message)
$ ENCODED_ERROR_MESSAGE=$(oc describe machines -n openshift-machine-api ${MACHINE_NAME} |grep 'Error Message:' | awk -F'failure message: ' '{print$2}')
$ oc describe machines -n openshift-machine-api ${MACHINE_NAME} |grep 'Error Message:'
Error Message: error launching instance: You are not authorized to perform this operation. Encoded authorization failure message: ${ENCODED_ERROR_MESSAGE}
- The decoded error message shows that the permissions used by Machine API are not allowed to run the action
ec2:RunInstances
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4.x
- Amazon Web Services (AWS)
- IAM service
- Authentication Mode: manual with STS
Resolution
Check if there isn't any policy blocking or missing any required action defined on the CredentialsRequests
. The following AWS services should be checked:
1) When using Service Endpoints (install config field: platform.aws.serviceEndpoints
): check if the existing VPC Endpoints for EC2 has the correct actions on the policy document, otherwise, you will have connectivity issues to reach the API services as it's blocked by Service Endpoint, thus the encoded message will not be provided on the Error Message
when troubleshooting the pod of Machine API Controllers
2) Check the Policies attached to the IAM Role ${cluster_name}-openshift-machine-api-aws-cloud-credentials
has all the required permissions defined on the CredentialsRequests
3) Check if the IAM Role ${cluster_name}-openshift-machine-api-aws-cloud-credentials
hasn't permissions boundary policies denying any required permissions defined on the CredentialsRequests
, or any expression matching it
4) Check if the AWS Account hasn't any Service Control Policies attached to the Organizations' structure, where the account belongs, that could deny any actions defined on the CredentialsRequests
Root Cause
Some possibilities could block the actions required for Machine API:
1) The EC2 endpoint policy for the VPC Endpoint is missing permissions
When the cluster is installed in a private, restricted or customized environment using the AWS VPC PrivateLink with VPC endpoints, the ec2 endpoint policies should allow the API calls to the EC2 service and dependencies, like IAM, otherwise, the requests will not reach the required API and fail on the client side. The error message should look like this - without encoded error message as the service API can't be reached:
$ oc describe machine -n openshift-machine-api $MACHINE_NAME |grep 'Error Message:'
Error Message: error launching instance: You are not authorized to perform this operation.
2) Missing permissions on the IAM Role, in general named as ${cluster_name}-openshift-machine-api-aws-cloud-credentials
:
The Policy document attached to the IAM Role used by Machine API has incorrect permissions, not allowing to perform ec2:RunInstances
, or equivalent, action to Machine API creates the Machine on the cloud provider.
The error message should look like this:
$ oc describe machines -n openshift-machine-api ${MACHINE_NAME} |grep 'Error Message:'
Error Message: error launching instance: You are not authorized to perform this operation. Encoded authorization failure message: ${ENCODED_ERROR_MESSAGE}
The fields below on the decoded error message should be checked:
allowed
should be set tofalse
explicitDeny
should be set tofalse
, as the error is missing the explicit "Allow" forec2:RunInstances
on the policy document
The decoded error message looks like this:
{
"allowed": false,
"explicitDeny": false,
"matchedStatements": {
"items": []
},
"failures": {
"items": []
},
"context": {
"principal": {
"id": "AROAT[redacted]:[redacted:session_name]",
"arn": "arn:aws:sts::[redacted:account_id]:assumed-role/[redacted:cluster_id]-openshift-machine-api-aws-cloud-credentials/[redacted:session_name]"
},
"action": "ec2:RunInstances",
"resource": "arn:aws:ec2:us-east-1:[redacted:account_id]:instance/*",
"conditions": {
(...)
}
}
}
3) Permissions Boundary policies for the IAM Role denying required actions:
The permissions boundary is used to set the maximum permissions that an identity-based policy can grant to an IAM entity, or not have. It's not a common configuration, for example, when the ec2:RunInstances
is explicitly denied on the permissions boundary policies, the Machine API will not be able to run the RunInstances EC2 API operation to create the machines.
The fields below on the decoded error message should be checked:
explicitDeny
should be set totrue
matchedStatements
should have the policy definition that matched the action defined on thecontext.action
with the denied statement
The decoded error message looks like this:
{
"allowed": false,
"explicitDeny": true,
"matchedStatements": {
"items": [
{
"statementId": "",
"effect": "DENY",
"principals": {
"items": [
{
"value": "SCOPE_POLICY_ISSUER"
}
]
},
"principalGroups": {
"items": []
},
"actions": {
"items": [
{
"value": "ec2:RunInstances"
}
]
},
"resources": {
"items": [
{
"value": "*"
}
]
},
"conditions": {
"items": []
}
}
]
},
"failures": {
"items": []
},
"context": {
"principal": {
"id": "AROAT[redacted]:[redacted:session_name]",
"arn": "arn:aws:sts::[redacted:account_id]:assumed-role/[redacted:cluster_id]-openshift-machine-api-aws-cloud-credentials/[redacted:session_name]"
},
"action": "ec2:RunInstances",
"resource": "arn:aws:ec2:us-east-1:[redacted:account_id]:instance/*",
"conditions": {
(...)
}
}
}
4) Explicitly Deny with Service Control Policies on the AWS Organizations structure
Service control policies (SCPs) are a type of organization policy that you can use to manage permissions in your organization. SCPs offer central control over the maximum available permissions for all accounts in your organization.
When the SCP is applied to the AWS Account running OCP with an explicitly deny of ec2:RunInstances
, or any expression matching required permissions, the Machine API will not be able to run the RunInstances EC2 API operation to create the machines.
The decoded error message looks like this:
{
"allowed": false,
"explicitDeny": true,
"matchedStatements": {
"items": [
{
"statementId": "Statement1",
"effect": "DENY",
"principals": {
"items": [
{
"value": "[redacted]"
}
]
},
"principalGroups": {
"items": []
},
"actions": {
"items": [
{
"value": "ec2:RunInstances"
}
]
},
"resources": {
"items": [
{
"value": "*"
}
]
},
"conditions": {
"items": []
}
}
]
},
"failures": {
"items": []
},
"context": {
"principal": {
"id": "[redacted]:[redacted:user]",
"arn": "arn:aws:sts::[redacted]:assumed-role/[redacted:sso_user_role_name]/[redacted:user]"
},
"action": "ec2:RunInstances",
"resource": "arn:aws:ec2:us-east-1:[redacted:account_id]:instance/*",
"conditions": {
(...)
}
}
}
Diagnostic Steps
- Check the Machines and note the name on the failed phase:
- Check the
Error Message
of the Failed machine:
$ oc describe machine $MACHINE_NAME -n openshift-machine-api
# or filter directly the error field
$ oc describe machine $MACHINE_NAME -n openshift-machine-api |grep 'Error Message:'
- When the Error message has the encoded failed message provided by AWS, get the encoded message and decode it:
$ ENCODED_ERROR_MESSAGE=$(oc describe machines -n openshift-machine-api $MACHINE_NAME |grep 'Error Message:' | awk -F'failure message: ' '{print$2}')
$ echo $(aws sts decode-authorization-message --encoded-message $ENCODED_ERROR_MESSAGE | jq -r .DecodedMessage) |jq .
- Check the required permissions for MachineAPI defined on the
CredentialsRequests
:
- Run the Policy Simulator using AWS CLI or AWS Console to check if the IAM Role has enough permissions as required on the
CredentialsRequests
:
$ ACTIONS=$(oc get -n openshift-cloud-credential-operator -o json \
credentialsrequests openshift-machine-api-aws \
| jq -r '.spec.providerSpec.statementEntries[].action[]' \
| sort -u | tr '\n' ' ')
$ MACHINE_API_ROLE_ARN=$(oc get secrets aws-cloud-credentials \
-n openshift-machine-api \
-o jsonpath='{.data.credentials}' |\
base64 -d |\
grep ^role_arn |\
awk '{print$3}')
$ aws iam simulate-principal-policy \
--policy-source-arn ${MACHINE_API_ROLE_ARN} \
--action-names ${ACTIONS} \
| jq -r '(["EVAL_ACTION","EVAL_DECISION"] | (., map(length*"-"))),
(.EvaluationResults[] | [.EvalActionName, .EvalDecision ])
| @tsv' | column -t
EVAL_ACTION EVAL_DECISION
----------- -------------
ec2:CreateTags allowed
ec2:DescribeAvailabilityZones allowed
ec2:DescribeDhcpOptions allowed
ec2:DescribeImages allowed
ec2:DescribeInstances allowed
ec2:DescribeInternetGateways allowed
ec2:DescribeSecurityGroups allowed
ec2:DescribeSubnets allowed
ec2:DescribeVpcs allowed
ec2:RunInstances allowed
ec2:TerminateInstances allowed
elasticloadbalancing:DeregisterTargets allowed
elasticloadbalancing:DescribeLoadBalancers allowed
elasticloadbalancing:DescribeTargetGroups allowed
elasticloadbalancing:DescribeTargetHealth allowed
elasticloadbalancing:RegisterInstancesWithLoadBalancer allowed
elasticloadbalancing:RegisterTargets allowed
iam:CreateServiceLinkedRole allowed
iam:PassRole allowed
kms:CreateGrant implicitDeny
kms:Decrypt allowed
kms:DescribeKey allowed
kms:Encrypt allowed
kms:GenerateDataKey allowed
kms:GenerateDataKeyWithoutPlainText allowed
kms:ListGrants implicitDeny
kms:RevokeGrant implicitDeny
# Fix the KMS actions that require parameters:
$ ACTIONS="kms:CreateGrant kms:RevokeGrant kms:ListGrants"
$ aws iam simulate-principal-policy \
--policy-source-arn ${MACHINE_API_ROLE_ARN} \
--action-names ${ACTIONS} \
--context-entries "ContextKeyName='kms:GrantIsForAWSResource',ContextKeyValues=true,ContextKeyType=boolean" \
| jq -r '(["EVAL_ACTION","EVAL_DECISION"] | (., map(length*"-"))),
(.EvaluationResults[] | [.EvalActionName, .EvalDecision ])
| @tsv' | column -t
EVAL_ACTION EVAL_DECISION
----------- -------------
kms:CreateGrant allowed
kms:RevokeGrant allowed
kms:ListGrants allowed