Skip to content

OCP on AWS - Using Instance Disks for Ephemeral Storage

This document describes the steps used to evaluate the performance of different disk types on EC2 instances in AWS. The disk types include ephemeral (local disks) and block storage types: gp2, gp3, io1, and io2.

The tool used is FIO. The intention is to stress the disks, using the baseline burst IO balance of gp2 to define the total test duration. For example, if the EBS gp2 of 200GiB takes 20 minutes to consume all the burst balance for the stress tests, we will repeat the same time (plus 5 minutes) for other disks which do not have that limitation.

Table Of Contents:

  • Create the environment
    • Create the MachineConfig
    • Create the MachineSet using an instance type with ephemeral storage
    • Create the MachineSet with an extra EBS volume of type gp2
    • Create the MachineSet with an extra EBS volume of type gp3
    • Create the MachineSet with an extra EBS volume of type io1
    • Create the MachineSet with an extra EBS volume of type io2
  • Run the Benchmark
  • Analyze the results
  • Review

Create the environment

Create MachineConfig

The MachineConfig should create the systemd units to:

  • Create the filesystem on the new device
  • Mount the device on the path /var/lib/containers
  • Restore the SELinux context

Steps:

  • Export the device path presented to your instance for the ephemeral device (generally /dev/nvme1n1):
export DEVICE_NAME=nvme1n1
  • Create the MachineConfig manifest:
cat <<EOF | envsubst | oc create -f -
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 98-var-lib-containers
spec:
  config:
    ignition:
      version: 3.1.0
    systemd:
      units:
      - contents: |
          [Unit]
          Description=Make Filesystem on /dev/${DEVICE_NAME}
          DefaultDependencies=no
          BindsTo=dev-${DEVICE_NAME}.device
          After=dev-${DEVICE_NAME}.device var.mount
          Before=systemd-fsck@dev-${DEVICE_NAME}.service

          [Service]
          Type=oneshot
          RemainAfterExit=yes
          ExecStart=-/bin/bash -c "/bin/rm -rf /var/lib/containers/*"
          ExecStart=/usr/lib/systemd/systemd-makefs xfs /dev/${DEVICE_NAME}
          TimeoutSec=0

          [Install]
          WantedBy=var-lib-containers.mount
        enabled: true
        name: systemd-mkfs@dev-${DEVICE_NAME}.service
      - contents: |
          [Unit]
          Description=Mount /dev/${DEVICE_NAME} to /var/lib/containers
          Before=local-fs.target
          Requires=systemd-mkfs@dev-${DEVICE_NAME}.service
          After=systemd-mkfs@dev-${DEVICE_NAME}.service

          [Mount]
          What=/dev/${DEVICE_NAME}
          Where=/var/lib/containers
          Type=xfs
          Options=defaults,prjquota

          [Install]
          WantedBy=local-fs.target
        enabled: true
        name: var-lib-containers.mount
      - contents: |
          [Unit]
          Description=Restore recursive SELinux security contexts
          DefaultDependencies=no
          After=var-lib-containers.mount
          Before=crio.service

          [Service]
          Type=oneshot
          RemainAfterExit=yes
          ExecStart=/sbin/restorecon -R /var/lib/containers/
          TimeoutSec=0

          [Install]
          WantedBy=multi-user.target graphical.target
        enabled: true
        name: restorecon-var-lib-containers.service
EOF

Create the MachineSet

The next step is to create the MachineSet to launch an instance with ephemeral disks available. In general, instance types with ephemeral disks end with the letter "d". For example, the Compute optimized family (C) in the 6th-generation of Intel processors (i), with ephemeral storage, will be the type C6id.

In this example, the instance type c6id.xlarge provides ephemeral storage of 237 GB NVMe SSD.

export INSTANCE_TYPE=c6id.xlarge

Get the CLUSTER_ID:

export CLUSTER_ID="$(oc get infrastructure cluster \
    -o jsonpath='{.status.infrastructureName}')"

Create the MachineSet:

create_machineset() {
  # Required environment variables:
  ## DISK_TYPE         : Used to create the node label and name suffix of MachineSet
  ## CLUSTER_ID        : Can get from infrastructure object
  ## INSTANCE_TYPE     : InstanceType
  # Optional environment variables:
  ## EXTRA_EBS_DEVICE  : New EBS definition to be created  (default: '')
  ## AWS_REGION        : AWS Region (default: us-east-1)
  ## AWS_ZONE          : Availability Zone part of AWS_REGION  (default: us-east-1a)
  cat <<EOF | envsubst | oc create -f -
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
  labels:
    machine.openshift.io/cluster-api-cluster: ${CLUSTER_ID}
  name: ${CLUSTER_ID}-worker-${DISK_TYPE}
  namespace: openshift-machine-api
spec:
  replicas: 0
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: ${CLUSTER_ID}
      machine.openshift.io/cluster-api-machineset: ${CLUSTER_ID}-worker-${DISK_TYPE}
  template:
    metadata:
      labels:
        machine.openshift.io/cluster-api-cluster: ${CLUSTER_ID}
        machine.openshift.io/cluster-api-machine-role: worker
        machine.openshift.io/cluster-api-machine-type: worker
        machine.openshift.io/cluster-api-machineset: ${CLUSTER_ID}-worker-${DISK_TYPE}
    spec:
      metadata:
        labels:
          disk_type: "${DISK_TYPE}"
      providerSpec:
        value:
          ami:
            id: ami-0722eb0819717090f
          apiVersion: machine.openshift.io/v1beta1
          blockDevices:
          - ebs:
              encrypted: true
              iops: 0
              kmsKey:
                arn: ""
              volumeSize: 120
              volumeType: gp3
${EXTRA_BLOCK_DEVICES:-}
          credentialsSecret:
            name: aws-cloud-credentials
          deviceIndex: 0
          iamInstanceProfile:
            id: ${CLUSTER_ID}-worker-profile
          instanceType: ${INSTANCE_TYPE}
          kind: AWSMachineProviderConfig
          placement:
            availabilityZone: ${AWS_ZONE:-us-east-1a}
            region:  ${AWS_REGION:-us-east-1}
          securityGroups:
          - filters:
            - name: tag:Name
              values:
              - ${CLUSTER_ID}-worker-sg
          subnet:
            filters:
            - name: tag:Name
              values:
              - ${CLUSTER_ID}-private-${AWS_ZONE:-us-east-1a}
          tags:
          - name: kubernetes.io/cluster/${CLUSTER_ID}
            value: owned
          userDataSecret:
            name: worker-user-data
EOF
}

Choose the device and instance type to create the MachineSet according to the tested EBS type:

  • Create MachineSet for Ephemeral Disk Instance
export INSTANCE_TYPE="m6id.xlarge"
create_machineset
  • Create MachineSet for gp2 Disk Instance
export INSTANCE_TYPE="m6i.xlarge"
export EXTRA_BLOCK_DEVICES="
      - deviceName: /dev/xvdb
        ebs:
          volumeType: gp2
          volumeSize: 230
"
create_machineset
  • Create MachineSet for gp3 Disk Instance
export INSTANCE_TYPE="m6i.xlarge"
export EXTRA_BLOCK_DEVICES="
      - deviceName: /dev/xvdb
        ebs:
          volumeType: gp3
          volumeSize: 230
"
create_machineset
  • Create MachineSet for io1 Disk Instance
export INSTANCE_TYPE="m6i.xlarge"
export EXTRA_BLOCK_DEVICES="
      - deviceName: /dev/xvdb
        ebs:
          volumeType: io1
          volumeSize: 230
          iops: 3000
"
create_machineset
  • Create MachineSet for io2 Disk Instance
export INSTANCE_TYPE="m6i.xlarge"
export EXTRA_BLOCK_DEVICES="
      - deviceName: /dev/xvdb
        ebs:
          volumeType: io2
          volumeSize: 230
          iops: 3000
"
create_machineset

Wait for the node to be created:

oc get node -l disk_type=ephemeral -w

Make sure the device has been mounted correctly to the mount path /var/lib/containers:

oc debug node/$(oc get nodes -l disk_type=${disk_type} -o jsonpath='{.items[0].metadata.name}') -- chroot /host /bin/bash -c "df -h /var/lib/containers"

Review

Running fio-etcd

We will use the quick FIO test commonly used to evaluate disks for etcd.

As used in OpenShift for etcd (reference) quick tests:

export label_disk=ephemeral
export node_name=$(oc get nodes -l disk_type=${label_disk} -o jsonpath='{.items[0].metadata.name}')
export base_path="/var/lib/containers/_benchmark_fio"

Run quick FIO test (used for etcd):

  • Running on ephemeral device
export disk_type=ephemeral
export base_path="/var/lib/containers/_benchmark_fio"

oc debug node/${node_name} -- chroot /host /bin/bash -c \
    "mkdir -p ${base_path}; podman run --volume ${base_path}:/var/lib/etcd:Z quay.io/openshift-scale/etcd-perf" > ./results-${disk_type}-fio_etcd.txt
  • Running on the root volume (EBS):
export disk_type=ebs
export base_path="/var/lib/misc/_benchmark_fio"

oc debug node/${node_name} -- chroot /host /bin/bash -c \
    "mkdir -p ${base_path}; podman run --volume ${base_path}:/var/lib/etcd:Z quay.io/openshift-scale/etcd-perf" > ./results-${disk_type}-fio_etcd.txt

Running stress test with FIO

Run stress FIO test:

FIO parameters recommended in the AWS Documentation for General Purpose disks (GP):

oc debug node/${node_name} -- chroot /host /bin/bash -c \
    "echo \"[0] <=> \$(hostname) <=> \$(date) <=> \$(uptime) \"; \
    lsblk; \
    mkdir -p ${base_path}; \
    for offset in {1..2} ; do \
        echo \"Running [\$offset]\"; \
        podman run --rm \
            -v ${base_path}:/benchmark:Z \
            ljishen/fio \
                --ioengine=psync \
                --rw=randwrite \
                --direct=1 \
                --bs=16k \
                --size=1G \
                --numjobs=5 \
                --time_based \
                --runtime=60 \
                --group_reporting \
                --norandommap \
                --directory=/benchmark \
                --name=data_${disk_type}_\$offset \
                --output-format=json \
                --output=/benchmark/result_\$(hostname)-${disk_type}-\$offset.json ;\
        sleep 10; \
        rm -f ${base_path}/data_${disk_type}_* ||true ; \
        echo \"[\$offset] <=> \$(hostname) <=> \$(date) <=> \$(uptime) \"; \
    done; \
    tar cfz /tmp/benchmark-${disk_type}.tar.gz ${base_path}*/*.json" \
    2>/dev/null | tee -a ${log_stdout}

oc debug node/${node_name} -- chroot /host /bin/bash -c \
    "cat /tmp/benchmark-${disk_type}.tar.gz" \
    2>/dev/null > ./results-fio_stress-${disk_type}-${node_name}.tar.gz

References