Configure the Storage Node Platform

The following sections list the settings that need to be done on each machine hosting a storage node. These settings have been tested on CentOS 7.9 and CentOS 8.3 with Kernel 5.10.61.

The following sections list the settings that need to be done on each machine hosting a storage node. These settings have been tested on CentOS 7.9 and CentOS 8.3 with Kernel 5.10.61.

Disable UEFI Secure Boot

If boot mode in your system BIOS is set to UEFI, you may need to disable secure boot from the BIOS to ensure the system boots with the customized kernel modules.

IP settings for KumoScale Management and Data Traffic

Persistent IP for KumoScale Management and NVMe-oF™ portal should be configured by the administrator. While the management IP is used to manage the node, the Portal IP is used for NVMe-oF data traffic. Firewall rules may be needed to ensure proper operation.

Configure the Firewall for NVMe-oF/TCP Traffic

Before creating the NVMe-oF portal, confirm that the firewall allows connections through the chosen port:

firewall-cmd --stat

This is needed only when the system firewall is running and you wish to use NVMe-oF/TCP. It is not required for NVMe-oF/RDMA. For example, to allow connections through port 4420 for TCP:

firewall-cmd --permanent --add-port 4420/tcp

firewall-cmd --reload

NVMe-oF with RDMA

Follow these instructions in this section if your NICs used for NVMe-oF data traffic support RDMA and you wish to use the RDMA protocol with NVMe-oF.

 

Note: We highly recommend using the RDMA protocol if your network interface cards support it.

Load RDMA drivers

You may need to configure your system to load required RDMA drivers. For example, in CentOS 7.9 mlx5_ib is not loaded by the system (unlike CentOS 8.3), so if you are using a Mellanox™ NIC that requires this module, you need to load it and make sure it is being loaded during each boot. One way of verifying this is with the following:

modprobe mlx5_ib

echo mlx5_ib >> /etc/modules-load.d/mlx.conf

Set RoCE v2 as the Default Version

In order to properly serve hosts running RDMA over Converged Ethernet (RoCE) v1 and RoCE v2 we recommend setting RoCE v2 as the default version in the KumoScale software machine. More information on this topic is available at the link below:

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_infiniband_and_rdma_networks/configuring-roce_configuring-and-managing-networking

1.  Verify RDMA related drivers and rdma_cm are loaded:
modprobe rdma_cm

2.  Use the ibstat command to list available devices/ports

ibstat

3.  Set the default for a given device/port. For example:

DEVICE=mlx5_0

PORT=1

4.  Create the directory /sys/kernel/config/rdma_cm/$DEVICE/ :

mkdir -p /sys/kernel/config/rdma_cm/$DEVICE/

5.  Display the default RoCE mode. For example, to display the mode for port $PORT:

cat /sys/kernel/config/rdma_cm/$DEVICE/ports/$PORT/default_roce_mode

6.  Change the default RoCE mode to version 2:

echo "RoCE v2" > /sys/kernel/config/rdma_cm/$DEVICE/ports/$PORT/default_roce_mode

or,

use the Mellanox script found at:

https://github.com/Mellanox/mlnx-tools/blob/master/ofed_scripts/cma_roce_mode

cma_roce_mode -d $DEVICE -p $PORT -m 2

Note: The configuration is not persistent and must be done with every boot.

IRQ Balancer Configuration

For best performance from KumoScale software, we recommend the following settings for the irq balancer:

Disable the irqbalance daemon with:

systemctl stop irqbalance

systemctl disable irqbalance

Run it only once with the --oneshot option and add a udev rule to ensure it runs before any connection is made, even after reboot, with the following:

irqbalance --oneshot

echo 'ACTION=="add", SUBSYSTEM=="module", KERNEL=="nvmeoft_fab", RUN+="irqbalance --oneshot"' > /etc/udev/rules.d/10-ks-mod-fab-load.rules

udevadm control --reload-rules

Disable HyperThreading

For best performance with KumoScale software, we recommend disabling hyperthreading (HT) from the BIOS.

Resolve IO_PAGE_FAULT Errors in Some AMD Platforms

In some AMD platforms that include hardware input–output memory management unit (IOMMU), you may see many IO_PAGE_FAULTs in the kernel log.

The issue can be solved by one of the following:

  1. Disable CPU virtualization in the BIOS.
  2. Avoid using HW IOMMU; instead use SWIOTLB (see note below) by adding iommu=soft to the LINUX™ boot command line (grub).

SWIOTLB (Software Input Output Translation Lookaside Buffer) is an Intel® technology which somewhat bypasses the IOMMU and enables a more configurable memory management interface. Without going into the deep complexity of how this works, page tables are cached in the Lookaside Buffer reducing the need to constantly access physical RAM to map memory. This technology is also referred to as a bounce buffer as the physical address of the memory map is held in this virtual space and IO is bounced between the physical IO and the Physical memory by this virtual lookaside buffer.

This allows the memory mapping to be carried out quickly and have a physical memory space available for use much faster than if it had to be created physically in RAM and presented to the system as usable.


Next: Install the KumoScale Kernel Modules