KumoScale
KumoScale

Storage Nodes


Storage Nodes are the workhorses of a KumoScaleTM deployment. The storage nodes are standard one or two-socket servers based on Intel® or AMD CPUs. They are typically 1U or 2U servers, configured with 12 to 24 NVMeTM drive bays, and two or more 100Gbps network cards. KIOXIA tests hardware platforms to assure compatibility, and maintains a Hardware Certification List (HCL) of servers, SSDs and NICs that are approved for use with KumoScale software.

KumoScale storage nodes provide NVMe storage to clients, with high-performance and low-latency, at data center scale. This is accomplished by federating storage nodes together into a single cluster, which is managed by the KumoScale Control Plane so as to appear to users as a single system, accessed through a central API service. KumoScale storage nodes work in bare-metal, virtualized, and containerized environments, with any NVMe compliant SSD and with any NVMe-oFTM compliant Linux® client.

KumoScale Storage Services ArchitectureKumoScale Storage Node Architecture

KumoScale storage nodes terminate NVMe-oF commands directed to logical volumes. These commands are parsed, and with the aid of a mapping matrix, used to construct equivalent NVMe commands which are forwarded over the PCIe® interface to local NVMe drives. Because all NVMe-oF processing is done by KumoScale software, the SSDs need not have any native NVMe-oF capability.  Any SSD model which is compliant with NVMe specification 1.3 or 1.4 can be certified for use with KumoScale software.

Logical (client) View

KumoScale storage nodes present a virtualized storage container called a volume to clients. NVMe-oF storage appears to the client operating system as if it were implemented via locally  attached NVMe devices. As shown in the figure below, NVMe connections are terminated by one or more controllers within each SSD.  Once connected, any namespaces attached to each controller are visible to clients, and under the Linux OS are presented as NVMe devices with sequentially numbered namespaces, e.g. "/dev/nvme0n1".

Operating systems discover locally attached NVMe devices via a protocol which enumerates all devices attached to a PCIe bus hierarchy. As is typical for I/O channels, the OS assumes that it owns any devices so attached.

To maintain maximum compatibility with local NVMe devices, NVMe-oF specification implements a networked version of this discovery protocol. KumoScale storage nodes implement an NVMe-oF "discovery server," which can be interrogated by a host to get a list of namespaces (volumes) to which that host may connect.

For security reasons, KumoScale software allows this discovery mechanism to be disabled. When discovery is set to OFF, storage resources cannot be discovered over the fabric using the NVMe Discovery command. In this case, a connection will only be accepted if the Connect command contains the globally unique NVMe Qualified Name (NQN) of the targeted resource.

Volume Attach - Logical View from ClientVolume Mapping - Logical View  from Clients

Physical Implementation

The figure below shows the implementation of the namespaces represented in the logical view above. The controllers are virtualized, using the NVMe-oF dynamic controller construct. Access control from controllers to volumes is general, and is specified when the volume is created. KumoScale storage nodes map client volumes/namespaces to physical SSDs.  While there are exceptions, this mapping is usually static, i.e. it does not change with every I/O write. This makes the data path simple and fast. The details of the mapping depend on the performance requirements specified for the volume and the workload associated with other volumes sharing the SSD. In the example below, Volumes 1 and 2 are dedicated to Client A, Volume 4 is dedicated to Client B, and Volumes 3.1 and 3.2 are shared between Clients A and B. 

Volume Attach - Physical Implementation

Volume Mapping - Logical and Physical Views (from Storage Node)

KumoScale storage nodes support RDMA over Converged Ethernet v2 (RoCE v2) and Transmission Control Protocol (TCP/IP) transports. RoCE transport offers very low latency and high-performance, but is dependent on NIC hardware offload and a fully "lossless" network design. TCP/IP transport is routable, offers great flexibility, works over both "lossless" and congested networks, and is backward compatible with existing (and legacy) infrastructures. It does not require an RDMA-enabled NIC or any special configuration, but the latest generation of network interfaces include hardware acceleration for TCP/IP NVMe-oF traffic that delivers performance comparable to RoCEv2.

KumoScale storage nodes are fully compatible with the NVMe-oF specification (version 1.0 and above), and are interoperable with any client initiator that is similarly compliant. For best results, KIOXIA recommends:

  • For NVMe-oF RoCE v2: Linux with kernel 4.9.64 x86 or newer
  • For NVMe-oF TCP: Linux with kernel 5.x x86 or newer
  • For NVMe CLI (Command Line Interface): Version 1.6 or newer