KumoScaleTM storage provides data protection via a technique called Cross Domain Data Replication (CDDR). CDDR creates multiple replicas of a logical volume, and maps them to storage nodes located in different failure domains. This ensures that the volumes will not only remain available through failure of an SSD or storage node, but also operate normally in the presence of network partitions, power/cooling failures, or configuration errors that can take entire clusters or availability zones offline.
KumoScale technology takes a data center-level approach to resilience against hardware failure or network outage. KumoScale users can assign arbitrary labels, corresponding with failure domains, to cluster resources such as storage nodes. These labels provide the KumoScale provisioner with an awareness of failure topology. Then, when requesting a volume, users can describe mapping constraints and goals simply and concisely in terms of their own topology label scheme. For example, a user could create a label called "availability_zone" and apply it to all storage nodes. Then, a replicated volume can be constrained to place the replicas in storage nodes having different values for that label.
Replication is optional and can be configured on a per-volume basis, so replication cost is borne only those applications (Storage Classes) that require it. This can represent significant savings relative to a conventional array which implements data protection locally, and in a uniform way for all users.
KumoScale software implements replication via the Linux® “md” layer in the client. By replicating from the client, this approach takes advantage of the enormous aggregate network bandwidth of data center compute nodes, and avoids creating unnecessary bottlenecks at the storage node. The result is a tremendous improvement in IOPS/$ and bandwidth/$ vs. target-side replication. For more detail see Working with KumoScale Software.
Kubernetes and OpenStack® orchestrators both deploy a small executable on each client node to assist in managing networked storage connections. In a Kubernetes® environment, this is called the CSI (Cloud Storage Initiative) Driver, and in the OpenStack environment, the "OS brick". In addition to these options, KumoScale storage provides an analogous standalone function for bare metal deployments. In the following discussion, the term "Client Agent" is used generally to refers to any of these three scenarios.
The client agent is not part of the data path, and so has no effect on performance, and does not use significant CPU or memory resources. The agent has two primary functions:
When a volume is provisioned, the Client Agent connects the volume to the client, makes sure it is recognized by the OS, and optionally mounts a local file system. If a volume must be disconnected and reconnected (e.g. due to the relocation of the client VM), the agent ensures that the new connection is presented to the OS with the same logical device ID as before.
The Client Agent serves an important role in helping to allow the KumoScale control plane monitor the health of storage connections. It periodically checks the connection state and health of each volume replica, and reports any abnormal condition to the KumoScale control plane. By observing volume connections from both the client and the storage node points of view, KumoScale capabilities can maintain a much more complete picture of the system state, and makes distinguishing between endpoint failure and network outage more reliable.
For more details see CSI Driver User Manual
KumoScale enabled storage provides the option to manually add and remove replicas, via the REST API directly, or by running an Ansible® playbook. This is useful when preparing for a planned shutdown, such as for maintenance purposes, and assist in easily clearing a storage node before it is shut down, thus preserving the volumes' resiliency level throughout the process. It may also be used to free capacity by temporarily reducing the resiliency level.
When one of the "legs" of a replicated volume becomes unavailable, KumoScale technology can replace it. This process consists of the following steps, which are initiated and coordinated by the provisioner service:
As many network problems can be transient in nature, initiating a replica replacement too soon can waste bandwidth and create unnecessary repair activity. To allow for this, KumoScale software allows replicated volumes to be configured with a timeout value representing the time for which a replica may be unavailable before the healing sequence is initiated. When a replica fails to respond, the replication layer tracks which data blocks have been written. If the replica becomes available again before the timeout periods has elapsed, only the changed blocks will be synchronized by copying data from a healthy replica to the newly reconnected one. This makes synchronization after transient network problems fast.
It is important for a data protection system to be able to correctly recover from multiple failure scenarios, including situations where any of the constituent machines suffers a sudden power loss or network disconnection during recovery from another error. This means that the state of the system must be persisted at all times, in a way that allows a newly restarted component to resume the error recovery process correctly.
For more information please visit Self Healing.