Data Protection


Cross Domain Data Replication

KumoScaleTM storage provides data protection via a technique called Cross Domain Data Replication (CDDR).  CDDR creates multiple replicas of a logical volume, and maps them to storage nodes located in different failure domains. This ensures that the volumes will not only remain available through failure of an SSD or storage node, but also operate normally in the presence of network partitions, power/cooling failures, or configuration errors that can take entire clusters or availability zones offline.

Replicated Volume

Topology Awareness

KumoScale technology takes a data center-level approach to resilience against hardware failure or network outage.  KumoScale users can assign arbitrary labels, corresponding with failure domains, to cluster resources such as storage nodes. These labels provide the KumoScale provisioner with an awareness of failure topology.  Then, when requesting a volume, users can describe mapping constraints and goals simply and concisely in terms of their own topology label scheme.  For example, a user could create a label called "availability_zone" and apply it to all storage nodes.  Then, a replicated volume can be constrained to place the replicas in storage nodes having different values for that label. 

Selective Protection

Replication is optional and can be configured on a per-volume basis, so replication cost is borne only those applications (Storage Classes) that require it.  This can represent significant savings relative to a conventional array which implements data protection locally, and in a uniform way for all users. 

Client-side Replication

KumoScale software implements replication via the Linux® “md” layer in the client.  By replicating from the client, this approach takes advantage of the enormous aggregate network bandwidth of data center compute nodes, and avoids creating unnecessary bottlenecks at the storage node. The result is a tremendous improvement in IOPS/$ and bandwidth/$ vs. target-side replication. For more detail see Resiliency.

CSI Driver/Client Agent

Kubernetes and OpenStack® orchestrators both deploy a small executable on each client node to assist in managing networked storage connections.  In a Kubernetes® environment, this is called the CSI (Cloud Storage Initiative) Driver, and in the OpenStack environment, the "OS brick".  In addition to these options, KumoScale storage provides an analogous standalone function for bare metal deployments.  In the following discussion, the term "Client Agent" is used generally to refers to any of these three scenarios.

The client agent is not part of the data path, and so has no effect on performance, and does not use significant CPU or memory resources.  The agent has two primary functions:

Setup and Tear-down of Volume Connections

When a volume is provisioned, the Client Agent connects the volume to the client, makes sure it is recognized by the OS, and optionally mounts a local file system.  If a volume must be disconnected and reconnected (e.g. due to the relocation of the client VM), the agent ensures that the new connection is presented to the OS with the same logical device ID as before.

Health monitoring

The Client Agent serves an important role in helping to allow the KumoScale control plane monitor the health of storage connections.  It periodically checks the connection state and health of each volume replica, and reports any abnormal condition to the KumoScale control plane. By observing volume connections from both the client and the storage node points of view, KumoScale capabilities can maintain a much more complete picture of the system state, and makes distinguishing between endpoint failure and network outage more reliable.

For more details see CSI Driver User Manual

Adding and Removing Replicas

KumoScale enabled storage provides the option to manually add and remove replicas, via CLI, calling the REST API directly, or by running an Ansible® playbook. This is useful when preparing for a planned shutdown, such as for maintenance purposes, and assist in easily clearing a storage node before it is shut down, thus preserving the volumes' resiliency level throughout the process. It may also be used to free capacity by temporarily reducing the resiliency level.

Volume Self-Healing

When one of the "legs" of a replicated volume becomes unavailable, KumoScale technology can replace it.  This process consists of the following steps, which are initiated and coordinated by the provisioner service:

  • Instruct the client agent to disconnect the failed replica
  • Create a new, empty volume on a different storage node
  • Instruct the client agent to connect the new volume as a replica
  • Initiate a synchronization process, which copies the volume contents from a healthy replica to the new replica.  The volume is fully available for I/O during sync, and all new writes are sent to all replicas, including the new one
  • Should the old replica become available again at any later time, it is considered to be an "orphan."  The provisioner service will erase any data contained on the orphan volume, and delete it

As many network problems can be transient in nature, initiating a replica replacement too soon can waste bandwidth and create unnecessary repair activity.  To allow for this, KumoScale software allows replicated volumes to be configured with a timeout value representing the time for which a replica may be unavailable before the healing sequence is initiated.  When a replica fails to respond, the replication layer tracks which data blocks have been written.  If the replica becomes available again before the timeout periods has elapsed, only the changed blocks will be synchronized by copying data from a healthy replica to the newly reconnected one.  This makes synchronization after transient network problems fast.

It is important for a data protection system to be able to correctly recover from multiple failure scenarios, including situations where any of the constituent machines suffers a sudden power loss or network disconnection during recovery from another error.  This means that the state of the system must be persisted at all times, in a way that allows a newly restarted component to resume the error recovery process correctly.

For more information please visit Failure Recovery Process.