CSI System Management

This chapter explains how to perform several management operations.

Self-Healing

Note - This operation is executed by the CSI driver as part of its operation and is described here for informative purposes only.

The KumoScale CSI driver monitors the mdraid[1] (MD) of each volume every 30 seconds and repairs inconsistencies and incomplete states.

There are several scenarios which may cause these states. The driver handles the scenarios described in the following sections.

Restoring a Replica that is Down

Replicas may go down temporarily as a result of a network failure or a maintenance operation over the rack. In this case, the orchestrator will migrate the container to a node on another rack containing one of the other replicas of the volume and reconnect to it.

When the appliance of the first replica is back up, it is not connected and added to the MD anymore. The agent will discover that the replica belongs to the volume, add it to the target as a namespace and add its block device to the MD. The MD will then synchronize the changes to the restored replica.

Completing Deletion of a Replica

A replica may be in a state where it was deleted, but not removed from the MD, due to a temporary disconnection or failure.

In this case, the provisioner service will report the replica was deleted but is still connected to the MD. The driver will remove it from the MD, attempt to disconnect it from the target, and mark it as ‘removed.’

Autonomous Self-Healing

Configure the maxReplicaDownTime parameter (see Storage Class Parametersin CSI Storage Provisioning) for the number of minutes the CSI driver will wait when a degraded replicated volume is detected until autonomously initiating a healing process (provided it has less than four replicas).

The self-healing will initiate a process in which a new replica will be allocated in the most appropriate location, according to the volume’s storage class parameters. The self-healing will also connect to the new replica and synchronize it. The missing replica will be removed from the volume configuration.

CSI Event Forwarding

KumoScale software forwards CSI events and configuration commands from the Kubernetes master and node to the KumoScale Provisioner service.

Configure a syslog server on a KumoScale appliance if you want to log these events and commands.

The following table lists the events reported by the node to the syslog server:

Table 2. CSI Events Reported to the Syslog Server

Event

Level

Description

Host Disconnected

Fatal

The host NVMe-oF™ connection to the backend was disconnected.

Host Reconnected

Info

The host NVMe-oF connection to the backend was reconnected.

Replicated Volume Degraded

Fatal

A replicated volume is in a degraded state.

Replicated Volume Healed

Info

A replicated volume healed.

Replicated Volume Synch Started

Warning

The synchronization of a non-synchronized replica began.

Replicated Volume Synch Ended

Warning

The synchronization of a non-synchronized replica has completed.

Report Auditing

Info

The node reports these auditing commands to the KumoScale provisioner service:

1.      Session Established – a host connected to KumoScale software via the NVMe-oF standard.

2.     Session Closed – the NVMe-oF connection between a host and KumoScale software was disconnected.

Collecting KumoScale CSI Driver Logs

Use kubectl logs to collect the KumoScale CSI Driver logs for each of the containers
(ks-csi-plugin, ks-provisioner).

For example, run the following command to collect the logs:

  • KumoScale Provisioner service logs:
kubectl logs -n kube-system csi-kumoscale-controller-0 ks-provisioner > provisioner.log
  • Controller logs:
kubectl logs -n kube-system csi-kumoscale-controller-0 ks-csi-plugin > controller.log

[1] ‘mdraid’ is a Linux OS component that controls storage devices. It is referred to as Linux software RAID as it makes RAID use possible without a hardware RAID controller.