Telemetry

KumoScale software collects physical SSD data and volume workload statistics. This data stream can be sent out to external TSDB, providing system health monitoring and budget planning information. KumoScale software also sends event logs to a Syslog server. KumoScale software collects and stores telemetry information using open-source components, supporting both push and pull mode. The following figure describes the data flow between these open-source components:

userman-fig5

What to do next:
See Deploying Telemetry for details on how to deploy the telemetry service in a KumoScale cluster.

See Volume Mapping Parameters for the attributes that can be filtered on.

See the Grafana Dashboard Guide for how to deploy and view reports on these metrics via a Grafana dashboard deployed in a KumoScale cluster.

See Automating Metrics Collection for details on how to directly integrate these metrics into your telemetry infrastructure.

Deploying Telemetry

To set up the telemetry service on the storage nodes, you need to define and deploy information using the telemetry CRD provided with KumoScale software as follows:

Create the Telemetry Custom Resource Definition

KumoScale software provides a telemetry CRD file you can use to define your own CRD.

  • Make a copy of kioxia.com_v1_telemetry_cr.yaml for editing, and save to a separate directory (e.g., deploy/crds/myapp_telemetry_cr.yaml).
  • Update yaml with values for the parameters listed in Telemetry Parameters.

Telemetry Parameters

This section defines all the parameters used with the telemetry CRD.

Telemetry parameter name

Description

Optional/Required

name

The telemetry configuration name; must comply with the name field of KumoScale Field Types.

Required

ip

IP for the time series database to which telemetry data is pushed.

Required

port

Telemetry server port. The default value is 2003.

Optional

intervalMin

The time interval between consecutive telemetry push events. Maximum is 86,400.

Optional

dataTypes

Volume telemetry (VOLUME;DRIVE), SSD telemetry (drive), or both (default).

Optional

pushState

Initial sending state:

true - the telemetry is pushed.
false - the telemetry is not pushed.

Optional

prefix

The prefix for the namespace in the time series database, from which the telemetry data structure is created. The default value is kumoscale.

Optional

telemetryTsdbType

Telemetry server type. Default is {GRAPHITE}.

Optional

transportType

Transport Type; either TCP_IP or UDP.

Optional

Example Telemetry CR

The IP address used in the example is for documentation purposes.

apiVersion: kumoscale.kioxia.com/v1

kind: Telemetry

metadata:

name: telemetry1

spec:

telemetry:

   ip: 192.0.2.0

   port: 2003

   pushState: true

   dataTypes:

   - DRIVE

Create the Telemetry Service

To create the telemetry service with name = telemetry1 defined in the CRD file myapp_telemetry_cr.yaml, enter the following:

kubectl create -f myapp_telemetry_cr.yaml

To verify the telemetry service, use the command below

kubectl get svc -A -o wide

Telemetry Data

The following sections specify the parameters sent in the KumoScale software telemetry feed.

SSD Telemetry

SSD telemetry consists of SMART (Self-Monitoring, Analysis and Reporting Technology) parameters:

Parameter

Description

Comments

Temperature

Device temperature

Reported in kelvin

Available Spare

Margin rate (%)

 

Percentage Used

Endurance in percentage

 

Data Units Read

The number of data units the initiator has read from the controller

Reported in 4KB units

Data Units write

The number of data units the initiator has written to the controller

Reported in 4KB units

Host Read Commands

The number of read commands completed by the controller

Includes compare commands

Host Write Commands

The number of write commands completed by the controller

 

ks_volume_ssd_used_capacity_bytes

The sum of the capacity of all sub-volumes that belongs to a volume on the same SSD.

Reported in bytes

Workload Statistics

Parameter

Description

Comments

IOs

Total commands issued to the volume

Read/Write

Bandwidth

Total bytes issued to the volume

Read/Write

IOPS

Read/Write, command size less than or equal to: 4KB/8KB/16KB/64KB/256KB/512KB/1MB/>1MB1

Histogram

Latency

Total latency in µs

Read/Write

Volume Telemetry

The volume-based data that is collected by KumoScale software consists of the following workload statistics:

Parameter

Description

Comments

IOs

Total commands issued to the volume

Read/Write

Bandwidth

Total bytes issued to the volume

Read/Write

IOPS

Read/Write, command size less than or equal to: 4KB/8KB/16KB/64KB/256KB/512KB/1MB/>1MB

Histogram

Latency

Total latency in µs

Read/Write

ks_volume_utilized_capacity_bytes

The number of written bytes on the volume.

 

ks_volume_used_capacity_bytes

The allocated space the volume took. For:

  • Thick volumes, the capacity of the volume.
  • Thin volumes, the current allocated space of the volume.

 

KumoScale Provisioner Telemetry Detail

The Get Metrics REST API command will present all Prometheus metrics that were collected from the Provisioner. The available feeds are listed below followed by details on each.

  • ks_node_used_capacity_bytes
  • ks_node_free_capacity_bytes
  • ks_node_state ks
  • ks_connected_volumes_state
  • ks_node_used_iops
  • ks_node_used_bw_bytes_per_sec
  • ks_node_free_iops
  • ks_node_free_bw_bytes_per_sec
  • ks_connected_replica_state
  • ks_vol_capacity_bytes

ks_node_used_capacity_bytes

  • KS node used capacity per Storage Node
  • Units: Used Capacity in bytes
  • TYPE: gauge
  • Labels:
  • node- KS persistent ID.
  • name - KS name.
  • Example:
ks_node_used_capacity_bytes{node="00:0c:29:0f:60:62",name="ks-node6-000c290f6062",} 2.01326592E8

 

ks_node_free_capacity_bytes

  • KS node free capacity bytes per Storage Node
  • Units: Free Capacity in Bytes
  • TYPE: gauge
  • Labels:
  • node- KS persistent ID.
  • name - KS name.
  • Example:
ks_node_free_capacity_bytes{node="00:0c:29:0f:60:62",name="ks-node6-000c290f6062",} 2.19882192896E12

 

ks_node_state ks

  • Node state per Storage Node
  • TYPE: gauge
  • State:
  • 1=Available
  • 2=Unavailable
  • Labels:
  • node - KS persistent ID.
  • name - KS name.
  • rack- KS location Rack ID.
  • zone- KS location zone ID.
  • region- KS location region ID.
  • Example:
ks_node_state{node="00:0c:29:4b:b3:b7",name="ks-node7-000c294bb3b7",rack="null",zone="null",region="null",}

 

ks_connected_volumes_state

  • KS connected volumes state per volume
  • The state considered “Available” if at least one replica has “Available” ReplicaState.
  • State:
  • 0= Available
  • 1= UnAvailable
  • Only reported for nvmeof volumes which are published
  • TYPE: gauge
  • Labels:
  • uuid - UUID of the parent volume in case of replicated Volume or volume in case of simple volume.
  • alias - Volume's alias.
  • hostId - Connected host UUID.
  • hostname – the initiator name.
  • nqn - Nqn of connected host.
  • version - Software version of the initiator's agent.
  • Example:

Simple (non-replicated) volume:

ks_connected_volumes_state{uuid="fd27da56-d3fb-45a3-b439-ed4e7c5f31c8",alias="pvc-9eb60ca3-9d6b-4586-9a50-22d7a842f85b",hostID="4c5abe4c-b5e8-41ed-b0f7-17db196b2dae",hostName="ks-node2-000c2955f853",nqn="nqn.2014-08.org.nvmexpress:NVMf:nvme:ks-node2-000c2955f853",version="3.18-14153",} 2.0

Replicated volume:

ks_connected_volumes_state{uuid="f39e2e95-2587-468c-92ce-7120ab4c08b0",alias="pvc-04b97c09-96da-4171-8f7a-832a16c9fdf0",hostID="4c5abe4c-b5e8-41ed-b0f7-17db196b2dae",hostName="ks-node2-000c2955f853",nqn="nqn.2014-08.org.nvmexpress:NVMf:nvme:ks-node2-000c2955f853",version="3.18-14153",} 2.0

 

ks_node_used_iops

  • KS node used iops per Storage Node
  • Used IOps - IO per sec
  • TYPE: gauge
  • Labels:
  • node- KS persistent ID.
  • name - KS name.
  • Example:
ks_node_used_iops{node="00:0c:29:27:86:09",name="ks-node3-000c29278609",} 13670.0

 

ks_node_used_bw_bytes_per_sec

  • KS node used bw per Storage Node
  • Units: bytes per sec
  • TYPE: gauge
  • Labels:
  • node- KS persistent ID.
  • name - KS name.
  • Example:
ks_node_used_bw_bytes_per_sec{node="00:0c:29:27:86:09",name="ks-node3-000c29278609",} 1.7747392E8

 

ks_node_free_iops

  • KS Node free iops per Storage Node
  • Free IOps- IO per second
  • TYPE: gauge
  • Labels:
  • node- KS persistent ID.
  • name - KS name.
  • Example:
ks_node_free_iops{node="00:0c:29:0f:60:62",name="ks-node6-000c290f6062",} 522900.0

 

ks_node_free_bw_bytes_per_sec

  • KS node free bw bytes per sec per Storage Node
  • Free bandwidth - bytes per sec
  • TYPE: gauge
  • Labels:
  • node- KS persistent ID.
  • name - KS name.
  • Example:
ks_node_free_bw_bytes_per_sec{node="00:0c:29:0f:60:62",name="ks-node6-000c290f6062",} 6.788481024E9

 

ks_connected_replica_state

  • KS connected replica state per volume
  • TYPE: gauge
  • Only reported for nvmeof volumes which are published
  • State:
  • 0=Available
  • 1=Terminating
  • 2=Missing
  • 3=Unknown
  • 4=Synchronizing
  • Labels:
  • uuid - UUID of the parent volume.
  • alias- Alias of the parent volume.
  • repUUID- UUID of the replica.
  • hostId-Connected host UUID.
  • hostname – the initiator name.
  • Nqn- Nqn of connected initiator.
  • Example:
ks_connected_replica_state{uuid="fd27da56-d3fb-45a3-b439-ed4e7c5f31c8",alias="pvc-9eb60ca3-9d6b-4586-9a50-22d7a842f85b",repUUID="ddd381fa-a48b-4610-84f5-c28f992ab92e",hostID="408e3ab3-2894-4240-ac2b-c5b7912bc014",hostName="ks-node1-000c29e60e77",nqn="nqn.2014-08.org.nvmexpress:NVMf:nvme:ks-node1-000c29e60e77",} 3.0

 

ks_vol_capacity_bytes

  • ks vol bytes per volume
  • Units : Capacity in bytes
  • TYPE: gauge
  • Labels:
  • uuid - UUID of the parent volume in case of replicated Volume or volume in case of simple volume.
  • alias - Volume's alias
  • numReplicas - Number of replicas
  • tenantID - tenant ID (0 for default tenant)
  • storageClassName - Storage Class name or "unknown" if not provided
  • provisioningType - Thin, thick, "Snapshot/Clone"
  • protocol - NVMeoF or Local
  • repUUIDX - For replicated volume – UUID of the X replica (blank for simple)
  • nodeIDX - For replicated volume – node ID (Persistent ID) of the X replica. For simple volume, Node ID of the volume
  • nodeNameX - - For replicated volume – the node name of the X replica. For simple volume, Node name of the volume.
  • Example:
ks_vol_capacity_bytes{uuid="71fd3724-431f-49fc-83b6-b625f47f5a3b",alias="pvc-7f07b3c8-be9c-4dbf-b1a6-1f2d72a9fe4b",numReplicas="1",tenantID="0",storageClassName="default",provisioningType="THIN",protocol="Local",repUUID1="",nodeID1="00:0c:29:27:86:09",nodeName1="ks-node3-000c29278609",repUUID2="",nodeID2="",nodeName2="",repUUID3="",nodeID3="",nodeName3="",} 1.073741824E11

 

Volume Mapping Parameters

The available feeds are listed below followed by details on each:

  • ks_volume_ssd_used_capacity_bytes
  • ks_volume_utilized_capacity_bytes
  • ks_volume_used_capacity_bytes

 

ks_volume_ssd_used_capacity_bytes

  • The sum of the capacity in bytes of all sub-volumes that belongs to a volume on the same SSD.
  • Units: Capacity (in bytes)
  • Type: gauge
  • Labels:
  • volID - UUID of the parent volume in case of replicated volume or volume UUID in case of simple volume.
  • replicaID – replica UUID.
  • nodeID – the storageNode ID.
  • driveID – the SSD ID.
  • nodeName – the KS name (same as the k8s name of the ks node)
  • Example:
ks_volume_ssd_used_capacity_bytes{volID="7d64023d-a410-4530-8071-f793ca288240",replicaID="3fbaa9ea-48e5-477d-9c09-179f598f7ed7",nodeID="00:0c:29:e6:0e:77",nodeName=”ks-node2-00:0c:29:e6:0e:77”driveID="VMWare NVME-0001",}

 

 

ks_volume_utilized_capacity_bytes

  • The number of written bytes on the volume
  • Type: gauge
  • Units: Capacity (in bytes)
  • Labels:
  • volID - UUID of the parent volume in case of replicated volume or volume UUID in case of simple volume.
  • replicaID – replica UUID.
  • nodeID – the storageNode ID.
  • nodeName – the KS name (same as the k8s name of the ks node)

 

ks_volume_used_capacity_bytes

  • For thick volume – its capacity , for thin volume, snapshot or snapshot volume - its reserved space.
  • Type: gauge
  • Units: Capacity (in bytes)
  • Labels:
  • volID - UUID of the parent volume in case of replicated volume or volume UUID in case of simple volume.
  • replicaID – replica UUID.
  • nodeID – the storageNode ID.
  • nodeName – the KS name (same as the k8s name of the ks node)

KumoScale Storage Nodes Telemetry Detail

Get Prometheus Metrics REST API command will present the Prometheus metrics from the storage node.

The available feeds are listed below followed by details on each:

  • totalBytesRead
  • Data_Units_Read
  • totalWriteLatency
  • totalReadCommands
  • IOWriteCnt
  • Media_and_Data_Integrity_Errors
  • totalWriteCommands
  • Percentage_Used
  • Available_Spare
  • Data_Units_Written
  • Latency
  • Host_Read_Commands
  • IOReadCnt
  • totalBytesWrite
  • totalReadLatency
  • Host_Write_Commands
  • Composite_Temperature

totalBytesRead

  • Total bytes read per volume
  • TYPE: counter
  • Labels:
  • DRIVE – SSD Persistent Id
  • VOLUME – Volume UUID
  • Example:
totalBytesRead{DRIVE=" VMWare_NVME-0000",VOLUME="be3f5f29-e45e-4bcb-a03e-56578921a19e",} 3.4318336E8

While using a virtual SSD, the SSD persistent id will not appear in a Serial Number format.

 

Data_Units_Read

  • SSD Data Units Read
  • The number of 512 byte data units initiator has read from the controller. For the NVM command set, logical blocks read as part of Compare and Read operations are included in this value.
  • Units: 4K.
  • TYPE: counter
  • Labels:
  • DRIVE – SSD Persistent Id
  • VOLUME – Volume UUID
  • Example:
Data_Units_Read{DRIVE="VMWare_NVME-0000",VOLUME=" be3f5f29-e45e-4bcb-a03e-56578921a19e ",} 0.0

Note: While using a virtual SSD, the SSD persistent id will not appear in a Serial Number format.

 

totalWriteLatency

  • Total write latency per volume
  • Reported in micro second
  • TYPE: counter
  • Labels:
  • DRIVE – SSD Persistent Id
  • VOLUME – Volume UUID
  • Example:
totalWriteLatency{DRIVE=" VMWare_NVME-0000",VOLUME="be3f5f29-e45e-4bcb-a03e-56578921a19e",} 89694.0

Note: While using a virtual SSD, the SSD persistent id will not appear in a Serial Number format

 

totalReadCommands

  • Total reads per volume
  • TYPE: counter
  • Labels:
  • DRIVE – SSD Persistent Id
  • VOLUME – Volume UUID
  • Example:
totalReadCommands{DRIVE=" VMWare_NVME-0000",VOLUME="be3f5f29-e45e-4bcb-a03e-56578921a19e",} 520.0

Note: While using a virtual SSD, the SSD persistent id will not appear in a Serial Number format

 

IOWriteCnt

  • Total write commands by size.
  • TYPE: counter
  • Labels:
  • DRIVE – SSD Persistent Id
  • VOLUME – Volume UUID
  • SIZE- Sizes = {le4K, le8K, le16K, le64K, le256K, le1MB,inf}
  • For example : le16K is the total read commands with 8K < Size ≤ 16K.
  • Example:
IOWriteCnt{DRIVE="VMWare_NVME-0000",VOLUME="be3f5f29-e45e-4bcb-a03e-56578921a19e",size="le1M",} 17.0

Note: While using a virtual SSD, the SSD persistent id will not appear in a Serial Number format

 

Media_and_Data_Integrity_Errors

  • SSD Media and Data Integrity Errors
  • TYPE: counter
  • Labels:
  • DRIVE – SSD Persistent Id
  • VOLUME – Volume UUID
  • Example:
Media_and_Data_Integrity_Errors{DRIVE="VMWare_NVME-0000",VOLUME=" be3f5f29-e45e-4bcb-a03e-56578921a19e ",} 0.0

Note: While using a virtual SSD, the SSD persistent id will not appear in a Serial Number format

 

totalWriteCommands

  • Total Write commands transferred in a Volume.
  • TYPE: counter
  • Labels:
  • DRIVE – SSD Persistent Id
  • VOLUME – Volume UUID
  • Example:
totalWriteCommands{DRIVE=" VMWare_NVME-0000",VOLUME="be3f5f29-e45e-4bcb-a03e-56578921a19e",} 339.0

Note: While using a virtual SSD, the SSD persistent id will not appear in a Serial Number format

 

Percentage_Used

  • SSD Percentage Used
  • SSD Endurance Normalize to the value from (0-255) to (0-100) (i.e. x100/255).
  • TYPE: gauge
  • Labels:
  • DRIVE – SSD Persistent Id
  • VOLUME – Volume UUID
  • Example:
Percentage_Used{DRIVE="VMWare_NVME-0000",VOLUME="be3f5f29-e45e-4bcb-a03e-56578921a19e",} 0.0

Note: While using a virtual SSD, the SSD persistent id will not appear in a Serial Number format

 

Available_Spare

  • SSD Available Spare ;
  • Units: margin rate (%).
  • TYPE: gauge
  • Labels:
  • DRIVE – SSD Persistent Id
  • VOLUME – Volume UUID
  • Example:
Available_Spare{DRIVE="VMWare_NVME-0000",VOLUME=" be3f5f29-e45e-4bcb-a03e-56578921a19e ",} 1.0

Note: While using a virtual SSD, the SSD persistent id will not appear in a Serial Number format

 

Data_Units_Written

  • SSD Data Units Written
  • For the NVM command set, logical blocks written as part of Write operations is included in this value. Write Uncorrectable commands donot impact this value.
  • Units: 4K
  • TYPE: counter
  • Labels:
  • DRIVE – SSD Persistent Id
  • VOLUME – Volume UUID
  • Example:
Data_Units_Written{DRIVE="VMWare_NVME-0000",VOLUME="be3f5f29-e45e-4bcb-a03e-56578921a19e",} 0.0

Note: While using a virtual SSD, the SSD persistent id will not appear in a Serial Number format

 

Latency

  • Latency of commands of specific size and type for a quintile measurement.
  • For example, Latency with percentile=50, size=4k. rw=“rd” – is the latency of 4K read commands for 50%.
  • Labels:
  • DRIVE – SSD Persistent Id
  • VOLUME – Volume UUID
  • percentile = {Avg, 50, 90, 95, 99}
  • size {4K, 8K, 16K, 64K, 256K}
  • rw (“rd”, “wr”)
  • TYPE: counter
  • Example:
Latency{DRIVE="",VOLUME="be3f5f29-e45e-4bcb-a03e-56578921a19e",percentile="99",size="64K",rw="rd",} 0.0

Note: While using a virtual SSD, the SSD persistent id will not appear in a Serial Number format

 

Host_Read_Commands

  • SSD Host Read Commands
  • Contains the number of read commands completed by the controller. For the NVM command set, this is the number of Compare and Read commands.
  • TYPE: counter
  • Labels:
  • DRIVE – SSD Persistent Id
  • VOLUME – Volume UUID
  • Example:
Host_Read_Commands{DRIVE="VMWare_NVME-0000",VOLUME=" be3f5f29-e45e-4bcb-a03e-56578921a19e ",} 0.0

Note: While using a virtual SSD, the SSD persistent id will not appear in a Serial Number format

 

IOReadCnt

  • Total read commands by size.
  • TYPE: counter
  • Labels:
  • DRIVE – SSD Persistent Id
  • VOLUME – Volume UUID
  • Size – Sizes = {le4K, le8K, le16K, le64K, le256K, le1MB,inf}
  • le4K - total read commands with 0 < Size ≤ 4K
  • le8K - total read commands with 4 < Size ≤ 8K
  • le16K - total read commands with 8K < Size ≤ 16K
  • le64K - total read commands with 16k < Size ≤ 64K
  • le64K - total read commands with 64k < Size ≤ 256K
  • le1MB - total read commands with 256k < Size ≤ 1MB
  • inf - total read commands with size > 1MB.
  • Example:
IOReadCnt{DRIVE="VMWare_NVME-0000",VOLUME="be3f5f29-e45e-4bcb-a03e-56578921a19e",size="le1M",} 331.0

Note: While using a virtual SSD, the SSD persistent id will not appear in a Serial Number format

 

totalBytesWrite

  • Total bytes written per volume
  • TYPE: counter
  • Labels:
  • DRIVE – SSD Persistent Id
  • VOLUME – Volume UUID
  • Example:
totalBytesWrite{DRIVE=" VMWare_NVME-0000",VOLUME="be3f5f29-e45e-4bcb-a03e-56578921a19e",} 9416704.0

Note: While using a virtual SSD, the SSD persistent id will not appear in a Serial Number format

 

totalReadLatency

  • Total read latency per volume
  • Reported in micro seconds
  • TYPE: counter
  • Labels:
  • DRIVE – SSD Persistent Id
  • VOLUME – Volume UUID
  • Example:
totalReadLatency{DRIVE="VMWare_NVME-0000",VOLUME="be3f5f29-e45e-4bcb-a03e-56578921a19e",} 1105927.0

Note: While using a virtual SSD, the SSD persistent id will not appear in a Serial Number format

 

Host_Write_Commands

  • SSD Host Write Commands
  • Contains the number of write commands completed by the controller. For the NVM command set, this is the number of write commands.
  • Labels:
  • DRIVE – SSD Persistent Id
  • VOLUME – Volume UUID
  • TYPE: counter
  • Example:
Host_Write_Commands{DRIVE="VMWare_NVME-0000",VOLUME="be3f5f29-e45e-4bcb-a03e-56578921a19e",} 0.0

Note: While using a virtual SSD, the SSD persistent id will not appear in a Serial Number format

 

Composite_Temperature

  • Device temperature ; SSD Composite Temperature
  • Reported in Celsius
  • TYPE: gauge
Composite_Temperature{DRIVE="VMWare_NVME-0000",VOLUME=" be3f5f29-e45e-4bcb-a03e-56578921a19e",} 11759.0

Kubernetes Telemetry Detail

Prometheus Node Exporter

Prometheus Node Exporter is automatically installed as part of KumoScale’s Install operator and exports system-level statistics of bare-metal nodes or virtual machines to Prometheus.

Kube State Metrics

Kube-state-metrics is a service that listens to the Kubernetes API server and generates metrics about the state of the objects. It focuses on the health of the various objects, such as deployments, nodes and pods.

The following services are sampled via the default ServiceMonitor configuration every 30 seconds:

  • alertmanager
  • grafana
  • kubeApiServer
  • kubelet
  • kubeControllerManager
  • coreDns
  • kubeDns
  • kubeEtcd
  • kubeScheduler
  • kubeProxy
  • kubeStateMetrics
  • nodeExporter
  • prometheusOperator
  • prometheus (instance)

The Prometheus Stack CRD has dedicated fields to enable / disable the prometheus-node-exporter and kube-state-metrics services:

userman-fig6

Additional Statistics and Information

KumoScale software provides Application Programming Interface (API) functions for retrieving system information (via REST APIs or CLI), such as:

  • Available Network Interface Cards (NICs)
  • CPU and memory utilization
  • Status of the SSDs
  • System alerts
  • Network and storage performance statistics
  • Ongoing sessions
  • Allocated volumes

     

    Next: Syslog