Telemetry

KumoScale software collects physical SSD data and volume workload statistics. This data stream can be sent out to external TSDB, providing system health monitoring and budget planning information. KumoScale software also sends event logs to a Syslog server. KumoScale software collects and stores telemetry information using open source components, supporting both push and pull mode. The following figure describes the data flow between these open source components:

These telemetry streams are collected from both Kubernetes and KumoScale telemetry feeds. Details on the exact telemetry for each source is available below:

    1. Telemetry data collected from KumoScale Provisioner.
    2. Telemetry data collected from KumoScale Storage Nodes.
    3. Telemetry data collected from Kubernetes:
    What to do next:

    Next: Syslog

    KumoScale Provisioner Telemetry Detail

    Get Metrics REST API command will present all Prometheus metrics that was collected from the Provisioner. The available feeds are:

    ks_node_used_capacity_bytes

    • KS node used capacity per Storage Node
    • Units: Used Capacity in bytes
    • TYPE: gauge
    • Labels:
      • node- KS persistent ID.
      • name - KS name.
    • Example:
    ks_node_used_capacity_bytes{node="00:0c:29:0f:60:62",name="ks-node6-000c290f6062",} 2.01326592E8

    ks_node_free_capacity_bytes

    • KS node free capacity bytes per Storage Node
    • Units: Free Capacity in Bytes
    • TYPE: gauge
    • Labels:
      • node- KS persistent ID.
      • name - KS name.
    • Example:
    ks_node_free_capacity_bytes{node="00:0c:29:0f:60:62",name="ks-node6-000c290f6062",} 2.19882192896E12

    ks_node_state ks

    • Node state per Storage Node
    • TYPE: gauge
    • State:
      • 1=Available
      • 2=Unavailable
    • Labels:
      • node - KS persistent ID.
      • name - KS name.
      • rack- KS location Rack ID.
      • zone- KS location zone ID.
      • region- KS location region ID.
    • Example:
    ks_node_state{node="00:0c:29:4b:b3:b7",name="ks-node7-000c294bb3b7",rack="null",zone="null",region="null",}  

    ks_connected_volumes_state

    • KS connected volumes state per volume
    • The state considered “Available” if at least one replica has “Available” ReplicaState.
    • State:
      • 0= Available
      • 1= UnAvailable
    • Only reported for nvmeof volumes which are published
    • TYPE: gauge
    • Labels:
      • uuid - UUID of the parent volume in case of replicated Volume or volume in case of simple volume.
      • alias - Volume's alias.
      • hostId - Connected host UUID.
      • hostname – the host name.
      • nqn - Nqn of connected host.
      • version - Software version of the host's agent.
    • Example:

    Simple (non-replicated) volume:

    ks_connected_volumes_state{uuid="fd27da56-d3fb-45a3-b439-ed4e7c5f31c8",alias="pvc-9eb60ca3-9d6b-4586-9a50-22d7a842f85b",hostID="4c5abe4c-b5e8-41ed-b0f7-17db196b2dae",hostName="ks-node2-000c2955f853",nqn="nqn.2014-08.org.nvmexpress:NVMf:nvme:ks-node2-000c2955f853",version="3.18-14153",} 2.0

    Replicated volume:

    ks_connected_volumes_state{uuid="f39e2e95-2587-468c-92ce-7120ab4c08b0",alias="pvc-04b97c09-96da-4171-8f7a-832a16c9fdf0",hostID="4c5abe4c-b5e8-41ed-b0f7-17db196b2dae",hostName="ks-node2-000c2955f853",nqn="nqn.2014-08.org.nvmexpress:NVMf:nvme:ks-node2-000c2955f853",version="3.18-14153",} 2.0

    ks_node_used_iops

    • KS node used iops per Storage Node
    • Used IOps - IO per sec
    • TYPE: gauge
    • Labels:
      • node- KS persistent ID.
      • name - KS name.
    • Example:
    ks_node_used_iops{node="00:0c:29:27:86:09",name="ks-node3-000c29278609",} 13670.0

    ks_node_used_bw_bytes_per_sec

    • KS node used bw per Storage Node
    • Units: bytes per sec
    • TYPE: gauge
    • Labels:
      • node- KS persistent ID.
      • name - KS name.
    • Example:
    ks_node_used_bw_bytes_per_sec{node="00:0c:29:27:86:09",name="ks-node3-000c29278609",} 1.7747392E8

    ks_node_free_iops

    • KS Node free iops per Storage Node
    • Free IOps- IO per second
    • TYPE: gauge
    • Labels:
      • node- KS persistent ID.
      • name - KS name.
    • Example:
    ks_node_free_iops{node="00:0c:29:0f:60:62",name="ks-node6-000c290f6062",} 522900.0

    ks_node_free_bw_bytes_per_sec

    • KS node free bw bytes per sec per Storage Node
    • Free bandwidth - bytes per sec
    • TYPE: gauge
    • Labels:
      • node- KS persistent ID.
      • name - KS name.
    • Example:
    ks_node_free_bw_bytes_per_sec{node="00:0c:29:0f:60:62",name="ks-node6-000c290f6062",} 6.788481024E9

    ks_connected_replica_state

    • KS connected replica state per volume
    • TYPE: gauge
    • Only reported for nvmeof volumes which are published
    • State:
      • 0=Available
      • 1=Terminating
      • 2=Missing
      • 3=Unknown
      • 4=Synchronizing
    • Labels:
      • uuid - UUID of the parent volume.
      • alias- Alias of the parent volume.
      • repUUID- UUID of the replica.
      • hostId-Connected host UUID.
      • hostname – the host name.
      • Nqn- Nqn of connected host.
    • Example:
    ks_connected_replica_state{uuid="fd27da56-d3fb-45a3-b439-ed4e7c5f31c8",alias="pvc-9eb60ca3-9d6b-4586-9a50-22d7a842f85b",repUUID="ddd381fa-a48b-4610-84f5-c28f992ab92e",hostID="408e3ab3-2894-4240-ac2b-c5b7912bc014",hostName="ks-node1-000c29e60e77",nqn="nqn.2014-08.org.nvmexpress:NVMf:nvme:ks-node1-000c29e60e77",} 3.0

    ks_vol_capacity_bytes

    • ks vol bytes per volume
    • Units : Capacity in bytes
    • TYPE: gauge
    • Labels:
      • uuid - UUID of the parent volume in case of replicated Volume or volume in case of simple volume.
      • alias - Volume's alias
      • numReplicas - Number of replicas
      • tenantID - tenant ID (0 for default tenant)
      • storageClassName - Storage Class name or "unknown" if not provided
      • provisioningType - Thin, thick, "Snapshot/Clone"
      • protocol - NVMeoF or Local
      • repUUIDX - For replicated volume – UUID of the X replica (blank for simple)
      • nodeIDX - For replicated volume – node ID (Persistent ID) of the X replica. For simple volume, Node ID of the volume
      • nodeNameX - - For replicated volume – the node name of the X replica. For simple volume, Node name of the volume.
    • Example:
    ks_vol_capacity_bytes{uuid="71fd3724-431f-49fc-83b6-b625f47f5a3b",alias="pvc-7f07b3c8-be9c-4dbf-b1a6-1f2d72a9fe4b",numReplicas="1",tenantID="0",storageClassName="default",provisioningType="THIN",protocol="Local",repUUID1="",nodeID1="00:0c:29:27:86:09",nodeName1="ks-node3-000c29278609",repUUID2="",nodeID2="",nodeName2="",repUUID3="",nodeID3="",nodeName3="",} 1.073741824E11

    Volume Mapping Parameters

    ks_volume_ssd_used_capacity_bytes

    • The sum of the capacity in bytes of all sub-volumes that  belongs to a volume on the same SSD.
    • Units: Capacity (in bytes)
    • Type: gauge
    • Labels:
      • volID - UUID of the parent volume in case of replicated volume or volume UUID in case of simple volume.
      • replicaID – replica UUID.
      • nodeID – the storageNode ID.
      • driveID – the SSD ID.
      • nodeName – the KS name (same as the k8s name of the ks node)
    • Example:
    ks_volume_ssd_used_capacity_bytes{volID="7d64023d-a410-4530-8071-f793ca288240",replicaID="3fbaa9ea-48e5-477d-9c09-179f598f7ed7",nodeID="00:0c:29:e6:0e:77",nodeName=”ks-node2-00:0c:29:e6:0e:77”driveID="VMWare NVME-0001",}

    ks_volume_utilized_capacity_bytes

    • The number of written bytes on the volume
    • Type: gauge
    • Units: Capacity (in bytes)
    • Labels:
      • volID - UUID of the parent volume in case of replicated volume or volume UUID in case of simple volume.
      • replicaID – replica UUID.
      • nodeID – the storageNode ID.
      • nodeName – the KS name (same as the k8s name of the ks node)

    ks_volume_used_capacity_bytes

    • For thick volume – its capacity , for thin volume, snapshot or snapshot volume - its reserved space.
    • Type: gauge
    • Units: Capacity (in bytes)
    • Labels:
      • volID - UUID of the parent volume in case of replicated volume or volume UUID in case of simple volume.
      • replicaID – replica UUID.
      • nodeID – the storageNode ID.
      • nodeName – the KS name (same as the k8s name of the ks node)

    KumoScale Storage Nodes Telemetry Detail

    Get Prometheus Metrics REST API command will present the Prometheus metrics from the storage node.

    The available feeds are:

    totalBytesRead

    • Total bytes read per volume
    • TYPE: counter
    • Labels:
      • DRIVE – SSD Persistent Id
      • VOLUME – Volume UUID
    • Example:
    totalBytesRead{DRIVE=" VMWare_NVME-0000",VOLUME="be3f5f29-e45e-4bcb-a03e-56578921a19e",} 3.4318336E8

    Note: While using a virtual SSD,  the SSD persistent id will not appear in a Serial Number format.

    Data_Units_Read

    • SSD Data Units Read
    • The number of 512 byte data units host has read from the controller. For the NVM command set, logical blocks read as part of Compare and Read operations are included in this value.
    • Units: 4K.
    • TYPE: counter
    • Labels:
      • DRIVE – SSD Persistent Id
      • VOLUME – Volume UUID
    • Example:
    Data_Units_Read{DRIVE="VMWare_NVME-0000",VOLUME=" be3f5f29-e45e-4bcb-a03e-56578921a19e ",} 0.0

    Note: While using a virtual SSD,  the SSD persistent id will not appear in a Serial Number format.

    totalWriteLatency

    • Total write latency per volume
    • Reported in micro second
    • TYPE: counter
    • Labels:
      • DRIVE – SSD Persistent Id
      • VOLUME – Volume UUID
    • Example:
    totalWriteLatency{DRIVE=" VMWare_NVME-0000",VOLUME="be3f5f29-e45e-4bcb-a03e-56578921a19e",} 89694.0

    Note: While using a virtual SSD,  the SSD persistent id will not appear in a Serial Number format

    totalReadCommands

    • Total reads per volume
    • TYPE: counter
    • Labels:
      • DRIVE – SSD Persistent Id
      • VOLUME – Volume UUID
    • Example:
    totalReadCommands{DRIVE=" VMWare_NVME-0000",VOLUME="be3f5f29-e45e-4bcb-a03e-56578921a19e",} 520.0

    Note: While using a virtual SSD,  the SSD persistent id will not appear in a Serial Number format

    IOWriteCnt

    • Total write commands by size.
    • TYPE: counter
    • Labels:
      • DRIVE – SSD Persistent Id
      • VOLUME – Volume UUID
      • SIZE- Sizes = {le4K, le8K, le16K, le64K, le256K, le1MB,inf}
      • For example : le16K is the total read commands with 8K < Size ≤ 16K.
    • Example:
    IOWriteCnt{DRIVE="VMWare_NVME-0000",VOLUME="be3f5f29-e45e-4bcb-a03e-56578921a19e",size="le1M",} 17.0

    Note: While using a virtual SSD,  the SSD persistent id will not appear in a Serial Number format

    Media_and_Data_Integrity_Errors

    • SSD Media and Data Integrity Errors
    • TYPE: counter
    • Labels:
      • DRIVE – SSD Persistent Id
      • VOLUME – Volume UUID
    • Example:
    Media_and_Data_Integrity_Errors{DRIVE="VMWare_NVME-0000",VOLUME=" be3f5f29-e45e-4bcb-a03e-56578921a19e ",} 0.0

    Note: While using a virtual SSD,  the SSD persistent id will not appear in a Serial Number format

    totalWriteCommands

    • Total Write commands transferred in a Volume.
    • TYPE: counter
    • Labels:
      • DRIVE – SSD Persistent Id
      • VOLUME – Volume UUID
    • Example:
    totalWriteCommands{DRIVE=" VMWare_NVME-0000",VOLUME="be3f5f29-e45e-4bcb-a03e-56578921a19e",} 339.0

    Note: While using a virtual SSD,  the SSD persistent id will not appear in a Serial Number format

    Percentage_Used

    • SSD Percentage Used
    • SSD Endurance Normalize to the value from (0-255) to (0-100) (i.e. x100/255).
    • TYPE: gauge
    • Labels:
      • DRIVE – SSD Persistent Id
      • VOLUME – Volume UUID
    • Example:
    Percentage_Used{DRIVE="VMWare_NVME-0000",VOLUME="be3f5f29-e45e-4bcb-a03e-56578921a19e",} 0.0

    Note: While using a virtual SSD,  the SSD persistent id will not appear in a Serial Number format

    Available_Spare

    • SSD Available Spare ;
    • Units: margin rate (%).
    • TYPE: gauge
    • Labels:
      • DRIVE – SSD Persistent Id
      • VOLUME – Volume UUID
    • Example:
    Available_Spare{DRIVE="VMWare_NVME-0000",VOLUME=" be3f5f29-e45e-4bcb-a03e-56578921a19e ",} 1.0

    Note: While using a virtual SSD,  the SSD persistent id will not appear in a Serial Number format

    Data_Units_Written

    • SSD Data Units Written
    • For the NVM command set, logical blocks written as part of Write operations is included in this value. Write Uncorrectable commands donot impact this value.
    • Units: 4K
    • TYPE: counter
    • Labels:
      • DRIVE – SSD Persistent Id
      • VOLUME – Volume UUID
    • Example:
    Data_Units_Written{DRIVE="VMWare_NVME-0000",VOLUME="be3f5f29-e45e-4bcb-a03e-56578921a19e",} 0.0

    Note: While using a virtual SSD,  the SSD persistent id will not appear in a Serial Number format

    Latency

    • Latency of commands of specific size and type for a quintile measurement.
    • For example, Latency with percentile=50, size=4k. rw=“rd” – is the latency of 4K read commands for 50%.
    • Labels:
      • DRIVE – SSD Persistent Id
      • VOLUME – Volume UUID
      • percentile = {Avg, 50, 90, 95, 99}
      • size {4K, 8K, 16K, 64K, 256K}
      • rw (“rd”, “wr”)
    • TYPE: counter
    • Example:
    Latency{DRIVE="",VOLUME="be3f5f29-e45e-4bcb-a03e-56578921a19e",percentile="99",size="64K",rw="rd",} 0.0

    Note: While using a virtual SSD,  the SSD persistent id will not appear in a Serial Number format

    Host_Read_Commands

    • SSD Host Read Commands
    • Contains the number of read commands completed by the controller. For the NVM command set, this is the number of Compare and Read commands.
    • TYPE: counter
    • Labels:
      • DRIVE – SSD Persistent Id
      • VOLUME – Volume UUID
    • Example:
    Host_Read_Commands{DRIVE="VMWare_NVME-0000",VOLUME=" be3f5f29-e45e-4bcb-a03e-56578921a19e ",} 0.0

    Note: While using a virtual SSD,  the SSD persistent id will not appear in a Serial Number format

    IOReadCnt

    • Total read commands by size.
    • TYPE: counter
    • Labels:
    • DRIVE – SSD Persistent Id
    • VOLUME – Volume UUID
    • Size – Sizes = {le4K, le8K, le16K, le64K, le256K, le1MB,inf}
      • le4K - total read commands with 0 < Size ≤ 4K
      • le8K - total read commands with 4 < Size ≤ 8K
      • le16K - total read commands with 8K < Size ≤ 16K
      • le64K - total read commands with 16k < Size ≤ 64K
      • le64K - total read commands with 64k < Size ≤ 256K
      • le1MB - total read commands with 256k < Size ≤ 1MB
      • inf - total read commands with size > 1MB.
    • Example:
    IOReadCnt{DRIVE="VMWare_NVME-0000",VOLUME="be3f5f29-e45e-4bcb-a03e-56578921a19e",size="le1M",} 331.0

    Note: While using a virtual SSD,  the SSD persistent id will not appear in a Serial Number format

     totalBytesWrite

    • Total bytes written per volume
    • TYPE: counter
    • Labels:
      • DRIVE – SSD Persistent Id
      • VOLUME – Volume UUID
    • Example:
    totalBytesWrite{DRIVE=" VMWare_NVME-0000",VOLUME="be3f5f29-e45e-4bcb-a03e-56578921a19e",} 9416704.0

    Note: While using a virtual SSD,  the SSD persistent id will not appear in a Serial Number format

    totalReadLatency

    • Total read latency per volume
    • Reported in micro seconds
    • TYPE: counter
    • Labels:
      • DRIVE – SSD Persistent Id
      • VOLUME – Volume UUID
    • Example:
    totalReadLatency{DRIVE="VMWare_NVME-0000",VOLUME="be3f5f29-e45e-4bcb-a03e-56578921a19e",} 1105927.0

    Note: While using a virtual SSD,  the SSD persistent id will not appear in a Serial Number format

    Host_Write_Commands

    • SSD Host Write Commands
    • Contains the number of write commands completed by the controller. For the NVM command set, this is the number of write commands.
    • Labels:
      • DRIVE – SSD Persistent Id
      • VOLUME – Volume UUID
    • TYPE: counter
    • Example:
    Host_Write_Commands{DRIVE="VMWare_NVME-0000",VOLUME="be3f5f29-e45e-4bcb-a03e-56578921a19e",} 0.0

    Note: While using a virtual SSD,  the SSD persistent id will not appear in a Serial Number format

    Composite_Temperature

    • Device temperature ; SSD Composite Temperature
    • Reported in Celsius
    • TYPE: gauge
    Composite_Temperature{DRIVE="VMWare_NVME-0000",VOLUME=" be3f5f29-e45e-4bcb-a03e-56578921a19e",} 11759.0

    Kubernetes Telemetry Detail

    Prometheus Node Exporter

    Prometheus Node Exporter is automatically installed as part of KumoScale’s Install operator and exports system-level statistics of bare-metal nodes or virtual machines to Prometheus.

    Kube State Metrics

    Kube-state-metrics is a service that listens to the Kubernetes API server and generates metrics about the state of the objects. It focuses on the health of the various objects, such as deployments, nodes and pods.

    The following services are sampled via the default ServiceMonitor configuration every 30 seconds:

      • alertmanager
      • grafana
      • kubeApiServer
      • kubelet
      • kubeControllerManager
      • coreDns
      • kubeDns
      • kubeEtcd
      • kubeScheduler
      • kubeProxy
      • kubeStateMetrics
      • nodeExporter
      • prometheusOperator
      • prometheus (instance)

    The Prometheus Stack CR has dedicated fields to enable / disable  the prometheus-node-exporter and  kube-state-metrics services:

    Reference: https://github.com/prometheus/node_exporter

    Deploying Telemetry

    To set up the telemetry service on the storage nodes, you need to define and deploy information using the telemetry CR provided with KumoScale software as follows:

    • Create the telemetry CR.
    • Create the telemetry

    Create the Telemetry Custom Resource

    KumoScale software provides a telemetry CR file you can use to define your own CR.

    • Make a copy of kioxia.com_v1_telemetry_cr.yaml for editing, and save to a separate directory (e.g., deploy/crds/myapp_telemetry_cr.yaml).
    • Update yaml with values for the parameters listed in Telemetry Parameters.

    Telemetry Parameters

    This section defines all the parameters used with the telemetry CR.

    Telemetry parameter name

    Description

    Optional/Required

    name

    The telemetry configuration name; must comply with the name field of KumoScale Field Types.

    Required

    ip

    IP for the time series database to which telemetry data is pushed.

    Required

    port

    Telemetry server port. The default value is 2003.

    Optional

    intervalMin

    The time interval between consecutive telemetry push events. Maximum is 86,400.

    Optional

    dataTypes

    Volume telemetry (VOLUME;DRIVE), SSD telemetry (drive), or both (default).

    Optional

    pushState

    Initial sending state:

    true - the telemetry is pushed.
    false - the telemetry is not pushed.

    Optional

    prefix

    The prefix for the namespace in the time series database, from which the telemetry data structure is created. The default value is kumoscale.

    Optional

    telemetryTsdbType

    Telemetry server type. Default is {GRAPHITE}.

    Optional

    transportType

    Transport Type; either TCP_IP or UDP.

    Optional

    Example Telemetry CR

    The IP address used in the example is for documentation purposes.

    apiVersion: kumoscale.kioxia.com/v1

    kind: Telemetry

    metadata:

    name: telemetry1

    spec:

    telemetry:

       ip: 192.0.2.0

       port: 2003

       pushState: true

       dataTypes:

       - DRIVE

    Create the Telemetry Service

    To create the telemetry service with name = telemetry1 defined in the CR file myapp_telemetry_cr.yaml, enter the following:

    kubectl create -f myapp_telemetry_cr.yaml

    To verify the telemetry service, use the command below

    kubectl get svc -A -o wide

    Telemetry Data

    The following sections specify the parameters sent in the KumoScale software telemetry feed.

    SSD Telemetry

    SSD telemetry consists of SMART (Self-Monitoring, Analysis and Reporting Technology) parameters:

    Parameter

    Description

    Comments

    Temperature

    Device temperature

    Reported in kelvin

    Available Spare

    Margin rate (%)

     

    Percentage Used

    Endurance in percentage

     

    Data Units Read

    The number of data units the host has read from the controller

    Reported in 4KB units

    Data Units write

    The number of data units the host has written to the controller

    Reported in 4KB units

    Host Read Commands

    The number of read commands completed by the controller

    Includes compare commands

    Host Write Commands

    The number of write commands completed by the controller

     

    ks_volume_ssd_used_capacity_bytes

    The sum of the capacity of all sub-volumes that belongs to a volume on the same SSD.

    Reported in bytes

    Workload Statistics

    Parameter

    Description

    Comments

    IOs

    Total commands issued to the volume

    Read/Write

    Bandwidth

    Total bytes issued to the volume

    Read/Write

    IOPS

    Read/Write, command size less than or equal to: 4KB/8KB/16KB/64KB/256KB/512KB/1MB/>1MB1

    Histogram

    Latency

    Total latency in µs

    Read/Write

    Volume Telemetry

    The volume-based data that is collected by KumoScale software consists of the following workload statistics:

    Parameter

    Description

    Comments

    IOs

    Total commands issued to the volume

    Read/Write

    Bandwidth

    Total bytes issued to the volume

    Read/Write

    IOPS

    Read/Write, command size less than or equal to: 4KB/8KB/16KB/64KB/256KB/512KB/1MB/>1MB

    Histogram

    Latency

    Total latency in µs

    Read/Write

    ks_volume_utilized_capacity_bytes

    The number of written bytes on the volume.

     

    ks_volume_used_capacity_bytes

    The allocated space the volume took. For:

    ·     Thick volumes, the capacity of the volume.

    ·     Thin volumes, the current allocated space of the volume.

     

    Additional Statistics and Information

    KumoScale software provides Application Programming Interface (API) functions for retrieving system information (via REST APIs or CLI), such as:

    • Available Network Interface Cards (NICs)
    • CPU and memory utilization
    • Status of the SSDs
    • System alerts
    • Network and storage performance statistics
    • Ongoing sessions
    • Allocated volumes