4.1. Monitoring General Storage Cluster ParametersΒΆ

By monitoring general parameters, you can get detailed information about all components of a storage cluster, its overall status and health. To display this information, use the vstorage -c <cluster_name> top command, for example:

../_images/image021.png

The command above shows detailed information about the stor1 cluster. The general parameters (highlighted in red) are explained in the table below.

Parameter Description
Cluster

Overall status of the cluster:

  • healthy. All chunk servers in the cluster are active.
  • unknown. There is not enough information about the cluster state (e.g., because the master MDS server was elected a while ago).
  • degraded. Some of the chunk servers in the cluster are inactive.
  • failure. The cluster has too many inactive chunk servers; the automatic replication is disabled.
  • SMART warning. One or more physical disks attached to cluster nodes are in pre-failure condition. For details, see Monitoring Physical Disks
Space

Amount of disk space in the cluster:

  • free. Free physical disk space in the cluster.
  • allocatable. Amount of logical disk space available to clients. Allocatable disk space is calculated on the basis of the current replication parameters and free disk space on chunk servers. It may also be limited by license.

Note

For more information on monitoring and understanding disk space usage in clusters, see Understanding Disk Space Usage

MDS nodes Number of active MDS servers as compared to the total number of MDS servers configured for the cluster.
epoch time Time elapsed since the MDS master server election.
CS nodes

Number of active chunk servers as compared to the total number of chunk servers configured for the cluster.

The information in parentheses informs you of the number of

  • Active chunk servers (avail.) that are currently up and running in the cluster.
  • Inactive chunk servers (inactive) that are temporarily unavailable. A chunk server is marked as inactive during its first 5 minutes of inactivity.
  • Offline chunk servers (offline) that have been inactive for more than 5 minutes. A chunk server changes its state to offline after 5 minutes of inactivity. Once the state is changed to offline, the cluster starts replicating data to restore the chunks that were stored on the offline chunk server.
License Key number under which the license is registered on the Key Authentication server and license state.
Replication Replication settings. The normal number of chunk replicas and the limit after which a chunk gets blocked until recovered.
IO

Disks IO activity in the cluster:

  • Speed of read and write I/O operations, in bytes per second.
  • Number of read and write I/O operations per second.