6.1. Monitoring General Storage Cluster ParametersΒΆ

By monitoring general parameters, you can get detailed information about all components of the storage cluster, its overall status and health. To display this information, use the vstorage -c <cluster_name> top command. For example:

../_images/image021.png

The command above shows detailed information about the stor1 cluster. The general parameters (highlighted in red) are as follows.

Cluster

Overall status of the cluster:

Healthy
All chunk servers in the cluster are active.
Unknown
There is not enough information about the cluster state (e.g., because the master MDS server was elected a while ago).
Degraded
Some of the chunk servers in the cluster are inactive.
Failure
The cluster has too many inactive chunk servers; the automatic replication is disabled.
SMART warning
One or more physical disks attached to cluster nodes are in pre-failure condition. For details, see Monitoring Physical Disks.
Space

Amount of disk space in the cluster:

Free
Free physical disk space in the cluster.
Allocatable
Amount of logical disk space available to clients. Allocatable disk space is calculated on the basis of the current replication parameters and free disk space on chunk servers. It may also be limited by license.

Note

For more information on monitoring and understanding disk space usage in clusters, see Understanding Disk Space Usage.

MDS nodes
Number of active MDS servers as compared to the total number of MDS servers configured for the cluster.
Epoch time
Time elapsed since the MDS master server election.
CS nodes

Number of active chunk servers as compared to the total number of chunk servers configured for the cluster.

In parentheses, you can see the additional information on these chunk servers:

  • Active chunk servers (avail.) that are currently up and running in the cluster.
  • Inactive chunk servers (inactive) that are temporarily unavailable. A chunk server is marked as inactive during its first 5 minutes of inactivity.
  • Offline chunk servers (offline) that have been inactive for more than 5 minutes. A chunk server changes its state to offline after 5 minutes of inactivity. Once the state is changed to offline, the cluster starts replicating data to restore the chunks that were stored on the offline chunk server.
License
Key number under which the license is registered on the Key Authentication server and license state.
Replication
Replication settings. The normal number of chunk replicas and the limit after which a chunk gets blocked until recovered.
IO

Disk IO activity in the cluster:

  • Speed of read and write I/O operations, in bytes per second.
  • Number of read and write I/O operations per second.