.. _Monitoring the Entire Cluster: Monitoring the Entire Cluster ----------------------------- The overall storage cluster statistics are available on the **MONITORING** > **Dashboard** screen. Pay attention to the storage cluster status that can be one of the following: .. raw:: latex \setlist[description]{leftmargin=!,labelindent=0pt,labelwidth=1em+\widthof{DEGRADED}} **HEALTHY** All cluster components are active and operate normally. **UNKNOWN** Not enough information about the cluster state (e.g., because the cluster is inaccessible). **DEGRADED** Some of the cluster components are inactive or inaccessible. The cluster is trying to heal itself, data replication is scheduled or in progress. **FAILURE** The cluster has too many inactive services, automatic replication is disabled. If the cluster enters this state, troubleshoot the nodes or contact the support team. .. raw:: latex \setlist[description]{leftmargin=!,labelindent=0pt,labelwidth=1em+\widthof{ }} To view the storage cluster statistics in full screen, click **Fullscreen mode**. To exit the fullscreen mode, press **Esc** or **Exit fullscreen mode**. For advanced monitoring, click **Grafana dashboard**. A separate browser tab will open with preconfigured Grafana dashboards where you can manage existing dashboards, create new ones, share them between users, configure alerting, etc. For more information, refer to `Grafana documentation `__ . .. image:: /images/stor_image137.png :align: center :class: align-center .. include:: /includes/monitoring-storage-cluster-part1.inc .. _I/O Activity Charts: I/O Activity Charts ~~~~~~~~~~~~~~~~~~~ The **Read** and **Write** charts show the history of the cluster I/O activity as the speed of read and write I/O operations in megabytes per second and the number of read and write I/O operations per second (IOPS). For example: .. only:: ac .. image:: /images/stor_image28_ac.png :align: center :class: align-center .. only:: vz .. image:: /images/stor_image28_vz.png :align: center :class: align-center .. _Services Chart: Services Chart ~~~~~~~~~~~~~~ On the **Services** chart, you can monitor two types of services: - Metadata services (MDS). The number all disks with the metadata role. Ensure that at least three MDSes are running at all times. - Chunk services (CS). The number of all disks with the storage role. Typical statistics may look like this: .. only:: ac .. image:: /images/stor_image27_ac.png :align: center :class: align-center .. only:: vz .. image:: /images/stor_image27_vz.png :align: center :class: align-center If some of the services were not in the healthy state for some time, these time periods will be highlighted in red on the chart. .. _Chunks Chart: Chunks Chart ~~~~~~~~~~~~ You can monitor the state of all chunks in the cluster on the **Chunks** chart. Chunks can be in the following states: .. raw:: latex \setlist[description]{leftmargin=!,labelindent=0pt,labelwidth=1em+\widthof{degraded}} .. include:: /includes/monitoring-storage-cluster-part2.inc .. raw:: latex \setlist[description]{leftmargin=!,labelindent=0pt,labelwidth=1em+\widthof{ }} Healthy chunks are highlighted on the scale in green, offline in red, blocked in yellow, and degraded in grey. For example: .. image:: /images/stor_image27_1.png :align: center :class: align-center The **Replication** section shows the information about replication activity in the cluster. .. _Physical Space Chart: Physical Space Chart ~~~~~~~~~~~~~~~~~~~~ The **Physical space** chart shows the current usage of physical space in the entire storage cluster and on each particular tier. The used space includes the space occupied by all data chunks and their replicas plus the space occupied by any other data. .. image:: /images/stor_image27_2.png :align: center :class: align-center .. _Logical Space Chart: Logical Space Chart ~~~~~~~~~~~~~~~~~~~ The **Logical space** chart represents all the space allocated to different services for storing user data. This includes the space occupied exclusively by user data. Replicas and erasure coding metadata are not taken into account. .. image:: /images/stor_image27_3.png :align: center :class: align-center .. _Understanding Logical Space: Understanding Logical Space *************************** When monitoring disk space information in the cluster, keep in mind that logical space is the amount of free disk space that can be used for storing user data in the form of data chunks and all their replicas. Once this space runs out, no data can be written to the cluster. To better understand how logical disk space is calculated, consider the following example: - The cluster has three disks with the storage role. The first disk has 200 GB of space, the second one has 500 GB, and the third one has 1 TB. - If the redundancy mode is set to three replicas, each data chunk must be stored as three replicas on three different disks with the storage role. In this example, the available logical disk space will be 200 GB, that is, equal to the capacity of the smallest disk with the storage role. The reason is that each replica must be stored on a different disk. So once the space on the smallest disk (i.e. 200 GB) runs out, no new chunk replicas can be created unless a new disk with the storage role is added or the redundancy mode is changed to two replicas. With the two replicas redundancy mode, the available logical disk space would be 700 GB, because the two smallest disks combined can hold 700 GB of data.