4.2. Monitoring nodes

Nodes added to the infrastructure are listed on the Infrastructure > Nodes screen with their status displayed. If the storage cluster has not been created yet, you will only see nodes with the Unassigned status. If the storage cluster exists, its nodes will be listed on the screen.

A node can have one of the following statuses:

UNASSIGNED
The node is not assigned to a cluster.
HEALTHY
All the storage services on the node are running.
ENTERING MAINTENANCE…
The node is entering maintenance. The services it hosts are either being evacuated or stopped.
ENTERING MAINTENANCE HALTED
The node cannot enter maintenance, because some of its services cannot be evacuated.
IN MAINTENANCE
The node is in maintenance mode. It does not participate in new chunk allocation.
EXITING MAINTENANCE…
The node is exiting maintenance. Nodes exiting maintenance cannot be managed.
FAILED
One or more storage services on the node have failed.

The default time interval for the charts is twelve hours. To zoom into a particular time interval, select the internal with the mouse; to reset zoom, double-click any chart.

4.2.1. Monitoring node performance

To monitor the performance of a cluster node, open the Nodes screen, and then click the required node line. On the right pane, the Overview tab displays the performance statistics. The statistics include:

  • CPU/RAM: CPU usage in percent over time, and RAM usage, in GiB over time
  • Network: the amount of transmitted (TX) and received (RX) traffic over time
  • Disks read: node read activity over time
  • Disks write: node write activity over time

The following sections provide more information on disk and network usage.

4.2.2. Monitoring node disks

To monitor the usage and status of node disks, click a node’s name. On the Disks tab, you will see a list of all disks on the node and their status icons.

A disk status icon shows the combined status of S.M.A.R.T. and the service corresponding to the disk role. It can be one of the following:

OK
The disk and service are healthy.
Failed
The service has failed or S.M.A.R.T. reported an error.
Releasing
The service is being released. When the process finishes, the disk status will change to OK.

To monitor performance of a particular disk, select it, and then click Performance. The Drive performance panel will display the I/O activity of the disk.

To view information about the disk, including its S.M.A.R.T. status, click Details.

To have the disk blink its activity LED, select the disk, and then click Blink. To have the disk stop blinking, click Unblink.

4.2.2.1. Monitoring the S.M.A.R.T. status of node disks

The S.M.A.R.T. status of all disks is monitored by a tool installed along with Acronis Cyber Infrastructure. Run every 10 minutes, the tool polls all disks attached to nodes, including journaling SSDs and system disks, and reports the results to the management node.

For the tool to work, make sure the S.M.A.R.T. functionality is enabled in node’s BIOS.

If a S.M.A.R.T. warning message is shown in the node status, one of that node’s disks is in the pre-failure condition and should be replaced. If you continue using the disk, keep in mind that it may fail or cause performance issues.

Pre-failure condition means that at least one of these S.M.A.R.T. counters is not zero:

  • Reallocated Sector Count
  • Reallocated Event Count
  • Current Pending Sector Count
  • Offline Uncorrectable

4.2.3. Monitoring node network

To monitor the node’s network usage, on the Infrastructure > Nodes screen, click the name of the node. Go to the Network tab.

To display the performance charts of a specific network interface, select it in the list, and then click Performance. When monitoring network performance, keep in mind that if the Receive and transmit errors chart is not empty, the network is experiencing issues and requires attention.

To display the details of a network interface, click Details. The Network details pane shows the interface state, bandwidth, MTU, MAC address, and all IP addresses.