6.2. Checking Data FlushingΒΆ
Before creating the cluster, you are recommended to check that all storage devices (hard disk drives, solid disk drives, RAIDs, etc.) you plan to include in your cluster can successfully flush data to disk when the server power goes off unexpectedly. Doing so will help you detect possible problems with devices that may lose data stored in their cache in the event of a power failure.
Acronis Storage ships a special tool, vstorage-hwflush-check
, for checking how a storage device flushes data to disk in an emergency case such as power outage. The tool is implemented as a client/server utility:
- Client. The client continuously writes blocks of data to the storage device. When a data block is written, the client increases a special counter and sends it to the server that keeps it.
- Server. The server keeps track of the incoming counters from the client so that it always knows the counter number the client will send next. If the server receives the counter that is less than the one already stored on the server (e.g., because the power was turned off and the storage device did not flush the cached data to disk), the server reports an error.
To check that a storage device can successfully flush data to disk when the power fails, follow the procedure below:
On the server part:
On a different Acronis Storage server than the one with the storage device to check, install the
vstorage-hwflush-check
tool, which is a part of thevstorage-ctl
package:# yum install vstorage-ctl
Run the
vstorage-hwflush-check
server:# vstorage-hwflush-check -l
On the client part:
On the Acronis Storage server with the storage device you want to check, install the
vstorage-hwflush-check
tool:# yum install vstorage-ctl
Run the
vstorage-hwflush-check
client, for example:# vstorage-hwflush-check -s vstorage1.example.com -d /vstorage/stor1-ssd/test -t 50
where
-s vstorage1.example.com
is the hostname of the computer where thevstorage-hwflush-check
server is running.-d /vstorage/stor1-ssd/test
defines the directory to use for testing data flushing. During its execution, the client creates a file in this directory and writes data blocks to it.-t 50
sets the number of threads for the client to write data to disk. Each thread has its own file and counter. You can increase the number of threads (max. 200) to test your system in more stressful conditions. You can also specify other options when running the client. For more information on available options, see thevstorage-hwflush-check
man page.
Wait for 10-15 seconds or more and power off the computer where the client is running, and then turn it on again.
Note
The Reset button does not turn off the power so you need to press the Power button or pull out the power cord to switch off the computer.
Restart the client by executing the same command you used to run it for the first time:
# vstorage-hwflush-check -s vstorage1.example.com -d /vstorage/stor1-ssd/test -t 50
Once launched, the client reads all written data, determines the version of data on the disk, and then restarts the test from the last valid counter. It then sends this valid counter to the server, and the server compares it with the latest counter it has. You may see output like:
id<N>:<counter_on_disk> -> <counter_on_server>
which means one of the following:
- If the counter on disk is lower than the counter on server and a “cache error detected” message is returned, it means that the storage device has failed to flush the data to disk. Avoid using this storage device in production—especially for CS or journals—as you risk losing data.
- If the counter on disk is higher than the counter on server, it means that the storage device has flushed the data to disk but the client has failed to report it to the server. The network may be too slow or the storage device may be too fast for the set number of load threads so you may consider increasing it. This storage device can be used in production.
- If both counters are equal, it means the storage device has flushed the data to disk and the client has reported it to the server. This storage device can be used in production.
To be on the safe side, repeat the procedure several times. Once you check your first storage device, continue with all remaining devices you plan to use in the cluster.