fbpx

How the Data Integrity Check works

October 17, 2020,

As a reminder, we license Ahsay’s software for our cloud backup service. This is a copy of their Data Integrity Check tech doc and we have added some relevant content. Ahsay are the world’s largest supplier of cloud backup software to MSPs.

In this tech doc AhsayOBM refers to our Server product and AhsayACB refers to our Desktop product.

How the Data Integrity Check works

Cloud Data backup is essential, and it should work when you need to restore from it.

Data can become corrupted during transit and at rest. This is something our software checks for during backup, and periodically when stored at rest. If you are using the latest version of our software, we can always guarantee your data will be restored to the same state you backed it up. 

In backing up large or even small file(s), data corruption may still occur during a backup job or at rest if you are using your own hardware. 

Some of the possible causes of data corruption are:

1. Backups terminate without warning when an active backup job is in progress. The most common causes are when a device is unplugged or power is lost.

We can’t prevent these situations, however, our software will detect the failed backup and recover from it next time. 

2. Technical problems on the cloud storage service.

If you are using your own hardware to store your customer data, it will need to be monitored and maintained to prevent any hardware failure and subsequent data loss.

If you are using our storage locations, or have your own account with Microsoft’s or Wasabi’s clouds, data corruption due to faulty hardware won’t be a concern. They both guarantee 11 nines level of data durability at all times.

Because data corruption is always a possibility, it is important to identify and remove corrupted files from the backup destination(s). Identifying and removing corrupted files from the backup destination(s) is mission-critical as it measures the integrity of the backup data and whether it can be restored.

The primary role of the Data Integrity Check is to identify and remove corrupted files from the backup destination(s). This will allow the next backup job to back up these files again.

Key Features

  • Identify and remove the files and/or folders in the backup destination(s) which do not appear in the index.
  • Identify and remove the files and/or folders which appear in the index but do not actually exist in the backup destinations (i.e. Cloud storage, or Local storage).
  • Identify and remove corrupted files from the backup destination(s) when the Run Cyclic Redundancy Check (CRC) During Data Integrity Check setting is enabled.
  • Identify and remove partially uploaded (orphan) files from the backup destination(s), and free up storage space.

Initiating a Data Integrity Check (DIC)

Data Integrity Checks are run from the server or desktop backup software on your device. We expect Ahsay to make this feature available on the backup clusters in early 2021.

Data Integrity Check (DIC) Modes

There are two (2) data integrity check modes:

  • With Run Cyclic Redundancy Check (CRC) disabled (Default mode)
  • With Run Cyclic Redundancy Check (CRC) enabled
With Run Cyclic Redundancy Check (CRC) disabled (Default mode)

This is the default setting for the Data Integrity Check. Running a Data Integrity Check using this mode allows the AhsayOBM/AhsayACB client to perform a comparison between the files and/or folders on the backup destination(s), and the list of the files and/or folders recorded in the current index file.

The following images show a detailed flow for each data integrity check mode.

Run Cyclic Redundancy Check (CRC) disabled (Default mode)

Ahsay data integrity check without CRC
Ahsay data integrity check without CRC large

You should run a Data Integrity Check in default mode when:You encounter index issues on your backup/restore job.You know or suspect the backup set storage statistics are not updated or incorrect and cannot wait for the next weekly Periodic Data Integrity Check (PDIC) job.You need to remove partially uploaded (orphan) files from the backup destination(s) to free up space. Partially uploaded (orphan) files will remain in the backup destination(s) when backup jobs with large files (i.e. database, VMware/Hyper-V, Windows System) backups are terminated unexpectedly or crash.

With Run Cyclic Redundancy Check (CRC) enabled

Running a data integrity check on this mode will perform a check on the integrity of the files in the backup destination(s) against the checksum file generated at the time of the backup job.

If there is a discrepancy, this indicates that the file(s) on the backup destination(s) are corrupted. The AhsayOBM/AhsayACB client will remove these files from the backup destination(s). If these files still exist on the client machine or backup server on the next backup job, the AhsayOBM/AhsayACB client will upload the latest copy.

The following images show a detailed flow for each data integrity check mode.

Run Cyclic Redundancy Check (CRC) enabled

Data Integrity Check with the CRC box ticked should only be run after you have contacted Support and we recommend it. When a CRC check is enabled, all your data is downloaded and streamed through the backup software on your device, and a new index file is created. Depending on how large your data set it, this could take hours or days, and you might incur data download (egress) charges from your storage provider.

We haven’t needed to run a full Data Integrity Check with CRC enabled since v6 was last used in 2017.

Ahsay data integrity check with CRC
Ahsay data integrity check with CRC large

For large file sizes, a percentage of progress will be displayed throughout the data integrity check job when this setting is enabled: 

Ahsay data integrity check running

Limitations

The Data Integrity Check has to be started manually from the AhsayOBM/AhsayACB client UI. It cannot be remotely started from the AhsayCBS web console. We expect Ahsay to support this in early 2021. 

The only exception is for a Run on Server (Office 365 or Cloud File) backup sets were a data integrity check can be started from the AhsayCBS web console. 

  • When a Data Integrity Check has identified issues on the backup set, it may require the end-user to confirm the changes before it takes the corrective actions

  • When a data integrity check is running, a backup and restore job cannot be run and vice versa: When an active backup or restore job(s) is running, a data integrity check cannot be run

With Run Cyclic Redundancy Check (CRC) enabled

Test Mode Confirmation Screen

Normally as part of the data integrity job, a (TEST MODE) confirmation screen is usually displayed when the data integrity check completes. This gives a summary report of the corrupted files, invalid indexes, or storage statistics issue for each backup destination. The (TEST MODE) confirmation screen allows the end-user to review the results of the data integrity check, and to decide whether they would like to proceed with the corrective actions. To further streamline the data integrity check process and improve user experience, the (TEST MODE) confirmation screen will ONLY prompt if either of the criteria below matches the backup data during the data integrity check operation:

  • deleted number of backup files is over 1,000
  • deleted number of backup file size is over 512MB (in total)
  • deleted number of backup files is over 10% of the total backup files

Otherwise, the data integrity check job will automatically take corrective actions.

The (TEST MODE) screen includes five (5) summary report for the following items found per backup destination:

test mode

Although you select ALL backup sets before starting the data integrity check, the (TEST MODE) confirmation screen will prompt one at a time with the corresponding backup set(s).For example, the data integrity check has run with three (3) backup sets and all these backup sets match the criteria’s of the (TEST MODE) confirmation screen, the (TEST MODE) confirmation screen will prompt three times to confirm if the end user will take corrective actions for the three backup sets.

Below is an example of a (TEST MODE) confirmation screen with the following scenario:

  • Multiple backup destinations, corrupted items and index-related issues found with correct and incorrect storage statistics.

How does Data Integrity Check (DIC) compare with Periodic Data Integrity Check (PDIC)

The periodic data integrity check is performed at the beginning of a backup job, which provides an additional regular data integrity check of the backup data and updates the storage statistics for each backup set. The PDIC feature is enabled in v8.3.2.11 (March 2020). 

Unlike the Data Integrity Check (DIC), the PDIC starts automatically in the background and performs a quick check of all the backup destination(s) without the end-user’s intervention.

The PDIC will be initiated automatically once EITHER of the following conditions is met:

  • Will be triggered on a weekly basis, usually on the first run of backup job that falls on any of these days: Friday, Saturday, or Sunday
  • If there is no active backup job(s) running on Friday, Saturday, or Sunday, then the PDIC will be triggered on the next available backup job

E.g. If the last PDIC job was run more than seven (7) days ago, then the subsequent PDIC job(s) will run seven days from that day onwards.

Comparisons of DIC and PDIC

dic comparison

One response to “How the Data Integrity Check works”

  1. Edward1984 says:

    Ahsay update this every few releases. Is it now 100% reliable?

Leave a Reply

Your email address will not be published. Required fields are marked *

BOBcloud.net
The Old Sorting Office, Corsham, Wiltshire SN13 9AA
Tel: 0800 907 8238 https://www.bobcloud.net/wp-content/themes/bobcloud/images/logo.png