DBackup accelerates backup of massive small files
Author: scutech Published time: 2017-04-11 16:52:59 Views: 4796
With informatization marching on, the range and volume of non-structured data have increased rapidly; archives, pics,videos and bills have piled up in the server, taking up more spaces than ever and slowing down the system. It is estimated that data will grow to zettabytes by 2020. While the size of a single file may only be a few KB, the number of such files is another story (reaching millions or even billions). How to improve backup efficiency of such files is an issue that troubles most enterprises. Well, this is where we can help. Our product DBackup integrates several cutting-edge technologies to optimize backup process and improve efficiency. Backup can be ‘Accurate’-accurate deduplication, ‘Fast’-fast transfer and ‘Less’-less resource occupation.
‘Accurate’
Variable-length partitioning and accurate deduplication
Scutech applies variable-length partitioning for non-structured data especially for backup of massive small files. Due to frequent changes of small files, a fixed-block partitioning will often result in repartitioning the whole backup, while variable-length partitioning targets only changed data, which occupies less computing resources and obtains optimal deduplication result as well.
‘Fast’
Multi-channel parallel backup improves efficiency
Scutech applies paralleling processing of file indexing and data backup and parallel data acquisition across multiple channels to significantly improve efficiency. Take a reservoir for an example, draining a reservoir through several water pipes is undoubtedly much faster than through a single pipe.
Parallel processing of file indexing and data backup
A traditional approach to GB-level file backup is to do a serial processing in one process: traverse files to back up, create a file index, then perform file backup. However, for massive files, as there could be a huge number of small files and a complex directory tree, it might consume too much time simultaneously traversing and indexing files, in other words, not efficient.
Scutech isolates file indexing and data backup into two separate processes: while the system is traversing through file directories, on one hand it creates an index for files and backs up files at the same time, which greatly shortens backup window and improves efficiency.
Parallel backup across multiple channels
To acquire and process files before backup, DBackup adopts multi-channel parallel technology that allows the system to firstly traverse through files using pipelining technology, create a file index, shard file information, then transfer backup data through multiple backup channels to the storage server.
The difficulty lies in determining a proper strategy to distribute sharded dataacross multiple channels and integrating backup sets during recovery. Scutech developed an algorithm to automatically monitor channel usage and distribute backup data across unoccupied channels. During recovery, it restores data to its original directory in an efficient and secure manner.
‘Less’
Automated synthesis on server-end, lessening resource occupation
An optimal backup strategy for massive files, especially small files. Compared with a traditional periodic backup approach that periodically performs a lengthy ‘full+incremental backup’, which consumes many computing, I/O and networking resources and interferes with key operations, DBackup synthesizes the initial full backup with subsequent incremental backups to generate a new full backup. Then it again synthesizes with a new incremental backup to generate a newer full backup. The cycle repeats itself. Our file synthetic backup supports all major platforms and environments, including file backup through a mounted disk in NFS or CIFS(Volume-level CDP technology does not support this feature yet).
DBackup integrates multiple technologies to deliver an accurate, fast and efficient backup of massive small files. For more information,scan or long-press the QRcode below and follow us on wechat!