FAQs - Archiving To The Cloud

Overview:

Cloud storage is great, we love it as a concept but like so many things the devils in the detail. Recent advances relating to internet connectivity and provider offerings have now made it feasible to move your old files to cloud storage, and if done properly with sufficient planning such a move can prove highly cost effective in the long term.

ArchiverFS is capable of archiving old files to most cloud storage offerings. If the cloud storage can be presented to the local network via a UNC path then ArchiverFS can probably archive to it. This includes offerings from Amazon, Microsoft, Oracle and others.

In addition to testing against on-premise storage, we test every ArchiverFS release against an Amazon Storage Gateway with an S3 based file share as an archive destination. If you would like more information about this specific setup then please feel free to email us via [email protected] and we will be happy to assist. We chose this setup to test against as it gives access to Amazon Glacier as a long term storage medium, and Glacier is priced extremely aggressively for cloud storage.

Before deciding to setup cloud archiving there are a couple of items that have to be considered:

- How much data do you want to migrate?

- How fast is your internet connectivity?

- How large are the largest files that users may want to access via seamless links?

All three of these items are crucial to the performance of the cloud archiving system. First files will need to be migrated to the cloud based archive storage across the local internet connection, and then when users want to access those files they will need to be recalled over it.

If you are able to migrate everything that hasn't been used in 3 years to second line storage, then as a guide we recommend you allow for 75% of your total file system to be moved. Obviously the exact percentage will vary, and you can always turn up or down the age settings as required.

To get an exact figure you can install ArchiverFS, configure an archive job, select the 'Pre-Scan' option then run the job. Once the job completes you'll be able to open the log file and see exactly how much data would have been archived if the job had been run without the 'Pre-Scan' option selected.

Now you have some idea of how much data you are likely to be archiving we recommend that you calculate how long it will take to migrate this data to cloud storage to cloud storage over the local internet connection. This is a simple calculation, but remember not to mix up MB's (MegaBytes) and Mb's (MegaBits). File system size will normally be calculated in MB's, and line speed will normally be calculated in Mb's. Divide Mb's by 8 to convert to MB's.

Next you should establish what the largest size file is that your users will typically need to access, and work out how long it will take a user to open that size file of file over the internet connection. You can set size limits on files to be migrated as part of an archive jobs options which may help if there are a small number of files that are significantly larger than the average.

If your files are particularly large, if the internet connection isn't fast enough or if it will take a really long time to migrate all your files to cloud storage then you may want to consider upgrading your internet connection or using stand alone storage on site like a NAS device instead of cloud storage.

If you do go ahead and decide that you do want to archive to cloud based storage then we would recommend working towards your desired retention timescales step by step e.g. if you are aiming to archive anything over 3 years old then start by archiving everything over 10 years old. Once you have archived everything over 10 years old run the job again set to archive everything over 9 years old, then repeat this until you reach your target retention age.

As long as you are happy to proceed and all the above numbers add up, setting up archiving is quite simple.

Archiving to Amazon Glacier:

Amazon Glacier offers almost unparalled per GB storage costs, they are roughly on par with or better than a locally deployed NAS device (depends heavily on the specific device, service life, etc). Amazon Glacier is a fantastic product, but there is one problem.

It is difficult to integrate Amazon Glacier with on-premise services. At the time of writing, you can only interact with it via the Amazon Glacier via the Amazon console or via the HTTP API.

There is no way to directly share Amazon Glacier storage to the network, or mount Glacier storage to a VM as a volume. Luckily there are ways round this.

The Amazon Storage Gateway ((https://aws.amazon.com/storagegateway/) allows you to present Amazon S3 storage volumes to your local network as if they were hosted on your local on-site servers.

This allows you to deploy an Amazon Storage Gateway on site and use it to present an Amazon S3 storage volume to the local network. This S3 volume can then be used as storage for one or more archive jobs.

But you don't have to stop there, you can now set up policies on the Storage Gateway to move files from the S3 volume to Glacier automatically. Your users will still retain seamless access to archived files via the shortcuts created by ArchiverFS in the live file system, but you'll now be able to take advantage of the crazy low cost per GB of Amazon Glacier storage for your old files.

Getting further help:

If you have any further questions then please don't hesitate to contact us via [email protected].