Vault Storage
Vault storage is a tape-based product that is designed to accommodate long-term data retention and is best used to store data that is accessed infrequently or backups of data that is being stored locally. Vault storage can be presented as a desktop share or network drive (SMB), can be mounted on a server (NFS) or can be accessed via SFTP or rsync. Access to SMB shares is controlled via secure access groups that can be managed through the University's GroupAdmin Tool). NFS shares are made available to specific machines using static IP addresses or hostnames.
Vault storage uses a combination of front-end disk alongside substantial tape capacity to efficiently store large amounts of data. The disk cache allows data recently written to Vault to be accessed quickly and magnetic tapes store older files that have not been recently used. By default data that has not been accessed for seven days is migrated to tape, but the default policy can be adjusted in special cases (contact the RDS team for more information). Files migrated to tape are still displayed in the relevant directory and data is automatically recalled to disk when it is accessed. Recalling data from tape can take up to five minutes, possibly much longer, depending upon the file size and number of files.
Due to the way data is stored on magnetic tape1, recalling data can be a lengthy process if a collection contains a large number of files. For each file that needs to be recalled the tape robot must:
- Look up the file in the database,
- Locate the specific tape that contains the file (large files may span multiple tapes),
- Collect the tape and load it into an available tape drive,
- Move to the position on the tape where the data is stored, and
- Copy the data to disk.
As a result, recalling collections with large numbers of files may take a long time especially if the files were added over a long period as they may be spread across multiple tapes. In order to ensure that data is recalled quickly and efficiently, it is strongly recommended that users store data in an orderly structure of folders that represent experiments, projects, instruments or people and also bundle large numbers of small files together into archives. This can be accomplished by using a file compression tool like ZIP, TAR or SquashFS. In this way a single archive containing a large number of files can be recalled far more quickly and efficiently than the same data stored as many individual files. If there is a need to access a large number of files within a short timeframe, a “bulk recall" can be requested by contacting the RDS team (note that bulk recalls are assessed on a case-by-case basis).
Vault storage can be accessed by external collaborators using the Monash VPN or via Aspera (a web-based service). There is a significant amount of Vault capacity available and allocations can be requested via the Data Dashboard.
Resources
- Vault Storage User Guide
- Data Dashboard User Guide
- Guidelines for managing research data
- Electronic Information Security - Information Classification Procedure
- Slide Presentation on Better Use of Vault
Technical Information
| Feature | Description |
|---|---|
| Protocols | SMB, NFS, SFTP, rsync |
| Supported Operating Systems | Windows2, macOS, Linux |
| Security | Secure access groups (SMB), specific machines (NFS) via hostname or IP, local accounts (SFTP, rsync) |
| Security Classification | Public, Restricted |
| Backup Schedule | Daily, 30-day backup retention period |