Object Storage

Modified on Wed, 22 Jul at 3:07 PM

Object Storage is not a traditional file-system or real-time data storage system. It's designed for mostly static data that can be retrieved, leveraged, and then updated if necessary. It is independent of a particular instance and can be updated and used without having any instance running. It is designed to be redundant and scalable.

This article contains information about Swift and how to request an allocation of Pawsey S3 Object Storage Service.

Concept

Think about that dataset comprised of 2 GB files that you read in and analyse many times, but in general it doesn't change. Or the images you want to use on the cloud. Those are a couple examples of what's perfect for object storage. Objects are written to multiple hardware devices in the data center to ensure integrity, and great performance.

In general, the object store is great for data you write once and read many times, but not suitable for applications like databases. It's the safest place to put your data on the ARDC Nectar Research Cloud as multiple redundant copies of your data are made, and it has great performance. You can access the object store from anywhere on the Internet, and data from object storage can be transferred to and from your virtual machine with a variety of http-capable tools.

Object storage has the following features which are quite different from the traditional file systems:

Access via API at application-level, rather than via OS at file-system-level. This means, byte-level interaction is not possible and interaction can occur via a single API end point.
Access via HTTPS optionally allows data to be externally accessible.
No directory tree: object storage uses a flat structure and objects are stored in containers.
Metadata lives directly with object.
Scalability: object storage systems can scale very well when data reaches hundreds of TB and moves into the PB range and beyond.
Durability: object storage systems have mechanisms to check file consistency, and handle failed drives, bit-rot, server and cabinet failures, etc. These features allow the system to automatically replicate data as needed to retain the desired number of replicas, which results in extremely high durability and availability of data.
Cost: object storage systems are designed to run on commodity hardware, it is cheaper compared to block or file storage.

Swift

Swift is the component that provides object storage for OpenStack. With your credentials and via a URL you can request Swift to reserve & create storage (called containers or buckets). Files (known as objects when stored in Swift) can then be uploaded and accessed similarly by your running Virtual Machines.

The Nectar implementation of Swift is geodistributed across the Nectar Nodes so that availability is not reliant on any one data centre or network infrastructure. Each collection of Swift nodes/hardware is known as a region, which may or may not include a Swift proxy server (the internet facing and serving component of Swift). With some Swift clients/APIs users can explicitly chose which proxy to connect to — an example where this might be useful is for speeding up writes to object storage by choosing the nearest proxy. Due to Nectar's Swift having multiple regions (some of which are Node private) some clients/APIs require explicit configuration of a default region, which should be "Melbourne" for most users.

Swift does not provide encryption of the data it stores. If you have sensitive data that requires encryption you must encrypt the data files before upload.

Pawsey S3 Object Storage Service

In addition to the object storage included in the Resource Bundles, Pawsey Supercomputing Research Centre offers a large S3 Object Storage Service. This is an additional service of Object Storage provided and approved by the Pawsey Supercomputing Research Centre. The storage is located in Western Australia.

Ceph is a distributed storage system that supports object storage by providing an S3-compatible interface through RADOS Gateway (RGW), allowing you to interact with buckets and objects using standard APIs and tools.

With your credentials and via a URL, you can request Ceph to reserve and create storage (called buckets). Files (known as objects when stored in Ceph RGW) can then be uploaded and accessed similarly by your running virtual machines, or from any other external source.

To request an allocation of Pawsey object storage, you should enter it in the Allocation request form under STEP 2: Cloud resources, 2. Location Specific Resources on the Nectar Dashboard. For example, if you require 10 TB of storage (equivalent to 10 000 GB), toggle On the S3 Object Storage Service and enter 10000, as shown in the screenshot:

The toggle on the left is highlighted in yellow, turned On next to S3 Object Storage Service. Below, in a box with the subheading Pawsey (WA), the text 10000 has been entered in the field labelled Storage, where the units are GB.

Adhere to the Pawsey Supercomputing Research Centre Data Storage and Management Policy. Pawsey object storage does not provide encryption of the data it stores. If you have sensitive data that requires encryption, you must encrypt the data files before uploading them to Pawsey object storage. Refer to Nectar’s guidance for sensitive data and conform to your institution’s data governance framework, policies and procedures.