This page is currently incomplete and it is being updated following recent developments.
S3
CSCS offers a public cloud object storage service, based on the Ceph Object Gateway. The service can be accessed from S3-compatible clients.
General information
Endpoint: https://rgw.cscs.ch
URL: path-style in the format https://rgw.cscs.ch/%(bucket)s/key-name
Publicly accessible object links (after setting proper bucket policy): https://rgw.cscs.ch/<tenant>:<bucket-name>/key-name
Usage examples
AWS CLI
Configuration
The first step is to configure the profile:
> aws configure --profile naret-testuser AWS Access Key ID [None]: [REDACTED] AWS Secret Access Key [None]: [REDACTED] Default region name [None]: cscs-zonegroup Default output format [None]:
Then, settings such as the default endpoint and the path-style URLs can be placed in the configuration file:
[profile naret-testuser] endpoint_url = https://rgw.cscs.ch region = cscs-zonegroup s3 = addressing_style = path
Creating a pre-signed URL
> aws --profile=naret-testuser s3 presign s3://test-bucket/file.txt --expires-in 300 https://rgw.cscs.ch/test-bucket/file.txt?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=IA6AOCNMKPDXQ0YNA3DP%2F20241209%2Fcscs-zonegroup%2Fs3%2Faws4_request&X-Amz-Date=20241209T080748Z&X-Amz-Expires=300&X-Amz-SignedHeaders=host&X-Amz-Signature=f2e2adb457f6fd43401124e4ea2650fba528e614ab661f9c05e2fa2e77691b5d
Notice that the tenant part is missing from the URL: this is because S3 doesn't natively deal with multitenancy. The correct object is retrieved based on the access key. A more thorough explanation can be found in the RGW documentation.
Making a bucket's contents anonymously accessible from the Internet
First, a bucket policy needs to be written:
> cat test-public-bucket-anon-from-internet.json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": "*", "Action": "s3:GetObject", "Resource": [ "arn:aws:s3:::test-public-bucket/*", "arn:aws:s3:::test-public-bucket" ] } ] }
Then, it can be applied to the bucket:
> aws --profile=naret-testuser s3api put-bucket-policy --bucket test-public-bucket --policy file://test-public-bucket-anon-from-internet.json
At this point, the objects in test-public-bucket are accessible via direct links:
> curl https://rgw.cscs.ch/test_tenant:test-public-bucket/file.txt This is a test.
s3cmd
Configuration
The first step is to configure the profile:
> s3cmd --configure Enter new values or accept defaults in brackets with Enter. Refer to user manual for detailed description of all options. Access key and Secret key are your identifiers for Amazon S3. Leave them empty for using the env variables. Access Key: [REDACTED] Secret Key: [REDACTED] Default Region [US]: cscs-zonegroup Use "s3.amazonaws.com" for S3 Endpoint and not modify it to the target Amazon S3. S3 Endpoint [s3.amazonaws.com]: rgw.cscs.ch Use "%(bucket)s.s3.amazonaws.com" to the target Amazon S3. "%(bucket)s" and "%(location)s" vars can be used if the target S3 system supports dns based buckets. DNS-style bucket+hostname:port template for accessing a bucket [%(bucket)s.s3.amazonaws.com]: rgw.cscs.ch/%(bucket)s Encryption password is used to protect your files from reading by unauthorized persons while in transfer to S3 Encryption password: Path to GPG program: When using secure HTTPS protocol all communication with Amazon S3 servers is protected from 3rd party eavesdropping. This method is slower than plain HTTP, and can only be proxied with Python 2.7 or newer Use HTTPS protocol [Yes]: Yes On some networks all internet access must go through a HTTP proxy. Try setting it here if you can't connect to S3 directly HTTP Proxy server name: New settings: Access Key: [REDACTED] Secret Key: [REDACTED] Default Region: cscs-zonegroup S3 Endpoint: rgw.cscs.ch DNS-style bucket+hostname:port template for accessing a bucket: rgw.cscs.ch/%(bucket)s Encryption password: Path to GPG program: None Use HTTPS protocol: True HTTP Proxy server name: HTTP Proxy server port: 0
And then confirm.
IMPORTANT: The configuration is not complete yet.
> s3cmd ls s3://test-bucket ERROR: S3 error: 403 (SignatureDoesNotMatch)
To fix this, it is necessary to edit the .s3cfg file, normally located in the user's home directory, and change the signature_v2 setting to true.
~ > cat .s3cfg | grep signature_v2 signature_v2 = True > s3cmd ls s3://test-bucket 2024-12-09 08:05 15 s3://test-bucket/file.txt
Cyberduck
Configuration
In order to be able to connect to the S3 endpoint using Cyberduck, a profile supporting path-style requests must be downloaded from here.
Swift - DEPRECATED
CSCS offers a public cloud object storage service, based on OpenStack Swift. The service can be accessed from REST APIs compatible with the Openstack Swift protocol.
Access
There are several ways of accessing the object storage service, from user-friendly graphical tools to software libraries which can be integrated with custom applications.
Web frontend
Users with the SwiftOperator role can use the OpenStack dashboard to access object storage, by selecting the Object Store tab on the top menu bar of the Horizon GUI. Users who don't have the SwiftOperator role cannot access the object store from this web interface and must use other means such as CLI or Cyberduck or the REST API.
Command line interface
A generic guide on how to connect to our OpenStack system Castor using the command line is available at this URL. You can find detailed documentation on how to use the Swift command line client at the official Openstack documentation. Please note that there are two ways of accessing Swift, either the old Swift client command line, which has the most features, but it's no longer actively developed: Object Storage service (swift) command-line client. Alternatively, you can use the newer OpenStack unified CLI, which does not cover all Swift features yet, but it's actively developed: OpenStackClient
Cyberduck
A guide for connecting to our object storage service using the graphical file browsing client Cyberduck can be found here.
Swift REST API
The object storage is accessible from a REST API, defined this official documentation page. You can access the REST API with curl, or alternatively you can use a software library such as the Python swiftclient module, which can be downloaded from GitHub or from PyPi.
Examples
You can find below a list of how-tos for typical use cases, which however don't cover all the possible operations of Openstack Swift. For a complete list of functionalities please refer to the official documentation of CLI and REST API.
Known issues and workarounds
You can find nelow a list of known issues with the relative workaround.
Access Control Lists
Users could be granted two different roles within a project:
- SwiftOperator: can list, create and modify all containers and objects within a project, can configure read or write Access Control Lists (ACL)
- member: can access objects in specific containers only if a read or write ACL was granted to them
In Swift, ACLs can be assigned to containers, not to objects. Our authentication system involves Keystone V3 and domains, which means that user names and project names might not be unique. Because of this, when creating an ACL rule users need to specify project IDs and user IDs instead of project names and user names. To facilitiate this, we are regularly populating an object called user_ids inside of a container project_info in each project. The object user_ids contains the IDs of members of the current project.
This is an example of how and operator can add a read ACL to a container:
swift post mycontainer --read-acl {PROJECT1_ID}:{USER1_ID},{PROJECT2_ID}:{USER2_ID
The option --write-acl
is used to configure write permissions. Please note that PROJECT_ID
and USER_ID
are long alphanumerical strings, so the command in reality will look like the following:
swift post testcontainer --read-acl 62f7feebbfb94f3bbb501b0a060nfn2r:3bb7feebbfb94f3bb5mdob0a060b30eb
It is also possible to use the wildcard *
, as described in the official ACL documentation.
Operators can use the swift stat
command to list existing ACLs on a container.
If a user wants to use the CLI to access an object in a project he's member of he can run a command like the following, after having authenticated: swift list {container_name}
.
Only operators can list all the containers in a project. Normal users cannot list which containers they have access to. However, once they are told by operators which containers they have access to, they can list their contents. If a user wants to access an object in a project he's not member of, as long as he was granted ACL access, he should instead use the following command:
swift --os-storage-url https://object.cscs.ch/v1/AUTH_{CHOSEN_PROJECT_ID} list {CONTAINER_NAME
You can find more documentation about Swift ACLs on the official documentation.
Roles and project memberships have to be requested contacting CSCS staff.
Data protection
Container versioning
Our object storage system automatically saves up to 3 versions of objects whenever they are modified or deleted. Object versions are automatically stored into the {your_container_name}_versions container and they are kept for 90 days. The user can recover an object by copying (CLI: swift copy) the desired version from the {your_container_name}_versions container into {your_container_name} or to a different one. The {your_container_name}_versions containers are automatically created by a daily cron job.
Backup
In addition to versioning, all data written in the object store is backed up to tape. This allows the recovery of the entire object store in case of major hardware or file system failures. The backup is taken once a day, and we configured a retention policy of 3 months. Whenever an object changes, a new copy is created in the backend, with a maximum of three copies stored on tape. In case of major outage, the entire storage will be restored within a few days.
Swift S3 API
In addition to the standard Swift API the object storage service exposes also a S3 API, still using the same service endpoint. The compatibility matrix between the Swift S3 API and the Amazon S3 API is described in this table.
These two APIs allow access to the same object storage service, so no matter which one you decide to use, you will always access to the same data. The S3 ACLs are disabled in order to allow both APIs to operate seamlessly. In order to set ACLs on your containers/buckets and objects, please use the Swift API.
To use the Swift S3 API, first you have to create a set of EC2 credentials. For this you need to obtain a standard keystone token and then use the OpensStacl CLI as follow:
openstack ec2 credentials create
You will obtain in output a pair of access and secret keys which can be used with any S3 client. Below a short list of the most common ones:
- boto
- s3cmd
- s3curl
- GUIs: Cyberduck, S3 Browser, Bucket Explorer, ...