Rucio

Rucio is a distributed data management (DDM) service developed by CERN for WLCG. It is responsible for maintaining a database of where data is located in associated storage sites, otherwise known as Rucio Storage Elements (RSEs), and data placement, allowing operators to upload, move and delete data (replicas) by the creation of “rules”.

Prerequisites

IAM Configuration

For OIDC access, two clients are required to be set up on IAM. The first, herefater the “auth” client, enables the OIDC authorization_code flow used to authorise users against the Rucio server. The second, hereafter the “admin” client, is used to enable the Rucio server and daemon processes to perform authenticated actions against storage.

In the default “admin token” workflow that Rucio utilises, the token retrieved at authentication by the user using the auth client is not the same token passed onward to perform operations on the datalake. Instead, the Rucio server gets a token via the admin client using the OAuth2 client_credentials grant. As this type of grant is for service to service communication, it doesn’t require the offline_access scope (as this scope is needed to generate refresh tokens, but refresh tokens aren’t required when the service has all the credentials it needs to generate access tokens whenever it wants). It also doesn’t make sense to have the openid scope, or groups, as these are related to users, which this type of grant doesn’t necessitate.

Note

As of the time of writing, it is necessary to have the offline_access scope enabled on the admin client as the token exchange done by FTS exchanges an access token for a refresh token (again, quite what this means is ambiguous; as this token was originally obtained from a client_credentials one could argue that it doesn’t make sense to have a refresh token).

Using the IAM web portal, register two new clients: an auth client and an admin client.

On both clients, set up the redirect_uris to include:

For the auth client, enable the following grants:

  • urn:ietf:params:oauth:grant-type:token-exchange (may require IAM administrator privilege),

  • refresh_token

  • authorization_code

and enable (at least) the following scopes:

  • openid

  • profile

  • offline_access

  • wlcg.groups

  • fts

  • rucio

On the admin client, enable the following grants:

  • client_credentials

  • refresh_token

  • urn:ietf:params:oauth:grant-type:token-exchange

and enable (at least) the following scopes:

  • openid

  • profile

  • offline_access

  • storage.read:/

  • storage.modify:/

  • storage.create:/

  • wlcg.groups

  • fts

  • rucio

  • scim:read

Databases

Two databases are required to operate Rucio for SRCNet. The first is an application database used by the various Rucio processes. The second is a “standalone” database that is used to store metadata.

The charts for deploying two instances of postgres for this purposes can be found in the linked repository at the top of this page.

Monitoring

For monitoring, an instance of both Elasticsearch and Grafana is the preferred option.

Deployment (k8s/helm)

The processes that constitute the Rucio service (server, daemons) are all containerised and can easily be deployed on to Kubernetes clusters via a series of Helm charts. The charts these can be found in the linked repository at the top of this page.

Guides

Notes