Rucio
Note
Code Repository: https://gitlab.com/ska-telescope/src/deployments/skaosrc/ska-src-skaosrc-services-cd/-/tree/main/rucio (private)
Rucio is a distributed data management (DDM) service developed by CERN for WLCG. It is responsible for maintaining a database of where data is located in associated storage sites, otherwise known as Rucio Storage Elements (RSEs), and data placement, allowing operators to upload, move and delete data (replicas) by the creation of “rules”.
Prerequisites
IAM Configuration
For OIDC access, two clients are required to be set up on IAM. The first, herefater the “auth” client, enables the
OIDC authorization_code
flow used to authorise users against the Rucio server. The second, hereafter the “admin”
client, is used to enable the Rucio server and daemon processes to perform authenticated actions against storage.
In the default “admin token” workflow that Rucio utilises, the token retrieved at authentication by the user using the
auth client is not the same token passed onward to perform operations on the datalake. Instead, the Rucio server gets a
token via the admin client using the OAuth2 client_credentials
grant. As this type of grant is for service to service
communication, it doesn’t require the offline_access
scope (as this scope is needed to generate refresh tokens, but refresh tokens
aren’t required when the service has all the credentials it needs to generate access tokens whenever it wants). It also
doesn’t make sense to have the openid
scope, or groups
, as these are related to users, which this type of grant
doesn’t necessitate.
Note
As of the time of writing, it is necessary to have the offline_access
scope enabled on the admin client as the
token exchange done by FTS exchanges an access token for a refresh token (again, quite what this means is ambiguous;
as this token was originally obtained from a client_credentials
one could argue that it doesn’t make sense to have
a refresh token).
Using the IAM web portal, register two new clients: an auth client and an admin client.
On both clients, set up the redirect_uris
to include:
For the auth client, enable the following grants:
urn:ietf:params:oauth:grant-type:token-exchange
(may require IAM administrator privilege),refresh_token
authorization_code
and enable (at least) the following scopes:
openid
profile
offline_access
wlcg.groups
fts
rucio
On the admin client, enable the following grants:
client_credentials
refresh_token
urn:ietf:params:oauth:grant-type:token-exchange
and enable (at least) the following scopes:
openid
profile
offline_access
storage.read:/
storage.modify:/
storage.create:/
wlcg.groups
fts
rucio
scim:read
Databases
Two databases are required to operate Rucio for SRCNet. The first is an application database used by the various Rucio processes. The second is a “standalone” database that is used to store metadata.
The charts for deploying two instances of postgres for this purposes can be found in the linked repository at the top of this page.
Monitoring
For monitoring, an instance of both Elasticsearch and Grafana is the preferred option.
Deployment (k8s/helm)
The processes that constitute the Rucio service (server, daemons) are all containerised and can easily be deployed on to Kubernetes clusters via a series of Helm charts. The charts these can be found in the linked repository at the top of this page.