SUSE Enterprise Storage 6 Installation and Basic Operation ¶

1. Installation ¶

1.1. Environment Setup ¶

In this demo, I use below environment, including VM setting and software installed.

All VMs installed here was built on a physical host 10.58.121.68.

Host Server:
    10.58.121.68  root / rootroot

Account
    root / root123

Gateway:
    10.58.120.1

Network Mask:
    255.255.254.0

Nameserver:
    10.58.32.32
    10.33.50.20

Domain Search
    sha.me.corp
    dhcp.sha.me.corp
    me.corp
    ind.me.corp
    bgr.me.corp

SUSE Server 15 SP1 Extensions and Modules were installed as below.

[x] SUSE Enterprise Storage 6 
[x] Basesystem Module 15 SP1 x86_64
[x] Server Applications Module 15 SP1 x86_64

Disable Services is as below:

AppArmor
Firewall

Enable Services is as below. SSH

Register SLES15.1 to local SMT.

SUSEConnect --url https://smtproxy.ind.me.corp

Demo Environment summary is below.

Alias	Host Name	Memory	Disk	eth0	eth0 mac address
sles01	admin (salt-master)	16GB	Disk1: 20G	10.58.121.181/23	52:54:00:23:7d:cd
sles02	data1	16GB	Disk1: 20G	10.58.121.182/23	52:54:00:5f:ce:6f
			Disk2: 8G
			Disk3: 8G
			Disk4: 8G
sles03	data2	16GB	Disk1: 20G	10.58.121.183/23	52:54:00:6f:f2:23
			Disk2: 8G
			Disk3: 8G
			Disk4: 8G
sles04	data3	16GB	Disk1: 20G	10.58.121.184/23	52:54:00:93:4c:67
			Disk2: 8G
			Disk3: 8G
			Disk4: 8G
sles05	data4	16GB	Disk1: 20G	10.58.121.185/23	52:54:00:90:b0:b0
			Disk2: 8G
			Disk3: 8G
			Disk4: 8G
sles06	mon1	16GB	Disk1: 20G	10.58.121.186/23	52:54:00:46:43:7a
sles07	mon2	16GB	Disk1: 20G	10.58.121.187/23	52:54:00:00:fe:6b
sles08	mon3	16GB	Disk1: 20G	10.58.121.188/23	52:54:00:60:a3:92

Add hostname to file /etc/hosts (all nodes)

If you do not specify a cluster network during Ceph deployment, it assumes a single public network environment.
Make sure that the fully qualified domain name (FQDN) of each node can be resolved to the public network IP address by all other nodes.

$ vi /etc/hosts
10.58.121.181   admin.sha.me.corp admin salt
10.58.121.182   data1.sha.me.corp data1
10.58.121.183   data2.sha.me.corp data2
10.58.121.184   data3.sha.me.corp data3
10.58.121.185   data4.sha.me.corp data4
10.58.121.186   mon1.sha.me.corp mon1
10.58.121.187   mon2.sha.me.corp mon2
10.58.121.188   mon3.sha.me.corp mon3

Add all nodes as trust ssh access (root account)

cd ~
ssh-keygen -t rsa
ssh-copy-id -i ~/.ssh/id_rsa.pub root@admin
ssh-copy-id -i ~/.ssh/id_rsa.pub root@data1
ssh-copy-id -i ~/.ssh/id_rsa.pub root@data2
ssh-copy-id -i ~/.ssh/id_rsa.pub root@data3
ssh-copy-id -i ~/.ssh/id_rsa.pub root@data4
ssh-copy-id -i ~/.ssh/id_rsa.pub root@mon1
ssh-copy-id -i ~/.ssh/id_rsa.pub root@mon2
ssh-copy-id -i ~/.ssh/id_rsa.pub root@mon3

ssh admin.sha.me.corp
ssh data1.sha.me.corp
ssh data2.sha.me.corp
ssh data3.sha.me.corp
ssh data4.sha.me.corp
ssh mon1.sha.me.corp
ssh mon2.sha.me.corp
ssh mon3.sha.me.corp
ssh salt
ssh admin
ssh data1
ssh data2
ssh data3
ssh data4
ssh mon1
ssh mon2
ssh mon3

Disable firewall (all nodes)

$ sudo /sbin/SuSEfirewall2 off
$ firewall-cmd --state
    not running

$ systemctl stop firewalld.service
$ systemctl status firewalld.service
    firewalld.service - firewalld - dynamic firewall daemon
    Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: disabled)
    Active: inactive (dead)
  Docs: man:firewalld(1)

Disable IPv6 (all nodes) and Set kernel pid to max value (all nodes)

$ vi /etc/sysctl.conf
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
kernel.pid_max = 4194303
$ sysctl -p

Set DEV_ENV=true in /etc/profile.local in all nodes

Install basic software (all nodes)

$ zypper in -y -t pattern yast2_basis base
$ zypper in -y net-tools vim man sudo tuned irqbalance
$ zypper in -y ethtool rsyslog iputils less supportutils-plugin-ses
$ zypper in -y net-tools-deprecated tree wget

Configure NTP service (all nodes), Setting via YaST2 and add server cn.pool.ntp.,org.

And /etc/chrony.conf file looks like below.

admin:~ # cat /etc/chrony.conf
# Use public servers from the pool.ntp.org project.
pool cn.pool.ntp.org iburst
! pool pool.ntp.org iburst

# Record the rate at which the system clock gains/losses time.
driftfile /var/lib/chrony/drift

# Allow the system clock to be stepped in the first three updates
# if its offset is larger than 1 second.
makestep 1.0 3

# Enable kernel synchronization of the real-time clock (RTC).
rtcsync

# Enable hardware timestamping on all interfaces that support it.
#hwtimestamp *

# Increase the minimum numbgr of selectable sources required to adjust
# the system clock.
#minsources 2

# Allow NTP client access from local network.
#allow 192.168.0.0/16

# Serve time even if not synchronized to a time source.
#local stratum 10

# Specify file containing keys for NTP authentication.
#keyfile /etc/chrony.keys

# Get TAI-UTC offset and leap seconds from the system tz database.
#leapsectz right/UTC

# Specify directory for log files.
logdir /var/log/chrony

# Select which information is logged.
#log measurements statistics tracking

# Also include any directives found in configuration files in /etc/chrony.d
include /etc/chrony.d/*.conf

Make /etc/chrony.conf effective.

# systemctl enable chronyd.service
# systemctl restart chronyd.service
# systemctl status chronyd.service

# chronyc sources

1.2. Install Packages ¶

Install salt-minion on all nodes. And start the service.

Hostname is in file /etc/salt/minion_id

zypper in -y salt-minion

Uncomment below to let all nodes know who is master

$ vi /etc/salt/minion
master: salt 

$ systemctl enable salt-minion.service
$ systemctl start salt-minion.service
$ systemctl status salt-minion.service

Install Ceph in admin node. Check log in /var/log/salt

admin:~ # zypper in -y salt-master
admin:~ # systemctl enable salt-master.service
admin:~ # systemctl start salt-master.service
admin:~ # systemctl status salt-master.service

Note: ganesha will be installed on mon1, not admin node.

admin:~ # zypper se ganesha
admin:~ # zypper in nfs-ganesha
admin:~ # systemctl enable nfs-ganesha
admin:~ # systemctl start nfs-ganesha
admin:~ # systemctl status nfs-ganesha

admin:~ # cd /var/log/salt

List fingerprints of all unaccepted minion keys on the Salt master.

admin:~ # salt-key -F
Local Keys:
master.pem:  c0:e5:***:04:c7
master.pub:  43:73:***:6a:34
Unaccepted Keys:
admin.sha.me.corp:  fe:51:***:b9:48
mon1.sha.me.corp:  94:13:***:91:63
mon2.sha.me.corp:  c0:fd:***:39:3f
mon3.sha.me.corp:  38:fc:***:2e:05
data1.sha.me.corp:  b6:6c:***:63:4f
data2.sha.me.corp:  ab:14:***:c8:ac
data3.sha.me.corp:  90:3f:***:76:3b
data4.sha.me.corp:  d8:12:***:f1:20

If the minions' fingerprints match, accept them

admin:~ # salt-key --accept-all
The following keys are going to be accepted:
Unaccepted Keys:
admin.sha.me.corp
mon1.sha.me.corp
mon2.sha.me.corp
mon3.sha.me.corp
data1.sha.me.corp
data2.sha.me.corp
data3.sha.me.corp
data4.sha.me.corp
Proceed? [n/Y] Y
Key for minion admin.sha.me.corp accepted.
Key for minion mon1.sha.me.corp accepted.
Key for minion mon2.sha.me.corp accepted.
Key for minion mon3.sha.me.corp accepted.
Key for minion data1.sha.me.corp accepted.
Key for minion data2.sha.me.corp accepted.
Key for minion data3.sha.me.corp accepted.
Key for minion data4.sha.me.corp accepted.

Verify that the keys have been accepted

admin:~ # salt-key -F
admin:~ # salt-key --list-all
Accepted Keys:
admin.sha.me.corp
data1.sha.me.corp
data2.sha.me.corp
data3.sha.me.corp
data4.sha.me.corp
mon1.sha.me.corp
mon2.sha.me.corp
mon3.sha.me.corp
Denied Keys:
Unaccepted Keys:
Rejected Keys:

Zero out all drivers which will be used as OSDs (optional)

data1:~ lsblk

data1:~ # for I in {b,c,d}; do dd if=/dev/zero of=dev/sd$i bs=512 count=40 oflag=direct; done

data2:~ # for I in {b,c,d}; do dd if=/dev/zero of=dev/sd$i bs=512 count=40 oflag=direct; done

data3:~ # for I in {b,c,d}; do dd if=/dev/zero of=dev/sd$i bs=512 count=40 oflag=direct; done

Install DeepSea

admin:~ # zypper in -y deepsea

Edit the /srv/pillar/ceph/deepsea_minions.sls file on the Salt master (admin node) and add or replace the following line:

admin:~ # vi /srv/pillar/ceph/deepsea_minions.sls
# Choose minions with a deepsea grain
deepsea_minions: 'G@deepsea:*'  #Match all Salt minions in the cluster
# Choose all minions
# deepsea_minions: '*'  #Match all minions with the 'deepsea' grain
# Choose custom Salt targeting
# deepsea_minions: 'ses*'
# deepsea_minions: 'ceph* or salt'

Target the Minions

Affirm salt-master (admin node) can communicate with the minions. And deploy the grains from admin node to all minions.

admin:~ # salt '*' test.ping
mon1.sha.me.corp:
    True
data4.sha.me.corp:
    True
data3.sha.me.corp:
    True
data2.sha.me.corp:
    True
data1.sha.me.corp:
    True
mon3.sha.me.corp:
    True
admin.sha.me.corp:
    True
mon2.sha.me.corp:
    True

Apply the 'deepsea' grain to a group of minions, and target with a DeepSea Grain

admin:~ # salt '*' grains.append deepsea default
data3.sha.me.corp:
    The val default was already in the list deepsea
mon2.sha.me.corp:
    The val default was already in the list deepsea
data1.sha.me.corp:
    The val default was already in the list deepsea
data4.sha.me.corp:
    The val default was already in the list deepsea
data2.sha.me.corp:
    The val default was already in the list deepsea
mon3.sha.me.corp:
    The val default was already in the list deepsea
admin.sha.me.corp:
    The val default was already in the list deepsea
mon1.sha.me.corp:
    The val default was already in the list deepsea

admin:~ # salt -G 'deepsea:*' test.ping  (The following command is an equivalent)
admin:~ # salt -C 'G@deepsea:*' test.ping
admin.sha.me.corp:
    True
data3.sha.me.corp:
    True
mon1.sha.me.corp:
    True
mon2.sha.me.corp:
    True
data2.sha.me.corp:
    True
data4.sha.me.corp:
    True
mon3.sha.me.corp:
    True
data1.sha.me.corp:
    True

1.3. Stage 0 — the preparation ¶

Run Stage 0—the preparation

During this stage, all required updates are applied and your system may be rebooted.
If there are errors, re-run the stage.

admin:~ # deepsea stage run ceph.stage.0 (The following commands are equivalents)
admin:~ # salt-run state.orch ceph.stage.0
admin:~ # salt-run state.orch ceph.stage.prep

Run Stage 1—the discovery

Here all hardware in your cluster is being detected and necessary information for the Ceph configuration is being collected.
The discovery stage collects data from all minions and creates configuration fragments that are stored in the directory /srv/pillar/ceph/proposals. The data are stored in the YAML format in *.sls or *.yml files

admin:~ # deepsea stage run ceph.stage.1 (The following commands are equivalents)
admin:~ # salt-run state.orch ceph.stage.1
admin:~ # salt-run state.orch ceph.stage.discovery

1.4. Stage 2 — the configuration ¶

Run Stage 2 — the configuration — you need to prepare configuration data in a particular format.

The assignment follows this pattern:
- role-ROLE_NAME/PATH/FILES_TO_INCLUDE (NOTE, the parent directory of PATH is /srv/pillar/ceph/)
To avoid trouble with performance and the upgrade procedure, do not deploy the Ceph OSD, Metadata Server, or Ceph Monitor role to the Admin Node.
Monitors, Metadata Server, and gateways can be co-located on the OSD nodes.
If you are using CephFS, S3/Swift, iSCSI, at least two instances of the respective roles (Metadata Server, Object Gateway, iSCSI) are required for redundancy and availability.

admin:~ # cp /usr/share/doc/packages/deepsea/examples/policy.cfg-rolebased /srv/pillar/ceph/proposals/policy.cfg
admin:~ # vi /srv/pillar/ceph/proposals/policy.cfg
## Cluster Assignment
# Add all nodes into Ceph cluster
cluster-ceph/cluster/*.sls

## Roles
# The Admin node fills the “master” and “admin” roles for DeepSea
# The master role is mandatory, always add a similar line to the following
role-master/cluster/admin*.sls
role-admin/cluster/admin*.sls

# Monitoring
# Cluster monitoring and data graphs, most commonly they run on Admin node
# NFS Ganesha is configured via the file /etc/ganesha/ganesha.conf
# As additional configuration is required to install NFS Ganesha, you can install NFS Ganesha later. 
# The following requirements need to be met before DeepSea stages 2 and 4 can be executed to install NFS Ganesha:
#  a)At least one node needs to be assigned the role-ganesha.
#  b)You can define only one role-ganesha per minion.
#  c)NFS Ganesha needs either an Object Gateway or CephFS to work, otherwise the validation will fail in Stage 3.
#  d)The kernel based NFS needs to be disabled on minions with the role-ganesha role. 
role-prometheus/cluster/admin*.sls
role-grafana/cluster/mon1*.sls

# MON
# The minion will provide the monitor service to the Ceph cluster
role-mon/cluster/mon*.sls

# MGR
# The Ceph manager daemon which collects all the state information from the whole cluster
# Deploy it on all minions where you plan to deploy the Ceph monitor role
role-mgr/cluster/mon1*.sls

# MDS
# The minion will provide the metadata service to support CephFS
role-mds/cluster/mon*.sls

# IGW
# The minion will act as an iSCSI Gateway
role-igw/cluster/mon2*.sls

# RGW
# The minion will act as an Object Gateway
role-rgw/cluster/mon3*.sls

# Storage
# Use this role to specify storage nodes
# It points to data1~4 nodes with a wildcard.
role-storage/cluster/data*.sls

# COMMON
# It includes configuration files generated during the discovery (Stage 1)
# Accept the default values for common configuration parameters such as fsid and public_network
config/stack/default/global.yml
config/stack/default/ceph/cluster.yml

admin:~ # deepsea stage run ceph.stage.2  (The following commands are equivalents)
admin:~ # salt-run state.orch ceph.stage.2
admin:~ # salt-run state.orch ceph.stage.configure

After the command succeeds, run below command to view the pillar data for the specified minions

admin:~ # salt 'mon*' pillar.items
admin:~ # salt '*' saltutil.pillar_refresh

Check time server (admin node) (the directory /srv/pillar/ceph/stack was initialized after deepsea installed, but global.yml file was not created yet until stage 2)

By default, DeepSea uses the Admin Node as the time server for other cluster nodes.

admin:~ # cat /srv/pillar/ceph/stack/default/global.yml  (this file will be generated after stage 2)
monitoring:
  prometheus:
    metric_relabel_config:
      ceph: []
      grafana: []
      node_exporter: []
      prometheus: []
    relabel_config:
      ceph: []
      grafana: []
      node_exporter: []
      prometheus: []
    rule_files: []
    scrape_interval:
      ceph: 10s
      grafana: 10s
      node_exporter: 10s
      prometheus: 10s
    target_partition:
      ceph: 1/1
      grafana: 1/1
      node_exporter: 1/1
      prometheus: 1/1
time_server: admin.sha.me.corp

Verify network (the directory /srv/pillar/ceph/stack was initialized after deepsea installed, but cluster.yml file was not created until stage 2 )

admin:~ # cat /srv/pillar/ceph/stack/ceph/cluster.yml --nothing
admin:~ # cat /srv/pillar/ceph/stack/default/ceph/cluster.yml
available_roles:
- storage
- admin
- mon
- mds
- mgr
- igw
- grafana
- prometheus
- storage
- rgw
- ganesha
- client-cephfs
- client-radosgw
- client-iscsi
- client-nfs
- benchmark-rbd
- benchmark-blockdev
- benchmark-fs
- master
cluster_network: 10.58.120.0/23
fsid: 343ee7d3-232f-4c71-8216-1edbc55ac6e0
public_network: 10.58.120.0/23

Note: customized file will overwrite default one.

Default file:/srv/pillar/ceph/stack/default/ceph/cluster.yml

Customized file:/srv/pillar/ceph/stack/ceph/cluster.yml

Check DriveGroup

DriveGroups specify the layouts of OSDs in the Ceph cluster.
They are defined in a single file /srv/salt/ceph/configuration/files/drive_groups.yml

admin:~ # cat /srv/salt/ceph/configuration/files/drive_groups.yml
    default_drive_group_name:
      target: 'data*'    <--original: 'I@role:storage'
      data_devices:
        all: true

admin:~ # salt 'data*' pillar.items | grep -B5 stroage

1.5. Stage 3 — the deployment ¶

Run Stage 3 — the deployment — creates a basic Ceph cluster with mandatory Ceph services.

This Deployment stage has more than 60 automated steps. Be patient and make sure the stage completes successfully before proceeding.

Set dev environment and disable subvolume:

admin:~ # vi /srv/pillar/ceph/stack/global.yml
admin:~ # vi /srv/pillar/ceph/stack/default/global.yml
monitoring:
  prometheus:
    metric_relabel_config:
      ceph: []
      grafana: []
      node_exporter: []
      prometheus: []
    relabel_config:
      ceph: []
      grafana: []
      node_exporter: []
      prometheus: []
    rule_files: []
    scrape_interval:
      ceph: 10s
      grafana: 10s
      node_exporter: 10s
      prometheus: 10s
    target_partition:
      ceph: 1/1
      grafana: 1/1
      node_exporter: 1/1
      prometheus: 1/1
time_server: admin.sha.me.corp
DEV_ENV: True
subvolume_init: disabled

admin:~ # salt '*' saltutil.pillar_refresh

Note: customized file will overwrite default one.

Default file:/srv/pillar/ceph/stack/default/global.yml

Customized file:/srv/pillar/ceph/stack/global.yml

admin:~ # deepsea stage run ceph.stage.3  (The following commands are equivalents)
admin:~ # salt-run state.orch ceph.stage.3
admin:~ # salt-run state.orch ceph.stage.deploy

After the command succeeds, run the following to check the status:

admin:~ # ceph -s

Below comands return you a structure of matching disks based on your DriveGroups. (will show available information after stage 3)

admin:~ # salt-run disks.Report
admin:~ # salt-run disks.list
admin:~ # salt-run disks.details

1.6. Stage 4 — the services ¶

Run Stage 4 — the services — additional features of Ceph like iSCSI, Object Gateway and CephFS can be installed in this stage. Each is optional.

admin:~ # deepsea stage run ceph.stage.4  (The following commands are equivalents)
admin:~ # salt-run state.orch ceph.stage.4
admin:~ # salt-run state.orch ceph.stage.services

admin:~ # ceph osd lspools
1 iscsi-images
2 cephfs_data
3 cephfs_metadata
4 .rgw.root
5 default.rgw.control
6 default.rgw.meta
7 default.rgw.log

Before logon to dashboard via url, need get credentials first

admin:~ # salt-call grains.get dashboard_creds
local:
    ----------
    admin:
        <your password>   --> the password was changed to mypassword to log on to dashboard

admin:~ # ceph mgr services
{
    "dashboard": "https://mon1.sha.me.corp:8443/",
    "prometheus": "http://mon1.sha.me.corp:9283/"
}

https://10.58.121.186:8443
http://10.58.121.186:9283



admin:~ # watch ceph -s
Every 2.0s: ceph -s                  admin: Mon Oct  5 14:41:51 2020
  cluster:
    id:     <id>
    health: HEALTH_OK

  services:s: ceph -s
    mon: 3 daemons, quorum mon1,mon2,mon3 (age 87m)
    mgr: mon1(active, since 82m)
    mds: cephfs:1 {0=mon3=up:active} 2 up:standby
    osd: 12 osds: 12 up (since 85m), 12 in (since 85m)
    rgw: 1 daemon active (mon3)

  task status:
    scrub status:
        mds.mon3: idle

  data:
    pools:   7 pools, 576 pgs
    objects: 213 objects, 4.2 KiB
    usage:   12 GiB used, 84 GiB / 96 GiB avail
    pgs:     576 active+clean

  io:
    client:   852 B/s rd, 0 op/s rd, 0 op/s wr

1.7. Stage 5 — the removal stage ¶

Run Stage 5 — the removal stage. This stage is not mandatory and during the initial setup it is usually not needed. In this stage the roles of minions and also the cluster configuration are removed. You need to run this stage when you need to remove a storage node from your cluster.

admin:~ # deepsea stage run ceph.stage.

1.8. Installation Guide ¶

Deployment Guide (EN)

Deployment Guide (ZH)

1.9. Issues during installation ¶

[ERROR]: The Salt Master has cached the public key for this node

[SOLUTION]: Restart minions service

[ERROR]: This server_id is computed nor by Adler32 neither by CRC32

SOLUTION

[QUESTION]: How to change new salt key

Stop salt-minion service
# systemctl stop salt-minion

Delete salt-minion pulic key
# rm /etc/salt/pki/minion/minion.pub
# rm /etc/salt/pki/minion/minion.pem

Change new minion_id
admin:~ # echo admin.sha.me.corp > /etc/salt/minion_id
data1:~ # echo data1.sha.me.corp > /etc/salt/minion_id
data2:~ # echo data2.sha.me.corp > /etc/salt/minion_id
data3:~ # echo data3.sha.me.corp > /etc/salt/minion_id
data4:~ # echo data4.sha.me.corp > /etc/salt/minion_id
mon1:~ # echo mon1.sha.me.corp > /etc/salt/minion_id
mon2:~ # echo mon2.sha.me.corp > /etc/salt/minion_id
mon3:~ # echo mon3.sha.me.corp > /etc/salt/minion_id

Delete old ID on admin node
# salt-key -D

Restart salt-minion service
# systemctl restart salt-minion

Accept all new key on admin node
admin:~ # salt-key -L
admin:~ # salt-key -A 
or
admin:~ # salt-key -a admin.sha.me.corp
data1:~ # salt-key -a data1.sha.me.corp
data2:~ # salt-key -a data2.sha.me.corp
data3:~ # salt-key -a data3.sha.me.corp
data4:~ # salt-key -a data4.sha.me.corp
mon1:~ # salt-key -a mon1.sha.me.corp
mon2:~ # salt-key -a mon2.sha.me.corp
mon3:~ # salt-key -a mon3.sha.me.corp

[ERROR] ['/var/lib/ceph subvolume missing on mon3.sha.me.corp', '/var/lib/ceph subvolume missing on mon1.sha.me.corp', '/var/lib/ceph subvolume missing on mon2.sha.me.corp', 'See /srv/salt/ceph/subvolume/README.md']

[SOLUTION]

Edit /srv/pillar/ceph/stack/global.yml and add the following line: subvolume_init: disabled

Then refresh the Salt pillar and re-run DeepSea stage.3: admin:~ # salt '*' saltutil.refresh_pillar admin:~ # salt-run state.orch ceph.stage.3

After DeepSea successfully finished stage.3, the Ceph Dashboard will be running. Refer to Book “Administration Guide”, Chapter 20 “Ceph Dashboard” for a detailed overview of Ceph Dashboard features.

To list nodes running dashboard, run: admin:~ # ceph mgr services | grep dashboard

To list admin credentials, run: admin:~ # salt-call grains.get dashboard_creds

[ERROR] module function cephprocesses.wait executed on nodes mon1~3 and data1~4 in Stage 0

[SOLUTION]

Check below on all nodes
# salt-call cephprocesses.check
    ERROR: process ceph-mds for role mds is not running
    ERROR: process radosgw for role rgw is not running

admin:~ # ceph -s
    Clock skew detected on mon ceph (mon.mon2, mon.mon3)
Set time server to public server (China)
# chronyc sources

1.10. Shutting Down the Whole Ceph Cluster ¶

Shut down or disconnect any clients accessing the cluster.

To prevent CRUSH from automatically rebalancing the cluster, set the cluster to noout:

    # ceph osd set noout
        Other flags you can set per osd:
            nodown
            noup
            noin
            noout

Disable safety measures and run the ceph.shutdown runner:

    admin:~ # salt-run disengage.safety
        safety is now disabled for cluster ceph

    admin:~ # salt-run state.orch ceph.shutdown
        admin.sha.me.corp_master:
          Name: set noout - Function: salt.state - Result: Changed Started: - 14:32:14.398022 Duration: 2266.75 ms
          Name: Shutting down radosgw for rgw - Function: salt.state - Result: Changed Started: - 14:32:16.665452 Duration: 1461.23 ms
          Name: Shutting down cephfs - Function: salt.state - Result: Changed Started: - 14:32:18.127353 Duration: 30326.193 ms
          Name: Shutting down iscsi - Function: salt.state - Result: Clean Started: - 14:32:48.454187 Duration: 30142.468 ms
          Name: Shutting down storage - Function: salt.state - Result: Changed Started: - 14:33:18.597321 Duration: 10841.45 ms
          Name: Shutting down mgr - Function: salt.state - Result: Changed Started: - 14:33:29.439442 Duration: 29209.141 ms
          Name: Shutting down mon - Function: salt.state - Result: Changed Started: - 14:33:58.649221 Duration: 30519.97 ms

        Summary for admin.sha.me.corp_master
        ------------
        Succeeded: 7 (changed=6)
        Failed:    0
        ------------
        Total states run:     7
        Total run time: 134.767 s

Power off all cluster nodes:

    admin:~ # salt -C 'G@deepsea:*' cmd.run "shutdown -h"
        Broadcast message from root@admin (Sat 2021-03-06 14:40:37 CST):
        The system is going down for poweroff at Sat 2021-03-06 14:41:37 CST!
        admin.sha.me.corp:
            Shutdown scheduled for Sat 2021-03-06 14:41:37 CST, use 'shutdown -c' to cancel.
        mon2.sha.me.corp:
            Shutdown scheduled for Sat 2021-03-06 14:41:37 CST, use 'shutdown -c' to cancel.
        data2.sha.me.corp:
            Shutdown scheduled for Sat 2021-03-06 14:41:37 CST, use 'shutdown -c' to cancel.
        mon3.sha.me.corp:
            Shutdown scheduled for Sat 2021-03-06 14:41:37 CST, use 'shutdown -c' to cancel.
        data3.sha.me.corp:
            Shutdown scheduled for Sat 2021-03-06 14:41:37 CST, use 'shutdown -c' to cancel.
        data4.sha.me.corp:
            Shutdown scheduled for Sat 2021-03-06 14:41:37 CST, use 'shutdown -c' to cancel.
        mon1.sha.me.corp:
            Shutdown scheduled for Sat 2021-03-06 14:41:37 CST, use 'shutdown -c' to cancel.
        data1.sha.me.corp:
            Shutdown scheduled for Sat 2021-03-06 14:41:37 CST, use 'shutdown -c' to cancel.

1.11. Starting, Stopping, and Restarting Services Using Targets ¶

# ls /usr/lib/systemd/system/ceph*.target
ceph.target
ceph-osd.target
ceph-mon.target
ceph-mgr.target
ceph-mds.target
ceph-radosgw.target
ceph-rbd-mirror.target

To start/stop/restart all Ceph services on the node, run:

# systemctl start ceph.target
# systemctl stop ceph.target
# systemctl restart ceph.target

To start/stop/restart all OSDs on the node, run:

# systemctl start ceph-osd.target
# systemctl stop ceph-osd.target
# systemctl restart ceph-osd.target

Starting, Stopping, and Restarting Individual Services

# systemctl list-unit-files --all --type=service ceph*
    ceph-osd@.service
    ceph-mon@.service
    ceph-mds@.service
    ceph-mgr@.service
    ceph-radosgw@.service
    ceph-rbd-mirror@.service

Example :

# systemctl status ceph-mon@HOSTNAME.service (e.g., ceph-mon@mon1.service)

# systemctl start ceph-osd@1.service
# systemctl stop ceph-osd@1.service
# systemctl restart ceph-osd@1.service
# systemctl status ceph-osd@1.service

1.12. Restarting All Services ¶

# salt-run state.orch ceph.restart

1.13. Restarting Specific Services ¶

Example: salt-run state.orch ceph.restart.service_name

# salt-run state.orch ceph.restart.mon
# salt-run state.orch ceph.restart.mgr
# salt-run state.orch ceph.restart.osd
# salt-run state.orch ceph.restart.mds
# salt-run state.orch ceph.restart.rgw
# salt-run state.orch ceph.restart.igw
# salt-run state.orch ceph.restart.ganesha

Default log file path of salt-run: /var/log/salt/master

2. Basic Operation ¶

2.1. Pools and Data Placement ¶

2.1.1. Enable the PG Autoscaler and Balancer Modules ¶

Task 1: View the state of all the Manager Modules ¶

List all the existing Manager Modules

admin:~ # ceph mgr module ls | less

Task 2: List the Existing Pools ¶

List the pools that already exist in the cluster

admin:~ # ceph osd lspools
1 iscsi-images
2 cephfs_data
3 cephfs_metadata
4 .rgw.root
5 default.rgw.control
6 default.rgw.meta
7 default.rgw.log

List the pools again, but this time using the rados command:

admin:~ # rados lspools
iscsi-images
cephfs_data
cephfs_metadata
.rgw.root
default.rgw.control
default.rgw.meta
default.rgw.log

View the output of placement group autoscale-status command for the pools

admin:~ # ceph osd pool autoscale-status
Error ENOTSUP: Module 'pg_autoscaler' is not enabled (required by command 'osd pool autoscale-status'): use `ceph mgr module enable pg_autoscaler` to enable it

Task 3: Enable the pg_autoscaler module ¶

Enable the pg_autoscaler module

admin:~ # ceph mgr module enable pg_autoscaler

admin:~ # ceph osd pool autoscale-status
POOL                  SIZE TARGET SIZE RATE RAW CAPACITY  RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE
iscsi-images          389               3.0       98256M 0.0000                               1.0    128         32 warn
cephfs_data             0               3.0       98256M 0.0000                               1.0    256         32 warn
cephfs_metadata      7285               3.0       98256M 0.0000                               4.0     64         16 warn
.rgw.root            1245               3.0       98256M 0.0000                               1.0     32            warn
default.rgw.control     0               3.0       98256M 0.0000                               1.0     32            warn
default.rgw.meta      381               3.0       98256M 0.0000                               1.0     32            warn
default.rgw.log     18078               3.0       98256M 0.0000                               1.0     32            warn

Note that for the iscsi-images pool the PG_NUM value is 128. And note that the NEW PG_NUM value is 32.

The PGs won’t be adjusted automatically because the default setting for the autoscaler is “warn”.

Note the last column (mode) that shows status “warn” for all the pools.

Check current status. “have too many placement groups”. That’s exactly what we want the pg_autoscaler to tell us.

admin:~ # ceph health
HEALTH_WARN 3 pools have too many placement groups

Turn off the pg_autoscaler feature for CephFS pools

admin:~ # ceph osd pool set cephfs_data pg_autoscale_mode off
set pool 2 pg_autoscale_mode to off

admin:~ # ceph osd pool set cephfs_metadata pg_autoscale_mode off
set pool 3 pg_autoscale_mode to off

admin:~ # ceph health
HEALTH_WARN 1 pools have too many placement groups

Set the pg_autoscaler mode to “on” for the iscs-images pool:

admin:~ # ceph osd pool set iscsi-images pg_autoscale_mode on
set pool 1 pg_autoscale_mode to on

admin:~ # ceph osd pool autoscale-status
POOL                  SIZE TARGET SIZE RATE RAW CAPACITY  RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE
iscsi-images          389               3.0       98256M 0.0000                               1.0    128         32 on
cephfs_data             0               3.0       98256M 0.0000                               1.0    256         32 off
cephfs_metadata      7412               3.0       98256M 0.0000                               4.0     64         16 off
.rgw.root            1245               3.0       98256M 0.0000                               1.0     32            warn
default.rgw.control     0               3.0       98256M 0.0000                               1.0     32            warn
default.rgw.meta      381               3.0       98256M 0.0000                               1.0     32            warn
default.rgw.log     18078               3.0       98256M 0.0000                               1.0     32            warn

Turn on the pg_autoscaler feature for CephFS pools

admin:~ # ceph osd pool set cephfs_data pg_autoscale_mode on
set pool 2 pg_autoscale_mode to on

admin:~ # ceph osd pool set cephfs_metadata pg_autoscale_mode on
set pool 3 pg_autoscale_mode to on

PG numbgrs must always be a power of 2

admin:~ # ceph osd pool autoscale-status
POOL                  SIZE TARGET SIZE RATE RAW CAPACITY  RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE
iscsi-images          389               3.0       98256M 0.0000                               1.0     32            on
cephfs_data             0               3.0       98256M 0.0000                               1.0     32            off
cephfs_metadata      7412               3.0       98256M 0.0000                               4.0     16            off
.rgw.root            1245               3.0       98256M 0.0000                               1.0     32            warn
default.rgw.control     0               3.0       98256M 0.0000                               1.0     32            warn
default.rgw.meta      381               3.0       98256M 0.0000                               1.0     32            warn
default.rgw.log     35900               3.0       98256M 0.0000                               1.0     32            warn

Show the cluster health

admin:~ # ceph -s
  cluster:
    id:     <id>
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum mon1,mon2,mon3 (age 4w)
    mgr: mon1(active, since 46h)
    mds: cephfs:1 {0=mon3=up:active} 2 up:standby
    osd: 12 osds: 12 up (since 8w), 12 in (since 8w)
    rgw: 1 daemon active (mon3)

  task status:
    scrub status:
        mds.mon3: idle

  data:
    pools:   7 pools, 433 pgs
    objects: 246 objects, 4.7 KiB
    usage:   13 GiB used, 83 GiB / 96 GiB avail
    pgs:     0.462% pgs not active
             431 active+clean
             2   peering

  io:
    client:   45 KiB/s rd, 0 B/s wr, 44 op/s rd, 28 op/s wr

Task 4: Turn on the Placement Group balancer feature ¶

1). Show the “status” of the balancer:

admin:~ # ceph balancer status
{
    "plans": [],
    "active": false,
    "last_optimize_started": "",
    "last_optimize_duration": "",
    "optimize_result": "",
    "mode": "none"
}

admin:~ # ceph balancer on

admin:~ # ceph balancer status
{
    "plans": [],
    "active": true,
    "last_optimize_started": "Mon Jan  4 20:22:57 2021",
    "last_optimize_duration": "0:00:00.001379",
    "optimize_result": "Please do \"ceph balancer mode\" to choose a valid mode first",
    "mode": "none"
}

2). Set the mode for the balancer to “upmap”:

admin:~ # ceph balancer mode upmap
Error EPERM: min_compat_client "jewel" < "luminous", which is required for pg-upmap. 
Try "ceph osd set-require-min-compat-client luminous" before enabling this mode

admin:~ # ceph osd set-require-min-compat-client luminous --yes-i-really-mean-it
set require_min_compat_client to luminous

admin:~ # ceph balancer mode upmap

admin:~ # ceph balancer status
{
    "plans": [],
    "active": true,
    "last_optimize_started": "Mon Jan  4 20:23:57 2021",
    "last_optimize_duration": "0:00:00.001807",
    "optimize_result": "Please do \"ceph balancer mode\" to choose a valid mode first",
    "mode": "upmap"
}

3). Create a balancer optimization plan called basic-plan. Ceph won’t let you do this yet. Because you just recently enabled the pg_autoscaler, Ceph is moving objects around, and the PGs are quite busy with re-peering.

admin:~ # ceph balancer optimize basic-plan
Error EINVAL: Balancer enabled, disable to optimize manually

4). Show the details of the plan: This shows what “execute”-ing the plan will do, itemizing which PGs will be affected.

admin:~ # ceph balancer show basic-plan
Error ENOENT: plan basic-plan not found   <--- failed here

5). Show the effectiveness of the plan by comparing the current score for the pre-planned balancing and the score for the planned balancing:

admin:~ # ceph balancer eval
current cluster score 0.118731 (lower is better)

admin:~ # ceph balancer eval basic-plan
Error EINVAL: option "basic-plan" not a plan or a pool

6). Show the status of the balancer, now with all of these settings having been set, but before putting them into effect:

The pg_autoscaler has already optimized the balance of PGs sufficiently. That’s because this cluster is very small and has no significant content stored in it yet.

If that’s the case, you would see a message like “Error EALREADY: Unable to find further optimization, or pool(s)' pg_num is decreasing, or distribution is already perfect.” If you receive this message, then you will not be able to complete this task. At some later time in the course you may choose to revisit this task to complete it.

admin:~ # ceph balancer status
{
    "plans": [],
    "active": true,
    "last_optimize_started": "Mon Jan  4 20:32:59 2021",
    "last_optimize_duration": "0:00:00.004170",
    "optimize_result": "Unable to find further optimization, or pool(s) pg_num is decreasing, or distribution is already perfect",
    "mode": "upmap"
}

7). Set the basic-plan into effect:

admin:~ # ceph balancer execute basic-plan
Error EINVAL: Balancer enabled, disable to execute a plan

8). Now re-show the current score for the balanced cluster:

admin:~ # ceph balancer eval
current cluster score 0.118731 (lower is better)

2.1.2. Manipulate Erasure Code Profiles ¶

Task 1: Display a list of the current Erasure Code profiles ¶

admin:~ # ceph osd erasure-code-profile
no valid command found; 4 closest matches:
osd erasure-code-profile set <name> {<profile> [<profile>...]} {--force}
osd erasure-code-profile get <name>
osd erasure-code-profile rm <name>
osd erasure-code-profile ls
Error EINVAL: invalid command

admin:~ # ceph osd erasure-code-profile ls
default

Task 2: Examine the details of the default EC profile ¶

admin:~ # ceph osd erasure-code-profile get default
k=2
m=1
plugin=jerasure
technique=reed_sol_van

Task 3: Create and remove a new EC profile ¶

1. Create a new EC profile from the command line. 
This is going to be a “bad” profile that will be removed in a moment:
admin:~ # ceph osd erasure-code-profile set bad_profile k=2 m=4 plugin=jerasure

admin:~ # ceph osd erasure-code-profile ls
bad_profile
default

admin:~ # ceph osd erasure-code-profile get bad_profile
crush-device-class=
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=2
m=4
plugin=jerasure
technique=reed_sol_van
w=8

admin:~ # ceph osd erasure-code-profile rm bad_profile

admin:~ # ceph osd erasure-code-profile ls
default

Task 4: Create a better EC profile ¶

admin:~ # ceph osd erasure-code-profile set usable_profile k=2 m=1 plugin=jerasure technique=reed_sol_van stripe_unit=4K crush-failure-domain=host

admin:~ # ceph osd erasure-code-profile get usable_profile
crush-device-class=
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=2
m=1
plugin=jerasure
stripe_unit=4K
technique=reed_sol_van
w=8

2.1.3. Manipulate CRUSH Map Rulesets ¶

Task 1: Display a list of the current CRUSH Map rules ¶

admin:~ # ceph osd crush rule ls
replicated_rule

admin:~ # ceph osd crush
                osd crush rule ls
                osd crush rule ls-by-class <class>
                osd crush rule dump {<name>}
                osd crush dump
                osd crush set {<int>}
                osd crush add-bucket <name> <type> {<args> [<args>...]}
                osd crush rename-bucket <srcname> <dstname>
                osd crush set <osdname (id|osd.id)> <float[0.0-]> <args> [<args>...]
                osd crush add <osdname (id|osd.id)> <float[0.0-]> <args> [<args>...]
                osd crush set-all-straw-buckets-to-straw2

admin:~ # ceph osd crush rule
                osd crush rule ls
                osd crush rule ls-by-class <class>
                osd crush rule dump {<name>}
                osd crush rule create-simple <name> <root> <type> {firstn|indep}
                osd crush rule create-replicated <name> <root> <type> {<class>}
                osd crush rule create-erasure <name> {<profile>}
                osd crush rule rm <name>
                osd crush rule rename <srcname> <dstname>

List the existing CRUSH Map rulesets that have been defined according to a particular device class:

admin:~ # ceph osd crush rule ls-by-class hdd

admin:~ # ceph osd crush rule ls-by-class ssd
Error ENOENT: failed to get rules by class 'ssd'

admin:~ # ceph osd crush rule ls-by-class nvme
Error ENOENT: failed to get rules by class 'nvme'

Task 2: Examine the details of the default CRUSH Map rule ¶

Show the details of the default CRUSH Map rule with the dump sub-command:

The “rule_id” and “ruleset” values just numbgrs to keep track of rules similar to a DB key id. “min_size” and “max_size” are related to how CRUSH behaves when a certain numbgr of replicas are created.

The “steps” section is the most functional portion of the rule, providing an ordered set of rules for how CRUSH should behave.

Note that there are three “op” parts, one each for “take”, “chooseleaf_firstn”, and “emit”.

“take” in a replicated rule is always the first step, and “emit” is always the last step.

The “item_type” in the “take” step is the crush_root value, and the “host” in the “chooseleaf_firstn” step is the failure_domain.

admin:~ # ceph osd crush rule dump replicated_rule
{
    "rule_id": 0,
    "rule_name": "replicated_rule",
    "ruleset": 0,
    "type": 1,
    "min_size": 1,
    "max_size": 10,
    "steps": [
        {
            "op": "take",
            "item": -1,
            "item_name": "default"
        },
        {
            "op": "chooseleaf_firstn",
            "num": 0,
            "type": "host"
        },
        {
            "op": "emit"
        }
    ]
}

Task 3: Create and remove a new CRUSH Map rule ¶

1). Create a new CRUSH ruleset from the command line.We made two mistakes here: First, we named it “bud” instead of “bad”.

admin:~ # ceph osd crush rule create-replicated bud_ruleset default host

admin:~ # ceph osd crush rule ls
replicated_rule
bud_ruleset

2). Rename the ruleset:

admin:~ # ceph osd crush rule rename bud_ruleset bad_ruleset

admin:~ # ceph osd crush rule ls
replicated_rule
bad_ruleset

3). The second mistake was that we specified the failure-domain at the host-bucket level.

This is technically not a bad thing to do, in fact it would be a common use case.

But for this demo we want to set the failure domain at the rack-bucket level. We can’t change a defined CRUSH Map ruleset, so delete the bad one:

admin:~ # ceph osd crush rule rm bad_ruleset

admin:~ # ceph osd crush rule ls
replicated_rule

Task 4: Create a better CRUSH Map rule ¶

Create a more appropriate CRUSH Map rule from the CLI, that will survive the failure of a rack:

admin:~ # ceph osd crush rule create-replicated better_ruleset default rack

admin:~ # ceph osd crush rule dump better_ruleset
{
    "rule_id": 1,
    "rule_name": "better_ruleset",
    "ruleset": 1,
    "type": 1,
    "min_size": 1,
    "max_size": 10,
    "steps": [
        {
            "op": "take",
            "item": -1,
            "item_name": "default"
        },
        {
            "op": "chooseleaf_firstn",
            "num": 0,
            "type": "rack"
        },
        {
            "op": "emit"
        }
    ]
}

Task 5: Create CRUSH Map rules for different classes of devices ¶

1). Create two different CRUSH Map rules from the CLI, that will accommodate a slow set of devices (HDDs) and a fast set of devices (SDDs):

The error of 2^nd is because the cluster does not have any SSD devices.

admin:~ # ceph osd crush rule create-replicated slow_devices default host hdd

admin:~ # ceph osd crush rule create-replicated fast_devices default host sdd
Error EINVAL: device class sdd does not exist

2). Display the details of the new “slow” rule:

admin:~ # ceph osd crush rule dump slow_devices
{
    "rule_id": 2,
    "rule_name": "slow_devices",
    "ruleset": 2,
    "type": 1,
    "min_size": 1,
    "max_size": 10,
    "steps": [
        {
            "op": "take",
            "item": -2,
            "item_name": "default~hdd"
        },
        {
            "op": "chooseleaf_firstn",
            "num": 0,
            "type": "host"
        },
        {
            "op": "emit"
        }
    ]
}

Task 6: Change the ruleset used by a pool ¶

1). Show which CRUSH Map Ruleset is being used by the cephfs_data pool: The rule should be listed as replicated_rule.

admin:~ # ceph osd pool get cephfs_data crush_rule
crush_rule: replicated_rule

2). Change the cephfs_data pool to use the new CRUSH Map ruleset that you created in the previous task.

admin:~ # ceph osd pool set cephfs_data crush_rule better_ruleset
set pool 2 crush_rule to better_ruleset

3). Verify that the rule has been changed by re-running the earlier command:

admin:~ # ceph osd pool get cephfs_data crush_rule
crush_rule: better_ruleset

4). In this demo cluster, making the cephfs_data pool use the “better_ruleset” will result in problems. (There’s no rack for the CRUSH Map, and not enough nodes to accommodate the requirement for a large numbgr of PGs.)

So change the setting back to the replicated_rule.

admin:~ # ceph osd pool set cephfs_data crush_rule replicated_rule
set pool 2 crush_rule to replicated_rule

admin:~ # ceph osd pool get cephfs_data crush_rule
crush_rule: replicated_rule

Task 7: Create a CRUSH Map rule enhanced with an EC profile

1). Combine the benefits of Erasure Coding with a CRUSH Map rule: This will only work if you have already created an appropriate EC profile called usable_profile.

In this demo you would have done in an earlier exercise.

And in this demo you need to tie this ec_rule to the usable_profile, not the better_profile.Or else any pool that you create using the ec_rule will fail due to insufficient resources.

admin:~ # ceph osd crush rule create-erasure ec_rule usable_profile  Link the CRUSH map rule (ec_rule) to EC profile (usable_profile)
created rule ec_rule at 3

P.S., The useable_profile was created by :

admin:~ # ceph osd erasure-code-profile set usable_profile k=2 m=1 plugin=jerasure technique=reed_sol_van stripe_unit=4K crush-failure-domain=host

2). Display the details of the EC-enhanced CRUSH Map rule:

See the added, extra “op” steps. You might also notice the different values for “type,” “min_size,” and “max_size” than what you saw in the standard replicated rules.

admin:~ # ceph osd crush rule dump ec_rule
{
    "rule_id": 3,
    "rule_name": "ec_rule",
    "ruleset": 3,
    "type": 3,
    "min_size": 3,
    "max_size": 3,
    "steps": [
        {
            "op": "set_chooseleaf_tries",
            "num": 5
        },
        {
            "op": "set_choose_tries",
            "num": 100
        },
        {
            "op": "take",
            "item": -1,
            "item_name": "default"
        },
        {
            "op": "chooseleaf_indep",
            "num": 0,
            "type": "host"
        },
        {
            "op": "emit"
        }
    ]
}



admin:~ # ceph osd crush rule ls
replicated_rule
better_ruleset
slow_devices
ec_rule

admin:~ # ceph osd crush rule create-replicated better_ruleset default rack
admin:~ # ceph osd crush rule create-replicated slow_devices default host hdd
admin:~ # ceph osd crush rule create-erasure ec_rule usable_profile

admin:~ # ceph osd crush rule dump replicated_rule
{
    "rule_id": 0,
    "rule_name": "replicated_rule",
    "ruleset": 0,
    "type": 1,
    "min_size": 1,
    "max_size": 10,
    "steps": [
        {
            "op": "take",
            "item": -1,
            "item_name": "default"
        },
        {
            "op": "chooseleaf_firstn",
            "num": 0,
            "type": "host"
        },
        {
            "op": "emit"
        }
    ]
}

admin:~ # ceph osd crush rule dump better_ruleset
{
    "rule_id": 1,
    "rule_name": "better_ruleset",
    "ruleset": 1,
    "type": 1,
    "min_size": 1,
    "max_size": 10,
    "steps": [
        {
            "op": "take",
            "item": -1,
            "item_name": "default"
        },
        {
            "op": "chooseleaf_firstn",
            "num": 0,
            "type": "rack"
        },
        {
            "op": "emit"
        }
    ]
}

admin:~ # ceph osd crush rule dump slow_devices
{
    "rule_id": 2,
    "rule_name": "slow_devices",
    "ruleset": 2,
    "type": 1,
    "min_size": 1,
    "max_size": 10,
    "steps": [
        {
            "op": "take",
            "item": -2,
            "item_name": "default~hdd"
        },
        {
            "op": "chooseleaf_firstn",
            "num": 0,
            "type": "host"
        },
        {
            "op": "emit"
        }
    ]
}


admin:~ # ceph osd pool
                osd pool stats {<poolname>}
                osd pool scrub <poolname> [<poolname>...]
                osd pool deep-scrub <poolname> [<poolname>...]
                osd pool repair <poolname> [<poolname>...]
                osd pool force-recovery <poolname> [<poolname>...]
                osd pool force-backfill <poolname> [<poolname>...]
                osd pool cancel-force-recovery <poolname> [<poolname>...]
                osd pool cancel-force-backfill <poolname> [<poolname>...]
                osd pool autoscale-status
                osd pool mksnap <poolname> <snap>

admin:~ # ceph osd pool get <poolname> size
                                        min_size
                                        pg_num
                                        pgp_num
                                        crush_rule
                                        Hashpspool
                                        Nodelete
                                        Nopgchange
                                        Nosizechange
                                        write_fadvise_dontneed
                                        Noscrub
                                        nodeep-scrub
                                        hit_set_type
                                        hit_set_period
                                        hit_set_count
                                        hit_set_fpp
                                        use_gmt_hitset
                                        target_max_objects
                                        target_max_bytes
                                        cache_target_dirty_ratio
                                        cache_target_dirty_high_ratio
                                        cache_target_full_ratio
                                        cache_min_flush_age
                                        cache_min_evict_age
                                        erasure_code_profile
                                        min_read_recency_for_promote
                                        All
                                        min_write_recency_for_promote
                                        fast_read
                                        hit_set_grade_decay_rate
                                        hit_set_search_last_n
                                        scrub_min_interval
                                        scrub_max_interval
                                        deep_scrub_interval
                                        recovery_priority
                                        recovery_op_priority
                                        scrub_priority
                                        compression_mode
                                        compression_algorithm
                                        compression_required_ratio
                                        compression_max_blob_size
                                        compression_min_blob_size
                                        csum_type
                                        csum_min_block
                                        csum_max_block
                                        allow_ec_overwrites
                                        fingerprint_algorithm
                                        pg_autoscale_mode
                                        pg_autoscale_bias
                                        pg_num_min
                                        target_size_bytes
                                        target_size_ratio

2.1.4. Investigate BlueStore ¶

Task 1: Explore the drive_groups.yml configuration ¶

After deployment, the drive_groups.yml file is where the storage administrator defines the configuration of the cluster’s storage devices.

Note the “data_devices” parameter. In this demo, “all” storage devices are data devices for BlueStore.

Note that there are no definitions for “wal_devices” or “db_devices.” That’s because in this demo environment we don’t have any other “fast” devices that would be appropriate for these roles.

Since BlueStore is the default, there is no definition of a “format” for the devices. Otherwise, a “Format: bluestore” key-value pair might exist to ensure that BlueStore is used.

admin:~ # cd /srv/salt/ceph/configuration/files

admin:/srv/salt/ceph/configuration/files # cat drive_groups.yml
# default:  <- just a name - can be anything
#   target: 'data*' <- must be resolvable by salt's targeting processor
#   data_devices:
#     size: 20G
#   db_devices:
#     size: 10G
#     rotational: 1
# allflash:
#   target: 'fast_nodes*'
#   data_devices:
#     size: 100G
#   db_devices:
#     size: 50G
#     rotational: 0
# This is the default configuration and
# will create an OSD on all available drives
default:
  target: 'data*'
  data_devices:
    all: true

Task 2: Examine a storage host’s storage devices ¶

admin:~ # ssh data1
Last login: Tue Jan  5 18:06:40 2021 from 10.58.121.181

Should see 3 devices, which are named ceph LVM-type devices

data1:~ # lsblk
NAME                                                                                                 MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                                                                                                    8:0    0    8G  0 disk
└─ceph--14c886af--269d--475f--8ee3--f5e4abbb222d-osd--data--38911b2d--f30a--4b09--9010--8dd6fad2fcc6 254:0    0    8G  0 lvm
sdb                                                                                                    8:16   0    8G  0 disk
└─ceph--9ec4a77a--5d67--4b21--be53--d7e9221082de-osd--data--00cb3dc6--c28b--41ae--95de--efb86da254da 254:1    0    8G  0 lvm
sdc                                                                                                    8:32   0    8G  0 disk
└─ceph--5eaea8a8--bb68--49dd--a1e3--b82c5464ab1f-osd--data--a4a05f70--53d9--41d4--a273--4f47a088968a 254:2    0    8G  0 lvm
sr0                                                                                                   11:0    1  672M  0 rom
vda                                                                                                  253:0    0   20G  0 disk
├─vda1                                                                                               253:1    0    8M  0 part
├─vda2                                                                                               253:2    0 18.4G  0 part /
└─vda3                                                                                               253:3    0  1.7G  0 part [SWAP]

See the raw ceph devices

data1:~ # ls -lad /dev/ceph*
drwxr-xr-x 2 root root 60 Oct  5 13:15 /dev/ceph-14c886af-269d-475f-8ee3-f5e4abbb222d
drwxr-xr-x 2 root root 60 Oct  5 13:16 /dev/ceph-5eaea8a8-bb68-49dd-a1e3-b82c5464ab1f
drwxr-xr-x 2 root root 60 Oct  5 13:15 /dev/ceph-9ec4a77a-5d67-4b21-be53-d7e9221082de

Dig down even farther by examining the content of one of the directories, see a symlink to an LVM device-mapper device.

All the devices are tied together with LVM. Note that the name of the symlink is named osd-data-.

data1:~ # ls -l /dev/ceph-14c886af-269d-475f-8ee3-f5e4abbb222d
lrwxrwxrwx 1 ceph ceph 7 Oct  5 13:15 osd-data-38911b2d-f30a-4b09-9010-8dd6fad2fcc6 -> ../dm-0

data1:~ # l /dev/dm*
brw-rw---- 1 ceph ceph 254, 0 Jan  5 18:10 /dev/dm-0
brw-rw---- 1 ceph ceph 254, 1 Jan  5 18:10 /dev/dm-1
brw-rw---- 1 ceph ceph 254, 2 Jan  5 18:10 /dev/dm-2

Task 3: Examine a storage host’s OSD details ¶

data1:~ # cd /var/lib/ceph/
data1:/var/lib/ceph # ls -l
drwxr-x--- 1 ceph ceph  0 Aug 24 22:03 bootstrap-mds
drwxr-x--- 1 ceph ceph  0 Aug 24 22:03 bootstrap-mgr
drwxr-x--- 1 ceph ceph 24 Oct  5 13:15 bootstrap-osd
drwxr-x--- 1 ceph ceph  0 Aug 24 22:03 bootstrap-rbd
drwxr-x--- 1 ceph ceph  0 Aug 24 22:03 bootstrap-rbd-mirror
drwxr-x--- 1 ceph ceph  0 Aug 24 22:03 bootstrap-rgw
drwxr-x--- 1 ceph ceph 12 Oct  5 09:04 crash
drwxr-x--- 1 ceph ceph  0 Aug 24 22:03 mds
drwxr-x--- 1 ceph ceph  0 Aug 24 22:03 mgr
drwxr-x--- 1 ceph ceph  0 Aug 24 22:03 mon
drwxr-x--- 1 ceph ceph 38 Oct  5 13:16 osd
drwxr-x--- 1 ceph ceph  0 Aug 24 22:03 tmp

See 3 different sub-directories, each representing the 3 different OSDs (ceph-2, ceph-6, ceph-10) that are running on this storage server

data1:/var/lib/ceph # cd osd/

data1:/var/lib/ceph/osd # ls -l  
drwxrwxrwt 2 ceph ceph 320 Oct  5 13:16 ceph-10
drwxrwxrwt 2 ceph ceph 320 Oct  5 13:15 ceph-2
drwxrwxrwt 2 ceph ceph 320 Oct  5 13:16 ceph-6

See some functional files associated with the OSD and BlueStore. See a block file, which is a symlink to one of the ceph devices, which stores the raw objects for the OSD.

data1:/var/lib/ceph/osd # cd ceph-2

data1:/var/lib/ceph/osd/ceph-2 # ls -l  
-rw-r--r-- 1 ceph ceph 400 Oct  5 13:15 activate.monmap
lrwxrwxrwx 1 ceph ceph  92 Oct  5 13:15 block -> /dev/ceph-14c886af-269d-475f-8ee3-f5e4abbb222d/osd-data-38911b2d-f30a-4b09-9010-8dd6fad2fcc6
-rw------- 1 ceph ceph   2 Oct  5 13:15 bluefs
-rw------- 1 ceph ceph  37 Oct  5 13:15 ceph_fsid
-rw-r--r-- 1 ceph ceph  37 Oct  5 13:15 fsid
-rw------- 1 ceph ceph  55 Oct  5 13:15 keyring
-rw------- 1 ceph ceph   8 Oct  5 13:15 kv_backend
-rw------- 1 ceph ceph  21 Oct  5 13:15 magic
-rw------- 1 ceph ceph   4 Oct  5 13:15 mkfs_done
-rw------- 1 ceph ceph  41 Oct  5 13:15 osd_key
-rw------- 1 ceph ceph   6 Oct  5 13:15 ready
-rw------- 1 ceph ceph   3 Oct  5 13:15 require_osd_release
-rw------- 1 ceph ceph  10 Oct  5 13:15 type
-rw------- 1 ceph ceph   2 Oct  5 13:15 whoami


data1:/var/lib/ceph/osd/ceph-2 # cat ceph_fsid  # The unique ID of this Ceph cluster
343ee7d3-232f-4c71-8216-1edbc55ac6e0  

data1:/var/lib/ceph/osd/ceph-2 # cat fsid  # The unique ID of this OSD
6df58ebc-dbfe-4822-9714-90212c06ea05

data1:/var/lib/ceph/osd/ceph-2 # cat keyring  # The Ceph key for this OSD
[osd.2]
key = <your key>

data1:/var/lib/ceph/osd/ceph-2 # cat ready  # Indication of the readiness of this OSD
ready

data1:/var/lib/ceph/osd/ceph-2 # cat type  # filestore or bluestore (in this case: bluestore)
bluestore

data1:/var/lib/ceph/osd/ceph-2 # cat whoami  # The integer id of this OSD (in this case: 2)
2

Task 4: Display BlueStore information using ceph-bluestore-tool ¶

Show BlueStore metadata for osd.2:

data1:/var/lib/ceph/osd/ceph-2 # ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-2
inferring bluefs devices from bluestore path
{
    "/var/lib/ceph/osd/ceph-2/block": {
        "osd_uuid": "6df58ebc-dbfe-4822-9714-90212c06ea05",
        "size": 8585740288,
        "btime": "2020-10-05 13:15:51.227799",
        "description": "main",
        "bluefs": "1",
        "ceph_fsid": "343ee7d3-232f-4c71-8216-1edbc55ac6e0",
        "kv_backend": "rocksdb",
        "magic": "ceph osd volume v026",
        "mkfs_done": "yes",
        "osd_key": <your key>,
        "ready": "ready",
        "require_osd_release": "14",
        "whoami": "2"
    }

Run a manual “scrub” on osd.7 using ceph-blestore-tool. (Received error, the tool won’t allow you to do this while the OSD is running.)

data1:/var/lib/ceph/osd/ceph-2 # ceph-bluestore-tool fsck --path /var/lib/ceph/osd/ceph-2
error from fsck: (11) Resource temporarily unavailable
2021-01-05 18:32:25.528 7f4abad6e180 -1 bluestore(/var/lib/ceph/osd/ceph-2) _lock_fsid failed to lock /var/lib/ceph/osd/ceph-2/fsid (is another ceph-osd still running?)(11) Resource temporarily unavailable

Simulate that the OSD is down, shutdown the OSD:

data1:/var/lib/ceph/osd/ceph-2 # systemctl stop ceph-osd@2.service

Now run the “fsck” command again. This time the “fsck” has worked, with the output showing: “fsck success”

data1:/var/lib/ceph/osd/ceph-2 # ceph-bluestore-tool fsck --path /var/lib/ceph/osd/ceph-2
fsck success

Restart the OSD:

data1:/var/lib/ceph/osd/ceph-2 # systemctl start ceph-osd@2.service

data1:/var/lib/ceph/osd/ceph-2 # ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-2
inferring bluefs devices from bluestore path
{
    "/var/lib/ceph/osd/ceph-2/block": {
        "osd_uuid": "6df58ebc-dbfe-4822-9714-90212c06ea05",
        "size": 8585740288,
        "btime": "2020-10-05 13:15:51.227799",
        "description": "main",
        "bluefs": "1",
        "ceph_fsid": "343ee7d3-232f-4c71-8216-1edbc55ac6e0",
        "kv_backend": "rocksdb",
        "magic": "ceph osd volume v026",
        "mkfs_done": "yes",
        "osd_key": <your key>,
        "ready": "ready",
        "require_osd_release": "14",
        "whoami": "2"
    }
}

2.2. Common Day 1 Tasks Using the CLI ¶

Including ollowing topics in relation to the commandline:

Users and Ceph Configuration
Health commands
Erasure Code Profiles
CRUSH Map rules
Pools
Scrubbing OSDs and Placement Groups
Manager modules
The tell commands

2.2.1. Ceph Users and Configuration ¶

Task 1: View the current user keyrings ¶

Ceph keyrings are stored in below directory

admin:~ # cd /etc/ceph/

admin:/etc/ceph # ls -l
-rw------- 1 root root 151 Oct  5 13:13 ceph.client.admin.keyring
-rw-r--r-- 1 root root 980 Oct  5 13:13 ceph.conf
-rw-r--r-- 1 root root  92 Aug 24 22:03 rbdmap

The value of 'key' is the key that’s on the keyring. The admin keyring is “allow”ed all capabilities (permissions) to all services in the cluster, as expected. there are more than just client keys.

admin:/etc/ceph # cat ceph.client.admin.keyring
[client.admin]
        key = <your key>
        caps mds = "allow *"
        caps mon = "allow *"
        caps osd = "allow *"
        caps mgr = "allow *"

Display the existing users with the “auth” command: Below two commands are equivalent

admin:/etc/ceph # ceph -n client.admin -keyring=/etc/ceph/ceph.client.admin.keyring auth ls  -- failed???
no valid command found

admin:/etc/ceph # ceph auth ls
installed auth entries:

mds.mon1
        key: <your key>
        caps: [mds] allow *
        caps: [mgr] allow profile mds
        caps: [mon] allow profile mds
        caps: [osd] allow rwx
mds.mon2
        key: <your key>
        caps: [mds] allow *
        caps: [mgr] allow profile mds
        caps: [mon] allow profile mds
        caps: [osd] allow rwx
mds.mon3
        key: <your key>
        caps: [mds] allow *
        caps: [mgr] allow profile mds
        caps: [mon] allow profile mds
        caps: [osd] allow rwx
osd.0
        key: <your key>
        caps: [mgr] allow profile osd
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.1
        key: <your key>
        caps: [mgr] allow profile osd
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.10
        key: <your key>
        caps: [mgr] allow profile osd
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.11
        key: <your key>
        caps: [mgr] allow profile osd
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.2
        key: <your key>
        caps: [mgr] allow profile osd
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.3
        key: <your key>
        caps: [mgr] allow profile osd
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.4
        key: <your key>
        caps: [mgr] allow profile osd
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.5
        key: <your key>
        caps: [mgr] allow profile osd
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.6
        key: <your key>
        caps: [mgr] allow profile osd
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.7
        key: <your key>
        caps: [mgr] allow profile osd
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.8
        key: <your key>
        caps: [mgr] allow profile osd
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.9
        key: <your key>
        caps: [mgr] allow profile osd
        caps: [mon] allow profile osd
        caps: [osd] allow *
client.admin
        key: <your key>
        caps: [mds] allow *
        caps: [mgr] allow *
        caps: [mon] allow *
        caps: [osd] allow *
client.bootstrap-mds
        key: <your key>
        caps: [mon] allow profile bootstrap-mds
client.bootstrap-mgr
        key: <your key>
        caps: [mon] allow profile bootstrap-mgr
client.bootstrap-osd
        key: <your key>
        caps: [mgr] allow r
        caps: [mon] allow profile bootstrap-osd
client.bootstrap-rbd
        key: <your key>
        caps: [mon] allow profile bootstrap-rbd
client.bootstrap-rbd-mirror
        key: <your key>
        caps: [mon] allow profile bootstrap-rbd-mirror
client.bootstrap-rgw
        key: <your key>
        caps: [mon] allow profile bootstrap-rgw
client.igw.mon2
        key: <your key>
        caps: [mgr] allow r
        caps: [mon] allow *
        caps: [osd] allow *
client.rgw.mon3
        key: <your key>
        caps: [mgr] allow r
        caps: [mon] allow rwx
        caps: [osd] allow rwx
client.storage
        key: <your key>
        caps: [mon] allow rw
mgr.mon1
        key: <your key>
        caps: [mds] allow *
        caps: [mon] allow profile mgr
        caps: [osd] allow *

Task 2: Create a new keyring and associated user ¶

1). There are several different ways to create a new keyring and user. This is just one way. Create a new keyring and associated user named James. Remembgr that typically all new users will need read rights for the mon capability, and will need read/write rights for the osd capability, including a specification of rights to a pool.

admin:/etc/ceph # ceph-authtool -g -n client.james --cap mon 'allow r' --cap osd 'allow rw pool=iscsi-images' -C /etc/ceph/ceph.client.james.keyring
creating /etc/ceph/ceph.client.james.keyring

admin:/etc/ceph # l
total 16
drwxr-xr-x 1 root root  130 Jan  5 19:31 ./
drwxr-xr-x 1 root root 4392 Oct  5 13:03 ../
-rw------- 1 root root  151 Oct  5 13:13 ceph.client.admin.keyring
-rw------- 1 root root  126 Jan  5 19:31 ceph.client.james.keyring
-rw-r--r-- 1 root root  980 Oct  5 13:13 ceph.conf
-rw-r--r-- 1 root root   92 Aug 24 22:03 rbdmap

2). Show the content of the newly created keyring:

admin:/etc/ceph # cat ceph.client.james.keyring
[client.james]
        key = <your key>
        caps mon = "allow r"
        caps osd = "allow rw pool=iscsi-images"

3). Officially add the new keyring to Ceph:

admin:/etc/ceph # ceph auth add client.james -i /etc/ceph/ceph.client.james.keyring
added key for client.james

4). Show the key information using the “auth” function:

admin:/etc/ceph # ceph auth get client.james
exported keyring for client.james
[client.james]
        key = <your key>
        caps mon = "allow r"
        caps osd = "allow rw pool=iscsi-images"

Task 3: Create a client key for RBD ¶

1). Change to the directory that contains the ceph keyrings.

admin:~ # cd /etc/ceph/

2). List the content of the directory: Although you see the admin users’s keyring, ceph.client.admin.keyring, there is not yet a file that is appropriate for a specific application to use. Also note that the permissions on the keyring file are quite restrictive: 0600

admin:/etc/ceph # ls -l
-rw------- 1 root root 151 Oct  5 13:13 ceph.client.admin.keyring
-rw------- 1 root root 126 Jan  5 19:31 ceph.client.james.keyring
-rw-r--r-- 1 root root 980 Oct  5 13:13 ceph.conf
-rw-r--r-- 1 root root  92 Aug 24 22:03 rbdmap

3). Show the content of the admin user’s keyring: You will use the value associated with the “key” key to create a new file. Copy the “key” value using your favorite method.

admin:/etc/ceph # cat ceph.client.admin.keyring
[client.admin]
        key = <your key>
        caps mds = "allow *"
        caps mon = "allow *"
        caps osd = "allow *"
        caps mgr = "allow *"

4). Open a new file for editing called admin.secret using your favorite editor (such as vi): The name of the file isn’t very important, but naming it this way will help to identify its purpose: it’s a secret key for the admin user. Note that there are many ways to do this. An alternative way is mentioned in the tip below that will do this in one step using grep and awk.

admin:/etc/ceph # vi admin.secret

5). Paste the “key” value into the new file. It will be the only content of the file. It will look like this (in fact it’s probably exactly the same as this, if you’re using the demo environment provided to you):

admin:/etc/ceph # cat admin.secret
<your key>

6). Save the file and exist out of the editor.

7). Change the permissions of the file so that no other user on the host can see the content of the file:

admin:/etc/ceph # chmod 0600 admin.secret

admin:/etc/ceph # l
drwxr-xr-x 1 root root  154 Jan  5 20:03 ./
drwxr-xr-x 1 root root 4392 Oct  5 13:03 ../
-rw------- 1 root root   41 Jan  5 20:03 admin.secret
-rw------- 1 root root  151 Oct  5 13:13 ceph.client.admin.keyring
-rw------- 1 root root  126 Jan  5 19:31 ceph.client.james.keyring
-rw-r--r-- 1 root root  980 Oct  5 13:13 ceph.conf
-rw-r--r-- 1 root root   92 Aug 24 22:03 rbdmap

Tip:

An alternative way to create this key file is to simply use grep/awk together in one bash command, like this:

admin:/etc/ceph # grep "key =" ceph.client.admin.keyring | awk -F" = " '{ print $2 }'
<your key>

admin:/etc/ceph # grep "key =" ceph.client.admin.keyring | awk -F" = " '{ print $2 }' > admin.secret

admin:/etc/ceph # cat admin.secret
<your key>

Task 4: View the Ceph master configuration file ¶

View the content of the file. The file is managed and controlled by DeepSea. The comment makes reference to the control files in the /srv/salt/ceph/configuration/ directory hierarchy. This is a very simple storage cluster. In a more diverse and sophisticated ceph cluster there may be more configuration settings defined. Although this exercise doesn’t call out any more specific information about this configuration file, you may take a moment to consider the content of the file before finishing the task.

admin:/etc/ceph # cat ceph.conf
# DeepSea default configuration. Changes in this file will be overwritten on
# package update. Include custom configuration fragments in
# /srv/salt/ceph/configuration/files/ceph.conf.d/[global,osd,mon,mgr,mds,client].conf
[global]
fsid = 343ee7d3-232f-4c71-8216-1edbc55ac6e0
mon_initial_membgrs = mon1, mon2, mon3
mon_host = 10.58.121.186, 10.58.121.187, 10.58.121.188
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public_network = 10.58.120.0/23
cluster_network = 10.58.120.0/23
ms_bind_msgr2 = false
# enable old ceph health format in the json output. This fixes the
# ceph_exporter. This option will only stay until the prometheus plugin takes
# over
mon_health_preluminous_compat = true
mon health preluminous compat warning = false
rbd default features = 3
[client.rgw.mon3]
rgw frontends = "beast port=80"
rgw dns name = mon3.sha.me.corp
rgw enable usage log = true
[osd]
[mon]
[mgr]
[mds]
[client]


admin:/etc/ceph # ls -l /srv/salt/ceph/configuration/
drwxr-xr-x 1 salt salt  18 Oct  5 13:13 cache
drwxr-xr-x 1 root root  38 Oct  5 09:04 check
drwxr-xr-x 1 root root  74 Oct  5 09:04 create
-rw-r--r-- 1 root root 217 May 14  2020 default-import.sls
-rw-r--r-- 1 root root 222 May 14  2020 default.sls
drwxr-xr-x 1 root root 276 Oct  5 12:55 files
-rw-r--r-- 1 root root  74 May 14  2020 init.sls

2.2.2. Run the Ceph Health Commands ¶

Get overall health status

admin:~ # ceph health
HEALTH_OK

admin:~ # ceph -s
admin:~ # ceph status
  cluster:
    id:     343ee7d3-232f-4c71-8216-1edbc55ac6e0
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum mon1,mon2,mon3 (age 9w)
    mgr: mon1(active, since 5w)
    mds: cephfs:1 {0=mon3=up:active} 2 up:standby
    osd: 12 osds: 12 up (since 98m), 12 in (since 3M)
    rgw: 1 daemon active (mon3)

  task status:
    scrub status:
        mds.mon3: idle

  data:
    pools:   7 pools, 208 pgs
    objects: 246 objects, 4.7 KiB
    usage:   14 GiB used, 82 GiB / 96 GiB avail
    pgs:     208 active+clean

  io:
    client:   852 B/s rd, 0 op/s rd, 0 op/s wr

Run the “status” command for the monitors:

admin:~ # ceph mon stat
e1: 3 mons at {
                    mon1=[v2:10.58.121.186:3300/0,v1:10.58.121.186:6789/0],
                    mon2=[v2:10.58.121.187:3300/0,v1:10.58.121.187:6789/0],
                    mon3=[v2:10.58.121.188:3300/0,v1:10.58.121.188:6789/0]
                }, 
                election epoch 22, 
                leader 0 mon1, 
                quorum 0,1,2 mon1,mon2,mon3

Run the “status” command for the placement groups:

admin:~ # ceph pg stat
208 pgs: 208 active+clean; 4.7 KiB data, 2.1 GiB used, 82 GiB / 96 GiB avail; 852 B/s rd, 0 op/s

Run the ceph “status” command while watching for changes to the status:

admin:~ # ceph -s --watch-debug
  cluster:
    id:     343ee7d3-232f-4c71-8216-1edbc55ac6e0
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum mon1,mon2,mon3 (age 9w)
    mgr: mon1(active, since 5w)
    mds: cephfs:1 {0=mon3=up:active} 2 up:standby
    osd: 12 osds: 12 up (since 104m), 12 in (since 3M)
    rgw: 1 daemon active (mon3)

  task status:
    scrub status:
        mds.mon3: idle

  data:
    pools:   7 pools, 208 pgs
    objects: 246 objects, 4.7 KiB
    usage:   14 GiB used, 82 GiB / 96 GiB avail
    pgs:     208 active+clean

  io:
    client:   1.2 KiB/s rd, 1 op/s rd, 0 op/s wr


2021-01-05 20:20:53.947298 mgr.mon1 [DBG] pgmap v1597415: 208 pgs: 208 active+clean; 4.7 KiB data, 2.1 GiB used, 82 GiB / 96 GiB avail; 852 B/s rd, 0 op/s
2021-01-05 20:20:55.949294 mgr.mon1 [DBG] pgmap v1597416: 208 pgs: 208 active+clean; 4.7 KiB data, 2.1 GiB used, 82 GiB / 96 GiB avail; 1.2 KiB/s rd, 1 op/s
.......

2.2.3. Manipulate Pools ¶

Task 1: Display a list of the current pools ¶

admin:~ # ceph osd pool ls
iscsi-images
cephfs_data
cephfs_metadata
.rgw.root
default.rgw.control
default.rgw.meta
default.rgw.log

admin:~ # ceph osd pool ls detail
pool 1 'iscsi-images' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 448 lfor 0/448/446 flags hashpspool stripe_width 0 application rbd
pool 2 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 1395 lfor 0/1374/1372 flags hashpspool stripe_width 0 application cephfs
pool 3 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 16 pgp_num 16 last_change 1385 lfor 0/975/973 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs
pool 4 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 31 flags hashpspool stripe_width 0 application rgw
pool 5 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 33 flags hashpspool stripe_width 0 application rgw
pool 6 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 35 flags hashpspool stripe_width 0 application rgw
pool 7 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 37 flags hashpspool stripe_width 0 application rgw

List pools with their index numbgr. Note how the index numbgr matches the index numbgr of the detail listing above.

admin:~ # ceph osd lspools
1 iscsi-images
2 cephfs_data
3 cephfs_metadata
4 .rgw.root
5 default.rgw.control
6 default.rgw.meta
7 default.rgw.log

Task 2: Display the usage data and stats of the current pools ¶

Display pool RAW STORAGE: CLASS SIZE hdd 96 GiB TOTAL 96 GiB POOLS: POOL iscsi-images cephfs_data cephfs_metadata .rgw.root default.rgw.control default.rgw.meta default.rgw.log admin:~ # ceph df detail RAW STORAGE: CLASS SIZE hdd 96 GiB TOTAL 96 GiB POOLS: POOL iscsi-images cephfs_data cephfs_metadata .rgw.root default.rgw.control default.rgw.meta default.rgw.log

SUSE Enterprise Storage 6 Installation and Basic Operation ¶

1. Installation ¶

1.1. Environment Setup ¶

1.2. Install Packages ¶

1.3. Stage 0 — the preparation ¶

1.4. Stage 2 — the configuration ¶

1.5. Stage 3 — the deployment ¶

1.6. Stage 4 — the services ¶

1.7. Stage 5 — the removal stage ¶

1.8. Installation Guide ¶

1.9. Issues during installation ¶

1.10. Shutting Down the Whole Ceph Cluster ¶

1.11. Starting, Stopping, and Restarting Services Using Targets ¶

1.12. Restarting All Services ¶

1.13. Restarting Specific Services ¶

2. Basic Operation ¶

2.1. Pools and Data Placement ¶

2.1.1. Enable the PG Autoscaler and Balancer Modules ¶

Task 1: View the state of all the Manager Modules ¶

Task 2: List the Existing Pools ¶

Task 3: Enable the pg_autoscaler module ¶

Task 4: Turn on the Placement Group balancer feature ¶

2.1.2. Manipulate Erasure Code Profiles ¶

Task 1: Display a list of the current Erasure Code profiles ¶

Task 2: Examine the details of the default EC profile ¶

Task 3: Create and remove a new EC profile ¶

Task 4: Create a better EC profile ¶

2.1.3. Manipulate CRUSH Map Rulesets ¶

Task 1: Display a list of the current CRUSH Map rules ¶

Task 2: Examine the details of the default CRUSH Map rule ¶

Task 3: Create and remove a new CRUSH Map rule ¶

Task 4: Create a better CRUSH Map rule ¶

Task 5: Create CRUSH Map rules for different classes of devices ¶

Task 6: Change the ruleset used by a pool ¶

2.1.4. Investigate BlueStore ¶

Task 1: Explore the drive_groups.yml configuration ¶

Task 2: Examine a storage host’s storage devices ¶

Task 3: Examine a storage host’s OSD details ¶

Task 4: Display BlueStore information using ceph-bluestore-tool ¶

2.2. Common Day 1 Tasks Using the CLI ¶

2.2.1. Ceph Users and Configuration ¶

Task 1: View the current user keyrings ¶

Task 2: Create a new keyring and associated user ¶

Task 3: Create a client key for RBD ¶

Task 4: View the Ceph master configuration file ¶

2.2.2. Run the Ceph Health Commands ¶

2.2.3. Manipulate Pools ¶

Task 1: Display a list of the current pools ¶

Task 2: Display the usage data and stats of the current pools ¶

Task 3: Create two new pools, one replicated, one EC ¶

Task 4: Assign an application to the two new pools ¶

Task 5: Manage snapshots of the new RGW bucket pool ¶

2.2.4. Maintain consistency of data with Scrub and Repair ¶

Task 1: Display a few of the Scrub settings ¶

Task 2: Change the Scrub settings in ceph.conf ¶

Task 3: Change the Scrub settings directly in the Configuration DB ¶

Task 4: Manually scrub and repair an OSD and a PG ¶

2.2.5. Manipulate Manager Modules ¶

Task 1: Display the list of enabled Manager Modules ¶

Task 4: Briefly attempt to use the crash manager module ¶

2.2.6. Introduction to the Tell command ¶

Task 1: Run a benchmark test on an OSD ¶

Task 2: Change the protection setting regarding the deletion of pools ¶

2.3. Ceph Dashboard ¶

2.3.1. Access Dashboard ¶

Task 1: Set the password for the admin user of the Ceph Dashboard ¶

Task 3: Visit the Ceph Dashboard URL ¶

2.3.2. Explore the Dashboard Health, Performance, Status ¶

2.4. Storage Data Access ¶

2.4.1. Ensure the SES Cluster is Healthy ¶

Task 1: Check the Cluster’s health ¶

2.4.2. Use the S3 API to Interact with the RADOS Gateway ¶

Task 1: Using the s3cmd tool and create an S3 user ¶

Task 2: Create a new s3cmd configuration file and a new S3 bucket ¶

Task3: Create and upload a file to a bucket using the S3 API ¶

2.4.3. Use the swift API to Interact with the RADOS Gateway ¶

Task 1: Create a swift subuser ¶

Task 2: Use the swift command to access a file created with the S3cmd tool ¶

2.4.4. Create Snapshots on SES using RBD ¶

Task 1: Create a new pool for RBD images ¶