先运行一个swarm集群。
创建一个网络ceph_network
docker network create -d overlay ceph_network
部署 etcd ,用于存储ceph配置。
version: '3.5'
services:
etcd0:
build: etcd
image: quay.io/coreos/etcd
ports:
- 2379:2379
- 2380:2380
volumes:
- etcd0:/etcd_data
command:
- /usr/local/bin/etcd
- -name
- etcd0
- --data-dir
- /etcd_data
- -advertise-client-urls
- http://etcd0:2379
- -listen-client-urls
- http://0.0.0.0:2379
- -initial-advertise-peer-urls
- http://etcd0:2380
- -listen-peer-urls
- http://0.0.0.0:2380
- -initial-cluster
- etcd0=http://etcd0:2380,etcd1=http://etcd1:2380,etcd2=http://etcd2:2380
etcd1:
build: etcd
image: quay.io/coreos/etcd
ports:
- 2379
- 2380
volumes:
- etcd1:/etcd_data
command:
- /usr/local/bin/etcd
- -name
- etcd1
- --data-dir
- /etcd_data
- -advertise-client-urls
- http://etcd1:2379
- -listen-client-urls
- http://0.0.0.0:2379
- -initial-advertise-peer-urls
- http://etcd1:2380
- -listen-peer-urls
- http://0.0.0.0:2380
- -initial-cluster
- etcd0=http://etcd0:2380,etcd1=http://etcd1:2380,etcd2=http://etcd2:2380
etcd2:
build: etcd
image: quay.io/coreos/etcd
ports:
- 2379
- 2380
volumes:
- etcd2:/etcd_data
command:
- /usr/local/bin/etcd
- -name
- etcd2
- --data-dir
- /etcd_data
- -advertise-client-urls
- http://etcd2:2379
- -listen-client-urls
- http://0.0.0.0:2379
- -initial-advertise-peer-urls
- http://etcd2:2380
- -listen-peer-urls
- http://0.0.0.0:2380
- -initial-cluster
- etcd0=http://etcd0:2380,etcd1=http://etcd1:2380,etcd2=http://etcd2:2380
volumes:
etcd0:
etcd1:
etcd2:
networks:
default:
external:
name: ceph_network
运行下面容器,初始化ceph配置。
version: '3.5'
services:
populate_kvstore:
image: ceph/daemon
command: populate_kvstore
environment:
KV_TYPE: etcd
KV_IP: etcd0
KV_PORT: 2379
networks:
default:
external:
name: ceph_network
这个初始化动作有个bug,会少初始化一个目录,进入etcd0控制台 手动创建目录
etcdctl mkdir /ceph-config/ceph/client_host
运行下面stack,运行 ceph的 mon ,mgr,mds服务。
version: '3.5'
services:
mon:
image: ceph/daemon
command: mon
privileged: true
extra_hosts:
- "etcd0:127.0.0.1"
- "etcd1:127.0.0.1"
- "etcd2:127.0.0.1"
volumes:
- /var/lib/ceph:/var/lib/ceph
deploy:
placement:
constraints:
- node.role == manager
networks:
hostnet: {}
environment:
NETWORK_AUTO_DETECT: 4
KV_TYPE: etcd
KV_IP: etcd0
KV_PORT: 2379
mgr:
image: ceph/daemon
command: mgr
privileged: true
extra_hosts:
- "etcd0:127.0.0.1"
- "etcd1:127.0.0.1"
- "etcd2:127.0.0.1"
volumes:
- /var/lib/ceph:/var/lib/ceph
deploy:
placement:
constraints:
- node.role == manager
networks:
hostnet: {}
environment:
KV_TYPE: etcd
KV_IP: etcd0
KV_PORT: 2379
mds:
image: ceph/daemon
command: mds
privileged: true
extra_hosts:
- "etcd0:127.0.0.1"
- "etcd1:127.0.0.1"
- "etcd2:127.0.0.1"
volumes:
- /var/lib/ceph:/var/lib/ceph
deploy:
placement:
constraints:
- node.role == manager
networks:
hostnet: {}
environment:
CEPHFS_CREATE: 1
KV_TYPE: etcd
KV_IP: etcd0
KV_PORT: 2379
networks:
hostnet:
external: true
name: host
在有挂载卷(sdb)的机器上运行 osd服务。(如果这个机器不在docker集群,需要自行配置etcd访问路径。)
docker run -d --net=host \
--privileged=true \
--pid=host \
--add-host=etcd0:127.0.0.1 \
--add-host=etcd1:127.0.0.1 \
--add-host=etcd2:127.0.0.1 \
-v /dev/:/dev/ \
-e OSD_DEVICE=/dev/sdb \
-e OSD_TYPE=disk \
-e KV_TYPE=etcd \
-e KV_IP=etcd0 \
-e OSD_DMCRYPT=1 \
ceph/daemon osd
进入mon控制台,ceph -s 查看集群状态。
可以使用zap来清理ceph操作过的的存储设备。
docker run -d --privileged=true \
-v /dev/:/dev/ \
-e OSD_DEVICE=/dev/sdb \
ceph/daemon zap_device
倒腾了几天之后,发现容器化部署ceph还是比较坑的。
遇到很多问题,查官方 看issues,发现容器化ceph还是不够成熟,各种问题,各种坑。
就目前来说,etcd存储配置还是不成熟,各种问题,集群部署好,重启mgr服务,服务就起不来了,各种重启,目前还不知道什么原因。
目前处理是停用了etcd存储,转用传统配置存储,然后再一步步看吧,还不知道有多少坑,忧心忡忡,不敢上生产,甚至想滚回去,还有ceph-deploy方式来部署ceph,脑壳疼。
参考链接:https://ceph.com/planet/%E5%9F%BA%E4%BA%8Edocker%E9%83%A8%E7%BD%B2ceph%E4%BB%A5%E5%8F%8A%E4%BF%AE%E6%94%B9docker-image/
https://hub.docker.com/r/ceph/daemon/
https://geek-cookbook.funkypenguin.co.nz/ha-docker-swarm/shared-storage-ceph/
https://github.com/sepich/ceph-swarm