Using Alibaba Cloud Distributed Storage with Self-Built K8s Clusters
Categories:
Introduction
This article, written on 2024-06-14, explains how to use Alibaba Cloud distributed storage in a self-built cluster on Alibaba Cloud. At the end you will find document links; the official Alibaba Cloud documentation is in Chinese, while the Alibaba Cloud storage plugin repository on GitHub currently contains only English docs—readers who can do so are encouraged to consult the original texts.
Storage Plugin Installation
- Create a custom permission policy: https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver/blob/master/docs/ram-policies/disk.json
- Create a RAM role, attach the custom policy, and save the
accesskey
andsecret
:kubectl create secret -n kube-system generic csi-access-key --from-literal=id='{id}' --from-literal=secret='{secret}'
- Install the CSI driver—no Helm chart exists, so installation must be done locally (as of 2024-06-13).
git clone https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver.git
cd alibaba-cloud-csi-driver/deploy
- If you are deploying to a self-built cluster on Alibaba Cloud ECS, simply run the next command; otherwise, read the notes carefully: https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver/blob/master/docs/install.md
helm upgrade --install alibaba-cloud-csi-driver ./chart --values chart/values-ecs.yaml --namespace kube-system
- Confirm with
watch kubectl get pods -n kube-system -l app=csi-plugin
Storage Type Selection Guide
- The minimum size for an ECS cloud disk is 20 GB with 3,000 IOPS; this is quite large and not particularly cost-effective.
- Dynamic cloud-disk volumes
- Official docs:
- Cloud disks cannot span availability zones; they are non-shared and can be mounted by only one Pod at a time (tests show multiple Pods in the same deployment can mount the same disk).
- Disk type must match the ECS type or mounting will fail. Refer to Instance Families for detailed compatibility.
- During deployment, a StorageClass auto-provisions the PV to purchase the cloud disk. If you have already purchased the disk, use a static volume instead.
- The requested disk size must lie within the range allowed for single disks.
- When the Pod is recreated, it will re-attach the original cloud disk. If scheduling constraints prevent relocation to the original AZ, the Pod will stay in the Pending state.
- Dynamically created disks are pay-as-you-go.
- Extra test notes:
- Although multiple Pods can mount a disk, only one can read and write; the rest are read-only. Therefore the PVC must set
accessModes
toReadWriteOnce
, and changing this has no effect. - If the StorageClass
reclaimPolicy
isDelete
, deleting the PVC also automatically deletes the cloud disk. - If the StorageClass
reclaimPolicy
isRetain
, the cloud disk is not deleted automatically; you must manually remove it both from the cluster and from the Alibaba Cloud console.
- Although multiple Pods can mount a disk, only one can read and write; the rest are read-only. Therefore the PVC must set
- A suitable scenario is hard to find.
- Official docs:
- Static cloud-disk volumes
- Official docs:
- Manually create the PV and PVC.
- Cloud disks cannot span availability zones; they are non-shared and can be mounted by only one Pod at a time.
- Disk type must match the ECS type or mounting fails.
- You may select disks in the same region and AZ as the cluster that are in the “Available” state.
- Official docs:
- Dynamic cloud-disk volumes
- NAS exhibits comparatively high latency; the best-case latency is ~2 ms, deep storage ~10 ms, pay-as-you-go, and offers better read/write performance than OSS object storage.
- OSS volume: https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/oss-volume-overview-1?spm=a2c4g.11186623.0.0.43166a351NbtvU
- OSS is shared storage that can serve multiple Pods simultaneously.
- As of 2024-06-13 supports CentOS, Alibaba Cloud Linux, ContainerOS and Anolis OS.
- Each application uses an independent PV name when using the volume.
- OSS volumes rely on ossfs as a FUSE file system.
- Good for read-only workloads—e.g., reading config, videos, images, etc.
- Not suitable for writing; consider the OSS SDK for writes or switch to NAS.
- ossfs can be tuned (cache, permissions, etc.) via configuration parameters.
- ossfs limitations:
- Random or append writes cause the entire file to be rewritten.
- Listing directories and other metadata operations are slow due to remote calls.
- File/folder rename is not atomic.
- When multiple clients mount the same bucket, users must coordinate behavior (e.g., avoid concurrent writes to the same file).
- No hard links.
- For CSI plugin versions below v1.20.7, only local changes are detected; external modifications by other clients or tools are ignored.
- Do not use in high-concurrent read/write scenarios to avoid system overload.
- In a hybrid cluster (with some nodes outside Alibaba Cloud) only NAS and OSS static volumes can be used.
- Cloud disks, NAS, and OSS have region restrictions.
In summary: Cloud disks are provisioned and mounted as whole disks, making sharing inconvenient. OSS operates at file granularity; high-concurrent read/write suffers performance issues and supported OSes are limited.
- Cloud disks suit databases or other scenarios demanding large space and high performance.
- For scenarios with lower performance needs, NAS is a good choice.
- OSS is unsuitable for high-concurrent writes on Alibaba Cloud clusters, though it may suit concurrent-read workloads.
The official documentation contains inconsistencies and contradictions; readers should check the date and run their own tests to verify whether a formerly unsupported feature may have since become supported.
Operation Steps
Follow the official Alibaba Cloud guide. After installing the storage plugin as described above, you can proceed with deployment using Use NAS static volumes.
Note: k3s users may hit issues with local-path-storage, seeing errors like:
- failed to provision volume with StorageClass “local-path”: claim.Spec.Selector is not supported
- Waiting for a volume to be created either by the external provisioner ’localplugin.csi.alibabacloud.com’ or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
To avoid k3s’s default local-path-storage, set storageClassName in the persistentVolumeClaim to empty:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pvc-nas
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 2Gi
selector:
matchLabels:
alicloud-pvname: pv-nas
storageClassName: ""
References
- https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver
- https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver/blob/master/docs/disk.md
- https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver/blob/master/docs/install.md
- https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver/blob/master/docs/ram-policies/disk.json
- https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver/blob/master/deploy/chart/values.yaml
- https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/use-dynamically-provisioned-disk-volumes?#6d16e8a415nie
- https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/mount-statically-provisioned-nas-volumes?spm=a2c4g.11186623.0.0.125672b9VnrKw6