Understanding storage in Kubernetes
How to make external storage resources available to pods using volumes. Understand dynamic storage resource provisioning through StorageClass, PersistentVolume and PersistentVolumeClaim resources.
Table of contents
Overview
The filesystem inside containers is ephemeral. Anything stored inside a container during runtime will disapear after the container restart. To get rid of that and make Kubernetes Pod's containers data persist after restart, we can use Volumes.
When we look at Kubernetes pods resource specification, there is a field called volumes
, that can be used to declare a list of volumes belonging to a pod. Each container inside the pod can then mount one or many of the declared volumes using the containers.volumeMounts
field. Here is an example pod.spec
manifest for illustration :
(...)
spec:
containers:
- name: nginx
image: nginx
volumeMounts:
- name: websites
mountPath: /websites
volumes:
- name: websites
emptyDir: # A temporary directory that shares a pod's lifetime
medium: "" # Use the nodes default storage medium to back this dir
# Value can also be 'Memory' to use nodes RAM as backend
Volumes offer the possibility to make external storage resources available for use by our pods containers. There are different types of volumes supported by Kubernetes that we can use. In the above example we used emptyDir that is actually not a persitent storage type... it can be used to share temporary data between pods containers. For a complete list of volumes types we can use, have a look at volumes.
Also, for a list of available options we can use when mounting volumes inside pods containers, we can have a look at containers.volumeMounts.
It is also possible to make pods use an existing PersitentVolume (PV) resource by referencing the associated PersitentVolumeClaim (PVC) inside pod.spec.volumes
as follows :
(...)
spec:
containers:
- name: nginx
image: nginx
volumeMounts:
- name: websites
mountPath: /websites
volumes:
- name: websites
persistentVolumeClaim:
claimName: websites
readOnly: false
We will talk about PV and PVC in the nexts sections.
Dynamic storage resource provisioning
Storage resources (represented by the PersistentVolume object) can be dynamically provionned thanks to StorageClass and PersitentVolumeClaim. Here is a diagram describing how that dynamic provisioning occurs. Next paragraphs will give you more explanations about all of this.
StorageClass
StorageClass resources are created by a Kubernetes cluster administrator and define the type of storage resources (for instance NFS, GCE persistent disks...) that can be automatically provisioned once requested by users.
The underlying storage resource is created by a volume provisioner whose name is specified inside the StorageClass. The provisioner actually makes a call to a volume plugin API to create the underlying storage resource.
Here is an example StorageClass resource manifest :
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs
provisioner: example.com/nfs # External provisioner
parameters:
server: nfs-server.example.com
path: /share
readOnly: "false"
# Allow expansion of the volumes created from this StorageClass
allowVolumeExpansion: true
# Reclaim policy of the volumes created from this StorageClass
# Determines what happens to the volume when released
reclaimPolicy: Recycle # Cleanup the volume data and make it
# available for use by another claim
# Other possible values :
# - Delete : delete the volume (Default)
# - Retain : preserve the volume and its data. The volume
# won't be available for use by another claim
Provisioners that are natively supported by Kubernetes are internal provisioners and the others are external provisioners.
The Kubernetes volume provisioners page shows some of the Kubernetes volume provisioners and their associated volume plugins. The page also tells wheather the provisioners are internal and also contains links to StorageClass resource manifest example for some of them.
PersistentVolumeClaim (PVC)
A user request for provisioning storage resources from a specific StorageClass is achieved through a PersitentVolumeClaim (PVC) resource.
Here is an example PVC manifest :
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nfs
spec:
accessModes:
- ReadWriteMany
storageClassName: nfs
resources:
requests:
storage: 3Gi # requested volume size
If pvc.spec.storageClassName
is not specified or empty, the Kubernetes cluster default StorageClass is used.
Each volume provisioner available inside the cluster has a controller that periodically watches PersitentVolumeClaim resources.
Here is a simplified overview of how the PVC request is satisfied by the provisioner :
- For each PVC created inside the Kubernetes cluster, the controller looks at the
spec.storageClassName
field. After that, the controller makes sure the specified StorageClass resource exists. If it doesn't exit, nothing is done - If it exists, and
StorageClass.provisioner
corresponds to the provisioner name, the provisioner tries to find a volume (PersitentVolume resource) satisfying the request (same StorageClass, access mode, storage size greater than or equal to what's inside the PVC ...) that is not already associated (or bounded) to a PVC - If found, the volume is bounded to the PVC and the request is satisfied. If not found, the provisioner tries to create the volume using parameters specified inside the StorageClass
- The provisioner makes a call to the appropriate volume plugin in order to create the underlying storage resource. If the volume plugin succeeds in creating the underlying storage resource (a GCE persitent disk for instance), the provisioner creates the associated PersistentVolume (PV) resource and bound it to the PVC
- If the volume plugin fails in creating the underlying storage resource, the provisioner returns an error to the user
As we can see, once a PersitentVolumeClaim (PVC) request is satisfied, a binding between the PVC and the PersistentVolume (PV) satifying the request is done. This is a bi-directional binding that is achieved as follows :
- The PV references the PVC :
pv.spec.claimRef.name
contains the name of the PVCpv.spec.claimRef.namespace
contains the namespace where the PVC resides
- The PVC references the PV :
pvc.spec.volumeName
contains the name of the PV
PersistentVolume (PV)
PersistentVolume (PV) is an object representing the underlying storage resource that will actually be used by pods to store data.
Here is how the Kubernetes PV documentation defines it :
A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using Storage Classes. It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual Pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system
Here is an example PV manifest :
apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs
spec:
# Name of the StorageClass to which this PV belongs
storageClassName: nfs
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
nfs:
server: nfs-server.example.com
path: /share
readOnly: "false"
volumeMode: Filesystem # or Block. Default: Filesystem
persistentVolumeReclaimPolicy: Delete # Default for dynamically created PVs
When using dynamic storage provionning using StorageClass as seen before (read the StorageClass and PersitentVolumeClaim sections), the PV resource is automatically created to reflect the underlying storage resource provisioned and then bounded to the user request PVC.
Static storage resource provisioning
PV resources can also be manually provisioned by administrators. Here is an example manifest that can be used to create the PV :
apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
nfs:
server: nfs-server.example.com
path: /share
readOnly: "false"
volumeMode: Filesystem # or Block. Default: Filesystem
persistentVolumeReclaimPolicy: Retain # Default for manually created PVs
If its done that way and a user wants to use the pre-provisioned PV, it has to create a PVC that looks like this :
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nfs
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 2Gi # requested volume size
volumeName: nfs # Directly specify the name of the PV to use
Optionally, we could have used the spec.storageClassName
field in both the PV and PVC resource manifest to specify a StorageClass resource name that has a special provisioner. That special provisioner actually doesn't do dynamic provisioning.
Using this option allow us to enable volume expansion by using the StorageClass allowVolumeExpansion
field as their is no such equivalent field for the PV resource. Here is the sample manifest for creating that StorageClass :
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: local
provisioner: kubernetes.io/no-provisioner
allowVolumeExpansion: true
Examples
Dynamic provisioning of GCE persistent disks for GKE pods
Here is an example of a StorageClass object that can be used to dynamically provision persistent disks in GCP. See gce-pd-storageclass for details.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: standard
provisioner: kubernetes.io/gce-pd # Internal provisioner
parameters:
type: pd-standard
fstype: ext4
replication-type: none
Once the preceding StorageClass object created inside a GKE cluster, the creation of a PersistentVolumeClaim referencing that StorageClass name (standard) in its spec.storageClassName
parameter will automatically create a GCE (Google Cloud Engine) persistent disk with caracteristics defined inside the StorageClass. Once the GCE persistent disk successfully provisioned, the associated PersistentVolume
resource is created and bounded to the PersistentVolumeClaim
resource.
The size we want for the storage resource is specified inside the PersitentVolumeClaim's resource spec.resources.requests.storage
field. Here is a sample manifest of a PersitentVolumeClaim resource using the preceding StorageClass :
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: gce-pd
spec:
accessModes:
- ReadWriteOnce
storageClassName: standard
resources:
requests:
storage: 5Gi # requested volume size
When using StatefulSets for managing pods with persistence, the PersitentVolumeClaim resource can be dynamically created for each pod of the StatefulSet using the spec.volumeClaimTemplates
field as follows :
(...)
volumeClaimTemplates:
- metadata:
name: pvc
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "standard"
resources:
requests:
storage: 2Gi # size of the disk
A new GCE persistent disks (with caracteristics defined inside the standard StorageClass) will automatically be provisioned for each pod of the StatefulSet.
Dynamic NFS storage provisioning
The nfs-ganesha-server-and-external-provisioner project can be used to easily deploy an NFS server and its associated external provisioner. Once deployed, creating PVCs referring to the NFS server's StorageClass will automatically do the following :
- create a dedicated NFS export for the PVC
- create a PV using the dedicated NFS export
- bound the PV and the PVC
Deploy the NFS server and external provisioner
- Pre-requisite : Helm
- Also note that for this example, we are using the standard managed Kubernetes service from Google Cloud Platform (Google Kubernetes Engine)
- On Google Kubernetes Engine (GKE), the
standard
StorageClass can be used to dynamically provision HDD disks on Google Cloud Platform. The NFS server will use that StorageClass to provision a disk for data persistence
# Add the Helm charts repository
$ helm repo add nfs-ganesha-server-and-external-provisioner https://kubernetes-sigs.github.io/nfs-ganesha-server-and-external-provisioner/
"nfs-ganesha-server-and-external-provisioner" has been added to your repositories
# Update the Helm charts repository
$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "nfs-ganesha-server-and-external-provisioner" chart repository
Update Complete. ⎈Happy Helming!⎈
Here is the content of the values.yml file we used for the deployment :
replicaCount: 1
persistence:
enabled: true
accessMode: ReadWriteOnce
storageClass: standard
size: 5Gi
storageClass:
create: true
defaultClass: false
name: nfs
allowVolumeExpansion: true
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 100m
memory: 128Mi
Feel free to adjust the configuration according to your needs, using the nfs-server-provisioner-config-params page. Now lets run the installation command in order to deploy the NFS server and its associated external provisioner :
$ helm upgrade --install testnfs-nfs-provisionner -n testnfs nfs-ganesha-server-and-external-provisioner/nfs-server-provisioner -f values.yml --create-namespace
The Helm release name is testnfs-nfs-provisionner and its associated resources will be created inside the testnfs namespace. The namespace will be created if it doesn't exist. The above installation command is idempotent and can also be used to update the Helm release after configuration change.
A look at some of the created ressources
Here are the created resources after the deployment :
$ kubectl get all -n testnfs
NAME READY STATUS RESTARTS AGE
pod/testnfs-nfs-provisionner-nfs-server-provisioner-0 1/1 Running 0 3m36s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/testnfs-nfs-provisionner-nfs-server-provisioner ClusterIP 10.200.113.168 <none> 2049/TCP,2049/UDP,32803/TCP,32803/UDP,20048/TCP,20048/UDP,875/TCP,875/UDP,111/TCP,111/UDP,662/TCP,662/UDP 3m37s
NAME READY AGE
statefulset.apps/testnfs-nfs-provisionner-nfs-server-provisioner 1/1 3m37s
$ kubectl describe sc/nfs -n testnfs
Name: nfs
IsDefaultClass: No
Annotations: meta.helm.sh/release-name=nfs-provisionner,meta.helm.sh/release-namespace=testnfs
Provisioner: cluster.local/nfs-provisionner-nfs-server-provisioner
Parameters: <none>
AllowVolumeExpansion: True
MountOptions:
vers=3
retrans=2
timeo=30
ReclaimPolicy: Delete
VolumeBindingMode: Immediate
Events: <none>
$ kubectl get pvc -n testnfs
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
data-testnfs-nfs-provisionner-nfs-server-provisioner-0 Bound pvc-ca361607-af1d-4125-9d76-2235669e0eb0 5Gi RWO standard 108s
$ kubectl get pv/pvc-ca361607-af1d-4125-9d76-2235669e0eb0 -n testnfs
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-ca361607-af1d-4125-9d76-2235669e0eb0 5Gi RWO Delete Bound testnfs/data-testnfs-nfs-provisionner-nfs-server-provisioner-0 standard 109s
$ kubectl describe pv/pvc-ca361607-af1d-4125-9d76-2235669e0eb0 -n testnfs
Name: pvc-ca361607-af1d-4125-9d76-2235669e0eb0
Labels: topology.kubernetes.io/region=europe-west1
topology.kubernetes.io/zone=europe-west1-c
Annotations: pv.kubernetes.io/migrated-to: pd.csi.storage.gke.io
pv.kubernetes.io/provisioned-by: kubernetes.io/gce-pd
volume.kubernetes.io/provisioner-deletion-secret-name:
volume.kubernetes.io/provisioner-deletion-secret-namespace:
Finalizers: [kubernetes.io/pv-protection external-attacher/pd-csi-storage-gke-io]
StorageClass: standard
Status: Bound
Claim: testnfs/data-testnfs-nfs-provisionner-nfs-server-provisioner-0
Reclaim Policy: Delete
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 5Gi
Node Affinity:
Required Terms:
Term 0: topology.kubernetes.io/zone in [europe-west1-c]
topology.kubernetes.io/region in [europe-west1]
Message:
Source:
Type: GCEPersistentDisk (a Persistent Disk resource in Google Compute Engine)
PDName: pvc-ca361607-af1d-4125-9d76-2235669e0eb0
FSType: ext4
Partition: 0
ReadOnly: false
Events: <none>
Using the NFS server
Now lets test dynamic storage provisioning from the NFS server to make sure things are working properly.
For that, we start by making an NFS storage request by creating a PVC with nfs as StorageClass and 100Mi for the storage size. We also set the requested storage access mode to ReadWriteMany as we are using an NFS storage and want our workloads to be able to do multiple read and write on the filesystem. Here is the PVC manifest :
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: nfstest-pvc
namespace: testnfs
spec:
storageClassName: "nfs"
accessModes:
- ReadWriteMany
resources:
requests:
storage: 100Mi
Then, we create a Deployment with 2 replicas and make each replica pod use the same NFS server export through the previously created PVC. The NFS export will be mounted at the /nfs
path inside the pods containers. Here is the Deployment manifest :
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
namespace: testnfs
labels:
app: nginx
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
volumes:
- name: nfs
persistentVolumeClaim:
claimName: nfstest-pvc
readOnly: false
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
volumeMounts:
- name: nfs
mountPath: /nfs
Now lets verify that things are working properly after applying the previous PVC and Deployment manifests :
# PVC properly created and bounded
$ kubectl get pvc/nfstest-pvc -n testnfs
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
nfstest-pvc Bound pvc-e5c89579-8e17-492a-953d-eb1643a32538 100Mi RWX testnfs 78s
# Get created PV details
$ kubectl get pv/pvc-e5c89579-8e17-492a-953d-eb1643a32538 -n testnfs -o yaml
apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
EXPORT_block: "\nEXPORT\n{\n\tExport_Id = 1;\n\tPath = /export/pvc-e5c89579-8e17-492a-953d-eb1643a32538;\n\tPseudo
= /export/pvc-e5c89579-8e17-492a-953d-eb1643a32538;\n\tAccess_Type = RW;\n\tSquash
= no_root_squash;\n\tSecType = sys;\n\tFilesystem_id = 1.1;\n\tFSAL {\n\t\tName
= VFS;\n\t}\n}\n"
Export_Id: "1"
Project_Id: "0"
Project_block: ""
Provisioner_Id: 7de05b4f-5d9e-494e-9c03-f67efe86efd0
kubernetes.io/createdby: nfs-dynamic-provisioner
pv.kubernetes.io/provisioned-by: cluster.local/testnfs-nfs-provisionner-nfs-server-provisioner
creationTimestamp: "*****"
finalizers:
- kubernetes.io/pv-protection
name: pvc-e5c89579-8e17-492a-953d-eb1643a32538
resourceVersion: "809393219"
uid: fd3e20be-c31d-49a0-b268-12eadd390169
spec:
accessModes:
- ReadWriteMany
capacity:
storage: 100Mi
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: nfstest-pvc
namespace: testnfs
resourceVersion: "800467992"
uid: e5c89579-8e17-492a-953d-eb1643a32538
mountOptions:
- vers=3
- retrans=2
- timeo=30
nfs:
path: /export/pvc-e5c89579-8e17-492a-953d-eb1643a32538
server: 10.200.114.71
persistentVolumeReclaimPolicy: Delete
storageClassName: testnfs
volumeMode: Filesystem
status:
phase: Released
# Deployment pods running
$ kubectl get pods -n testnfs
NAME READY STATUS RESTARTS AGE
nginx-deployment-8568f6d5df-2fvq6 1/1 Running 0 108s
nginx-deployment-8568f6d5df-vjb2g 1/1 Running 0 108s
(...)
The NFS export is properly mounted inside each of the containers. We can write a file in one of them and verify that it is also present inside the other filesystem :
$ kubectl exec -it pods/nginx-deployment-8568f6d5df-2fvq6 -n testnfs -- /bin/bash
root@nginx-deployment-8568f6d5df-2fvq6:/# df -h
Filesystem Size Used Avail Use% Mounted on
(...)
10.200.114.71:/export/pvc-e5c89579-8e17-492a-953d-eb1643a32538 4.9G 0 4.9G 0% /nfs
(...)
root@nginx-deployment-8568f6d5df-2fvq6:/# touch /nfs/test
root@nginx-deployment-8568f6d5df-2fvq6:/# echo "test" > /nfs/test
root@nginx-deployment-8568f6d5df-2fvq6:/# cat /nfs/test
test
$ kubectl exec -it pods/nginx-deployment-8568f6d5df-vjb2g -n testnfs -- /bin/bash
root@nginx-deployment-8568f6d5df-vjb2g:/# df -h
Filesystem Size Used Avail Use% Mounted on
(...)
10.200.114.71:/export/pvc-e5c89579-8e17-492a-953d-eb1643a32538 4.9G 0 4.9G 0% /nfs
(...)
root@nginx-deployment-8568f6d5df-vjb2g:/# cat /nfs/test
test