Understanding storage in Kubernetes

How to make external storage resources available to pods using volumes. Understand dynamic storage resource provisioning through StorageClass, PersistentVolume and PersistentVolumeClaim resources.

Understanding storage in Kubernetes

Overview

The filesystem inside containers is ephemeral. Anything stored inside a container during runtime will disapear after the container restart. To get rid of that and make Kubernetes Pod's containers data persist after restart, we can use 'Volumes'.

When we look at Kubernetes pods resource specification, there is a field called 'volumes', that can be used to declare a list of volumes belonging to a pod. Each container inside the pod can then mount one or many of the declared volumes using the 'containers.volumeMounts' field. Here is an example 'pod.spec' manifest for illustration:

(...)
    spec:
      containers:
      - name: nginx
        image: nginx
        volumeMounts:
          - name: websites
            mountPath: /websites
      volumes:
        - name: websites
          emptyDir:     # A temporary directory that shares a pod's lifetime
            medium: ""  # Use the nodes default storage medium to back this dir
                        # Value can also be 'Memory' to use nodes RAM as backend

Volumes offer the possibility to make external storage resources available for use by our pods containers. There are different types of volumes supported by Kubernetes that we can use. In the above example we have used 'emptyDir' that is actually not a persitent storage type... it can be used to share temporary data between pods containers. For a complete list of volumes types we can use, have a look at volumes.

Also, for a list of available options we can use when mounting volumes inside pods containers, we can have a look at containers.volumeMounts.

It is also possible to make pods use an existing 'PersitentVolume (PV)' resource by referencing the associated 'PersitentVolumeClaim (PVC)' inside 'pod.spec.volumes' as follows:

(...)
    spec:
      containers:
      - name: nginx
        image: nginx
        volumeMounts:
          - name: websites
            mountPath: /websites
      volumes:
        - name: websites
          persistentVolumeClaim:
            claimName: websites
            readOnly: false

We will talk about 'PV' and 'PVC' in the next sections.

Dynamic storage resource provisioning

Storage resources (represented by the 'PersistentVolume' object) can be dynamically provionned thanks to 'StorageClass' and 'PersitentVolumeClaim'. Here is a diagram describing how that dynamic provisioning occurs. Next paragraphs will give you more explanations about all of this.

k8s_dynamic_storage_resource_provisionning.drawio-1

StorageClass

StorageClass resources are created by a Kubernetes cluster administrator and define the type of storage resources (for instance NFS, GCE persistent disks...) that can be automatically provisioned once requested by users.

The underlying storage resource is created by a volume 'provisioner' whose name is specified inside the 'StorageClass'. The 'provisioner' actually makes a call to a 'volume plugin' API to create the underlying storage resource.

Here is an example 'StorageClass' resource manifest:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs
provisioner: example.com/nfs # External provisioner
parameters:
  server: nfs-server.example.com
  path: /share
  readOnly: "false"
# Allow expansion of the volumes created from this StorageClass
allowVolumeExpansion: true  
# Reclaim policy of the volumes created from this StorageClass
# Determines what happens to the volume when released
reclaimPolicy: Recycle # Cleanup the volume data and make it 
                       # available for use by another claim
                       # Other possible values:
                       #  - Delete: delete the volume (Default)
                       #  - Retain: preserve the volume and its data. The volume
                                   # won't be available for use by another claim

Provisioners that are natively supported by Kubernetes are internal provisioners and the others are external provisioners.

The Kubernetes volume provisioners page shows some of the Kubernetes volume 'provisioners' and their associated 'volume plugins'. The page also tells wheather the 'provisioners' are 'internal' and also contains links to 'StorageClass' resources manifests examples for some of them.

PersistentVolumeClaim (PVC)

A user request for provisioning storage resources from a specific 'StorageClass' is achieved through a PersitentVolumeClaim ('PVC') resource.

Here is an example 'PVC' manifest :

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: nfs
  resources:
    requests:
      storage: 3Gi # requested volume size

If 'pvc.spec.storageClassName' is not specified or empty, the Kubernetes cluster default 'StorageClass' is used.

Each volume 'provisioner' available inside the cluster has a 'controller' that periodically watches 'PersitentVolumeClaim' resources.

Here is a simplified overview of how the 'PVC' request is satisfied by the 'provisioner':

  • For each 'PVC' created inside the Kubernetes cluster, the 'controller' looks at the 'spec.storageClassName' field. After that, the 'controller' ensures the specified 'StorageClass' resource exists. If it doesn't exit, nothing is done
  • If it exists, and 'StorageClass.provisioner' corresponds to the provisioner name, the 'provisioner' tries to find a volume (PersitentVolume resource) satisfying the request (same 'StorageClass', access mode, storage size greater than or equal to what's inside the 'PVC' ...) that is not already associated (or bounded) to a 'PVC'
  • If found, the volume is bounded to the 'PVC' and the request is satisfied. If not found, the 'provisioner' tries to create the volume using parameters specified inside the 'StorageClass'
  • The 'provisioner' makes a call to the appropriate 'volume plugin' in order to create the underlying storage resource. If the 'volume plugin' succeeds in creating the underlying storage resource (a GCE persitent disk for instance), the 'provisioner' creates the associated 'PersistentVolume (PV)' resource and bound it to the 'PVC'
  • If the 'volume plugin' fails in creating the underlying storage resource, the 'provisioner' returns an error to the user

As we can see, once a 'PersitentVolumeClaim (PVC)' request is satisfied, a 'binding' between the 'PVC' and the 'PersistentVolume (PV)' satifying the request is done. This is a bi-directional binding that is achieved as follows:

  • The 'PV' references the 'PVC' :
    • 'pv.spec.claimRef.name' contains the name of the 'PVC'
    • 'pv.spec.claimRef.namespace' contains the namespace where the 'PVC' resides
  • The 'PVC' references the 'PV' :
    • 'pvc.spec.volumeName' contains the name of the 'PV'

PersistentVolume (PV)

PersistentVolume ('PV') is an object representing the underlying storage resource that will actually be used by pods to store data.

Here is how the Kubernetes PV documentation defines it:

A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using Storage Classes. It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual Pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system

Here is an example 'PV' manifest:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs
spec:
  # Name of the StorageClass to which this PV belongs
  storageClassName: nfs
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteMany
  nfs:
    server: nfs-server.example.com
    path: /share
    readOnly: "false"
  volumeMode: Filesystem # or Block. Default: Filesystem
  persistentVolumeReclaimPolicy: Delete # Default for dynamically created PVs

When using dynamic storage provisioning using 'StorageClass' as seen before (read the StorageClass and PersitentVolumeClaim sections), the 'PV' resource is automatically created to reflect the underlying storage resource provisioned and then bounded to the user request 'PVC'.

Static storage resource provisioning

'PV' resources can also be manually provisioned by administrators. Here is an example manifest that can be used to create the 'PV':

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteMany
  nfs:
    server: nfs-server.example.com
    path: /share
    readOnly: "false"
  volumeMode: Filesystem # or Block. Default: Filesystem
  persistentVolumeReclaimPolicy: Retain # Default for manually created PVs

If its done that way and a user wants to use the pre-provisioned 'PV', it has to create a 'PVC' that looks like this:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 2Gi # requested volume size
  volumeName: nfs # Directly specify the name of the PV to use

Optionally, we could have used the 'spec.storageClassName' field in both the 'PV' and 'PVC' resource manifest to specify a 'StorageClass' resource name that has a special 'provisioner'. That special 'provisioner' actually doesn't do dynamic provisioning.

Using this option allows us to enable volume expansion by using the 'StorageClass' 'allowVolumeExpansion' field as their is no such equivalent field for the 'PV' resource. Here is the sample manifest for creating that 'StorageClass':

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local
provisioner: kubernetes.io/no-provisioner
allowVolumeExpansion: true  

Examples

Dynamic provisioning of GCE persistent disks for GKE pods

Here is an example of a 'StorageClass' object that can be used to dynamically provision persistent disks in GCP. See gce-pd-storageclass for details.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: standard
provisioner: kubernetes.io/gce-pd # Internal provisioner
parameters:
  type: pd-standard
  fstype: ext4
  replication-type: none

Once the preceding 'StorageClass' object created inside a GKE cluster, the creation of a 'PersistentVolumeClaim' referencing that 'StorageClass' name ('standard') in its 'spec.storageClassName' parameter will automatically create a GCE (Google Compute Engine) persistent disk with the caracteristics defined inside the 'StorageClass'. Once the GCE persistent disk is successfully provisioned, the associated 'PersistentVolume' resource is created and bounded to the 'PersistentVolumeClaim' resource.

The size we want for the storage resource is specified inside the 'PersitentVolumeClaim' resource 'spec.resources.requests.storage' field. Here is a sample manifest of a 'PersitentVolumeClaim' resource using the preceding 'StorageClass':

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: gce-pd
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: standard
  resources:
    requests:
      storage: 5Gi # requested volume size

When using StatefulSets for managing pods with persistence, the 'PersitentVolumeClaim' resource can be dynamically created for each pod of the 'StatefulSet' using the 'spec.volumeClaimTemplates' field as follows:

(...)
  volumeClaimTemplates:
  - metadata:
      name: pvc
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "standard"
      resources:
        requests:
          storage: 2Gi # size of the disk

A new GCE persistent disk (with caracteristics defined inside the 'standard' 'StorageClass') will automatically be provisioned for each pod of the 'StatefulSet'.

Dynamic NFS storage provisioning

The nfs-ganesha-server-and-external-provisioner project can be used to easily deploy an NFS server and its associated external provisioner. Once deployed, creating 'PVCs' referring to the NFS server's 'StorageClass' will automatically do the following:

  • create a dedicated NFS export for the 'PVC'
  • create a 'PV' using the dedicated NFS export
  • bound the 'PV' and the 'PVC'
Deploy the NFS server and external provisioner
  • Pre-requisite: Helm
  • Also note that for this example, we are using the standard managed Kubernetes service from Google Cloud Platform (Google Kubernetes Engine)
  • On Google Kubernetes Engine (GKE), the 'standard' 'StorageClass' can be used to dynamically provision HDD disks on Google Cloud Platform. The NFS server will use that 'StorageClass' to provision a disk for data persistence
# Add the Helm charts repository
$ helm repo add nfs-ganesha-server-and-external-provisioner https://kubernetes-sigs.github.io/nfs-ganesha-server-and-external-provisioner/
"nfs-ganesha-server-and-external-provisioner" has been added to your repositories

# Update the Helm charts repository
$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "nfs-ganesha-server-and-external-provisioner" chart repository
Update Complete. ⎈Happy Helming!⎈

Here is the content of the 'values.yml' file we used for the deployment:

replicaCount: 1

persistence:
  enabled: true
  accessMode: ReadWriteOnce
  storageClass: standard
  size: 5Gi

storageClass:
  create: true
  defaultClass: false
  name: nfs
  allowVolumeExpansion: true

resources:
  limits:
    cpu: 100m
    memory: 128Mi
  requests:
    cpu: 100m
    memory: 128Mi

Feel free to adjust the configuration according to your needs, using the nfs-server-provisioner-config-params page. Now lets run the installation command in order to deploy the NFS server and its associated external provisioner:

$ helm upgrade --install testnfs-nfs-provisionner -n testnfs nfs-ganesha-server-and-external-provisioner/nfs-server-provisioner -f values.yml --create-namespace

The 'Helm' release name is 'testnfs-nfs-provisionner' and its associated resources will be created inside the 'testnfs' namespace. The namespace will be created if it doesn't exist. The above installation command is idempotent and can also be used to update the 'Helm' release after configuration change.

A look at some of the created ressources

Here are the created resources after the deployment:

$ kubectl get all -n testnfs
NAME                                                    READY   STATUS    RESTARTS   AGE
pod/testnfs-nfs-provisionner-nfs-server-provisioner-0   1/1     Running   0          3m36s

NAME                                                      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                                                                                     AGE
service/testnfs-nfs-provisionner-nfs-server-provisioner   ClusterIP   10.200.113.168   <none>        2049/TCP,2049/UDP,32803/TCP,32803/UDP,20048/TCP,20048/UDP,875/TCP,875/UDP,111/TCP,111/UDP,662/TCP,662/UDP   3m37s

NAME                                                               READY   AGE
statefulset.apps/testnfs-nfs-provisionner-nfs-server-provisioner   1/1     3m37s
$ kubectl describe sc/nfs -n testnfs
Name:                  nfs
IsDefaultClass:        No
Annotations:           meta.helm.sh/release-name=nfs-provisionner,meta.helm.sh/release-namespace=testnfs
Provisioner:           cluster.local/nfs-provisionner-nfs-server-provisioner
Parameters:            <none>
AllowVolumeExpansion:  True
MountOptions:
  vers=3
  retrans=2
  timeo=30
ReclaimPolicy:      Delete
VolumeBindingMode:  Immediate
Events:             <none>
$ kubectl get pvc -n testnfs
NAME                                                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-testnfs-nfs-provisionner-nfs-server-provisioner-0   Bound    pvc-ca361607-af1d-4125-9d76-2235669e0eb0   5Gi        RWO            standard       108s
$ kubectl get pv/pvc-ca361607-af1d-4125-9d76-2235669e0eb0 -n testnfs
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                                            STORAGECLASS   REASON   AGE
pvc-ca361607-af1d-4125-9d76-2235669e0eb0   5Gi        RWO            Delete           Bound    testnfs/data-testnfs-nfs-provisionner-nfs-server-provisioner-0   standard                109s

$ kubectl describe pv/pvc-ca361607-af1d-4125-9d76-2235669e0eb0 -n testnfs
Name:              pvc-ca361607-af1d-4125-9d76-2235669e0eb0
Labels:            topology.kubernetes.io/region=europe-west1
                   topology.kubernetes.io/zone=europe-west1-c
Annotations:       pv.kubernetes.io/migrated-to: pd.csi.storage.gke.io
                   pv.kubernetes.io/provisioned-by: kubernetes.io/gce-pd
                   volume.kubernetes.io/provisioner-deletion-secret-name:
                   volume.kubernetes.io/provisioner-deletion-secret-namespace:
Finalizers:        [kubernetes.io/pv-protection external-attacher/pd-csi-storage-gke-io]
StorageClass:      standard
Status:            Bound
Claim:             testnfs/data-testnfs-nfs-provisionner-nfs-server-provisioner-0
Reclaim Policy:    Delete
Access Modes:      RWO
VolumeMode:        Filesystem
Capacity:          5Gi
Node Affinity:
  Required Terms:
    Term 0:        topology.kubernetes.io/zone in [europe-west1-c]
                   topology.kubernetes.io/region in [europe-west1]
Message:
Source:
    Type:       GCEPersistentDisk (a Persistent Disk resource in Google Compute Engine)
    PDName:     pvc-ca361607-af1d-4125-9d76-2235669e0eb0
    FSType:     ext4
    Partition:  0
    ReadOnly:   false
Events:         <none>
Using the NFS server

Now lets test dynamic storage provisioning from the NFS server to make sure things are working properly.

For that, we start by making an NFS storage request by creating a 'PVC' with 'nfs' as 'StorageClass' and '100Mi' for the storage size. We also set the requested storage access mode to 'ReadWriteMany' as we are using a NFS storage and want our workloads to be able to do multiple read and write on the filesystem. Here is the 'PVC' manifest:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: nfstest-pvc
  namespace: testnfs
spec:
  storageClassName: "nfs"
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 100Mi

Then, we create a 'Deployment' with '2 replicas' and make each replica pod use the same NFS server export through the previously created 'PVC'. The NFS export will be mounted at the '/nfs' path inside the pods containers. Here is the 'Deployment' manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  namespace: testnfs
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      volumes:
        - name: nfs
          persistentVolumeClaim:
            claimName: nfstest-pvc
            readOnly: false
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
        volumeMounts:
          - name: nfs
            mountPath: /nfs

Now lets verify that things are working properly after applying the previous 'PVC' and 'Deployment' manifests:

# PVC properly created and bounded
$ kubectl get pvc/nfstest-pvc -n testnfs
NAME          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
nfstest-pvc   Bound    pvc-e5c89579-8e17-492a-953d-eb1643a32538   100Mi      RWX            testnfs        78s
# Get created PV details
$ kubectl get pv/pvc-e5c89579-8e17-492a-953d-eb1643a32538 -n testnfs -o yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    EXPORT_block: "\nEXPORT\n{\n\tExport_Id = 1;\n\tPath = /export/pvc-e5c89579-8e17-492a-953d-eb1643a32538;\n\tPseudo
      = /export/pvc-e5c89579-8e17-492a-953d-eb1643a32538;\n\tAccess_Type = RW;\n\tSquash
      = no_root_squash;\n\tSecType = sys;\n\tFilesystem_id = 1.1;\n\tFSAL {\n\t\tName
      = VFS;\n\t}\n}\n"
    Export_Id: "1"
    Project_Id: "0"
    Project_block: ""
    Provisioner_Id: 7de05b4f-5d9e-494e-9c03-f67efe86efd0
    kubernetes.io/createdby: nfs-dynamic-provisioner
    pv.kubernetes.io/provisioned-by: cluster.local/testnfs-nfs-provisionner-nfs-server-provisioner
  creationTimestamp: "*****"
  finalizers:
  - kubernetes.io/pv-protection
  name: pvc-e5c89579-8e17-492a-953d-eb1643a32538
  resourceVersion: "809393219"
  uid: fd3e20be-c31d-49a0-b268-12eadd390169
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 100Mi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: nfstest-pvc
    namespace: testnfs
    resourceVersion: "800467992"
    uid: e5c89579-8e17-492a-953d-eb1643a32538
  mountOptions:
  - vers=3
  - retrans=2
  - timeo=30
  nfs:
    path: /export/pvc-e5c89579-8e17-492a-953d-eb1643a32538
    server: 10.200.114.71
  persistentVolumeReclaimPolicy: Delete
  storageClassName: testnfs
  volumeMode: Filesystem
status:
  phase: Released
# Deployment pods running
$ kubectl get pods -n testnfs
NAME                                                READY   STATUS    RESTARTS   AGE 
nginx-deployment-8568f6d5df-2fvq6                   1/1     Running   0          108s
nginx-deployment-8568f6d5df-vjb2g                   1/1     Running   0          108s
(...)

The NFS export is properly mounted inside each of the containers. We can write a file in one of them and verify that it is also present inside the other's filesystem:

$ kubectl exec -it pods/nginx-deployment-8568f6d5df-2fvq6 -n testnfs -- /bin/bash
root@nginx-deployment-8568f6d5df-2fvq6:/# df -h
Filesystem                                                      Size  Used Avail Use% Mounted on
(...)
10.200.114.71:/export/pvc-e5c89579-8e17-492a-953d-eb1643a32538  4.9G     0  4.9G   0% /nfs
(...)
root@nginx-deployment-8568f6d5df-2fvq6:/# touch /nfs/test
root@nginx-deployment-8568f6d5df-2fvq6:/# echo "test" > /nfs/test
root@nginx-deployment-8568f6d5df-2fvq6:/# cat /nfs/test 
test

$ kubectl exec -it pods/nginx-deployment-8568f6d5df-vjb2g -n testnfs -- /bin/bash
root@nginx-deployment-8568f6d5df-vjb2g:/# df -h
Filesystem                                                      Size  Used Avail Use% Mounted on
(...)
10.200.114.71:/export/pvc-e5c89579-8e17-492a-953d-eb1643a32538  4.9G     0  4.9G   0% /nfs
(...)
root@nginx-deployment-8568f6d5df-vjb2g:/# cat /nfs/test 
test