Understanding storage in Kubernetes

How to make external storage resources available to pods using volumes. Understand dynamic storage resource provisioning through StorageClass, PersistentVolume and PersistentVolumeClaim resources.

Understanding storage in Kubernetes

Overview

The filesystem inside containers is ephemeral. Anything stored inside a container during runtime will disapear after the container restart. To get rid of that and make Kubernetes Pod's containers data persist after restart, we can use Volumes.

When we look at Kubernetes pods resource specification, there is a field called volumes, that can be used to declare a list of volumes belonging to a pod. Each container inside the pod can then mount one or many of the declared volumes using the containers.volumeMounts field. Here is an example pod.spec manifest for illustration :

(...)
    spec:
      containers:
      - name: nginx
        image: nginx
        volumeMounts:
          - name: websites
            mountPath: /websites
      volumes:
        - name: websites
          emptyDir:     # A temporary directory that shares a pod's lifetime
            medium: ""  # Use the nodes default storage medium to back this dir
                        # Value can also be 'Memory' to use nodes RAM as backend

Volumes offer the possibility to make external storage resources available for use by our pods containers. There are different types of volumes supported by Kubernetes that we can use. In the above example we used emptyDir that is actually not a persitent storage type... it can be used to share temporary data between pods containers. For a complete list of volumes types we can use, have a look at volumes.

Also, for a list of available options we can use when mounting volumes inside pods containers, we can have a look at containers.volumeMounts.

It is also possible to make pods use an existing PersitentVolume (PV) resource by referencing the associated PersitentVolumeClaim (PVC) inside pod.spec.volumes as follows :

(...)
    spec:
      containers:
      - name: nginx
        image: nginx
        volumeMounts:
          - name: websites
            mountPath: /websites
      volumes:
        - name: websites
          persistentVolumeClaim:
            claimName: websites
            readOnly: false

We will talk about PV and PVC in the nexts sections.

Dynamic storage resource provisioning

Storage resources (represented by the PersistentVolume object) can be dynamically provionned thanks to StorageClass and PersitentVolumeClaim. Here is a diagram describing how that dynamic provisioning occurs. Next paragraphs will give you more explanations about all of this.

k8s_dynamic_storage_resource_provisionning.drawio-1

StorageClass

StorageClass resources are created by a Kubernetes cluster administrator and define the type of storage resources (for instance NFS, GCE persistent disks...) that can be automatically provisioned once requested by users.

The underlying storage resource is created by a volume provisioner whose name is specified inside the StorageClass. The provisioner actually makes a call to a volume plugin API to create the underlying storage resource.

Here is an example StorageClass resource manifest :

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs
provisioner: example.com/nfs # External provisioner
parameters:
  server: nfs-server.example.com
  path: /share
  readOnly: "false"
# Allow expansion of the volumes created from this StorageClass
allowVolumeExpansion: true  
# Reclaim policy of the volumes created from this StorageClass
# Determines what happens to the volume when released
reclaimPolicy: Recycle # Cleanup the volume data and make it 
                       # available for use by another claim
                       # Other possible values :
                       #  - Delete : delete the volume (Default)
                       #  - Retain : preserve the volume and its data. The volume
                                   # won't be available for use by another claim

Provisioners that are natively supported by Kubernetes are internal provisioners and the others are external provisioners.

The Kubernetes volume provisioners page shows some of the Kubernetes volume provisioners and their associated volume plugins. The page also tells wheather the provisioners are internal and also contains links to StorageClass resource manifest example for some of them.

PersistentVolumeClaim (PVC)

A user request for provisioning storage resources from a specific StorageClass is achieved through a PersitentVolumeClaim (PVC) resource.

Here is an example PVC manifest :

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: nfs
  resources:
    requests:
      storage: 3Gi # requested volume size

If pvc.spec.storageClassName is not specified or empty, the Kubernetes cluster default StorageClass is used.

Each volume provisioner available inside the cluster has a controller that periodically watches PersitentVolumeClaim resources.

Here is a simplified overview of how the PVC request is satisfied by the provisioner :

  • For each PVC created inside the Kubernetes cluster, the controller looks at the spec.storageClassName field. After that, the controller makes sure the specified StorageClass resource exists. If it doesn't exit, nothing is done
  • If it exists, and StorageClass.provisioner corresponds to the provisioner name, the provisioner tries to find a volume (PersitentVolume resource) satisfying the request (same StorageClass, access mode, storage size greater than or equal to what's inside the PVC ...) that is not already associated (or bounded) to a PVC
  • If found, the volume is bounded to the PVC and the request is satisfied. If not found, the provisioner tries to create the volume using parameters specified inside the StorageClass
  • The provisioner makes a call to the appropriate volume plugin in order to create the underlying storage resource. If the volume plugin succeeds in creating the underlying storage resource (a GCE persitent disk for instance), the provisioner creates the associated PersistentVolume (PV) resource and bound it to the PVC
  • If the volume plugin fails in creating the underlying storage resource, the provisioner returns an error to the user

As we can see, once a PersitentVolumeClaim (PVC) request is satisfied, a binding between the PVC and the PersistentVolume (PV) satifying the request is done. This is a bi-directional binding that is achieved as follows :

  • The PV references the PVC :
    • pv.spec.claimRef.name contains the name of the PVC
    • pv.spec.claimRef.namespace contains the namespace where the PVC resides
  • The PVC references the PV :
    • pvc.spec.volumeName contains the name of the PV

PersistentVolume (PV)

PersistentVolume (PV) is an object representing the underlying storage resource that will actually be used by pods to store data.

Here is how the Kubernetes PV documentation defines it :

A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using Storage Classes. It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual Pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system

Here is an example PV manifest :

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs
spec:
  # Name of the StorageClass to which this PV belongs
  storageClassName: nfs
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteMany
  nfs:
    server: nfs-server.example.com
    path: /share
    readOnly: "false"
  volumeMode: Filesystem # or Block. Default: Filesystem
  persistentVolumeReclaimPolicy: Delete # Default for dynamically created PVs

When using dynamic storage provionning using StorageClass as seen before (read the StorageClass and PersitentVolumeClaim sections), the PV resource is automatically created to reflect the underlying storage resource provisioned and then bounded to the user request PVC.

Static storage resource provisioning

PV resources can also be manually provisioned by administrators. Here is an example manifest that can be used to create the PV :

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteMany
  nfs:
    server: nfs-server.example.com
    path: /share
    readOnly: "false"
  volumeMode: Filesystem # or Block. Default: Filesystem
  persistentVolumeReclaimPolicy: Retain # Default for manually created PVs

If its done that way and a user wants to use the pre-provisioned PV, it has to create a PVC that looks like this :

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 2Gi # requested volume size
  volumeName: nfs # Directly specify the name of the PV to use

Optionally, we could have used the spec.storageClassName field in both the PV and PVC resource manifest to specify a StorageClass resource name that has a special provisioner. That special provisioner actually doesn't do dynamic provisioning.

Using this option allow us to enable volume expansion by using the StorageClass allowVolumeExpansion field as their is no such equivalent field for the PV resource. Here is the sample manifest for creating that StorageClass :

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local
provisioner: kubernetes.io/no-provisioner
allowVolumeExpansion: true  

Examples

Dynamic provisioning of GCE persistent disks for GKE pods

Here is an example of a StorageClass object that can be used to dynamically provision persistent disks in GCP. See gce-pd-storageclass for details.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: standard
provisioner: kubernetes.io/gce-pd # Internal provisioner
parameters:
  type: pd-standard
  fstype: ext4
  replication-type: none

Once the preceding StorageClass object created inside a GKE cluster, the creation of a PersistentVolumeClaim referencing that StorageClass name (standard) in its spec.storageClassName parameter will automatically create a GCE (Google Cloud Engine) persistent disk with caracteristics defined inside the StorageClass. Once the GCE persistent disk successfully provisioned, the associated PersistentVolume resource is created and bounded to the PersistentVolumeClaim resource.

The size we want for the storage resource is specified inside the PersitentVolumeClaim's resource spec.resources.requests.storage field. Here is a sample manifest of a PersitentVolumeClaim resource using the preceding StorageClass :

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: gce-pd
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: standard
  resources:
    requests:
      storage: 5Gi # requested volume size

When using StatefulSets for managing pods with persistence, the PersitentVolumeClaim resource can be dynamically created for each pod of the StatefulSet using the spec.volumeClaimTemplates field as follows :

(...)
  volumeClaimTemplates:
  - metadata:
      name: pvc
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "standard"
      resources:
        requests:
          storage: 2Gi # size of the disk

A new GCE persistent disks (with caracteristics defined inside the standard StorageClass) will automatically be provisioned for each pod of the StatefulSet.

Dynamic NFS storage provisioning

The nfs-ganesha-server-and-external-provisioner project can be used to easily deploy an NFS server and its associated external provisioner. Once deployed, creating PVCs referring to the NFS server's StorageClass will automatically do the following :

  • create a dedicated NFS export for the PVC
  • create a PV using the dedicated NFS export
  • bound the PV and the PVC
Deploy the NFS server and external provisioner
  • Pre-requisite : Helm
  • Also note that for this example, we are using the standard managed Kubernetes service from Google Cloud Platform (Google Kubernetes Engine)
  • On Google Kubernetes Engine (GKE), the standard StorageClass can be used to dynamically provision HDD disks on Google Cloud Platform. The NFS server will use that StorageClass to provision a disk for data persistence
# Add the Helm charts repository
$ helm repo add nfs-ganesha-server-and-external-provisioner https://kubernetes-sigs.github.io/nfs-ganesha-server-and-external-provisioner/
"nfs-ganesha-server-and-external-provisioner" has been added to your repositories

# Update the Helm charts repository
$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "nfs-ganesha-server-and-external-provisioner" chart repository
Update Complete. ⎈Happy Helming!⎈

Here is the content of the values.yml file we used for the deployment :

replicaCount: 1

persistence:
  enabled: true
  accessMode: ReadWriteOnce
  storageClass: standard
  size: 5Gi

storageClass:
  create: true
  defaultClass: false
  name: nfs
  allowVolumeExpansion: true

resources:
  limits:
    cpu: 100m
    memory: 128Mi
  requests:
    cpu: 100m
    memory: 128Mi

Feel free to adjust the configuration according to your needs, using the nfs-server-provisioner-config-params page. Now lets run the installation command in order to deploy the NFS server and its associated external provisioner :

$ helm upgrade --install testnfs-nfs-provisionner -n testnfs nfs-ganesha-server-and-external-provisioner/nfs-server-provisioner -f values.yml --create-namespace

The Helm release name is testnfs-nfs-provisionner and its associated resources will be created inside the testnfs namespace. The namespace will be created if it doesn't exist. The above installation command is idempotent and can also be used to update the Helm release after configuration change.

A look at some of the created ressources

Here are the created resources after the deployment :

$ kubectl get all -n testnfs
NAME                                                    READY   STATUS    RESTARTS   AGE
pod/testnfs-nfs-provisionner-nfs-server-provisioner-0   1/1     Running   0          3m36s

NAME                                                      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                                                                                     AGE
service/testnfs-nfs-provisionner-nfs-server-provisioner   ClusterIP   10.200.113.168   <none>        2049/TCP,2049/UDP,32803/TCP,32803/UDP,20048/TCP,20048/UDP,875/TCP,875/UDP,111/TCP,111/UDP,662/TCP,662/UDP   3m37s

NAME                                                               READY   AGE
statefulset.apps/testnfs-nfs-provisionner-nfs-server-provisioner   1/1     3m37s
$ kubectl describe sc/nfs -n testnfs
Name:                  nfs
IsDefaultClass:        No
Annotations:           meta.helm.sh/release-name=nfs-provisionner,meta.helm.sh/release-namespace=testnfs
Provisioner:           cluster.local/nfs-provisionner-nfs-server-provisioner
Parameters:            <none>
AllowVolumeExpansion:  True
MountOptions:
  vers=3
  retrans=2
  timeo=30
ReclaimPolicy:      Delete
VolumeBindingMode:  Immediate
Events:             <none>
$ kubectl get pvc -n testnfs
NAME                                                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-testnfs-nfs-provisionner-nfs-server-provisioner-0   Bound    pvc-ca361607-af1d-4125-9d76-2235669e0eb0   5Gi        RWO            standard       108s
$ kubectl get pv/pvc-ca361607-af1d-4125-9d76-2235669e0eb0 -n testnfs
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                                            STORAGECLASS   REASON   AGE
pvc-ca361607-af1d-4125-9d76-2235669e0eb0   5Gi        RWO            Delete           Bound    testnfs/data-testnfs-nfs-provisionner-nfs-server-provisioner-0   standard                109s

$ kubectl describe pv/pvc-ca361607-af1d-4125-9d76-2235669e0eb0 -n testnfs
Name:              pvc-ca361607-af1d-4125-9d76-2235669e0eb0
Labels:            topology.kubernetes.io/region=europe-west1
                   topology.kubernetes.io/zone=europe-west1-c
Annotations:       pv.kubernetes.io/migrated-to: pd.csi.storage.gke.io
                   pv.kubernetes.io/provisioned-by: kubernetes.io/gce-pd
                   volume.kubernetes.io/provisioner-deletion-secret-name:
                   volume.kubernetes.io/provisioner-deletion-secret-namespace:
Finalizers:        [kubernetes.io/pv-protection external-attacher/pd-csi-storage-gke-io]
StorageClass:      standard
Status:            Bound
Claim:             testnfs/data-testnfs-nfs-provisionner-nfs-server-provisioner-0
Reclaim Policy:    Delete
Access Modes:      RWO
VolumeMode:        Filesystem
Capacity:          5Gi
Node Affinity:
  Required Terms:
    Term 0:        topology.kubernetes.io/zone in [europe-west1-c]
                   topology.kubernetes.io/region in [europe-west1]
Message:
Source:
    Type:       GCEPersistentDisk (a Persistent Disk resource in Google Compute Engine)
    PDName:     pvc-ca361607-af1d-4125-9d76-2235669e0eb0
    FSType:     ext4
    Partition:  0
    ReadOnly:   false
Events:         <none>
Using the NFS server

Now lets test dynamic storage provisioning from the NFS server to make sure things are working properly.

For that, we start by making an NFS storage request by creating a PVC with nfs as StorageClass and 100Mi for the storage size. We also set the requested storage access mode to ReadWriteMany as we are using an NFS storage and want our workloads to be able to do multiple read and write on the filesystem. Here is the PVC manifest :

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: nfstest-pvc
  namespace: testnfs
spec:
  storageClassName: "nfs"
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 100Mi

Then, we create a Deployment with 2 replicas and make each replica pod use the same NFS server export through the previously created PVC. The NFS export will be mounted at the /nfs path inside the pods containers. Here is the Deployment manifest :

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  namespace: testnfs
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      volumes:
        - name: nfs
          persistentVolumeClaim:
            claimName: nfstest-pvc
            readOnly: false
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
        volumeMounts:
          - name: nfs
            mountPath: /nfs

Now lets verify that things are working properly after applying the previous PVC and Deployment manifests :

# PVC properly created and bounded
$ kubectl get pvc/nfstest-pvc -n testnfs
NAME          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
nfstest-pvc   Bound    pvc-e5c89579-8e17-492a-953d-eb1643a32538   100Mi      RWX            testnfs        78s
# Get created PV details
$ kubectl get pv/pvc-e5c89579-8e17-492a-953d-eb1643a32538 -n testnfs -o yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    EXPORT_block: "\nEXPORT\n{\n\tExport_Id = 1;\n\tPath = /export/pvc-e5c89579-8e17-492a-953d-eb1643a32538;\n\tPseudo
      = /export/pvc-e5c89579-8e17-492a-953d-eb1643a32538;\n\tAccess_Type = RW;\n\tSquash
      = no_root_squash;\n\tSecType = sys;\n\tFilesystem_id = 1.1;\n\tFSAL {\n\t\tName
      = VFS;\n\t}\n}\n"
    Export_Id: "1"
    Project_Id: "0"
    Project_block: ""
    Provisioner_Id: 7de05b4f-5d9e-494e-9c03-f67efe86efd0
    kubernetes.io/createdby: nfs-dynamic-provisioner
    pv.kubernetes.io/provisioned-by: cluster.local/testnfs-nfs-provisionner-nfs-server-provisioner
  creationTimestamp: "*****"
  finalizers:
  - kubernetes.io/pv-protection
  name: pvc-e5c89579-8e17-492a-953d-eb1643a32538
  resourceVersion: "809393219"
  uid: fd3e20be-c31d-49a0-b268-12eadd390169
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 100Mi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: nfstest-pvc
    namespace: testnfs
    resourceVersion: "800467992"
    uid: e5c89579-8e17-492a-953d-eb1643a32538
  mountOptions:
  - vers=3
  - retrans=2
  - timeo=30
  nfs:
    path: /export/pvc-e5c89579-8e17-492a-953d-eb1643a32538
    server: 10.200.114.71
  persistentVolumeReclaimPolicy: Delete
  storageClassName: testnfs
  volumeMode: Filesystem
status:
  phase: Released
# Deployment pods running
$ kubectl get pods -n testnfs
NAME                                                READY   STATUS    RESTARTS   AGE 
nginx-deployment-8568f6d5df-2fvq6                   1/1     Running   0          108s
nginx-deployment-8568f6d5df-vjb2g                   1/1     Running   0          108s
(...)

The NFS export is properly mounted inside each of the containers. We can write a file in one of them and verify that it is also present inside the other filesystem :

$ kubectl exec -it pods/nginx-deployment-8568f6d5df-2fvq6 -n testnfs -- /bin/bash
root@nginx-deployment-8568f6d5df-2fvq6:/# df -h
Filesystem                                                      Size  Used Avail Use% Mounted on
(...)
10.200.114.71:/export/pvc-e5c89579-8e17-492a-953d-eb1643a32538  4.9G     0  4.9G   0% /nfs
(...)
root@nginx-deployment-8568f6d5df-2fvq6:/# touch /nfs/test
root@nginx-deployment-8568f6d5df-2fvq6:/# echo "test" > /nfs/test
root@nginx-deployment-8568f6d5df-2fvq6:/# cat /nfs/test 
test

$ kubectl exec -it pods/nginx-deployment-8568f6d5df-vjb2g -n testnfs -- /bin/bash
root@nginx-deployment-8568f6d5df-vjb2g:/# df -h
Filesystem                                                      Size  Used Avail Use% Mounted on
(...)
10.200.114.71:/export/pvc-e5c89579-8e17-492a-953d-eb1643a32538  4.9G     0  4.9G   0% /nfs
(...)
root@nginx-deployment-8568f6d5df-vjb2g:/# cat /nfs/test 
test