Get started with Google Cloud Platform #1

In this first part, we introduce Google Cloud Platform through a simple overview providing essential info for starting our GCP journey with ease.

Get started with Google Cloud Platform #1
Photo by Reza Rostampisheh / Unsplash

Connecting to Google Cloud Platform

  • We can connect to Google Cloud Platform and start using the provided services through the Google Cloud console or the gcloud command line utility
  • We first have to create an account by using one of our own email adresses (not neccessarily a gmail one) or by creating a new gmail account
  • We also need to associate a valid credit card to that account for billing paid services we will eventually use
  • There is a one time 300$ credit offer for new Google Cloud accounts
  • Google Cloud Platform users also get a free (limited to 50 hours/week) developement Linux virtual machine accessible from the web browser, inside the Google Cloud console, through the Cloud Shell service:
    • maintenance free, the OS and the installed tools are updated automatically
    • useful tools like gcloud, kubectl, docker, git, mysql, minikube... are already installed
    • we get 5GB persistent storage for the $HOME directory
    • we get a fully featured IDE preconfigured for development in many programming languages, powered by Google AI and providing live previews for web apps

Physical infrastructure and resources locations

  • Resources created in GCP reside in Google Datacenters
  • The datacenters are physically located in different regions
  • A GCP region is a geographical portion inside a specific country on a specific continent:
    • Johannesburg = africa-south1
    • Paris = europe-west9
  • Each region has at least 3 zones representing 3 physically separated datacenters buildings
  • For instance, the africa-south1 region has the following zones:
    • africa-south1-a
    • africa-south1-b
    • africa-south1-c
  • At the time of writing, Iowa (us-central1) is the only region that exceptionally has 4 zones
  • Here are the different availability types resources in GCP could have:
    • zonal: a resource that is available in a specific zone (ex: compute engines or zonal persitent disks)
    • regional: a resource that is available in all zones of a specific region (ex: VPC subnets or regional persistent disks)
    • multi-regional: a resource that is available in multiple zones of multiple regions (ex: multi-regional GCS buckets)
    • global: a resource that is not tied to a specific region or zone (ex: VPC and global load balancers)
  • A map representing GCP private network coverage around the world (Cables, Edges / Cloud CDN / Media CDN Point of Presence...): GCP private network
  • A tool that helps in choosing GCP regions based on carbon footprint, price and latency: Region picker
  • Make sure a specific GCP product is available in a specific region or multi-region location: GCP products availability by location
  • A list of GCP products that are globally available: Global products
  • GCP security, compliance, data privacy, transparency: Here
  • GCP infrastructure security: Here
  • GCP services statuses: Here

Resources hierarchy

  • Organization -> Folder -> Projects -> Resources
    • GCP resources (virtual machines, databases...) reside inside Projects
    • Projects are created inside Folders that are part of an Organization
  • If users accounts accessing GCP have Google Workspace activated, project they create are automatically part of their Google Workspace associated organizations. Otherwise, a new organization must be created
  • Organizations, projects and folders can be created using Google Cloud console as follows:
    • Organization: IAM & admin -> Identity and organization
    • Project and folders: in the Resource Manager (IAM & admin -> Manage Resources)
  • Projects have 3 attributes:
    • Name: unique in the folder. Can be changed at any time
    • Number: unique in the world. Can't be changed
    • ID: unique in the world. Can be changed only during project creation
  • Projects inherit strategies of the organization or folders
  • Strategies define who can do what on which resources
  • Strategies allowing projects creation or organization strategies administration can only be created at the organization level

Billing, budget and alerts

  • A GCP project must be associated to a billing account to work properly
  • The billing account contains customer payment details and is used for billing projects resources consumption
  • A billing account can be used by multiple projects but each project can be associated to only one billing account
  • Budgets can be configured for projects, and alerts triggered when project resources consumption reaches X percent of the budget
  • Budget alerts are by default sent to billing administrators
  • When creating a budget alert, we can also choose to receive consumption alerts in a Pub/Sub topic that will be automatically created
  • Billing info can also be exported to files inside GCS buckets or into BigQuery datasets for analysis

Labels, tags and quotas

Labels
  • Labels are user defined strings in key-value format that can be added to GCP resources
  • They are used for organizing resources. They propagate through billing and are useful for filtering
Tags
  • Tags are user defined string that can only be added to virtual machines
  • They are used to make network firewall rules and routing policies apply only to specific virtual machines
  • A tag is associated to a network firewall rule or routing policy during creation. Then, virtual machines using the tag inherit the network firewall rules and routing policies
Quotas
  • All resources in GCP are subject to quotas or limits
  • Quotas define:
    • how many resources can be created per project (15 VPC networks per project)
    • how quickly can API requests be made in a project: rate limits (ex: 5 admins actions per second on Cloud Spanner)
    • how many resources can be created per region (ex: 24 CPUs per region)
  • Some quotas increase automatically based on the usage. Other require users to make quota increase requests
  • To see quota values or request quota increase for a given project, go to the quota page in Cloud console
  • Here are 3 reasons why quotas are used:
    • prevent runaway consumptions in case of error or malicious attack
    • prevent billing spikes or surprises
    • forces sizing considerations and periodic review

Identity and Access Management (IAM)

IAM overview

  • 'IAM' allows admins to apply strategies that define 'who' can do 'what' on 'which resources'
  • The 'who' could be a Google account, a Google group, a service account or a Cloud Identity domain
  • The 'what' is defined by an 'IAM role', which is a resource that contains a set of permissions
  • Assigning a role to the 'who' entity grants him all the permissions included inside the role
  • There are many predefined roles for different types of GCP resources/services that simplify authorization management
  • IAM strategies are inherited according to the GCP resources hierarchy:
    • strategies created at the organization level are inherited by all folders inside the organization and therefore by all projects inside those folders
    • strategies created at the folder level are inherited by all projects inside the folder
    • strategies created inside a project apply only for the project
  • A deny rule always take precedence on any allow rule regardless which IAM role has been given. IAM always check deny rules before allow ones
  • Deny and allow rules are inherited according to the resources hierarchy

IAM roles

  • IAM roles contain a set of permissions that can be used to authorize principals (users, groups, service accounts...) to perform specific actions in GCP
  • There are 3 types of IAM roles:
    • Basic roles: grant read, write or administrator access to all resources of a project
      • Viewer: grant read access to all resources of a project
      • Editor: grant read and write access to resources of a project
      • Owner: grant administrator access to all resources of a project (read + write + permissions to grant permissions + billing admin permissions)
      • Billing Admin: for billing management
    • Predefined roles: roles that are specific to some GCP resources. Ex: Compute Instance Admin. That role grants administration privileges on Google Compute Engine
    • Custom roles: contain a set of permissions defined by GCP users. Can only be applied at organization or project level, not on folders
    • The following roles are required to manage IAM roles:
      • roles/iam.organizationRoleAdmin: grants roles administration privileges at the organization level
      • roles/iam.roleAdmin: grants roles administration privileges at the folder or project level

Service accounts

  • A 'service account' is an identity (and also a resource) that is meant to be used by other GCP resources, applications or external systems like CI/CD pipelines to authenticate with GCP and perform specific actions
  • Many resources in GCP use default service accounts that are automatically created and managed by Google
  • Attaching a service account to a GCP resource give that resource the identity of the service account
  • Service account 'keys' can be generated and exported for use by applications or systems outside of GCP for authentication
  • Actions that can be performed by entities using a service account are determined by the service account permissions
  • Service accounts are identified by unique email addresses than can be in the following forms:
    • PROJECT_NUMBER-compute@developer.gserviceaccount.com
    • PROJECT_ID@appspot.gserviceaccount.com
    • PROJECT_NUMBER@cloudservices.gserviceaccount.com
    • ...
  • Granting users the 'ServiceAccountUser' role grant them the privileges to perform specific actions as specific service accounts. Ex: a user perfoming a stop action on a GCE Virtual Machine (that he is not allowed to), will succeed if he has been granted the 'ServiceAccountUser' role and the service account attached to the GCE is allowed to stop the machine
  • For more on service accounts, have a look at service-account-overview

Products and services overview

Networking and security

VPC sketches

VPC and subnets
  • VPC (Virtual Private Cloud) represent a network in GCP
  • The VPC resource is global, and subnets inside the VPC are regional
  • For all GCP projects, a VPC named 'default' is automatically created
  • Inside that 'default' VPC, there are subnets in each available GCP regions
  • There are two modes of VPC creation:
    • auto: subnets are automatically created in all available regions
    • custom: the VPC is created without subnets. Subnets should be created manually
  • The predefined IAM role for network administration is:
    • Network Admin: for managing VPC networks only. Gives the following permissions: compute.networks.*
Firewall rules
  • Firewall rules are used to allow or deny incoming or outgoing traffic on a specific network (VPC)
  • When we create a firewall rule, we specify the VPC for which we want the rule to apply
  • The rule apply by default to all subnets of that VPC, therefore on computes using at least one of the VPC subnets
  • To make the rule apply only for specific computes, source/target service accounts or tags can be specified in the rule
  • That way, the rule will only apply for computes using the source/target service account or tagged with the specified network tag
  • The predefined IAM role for security administration is:
    • Security Admin: for managing firewall rules only. Gives the following permissions: compute.firewalls.*
Load Balancers and WAF (Cloud Armor)

For a good overview about Load Balancing in GCP, have a look at Understanding GCP Load Balancers.

For Cloud Armor, here are useful links:

Virtual machines

GCE sketches | VMs hosts maintenance

Google Compute Engine (GCE)
  • Google Compute Engine (GCE) is the GCP service that can be used to provision Virtual Machines (VM or compute instances)
  • 2 machine types: 'predefined' and 'custom'
  • 3 machine type families: 'general purpose', 'compute optimized', 'memory optimized'
  • Network bandwidth (outbound/egress) of virtual machines increases when the number of vCPUs increases
  • The maximum bandwidth is 32 Gbps on standard virtual machines
  • To increase the bandwith the 'VM Tier 1' networking performance can be enabled. The 'gVNIC interface card' can then be used to increase the maximum bandwidth to 100 Gbps
  • Virtual machines logical disks also consume network bandwidth (up to 60% of the network egress bandwidth) because they are network attached
  • Increasing VMs disks sizes also increase theirs I/O performance
  • Multiple network interfaces (from different network subnets) can be used to allow communication with other GCEs on those subnets
  • The number of additional network interfaces that can be added to a GCE depends on the GCEs machine size
  • GCE VMs authorizations are the same as those of the service account attached to them + the configured 'access scopes'
  • A default service account is automatically created and attached to every VMs that do not have a custom service account attached. That default compute service account ID has the following form:
    • PROJECT_NUMBER-compute@developer.gserviceaccount.com
  • When creating a compute instance with 'gcloud', the '--scopes=[SCOPE,...]' and '--service-account=SERVICE_ACCOUNT' options can be used to set access scopes and a custom service account for the VM
  • Access scopes values can be specified as an URI or an alias. Ex:
Alias                  URI
compute-ro             https://www.googleapis.com/auth/compute.readonly
compute-rw             https://www.googleapis.com/auth/compute
logging-write          https://www.googleapis.com/auth/logging.write
monitoring             https://www.googleapis.com/auth/monitoring
monitoring-read        https://www.googleapis.com/auth/monitoring.read
monitoring-write       https://www.googleapis.com/auth/monitoring.write
sql-admin              https://www.googleapis.com/auth/sqlservice.admin
storage-full           https://www.googleapis.com/auth/devstorage.full_control
storage-ro             https://www.googleapis.com/auth/devstorage.read_only
storage-rw             https://www.googleapis.com/auth/devstorage.read_write
(...)
  • Use 'gcloud compute instances create --help' and search for '--scopes' or '--service-account' for more (detailed descriptions, available scopes aliases...)
The metadata server
  • A server that can be queried from compute instances (without additional authentication)
  • Stores info about the GCE instance or project (VM name, project ID, IPs, service account...)
  • Communication with the metadata server are encrypted and never leave the physical host on which the VM is running
  • Here are the root URLs that can be used to query the metadata server:
    • http://metadata.google.internal/computeMetadata/v1
    • http://169.254.169.254/v1
    • http://metadata.goog/v1
  • When querying the metadata server, the root URLs should be completed with queries URIs for project or instance metadata
  • The root URI for querying 'project metadata' is '/project':
    • the list of available URIs for project metadata can be found here
  • The root URI for querying 'instance metadata' is '/instance':
    • the list of available URIs for instance metadata can be found here
  • When querying the metadata server, we need to set the 'Metadata-Flavor: Google' request header to tell the server that we need to retrieve metadata. Otherwise the request will be refused
  • Exemple query:
curl "http://metadata.google.internal/computeMetadata/v1/$metadata-uri" -H "Metadata-Flavor: Google"
  • $metadata-uri:
    • URI of a metadata key returning a single value, ex:
      • instance/image
      • instance/tags
    • URI of a metadata directory returning a list of other available URIs, ex:
      • instance/disks
  • The 'alt' query parameter can be used to format the data. Possible values are 'text' or 'json'. Ex:
    • ${metadata_server_root_url}/instance/tags?alt=text
  • The metadata server entries Etags are present in response headers. A different Etag value means a different metadata value version (value has been updated)
  • The metadata server responses status codes and meanings can be found here
  • The metadata server also supports custom metadata that can be set during VMs creation or for an already existing VM
  • Getting metadata info from the CLI:
# Project metadata
gcloud compute project-info describe --flatten="commonInstanceMetadata[]"

# Instance metadata
gcloud compute instances describe $vm_name --flatten="metadata[]"
Spot or preemptible GCE

Preemptible GCE doc

  • A GCE instance that is not guaranteed to be available when needed. Google can use its reserved computes resources for other purposes at anytime
  • Made for batch, fault tolerant and high throughput computing
  • Super low cost, short term instances (up to 91% cost saving compared to standard instances)
Startup & shutdown scripts

Startup scripts doc

  • Scripts that are executed at GCE instances startup or shutdown
  • The scripts are executed in best effort only
  • They are run by the 'root' user on Linux and the 'System' user on Windows
  • Can be configured at the VM or project level. Project level scripts trigger for every VMs of the project
  • VMs level scripts take precedence over project level ones
  • Shutdown scripts have timeouts:
    • '90s' for 'standard' VMs
    • '30s' for 'premptible' VMs
  • Passed as 'metadata' to VMs, either 'directly' or from a 'file':
# Script passed at command line
gcloud compute instances create myvm --metadata=startup-script='#!/bin/bash
      the rest 
      of the script'`

# Script passed from a file
# The file path can also be a GCS bucket URL

# Startup
gcloud compute instances create myvm --metadata-from-file=startup-script=<file_path>

# Shutdown
gcloud compute instances create myvm --metadata-from-file=shutdown-script=<file_path>
Sole-Tenant Nodes

Sole-tenant nodes doc | Sole-tenant nodes blog post

  • A service that can be used to reserve dedicated physical machines that will be used as hypervisors to create VMs
  • Mix and match VMs to consume host resources
Managed Instance Groups (MIG)

Instance groups doc

  • A service that can be used to run VMs at scale (up to thousands of VMs)
  • Supports auto scaling, healing and updating
  • Stateless or stateful, regional, single or multi-zone instances
  • MIG autoscaling could be based for example on:
    • the CPU utilization
    • the number of HTTP(s) requests
    • the standard or custom metrics coming from the Cloud Monitoring service
  • MIG also has a feature for scheduling auto scaling by specifying for example:
    • the min and max compute instances
    • the duration
    • the start time
    • the recurrence
    • (...)
  • MIG supports multiple rollout strategies like 'rolling updates' or 'canary'
  • Here are some important keywords for rolling updates:
    • maxSurge: how many extra instances to temporarily over provision (above the desired number)
    • minReadySeconds: the minimum amount time to wait for instances before they are actually considered healthy
    • maxUnavailable: the maximum number of instances that can be unavailable during the update
    • targetSize: how many instances to update
Pricing and cost optimization

Price calculator | GCE discounts | Custom machines types

  • Use 'Preemptible VMs' for fault tolerant workloads
  • Sustained Use Discounts (SUD):
    • up to 30% saving on GCE and Cloud SQL
  • Committed Use Discounts (CUD):
    • up to 70% saving without upfront fees or instance type lock-in
  • Per second billing:
    • up to 38% savings by paying per seconds, not per hour
  • Rightsizing:
    • choose optimal GCE families and custom machine types
  • Network service tiers:
    • performance: 70% more bandwidth
    • cost saving: 9% saving compared to other clouds

Google Cloud Storage (GCS)

GCS sketches

  • A durable and highly available object storage
  • Object storage is a computer data storage architecture that manages data as an object, not as a file and folder hierarchy (file storage) or a chunk of a disk (block storage)
  • Objects are stored in a packaged format:
    • binary form of the data
    • relevant associated metadata (creation date, author, type, permissions...)
    • Globally unique identifier (in the form of URL => good interaction with web technologies)
  • The maximum object size is 5TB
  • Commonly used as Binary Large Object Storage (BLOB) for online content (videos, pictures, audio), backup and archiving, storage of intermediate results and processing workflows
  • Could also be used for:
    • serving website content
    • archival and disaster recovery
    • direct download
  • To use GCS we need to create buckets for storing the data
  • GCS buckets names are unique in the world
  • When creating a GCS bucket, we have the choice between regional, dual-regional or multi regional availability
  • Here are some interesting features of GCS buckets:
    • 'unlimited' storage
    • fully managed
    • scalable
    • low latency
    • high durability
    • worldwide accessibility
    • geo redundancy
    • TLS, HTTPS...
  • Objects stored inside GCS buckets are immutable... new versions are created with every change made
  • New objects versions by default overwrite the old ones. By enabling 'object versioning' older object versions are kept and the history can be used to restore a specific object version
  • IAM roles and 'ACLs (Access Control List)' can be used to give or restrict access to GCS buckets. ACLs are used for finer control
  • Cloud storage offers 'lifecycle policies' that can be configured to perform automated actions. Ex:
    • delete objects older than 365 days
    • delete objects created before MM/DD/YY
    • keep only the 3 most recent versions
  • Cloud storage also offers 'retention policies' that can be used to prevent modification or deletion of objects for a specific period of time
  • 'Retention policies' and 'object versioning' are mutually exclusive with each other (when one is active, the other is deactivated)
  • It is also possible to lock the 'retention policy'. In that case, no one will be able to change the 'retention policy' settings until the retention period expires
  • Cloud storage also offers 'storage classes':
    • Standard: better for frequently accessed or hot data and data that are stored for only a brief period of time => high access frequency + low retention
    • Nearline: better for data accessed once in a month
    • Coldline: better for data accessed once every 3 months
    • Archive: better for data accessed once a year (lowest cost option)
    • Autoclass: automatically move objects between previous four storage classes to optimize cost and data access based on each object access pattern. Less frenquently accessed data are moved to colder storage to optimize storage cost and frequently accessed ones from colder storage are moved to standard storage to optimize future accesses
  • GCS buckets pricing is as follows:
    • there is no minimum fee, you pay what you use:
      • price for the volume of data stored (vary per storage class)
      • price for the volume of network egress traffic
  • For transferring data to GCS buckets we can use:
    • the 'gsutil' command line utility
    • the 'cloud console'
    • the 'Storage Transfer Service' for large data transfer (TB or PB):
      • schedules and manage batch transfers from other cloud providers, from a different GCS region or from an HTTPS endpoint
    • the 'Transfer appliance', which is a rackable, high capacity storage server (up to Peta Byte of data) leased from Google Cloud that should be connected to the internal network to upload data, then shipped to an upload facility to upload the data to GCS
  • GCS also has an integration with other Google Cloud products like CloudSQL, BigQuery... For example, we can:
    • import/export tables to/from BigQuery or CloudSQL
    • store App Engine logs
    • store objects used by App Engine apps (images...)
    • store Firestore backups, CloudSQL backups....
    • store GCE startup scripts, GCE images...

Database resources

Databases sketches

Cloud SQL

Cloud SQL sketches

  • Cloud SQL is a fully managed relational database service for MySQL, PostgreSQL and SQL Server
  • Scales up to 128 processor cores, 864 GB of RAM and 64 TB of storage
  • Supports fully automated backups and restores
  • The cost of an instance covers 7 backups
  • The data are encrypted accross the GCP internal network and at rest
  • Supports read replicas, regional and multi regional instances for high availability
AlloyDB

AlloyDB overview

  • Fully managed PostgreSQL compatible database service
  • For apps such as hybrid transactional and analytical processing
  • Fast transactional processing, highly available
  • Automated administrative tasks: backups, replication, patching, capacity management
Cloud Spanner

Spanner sketches

  • Fully managed relational database service
  • Highly available, scales horizontally, strongly consistent, speaks SQL, automatic replication => best of relational and non relational databases
  • ~ Petabyte capacity
  • Used by some of Google mission critical applications
Cloud Firestore

Firestore sketches

  • Flexible, horizontally scalable, NoSQL database
  • ACID transactions => if any operation in the transaction fails and cannot be retried, the whole transaction will fail
  • Automatic multi-region replication and strong consistency
  • Run complex queries without performance degradation
  • Data stored in documents that are organized into collections
  • Documents contain a set of key/value pairs
  • Max unit size: 1 MB per entity
  • Firestore noSQL queries can be used to retrieve individual or all documents in a collection. Can include multiple chained filters or combine filtering and sorting options
  • Queries are indexed by default => performance proportional to the size of the result set, not the data set
  • Firestore also provides:
    • automatic multi-region data replication
    • strong consistency guarantees
    • atomic batch operations
    • real time transaction support
    • offline queries
  • Pricing:
    • charged for each document read, write and delete
    • queries charged as one document read per query
    • the amount of database storage and network bandwidth used (minus up to 10 GB free egress per month between regions in the US)
  • Have free quotas per day:
    • 50,000 documents reads
    • 20,000 documents writes
    • 20,000 documents deletes
    • 1 GB of stored data
Cloud Bigtable

Bigtable sketches

  • NoSQL Big data database service
  • Not ACID => good for where transactional consistency is not required
  • Petabyte scale, high read and write throughput and low latency
  • Power Google Search, Analytics, Maps and Gmail
  • Ideal for adtech, fintech, IOT and ML applications
  • Easy integration with open source Big Data tools like Hadoop and Apache Hbase, but also GCP services like Cloud Dataflow and Cloud Dataproc
  • Max unit size: 10 MB per cell, 100 MB per row
  • Good for:
    • storing large amount of structured objects
    • analytical data with heavy read and write events
  • The smallest Bigtable cluster:
    • 3 nodes, up to 30000 operations per seconds
    • remember, you pay for those nodes weather apps are using them or not
  • Use Bigtable in these cases:
    • Your data size is at least 1 TB
    • You have a high volume of writes
    • You need read/write latency < 10 ms
    • You need strong consistency
    • You need a HBase compatible API
  • Otherwise, use Cloud Firestore

Container resources

Cloud Run

Cloud Run sketches

  • A managed compute platform that can run stateless containers via web requests or pub/sub events
  • Serverless => remove the need for infrastructure management
  • Built on Knative, an API and runtime environment built on Kubernetes
  • Fast, can scale up and down from zero almost instantaneously, charging only for the resources used
  • Cloud run developer workflow:
    • write app code. App must start a server that listens to web requests
    • build + package app into a container image and push image into Artifact Registry
    • deploy app container image into Cloud Run
  • Once deployed, we get a unique HTTPS URL back for accessing the app
  • Cloud Run then starts the app containers on demand to handle requests and ensures that all incoming requests are handled by dynamically adding and removing containers
  • Using the 'container-based workflow' provides transparency and flexibility
  • With Cloud Run we can as well use the 'source-based workflow' to deploy our apps
  • The 'source-based workflow' deploy the source code instead of a container image. Cloud Run automatically build and deploy a container image containing the app source code by using 'buildpacks' (an opensource project)
  • Pricing model:
    • pay only for system resources used while containers handle web requests, with a granularity of 100ms, and during containers startup and shutdown... don't pay if container doesn't handle requests
    • there are also small fees for every 1 Million requests served
    • the price of container time increases with cpu and memory... A container with more vCPU and memory is more expensive
    • Cloud Run can run any binary compiled for Linux 64 bits => popular languages like Java, Python, Node.js, PHP, GO, C++ can be used to run web applications in Cloud Run. Less popular languages like Cobol, Haskell, Perl could also be used
Google Kubernetes Engine (GKE)

GKE sketches | GKE docs | GKE versions and features

Here are some important things we need to know when creating a GKE 'standard' cluster.

Cluster network
  • The main private subnetwork range associated with the cluster will be used to assign IP addresses to the cluster 'nodes' and servives of type 'LoadBalancer'. That subnetwork is also called the primary subnetwork
  • In addition to the primary subnetwork range, a GCP network (VPC) can also have secondary subnetwork ranges associated with it
  • Secondary subnetwork ranges will be used to assign IP addresses to the GKE cluster's 'pods' and 'services' with type other than 'LoadBalancer'
  • You choose which secondary subnetwork to use for 'pods' or 'services' when creating a nodepool in the cluster, by specifying the name of the secondary subnetwork
  • You can create N secondary subnetworks for use by 'pods' and 'services' of your nodepools as you want. You are just limited by the number of secondary subnetworks you can add to a GCP network (VPC)
Number of pods per nodes
  • The number of pods per nodes you choose when creating a nodepool is very important because it also defines how many nodes you will be able to create in that nodepool
  • Suppose the '10.100.1.0/20' network range is the one reserved for pods on the main nodepool. The smallest and largest subnets ranges that can be reserved for pods on a single node in that nodepool are respectively '/28' and '/24' taken from the '10.100.1.0/20' network
  • The '/28' subnet range will be reserved for pods on nodes of the main nodepool when the pods per node value is set to '8' and the '/24' reserved when that value is between '65' and '110'
  • If a '/28' is reserved, and because in the '10.100.1.0/20' we have '256' subnets of size '/28', the number of nodes you will be able to create in the main nodepool will be '256'
  • If a '/24' is reserved, and because in the '10.100.1.0/20' we have '16' subnets of size '/24', the number of nodes you will be able to create in the main nodepool will be '16'
  • The relation between the value of pods per nodes on a given nodepool and the size of the subnet that is reserved for pods on that nodepool is described on this page : gcp flexible pod cidr

Serverless resources

Cloud Functions

Cloud Functions sketches

  • Lightweight, event-based, asynchronous compute solution
  • Allows to create small, single-purpose functions that respond to cloud events without the need to manage servers or runtime environments
  • Can be used to construct applications workflows from individual business logic tasks and connect and extend cloud services
  • Billed to the nearest 100 ms, and only while code is running
  • Supported languages include: Node.js, Python, Go, Java, .Net Core, Ruby and PHP
  • GCS and Pub/Sub events can trigger 'Cloud Functions' asynchronously
  • HTTP invocation can also be used to execute Cloud Functions synchornously

Data analytics resources

Data analytics sketches | Data analytics pipeline sketches

  • Dataflow: managed data processing service
  • Dataproc: managed Spark and Hadoop service
  • BigQuery service: edge between data storage and data processing. Big data analysis and interactive query capabilities
  • Data/looker studio: turn data into informative dashboards that are fully customizable, easy to read and share

Other data store

Filestore

Filestore product page | Filestore docs

  • Managed NAS for GCE and GKE
  • Predictable performance
  • Full NFSv3 support
  • Native file locking
  • Up to ~ 100 TB capacity
Memorystore

Memorystore product page | Memorystore docs

  • In-memory data store service
  • High availability, failover, patching and monitoring
  • Up to 300 GB storage space, sub-millisecond latency
  • 13 Gbps network throughput
  • Fully compatible with Redis protocol => lift and shift your apps without any code changes
Artifact registry

Artifact registry docs

Store container images and applications packages... can proxy to official packages repositories for many programming languages (npm, pypi, maven...).

Monitoring and logging

Monitoring

  • GCP's monitoring service is called Cloud Monitoring (formerly Stackdriver)
  • Monitoring is the base of Site Reliability Engineering (SRE)
  • SRE is a discipline that was born at Google
  • Google uses SRE principles to build, monitor and maintain some of the largest software systems in the world
  • SRE applies some software engineering principles to operations for creating software systems that are very reliable and easy to evolve
  • SRE principles include 'monitoring', 'incident response', 'post-morterms' or 'root cause analysis', 'testing and release procedures', 'capacity planning', etc.
  • GCP's Cloud Monitoring service dynamically configures monitoring after a resource is deployed, with intelligent defaults
  • This allows platforms monitoring through:

Logging

  • GCP's logging service is called Cloud Logging (formerly Stackdriver)
  • All logs of resources inside a GCP project are stored inside GCS buckets
  • To see buckets used for storing logs of a project use:
$ gcloud logging buckets list
LOCATION  BUCKET_ID  RETENTION_DAYS  CMEK  RESTRICTED_FIELDS  INDEX_CONFIGS  LIFECYCLE_STATE  LOCKED  CREATE_TIME  UPDATE_TIME
global    _Default   30                                                      ACTIVE
global    _Required  400                                                     ACTIVE           True
  • Project logs are by default stored inside the _Default and _Required buckets
  • Inside 'logging buckets', logs are organized into 'logs containers' (depending on their types)
  • To list logs containers for a project, use:
$ gcloud logging logs list
projects/myproject/logs/GCEGuestAgent
projects/myproject/logs/OSConfigAgent
projects/myproject/logs/cloudaudit.googleapis.com%2Faccess_transparency
projects/myproject/logs/cloudaudit.googleapis.com%2Factivity
projects/myproject/logs/cloudaudit.googleapis.com%2Fdata_access        
projects/myproject/logs/cloudaudit.googleapis.com%2Fsystem_event       
projects/myproject/logs/clouderrorreporting.googleapis.com%2Finsights  
projects/myproject/logs/cloudscheduler.googleapis.com%2Fexecutions     
projects/myproject/logs/cloudsql.googleapis.com%2Fmysql.err
projects/myproject/logs/cloudsql.googleapis.com%2Fpostgres.log
projects/myproject/logs/compute.googleapis.com%2Fhealthchecks
projects/myproject/logs/compute.googleapis.com%2Fshielded_vm_integrity
projects/myproject/logs/container-runtime
projects/myproject/logs/container.googleapis.com%2Fcluster-autoscaler-visibility
projects/myproject/logs/diagnostic-log
projects/myproject/logs/docker
projects/myproject/logs/events
(...)
  • To read logs from any logs containers, use:
gcloud logging read "LOG_FILTER"
  • We can optionally use the following options:
    • --order=ORDER: desc (default) or asc to change the order in which logs are shown
    • --organization=ORGANIZATION_ID, --project=PROJECT_ID, --folder=FOLDER_ID: select the GCP organization, project or folder to read logs from
    • --bucket=BUCKET, --location=LOCATION: select the bucket and bucket location from where to read logs
    • --limit=LIMIT: limit the number of returned entries
  • LOG_FILTER is a filter expression that is used to retrieve specific log entries. For details, see gcp-logging-query-language
  • Here is an exemple:
# Activity logs of a pod in specific time interval

$ gcloud logging read 'protoPayload.resourceName:mypod AND protoPayload.resourceNamespace:mynamespace AND logName:projects/mygcpproject/logs/cloudaudit.googleapis.com%2Factivity AND timestamp>="2024-05-14T06:05:54.238629139Z" AND timestamp<="2024-05-14T13:16:54.238629139Z"' --limit 1
---
insertId: c9bfcdbf-9be3-4401-8fc0-b9bbdaaca1dc
labels:
  authorization.k8s.io/decision: allow
  authorization.k8s.io/reason: 'RBAC: allowed by ClusterRoleBinding "system:controller:generic-garbage-collector"
    of ClusterRole "system:controller:generic-garbage-collector" to ServiceAccount
    "generic-garbage-collector/kube-system"'
logName: projects/mygcpproject/logs/cloudaudit.googleapis.com%2Factivity
operation:
  first: true
  id: c9bfcdbf-9be3-4401-8fc0-b9bbdaaca1dc
  last: true
  producer: k8s.io
protoPayload:
  '@type': type.googleapis.com/google.cloud.audit.AuditLog
  authenticationInfo:
    principalEmail: system:serviceaccount:kube-system:generic-garbage-collector
  authorizationInfo:
  - granted: true
    permission: com.victoriametrics.operator.v1beta1.vmservicescrapes.delete
    resource: operator.victoriametrics.com/v1beta1/namespaces/mynamespace/vmservicescrapes/mypod
  methodName: com.victoriametrics.operator.v1beta1.vmservicescrapes.delete
  requestMetadata:
    callerIp: 10.201.10.42
    callerSuppliedUserAgent: kube-controller-manager/v1.27.11 (linux/amd64) kubernetes/2cefead/system:serviceaccount:kube-system:generic-garbage-collector
  resourceName: operator.victoriametrics.com/v1beta1/namespaces/mynamespace/vmservicescrapes/mypod
  serviceName: k8s.io
  status:
    code: 0
receiveTimestamp: '2024-05-14T08:55:59.938400352Z'
resource:
  labels:
    cluster_name: mygkecluster
    location: europe-west1
    project_id: mygcpproject
  type: k8s_cluster
timestamp: '2024-05-14T08:55:58.827020Z'
Audit logs

GCP audit logs | GKE audit logs

  • 'Audit logs' represents logs of principals (users, service accounts...) activities accross GCP, regarding resources and resources data access and modifications
  • Required roles for viewing audit logs:
    • roles/logging.viewer: read logs other than 'Data Access' audit logs that are stored inside the _Default bucket
    • roles/logging.privateLogViewer: access to all logs inside the _Required and _Default buckets including 'Data Access' logs
  • Types of audit logs:
    • Admin Activity audit logs: API calls or actions that modify resources configurations or metadata. Always written. Can't be disabled. Audit logs 'sub-type' for this type of logs is ADMIN_WRITE.
      • Example actions written as 'Admin Activity' audit log: create or modify a Cloud storage bucket resource
    • Data Access audit logs: API calls or actions that read resources configurations or metadata + user driven API calls that create, modify or read. Disabled by default. Should be activated per GCP service using the following audit logs 'sub-types':
      • DATA_READ: logs data read actions only
      • DATA_WRITE: logs data creation and modifcation actions
      • ADMIN_READ: logs resources configurations or metadata read actions
      • Example actions written as 'Data Access' audit logs: read a Cloud storage bucket resource metadata, create, read or delete a Cloud storage bucket object
      • To enable 'Data Access' audit logs use this
    • System Event audit logs: logs entries for Google Cloud actions (not users direct actions) that modify resources configurations. Always written. Can't be disabled.
    • Policy Denied audit logs: logs entries generated when a Google Cloud service denies access to a user or service account because of a security policy violation. Written by default. Project is charged for storing those logs. Exclusion filters can be used to prevent those logs from being stored in Cloud Logging (to limit logs volume fees for instance)
  • Here are the logs names for each audit logs type:
    • activity: projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Factivity
    • data access: projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Fdata_access
    • system event: projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Fsystem_event
    • policy denied: projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Fpolicy
  • Other useful documentation links: