Cloud

Get started with Google Cloud Platform #1

In this first part, we introduce Google Cloud Platform through a simple overview providing essential info for starting our GCP journey with ease.

Gmkziz 👓

08 Nov 2024 • 23 min read

Photo by Reza Rostampisheh / Unsplash

Connecting to Google Cloud Platform

We can connect to Google Cloud Platform and start using the provided services through the Google Cloud console or the gcloud command line utility
We first have to create an account by using one of our own email adresses (not neccessarily a gmail one) or by creating a new gmail account
We also need to associate a valid credit card to that account for billing paid services we will eventually use
There is a one time 300$ credit offer for new Google Cloud accounts
Google Cloud Platform users also get a free (limited to 50 hours/week) developement Linux virtual machine accessible from the web browser, inside the Google Cloud console, through the Cloud Shell service:
- maintenance free, the OS and the installed tools are updated automatically
- useful tools like gcloud, kubectl, docker, git, mysql, minikube... are already installed
- we get 5GB persistent storage for the $HOME directory
- we get a fully featured IDE preconfigured for development in many programming languages, powered by Google AI and providing live previews for web apps

Physical infrastructure and resources locations

Resources created in GCP reside in Google Datacenters
The datacenters are physically located in different regions
A GCP region is a geographical portion inside a specific country on a specific continent:
- Johannesburg = africa-south1
- Paris = europe-west9
Each region has at least 3 zones representing 3 physically separated datacenters buildings
For instance, the africa-south1 region has the following zones:
- africa-south1-a
- africa-south1-b
- africa-south1-c
At the time of writing, Iowa (us-central1) is the only region that exceptionally has 4 zones
Here are the different availability types resources in GCP could have:
- zonal: a resource that is available in a specific zone (ex: compute engines or zonal persitent disks)
- regional: a resource that is available in all zones of a specific region (ex: VPC subnets or regional persistent disks)
- multi-regional: a resource that is available in multiple zones of multiple regions (ex: multi-regional GCS buckets)
- global: a resource that is not tied to a specific region or zone (ex: VPC and global load balancers)

Some useful links

A map representing GCP private network coverage around the world (Cables, Edges / Cloud CDN / Media CDN Point of Presence...): GCP private network
A tool that helps in choosing GCP regions based on carbon footprint, price and latency: Region picker
Make sure a specific GCP product is available in a specific region or multi-region location: GCP products availability by location
A list of GCP products that are globally available: Global products
GCP security, compliance, data privacy, transparency: Here
GCP infrastructure security: Here
GCP services statuses: Here

Resources hierarchy

Organization -> Folder -> Projects -> Resources
- GCP resources (virtual machines, databases...) reside inside Projects
- Projects are created inside Folders that are part of an Organization
If users accounts accessing GCP have Google Workspace activated, project they create are automatically part of their Google Workspace associated organizations. Otherwise, a new organization must be created
Organizations, projects and folders can be created using Google Cloud console as follows:
- Organization: IAM & admin -> Identity and organization
- Project and folders: in the Resource Manager (IAM & admin -> Manage Resources)
Projects have 3 attributes:
- Name: unique in the folder. Can be changed at any time
- Number: unique in the world. Can't be changed
- ID: unique in the world. Can be changed only during project creation
Projects inherit strategies of the organization or folders
Strategies define who can do what on which resources
Strategies allowing projects creation or organization strategies administration can only be created at the organization level

Billing, budget and alerts

A GCP project must be associated to a billing account to work properly
The billing account contains customer payment details and is used for billing projects resources consumption
A billing account can be used by multiple projects but each project can be associated to only one billing account
Budgets can be configured for projects, and alerts triggered when project resources consumption reaches X percent of the budget
Budget alerts are by default sent to billing administrators
When creating a budget alert, we can also choose to receive consumption alerts in a Pub/Sub topic that will be automatically created
Billing info can also be exported to files inside GCS buckets or into BigQuery datasets for analysis

Labels, tags and quotas

Labels

Labels are user defined strings in key-value format that can be added to GCP resources
They are used for organizing resources. They propagate through billing and are useful for filtering

Quotas

All resources in GCP are subject to quotas or limits
Quotas define:
- how many resources can be created per project (15 VPC networks per project)
- how quickly can API requests be made in a project: rate limits (ex: 5 admins actions per second on Cloud Spanner)
- how many resources can be created per region (ex: 24 CPUs per region)
Some quotas increase automatically based on the usage. Other require users to make quota increase requests
To see quota values or request quota increase for a given project, go to the quota page in Cloud console
Here are 3 reasons why quotas are used:
- prevent runaway consumptions in case of error or malicious attack
- prevent billing spikes or surprises
- forces sizing considerations and periodic review

Identity and Access Management (IAM)

IAM overview

'IAM' allows admins to apply strategies that define 'who' can do 'what' on 'which resources'
The 'who' could be a Google account, a Google group, a service account or a Cloud Identity domain
The 'what' is defined by an 'IAM role', which is a resource that contains a set of permissions
Assigning a role to the 'who' entity grants him all the permissions included inside the role
There are many predefined roles for different types of GCP resources/services that simplify authorization management
IAM strategies are inherited according to the GCP resources hierarchy:
- strategies created at the organization level are inherited by all folders inside the organization and therefore by all projects inside those folders
- strategies created at the folder level are inherited by all projects inside the folder
- strategies created inside a project apply only for the project
A deny rule always take precedence on any allow rule regardless which IAM role has been given. IAM always check deny rules before allow ones
Deny and allow rules are inherited according to the resources hierarchy

IAM roles

IAM roles contain a set of permissions that can be used to authorize principals (users, groups, service accounts...) to perform specific actions in GCP
There are 3 types of IAM roles:
- Basic roles: grant read, write or administrator access to all resources of a project
  - Viewer: grant read access to all resources of a project
  - Editor: grant read and write access to resources of a project
  - Owner: grant administrator access to all resources of a project (read + write + permissions to grant permissions + billing admin permissions)
  - Billing Admin: for billing management
- Predefined roles: roles that are specific to some GCP resources. Ex: Compute Instance Admin. That role grants administration privileges on Google Compute Engine
- Custom roles: contain a set of permissions defined by GCP users. Can only be applied at organization or project level, not on folders
- The following roles are required to manage IAM roles:
  - roles/iam.organizationRoleAdmin: grants roles administration privileges at the organization level
  - roles/iam.roleAdmin: grants roles administration privileges at the folder or project level

Service accounts

A 'service account' is an identity (and also a resource) that is meant to be used by other GCP resources, applications or external systems like CI/CD pipelines to authenticate with GCP and perform specific actions
Many resources in GCP use default service accounts that are automatically created and managed by Google
Attaching a service account to a GCP resource give that resource the identity of the service account
Service account 'keys' can be generated and exported for use by applications or systems outside of GCP for authentication
Actions that can be performed by entities using a service account are determined by the service account permissions
Service accounts are identified by unique email addresses than can be in the following forms:
- PROJECT_NUMBER-compute@developer.gserviceaccount.com
- PROJECT_ID@appspot.gserviceaccount.com
- PROJECT_NUMBER@cloudservices.gserviceaccount.com
- ...
Granting users the 'ServiceAccountUser' role grant them the privileges to perform specific actions as specific service accounts. Ex: a user perfoming a stop action on a GCE Virtual Machine (that he is not allowed to), will succeed if he has been granted the 'ServiceAccountUser' role and the service account attached to the GCE is allowed to stop the machine
For more on service accounts, have a look at service-account-overview

Products and services overview

Networking and security

VPC sketches

VPC and subnets

VPC (Virtual Private Cloud) represent a network in GCP
The VPC resource is global, and subnets inside the VPC are regional
For all GCP projects, a VPC named 'default' is automatically created
Inside that 'default' VPC, there are subnets in each available GCP regions
There are two modes of VPC creation:
- auto: subnets are automatically created in all available regions
- custom: the VPC is created without subnets. Subnets should be created manually
The predefined IAM role for network administration is:
- Network Admin: for managing VPC networks only. Gives the following permissions: compute.networks.*

Firewall rules

Firewall rules are used to allow or deny incoming or outgoing traffic on a specific network (VPC)
When we create a firewall rule, we specify the VPC for which we want the rule to apply
The rule apply by default to all subnets of that VPC, therefore on computes using at least one of the VPC subnets
To make the rule apply only for specific computes, source/target service accounts or tags can be specified in the rule
That way, the rule will only apply for computes using the source/target service account or tagged with the specified network tag
The predefined IAM role for security administration is:
- Security Admin: for managing firewall rules only. Gives the following permissions: compute.firewalls.*

Load Balancers and WAF (Cloud Armor)

For a good overview about Load Balancing in GCP, have a look at Understanding GCP Load Balancers.

For Cloud Armor, here are useful links:

Virtual machines

GCE sketches | VMs hosts maintenance

Google Compute Engine (GCE)

Google Compute Engine (GCE) is the GCP service that can be used to provision Virtual Machines (VM or compute instances)
2 machine types: 'predefined' and 'custom'
3 machine type families: 'general purpose', 'compute optimized', 'memory optimized'
Network bandwidth (outbound/egress) of virtual machines increases when the number of vCPUs increases
The maximum bandwidth is 32 Gbps on standard virtual machines
To increase the bandwith the 'VM Tier 1' networking performance can be enabled. The 'gVNIC interface card' can then be used to increase the maximum bandwidth to 100 Gbps
Virtual machines logical disks also consume network bandwidth (up to 60% of the network egress bandwidth) because they are network attached
Increasing VMs disks sizes also increase theirs I/O performance
Multiple network interfaces (from different network subnets) can be used to allow communication with other GCEs on those subnets
The number of additional network interfaces that can be added to a GCE depends on the GCEs machine size
GCE VMs authorizations are the same as those of the service account attached to them + the configured 'access scopes'
A default service account is automatically created and attached to every VMs that do not have a custom service account attached. That default compute service account ID has the following form:
- PROJECT_NUMBER-compute@developer.gserviceaccount.com
When creating a compute instance with 'gcloud', the '--scopes=[SCOPE,...]' and '--service-account=SERVICE_ACCOUNT' options can be used to set access scopes and a custom service account for the VM
Access scopes values can be specified as an URI or an alias. Ex:

Alias                  URI
compute-ro             https://www.googleapis.com/auth/compute.readonly
compute-rw             https://www.googleapis.com/auth/compute
logging-write          https://www.googleapis.com/auth/logging.write
monitoring             https://www.googleapis.com/auth/monitoring
monitoring-read        https://www.googleapis.com/auth/monitoring.read
monitoring-write       https://www.googleapis.com/auth/monitoring.write
sql-admin              https://www.googleapis.com/auth/sqlservice.admin
storage-full           https://www.googleapis.com/auth/devstorage.full_control
storage-ro             https://www.googleapis.com/auth/devstorage.read_only
storage-rw             https://www.googleapis.com/auth/devstorage.read_write
(...)

Use 'gcloud compute instances create --help' and search for '--scopes' or '--service-account' for more (detailed descriptions, available scopes aliases...)

The metadata server

A server that can be queried from compute instances (without additional authentication)
Stores info about the GCE instance or project (VM name, project ID, IPs, service account...)
Communication with the metadata server are encrypted and never leave the physical host on which the VM is running
Here are the root URLs that can be used to query the metadata server:
- http://metadata.google.internal/computeMetadata/v1
- http://169.254.169.254/v1
- http://metadata.goog/v1
When querying the metadata server, the root URLs should be completed with queries URIs for project or instance metadata
The root URI for querying 'project metadata' is '/project':
- the list of available URIs for project metadata can be found here
The root URI for querying 'instance metadata' is '/instance':
- the list of available URIs for instance metadata can be found here
When querying the metadata server, we need to set the 'Metadata-Flavor: Google' request header to tell the server that we need to retrieve metadata. Otherwise the request will be refused
Exemple query:

curl "http://metadata.google.internal/computeMetadata/v1/$metadata-uri" -H "Metadata-Flavor: Google"

$metadata-uri:
- URI of a metadata key returning a single value, ex:
  - instance/image
  - instance/tags
- URI of a metadata directory returning a list of other available URIs, ex:
  - instance/disks
The 'alt' query parameter can be used to format the data. Possible values are 'text' or 'json'. Ex:
- ${metadata_server_root_url}/instance/tags?alt=text
The metadata server entries Etags are present in response headers. A different Etag value means a different metadata value version (value has been updated)
The metadata server responses status codes and meanings can be found here
The metadata server also supports custom metadata that can be set during VMs creation or for an already existing VM
Getting metadata info from the CLI:

# Project metadata
gcloud compute project-info describe --flatten="commonInstanceMetadata[]"

# Instance metadata
gcloud compute instances describe $vm_name --flatten="metadata[]"

Spot or preemptible GCE

Preemptible GCE

A GCE instance that is not guaranteed to be available when needed. Google can use its reserved computes resources for other purposes at anytime
Made for batch, fault tolerant and high throughput computing
Super low cost, short term instances (up to 91% cost saving compared to standard instances)

Startup & shutdown scripts

Startup scripts

Scripts that are executed at GCE instances startup or shutdown
The scripts are executed in best effort only
They are run by the 'root' user on Linux and the 'System' user on Windows
Can be configured at the VM or project level. Project level scripts trigger for every VMs of the project
VMs level scripts take precedence over project level ones
Shutdown scripts have timeouts:
- '90s' for 'standard' VMs
- '30s' for 'premptible' VMs
Passed as 'metadata' to VMs, either 'directly' or from a 'file':

# Script passed at command line
gcloud compute instances create myvm --metadata=startup-script='#!/bin/bash
      the rest 
      of the script'`

# Script passed from a file
# The file path can also be a GCS bucket URL

# Startup
gcloud compute instances create myvm --metadata-from-file=startup-script=<file_path>

# Shutdown
gcloud compute instances create myvm --metadata-from-file=shutdown-script=<file_path>

Sole-Tenant Nodes

Sole-tenant nodes | Sole-tenant nodes blog post

A service that can be used to reserve dedicated physical machines that will be used as hypervisors to create VMs
Mix and match VMs to consume host resources

Managed Instance Groups (MIG)

Instance groups

A service that can be used to run VMs at scale (up to thousands of VMs)
Supports auto scaling, healing and updating
Stateless or stateful, regional, single or multi-zone instances
MIG autoscaling could be based for example on:
- the CPU utilization
- the number of HTTP(s) requests
- the standard or custom metrics coming from the Cloud Monitoring service
MIG also has a feature for scheduling auto scaling by specifying for example:
- the min and max compute instances
- the duration
- the start time
- the recurrence
- (...)
MIG supports multiple rollout strategies like 'rolling updates' or 'canary'
Here are some important keywords for rolling updates:
- maxSurge: how many extra instances to temporarily over provision (above the desired number)
- minReadySeconds: the minimum amount time to wait for instances before they are actually considered healthy
- maxUnavailable: the maximum number of instances that can be unavailable during the update
- targetSize: how many instances to update

Pricing and cost optimization

Price calculator | GCE discounts | Custom machines types

Use 'Preemptible VMs' for fault tolerant workloads
Sustained Use Discounts (SUD):
- up to 30% saving on GCE and Cloud SQL
Committed Use Discounts (CUD):
- up to 70% saving without upfront fees or instance type lock-in
Per second billing:
- up to 38% savings by paying per seconds, not per hour
Rightsizing:
- choose optimal GCE families and custom machine types
Network service tiers:
- performance: 70% more bandwidth
- cost saving: 9% saving compared to other clouds

Google Cloud Storage (GCS)

GCS sketches

A durable and highly available object storage
Object storage is a computer data storage architecture that manages data as an object, not as a file and folder hierarchy (file storage) or a chunk of a disk (block storage)
Objects are stored in a packaged format:
- binary form of the data
- relevant associated metadata (creation date, author, type, permissions...)
- Globally unique identifier (in the form of URL => good interaction with web technologies)
The maximum object size is 5TB
Commonly used as Binary Large Object Storage (BLOB) for online content (videos, pictures, audio), backup and archiving, storage of intermediate results and processing workflows
Could also be used for:
- serving website content
- archival and disaster recovery
- direct download
To use GCS we need to create buckets for storing the data
GCS buckets names are unique in the world
When creating a GCS bucket, we have the choice between regional, dual-regional or multi regional availability
Here are some interesting features of GCS buckets:
- 'unlimited' storage
- fully managed
- scalable
- low latency
- high durability
- worldwide accessibility
- geo redundancy
- TLS, HTTPS...
Objects stored inside GCS buckets are immutable... new versions are created with every change made
New objects versions by default overwrite the old ones. By enabling 'object versioning' older object versions are kept and the history can be used to restore a specific object version
IAM roles and 'ACLs (Access Control List)' can be used to give or restrict access to GCS buckets. ACLs are used for finer control
Cloud storage offers 'lifecycle policies' that can be configured to perform automated actions. Ex:
- delete objects older than 365 days
- delete objects created before MM/DD/YY
- keep only the 3 most recent versions
Cloud storage also offers 'retention policies' that can be used to prevent modification or deletion of objects for a specific period of time
'Retention policies' and 'object versioning' are mutually exclusive with each other (when one is active, the other is deactivated)
It is also possible to lock the 'retention policy'. In that case, no one will be able to change the 'retention policy' settings until the retention period expires
Cloud storage also offers 'storage classes':
- Standard: better for frequently accessed or hot data and data that are stored for only a brief period of time => high access frequency + low retention
- Nearline: better for data accessed once in a month
- Coldline: better for data accessed once every 3 months
- Archive: better for data accessed once a year (lowest cost option)
- Autoclass: automatically move objects between previous four storage classes to optimize cost and data access based on each object access pattern. Less frenquently accessed data are moved to colder storage to optimize storage cost and frequently accessed ones from colder storage are moved to standard storage to optimize future accesses
GCS buckets pricing is as follows:
- there is no minimum fee, you pay what you use:
  - price for the volume of data stored (vary per storage class)
  - price for the volume of network egress traffic
For transferring data to GCS buckets we can use:
- the 'gsutil' command line utility
- the 'cloud console'
- the 'Storage Transfer Service' for large data transfer (TB or PB):
  - schedules and manage batch transfers from other cloud providers, from a different GCS region or from an HTTPS endpoint
- the 'Transfer appliance', which is a rackable, high capacity storage server (up to Peta Byte of data) leased from Google Cloud that should be connected to the internal network to upload data, then shipped to an upload facility to upload the data to GCS
GCS also has an integration with other Google Cloud products like CloudSQL, BigQuery... For example, we can:
- import/export tables to/from BigQuery or CloudSQL
- store App Engine logs
- store objects used by App Engine apps (images...)
- store Firestore backups, CloudSQL backups....
- store GCE startup scripts, GCE images...

Database resources

Databases sketches

Cloud SQL

Cloud SQL sketches

Cloud SQL is a fully managed relational database service for MySQL, PostgreSQL and SQL Server
Scales up to 128 processor cores, 864 GB of RAM and 64 TB of storage
Supports fully automated backups and restores
The cost of an instance covers 7 backups
The data are encrypted accross the GCP internal network and at rest
Supports read replicas, regional and multi regional instances for high availability

AlloyDB

AlloyDB overview

Fully managed PostgreSQL compatible database service
For apps such as hybrid transactional and analytical processing
Fast transactional processing, highly available
Automated administrative tasks: backups, replication, patching, capacity management

Cloud Spanner

Spanner sketches

Fully managed relational database service
Highly available, scales horizontally, strongly consistent, speaks SQL, automatic replication => best of relational and non relational databases
~ Petabyte capacity
Used by some of Google mission critical applications

Cloud Firestore

Firestore sketches

Flexible, horizontally scalable, NoSQL database
ACID transactions => if any operation in the transaction fails and cannot be retried, the whole transaction will fail
Automatic multi-region replication and strong consistency
Run complex queries without performance degradation
Data stored in documents that are organized into collections
Documents contain a set of key/value pairs
Max unit size: 1 MB per entity
Firestore noSQL queries can be used to retrieve individual or all documents in a collection. Can include multiple chained filters or combine filtering and sorting options
Queries are indexed by default => performance proportional to the size of the result set, not the data set
Firestore also provides:
- automatic multi-region data replication
- strong consistency guarantees
- atomic batch operations
- real time transaction support
- offline queries
Pricing:
- charged for each document read, write and delete
- queries charged as one document read per query
- the amount of database storage and network bandwidth used (minus up to 10 GB free egress per month between regions in the US)
Have free quotas per day:
- 50,000 documents reads
- 20,000 documents writes
- 20,000 documents deletes
- 1 GB of stored data

Cloud Bigtable

Bigtable sketches

NoSQL Big data database service
Not ACID => good for where transactional consistency is not required
Petabyte scale, high read and write throughput and low latency
Power Google Search, Analytics, Maps and Gmail
Ideal for adtech, fintech, IOT and ML applications
Easy integration with open source Big Data tools like Hadoop and Apache Hbase, but also GCP services like Cloud Dataflow and Cloud Dataproc
Max unit size: 10 MB per cell, 100 MB per row
Good for:
- storing large amount of structured objects
- analytical data with heavy read and write events
The smallest Bigtable cluster:
- 3 nodes, up to 30000 operations per seconds
- remember, you pay for those nodes weather apps are using them or not
Use Bigtable in these cases:
- Your data size is at least 1 TB
- You have a high volume of writes
- You need read/write latency < 10 ms
- You need strong consistency
- You need a HBase compatible API
Otherwise, use Cloud Firestore

Container resources

Cloud Run

Cloud Run sketches

A managed compute platform that can run stateless containers via web requests or pub/sub events
Serverless => remove the need for infrastructure management
Built on Knative, an API and runtime environment built on Kubernetes
Fast, can scale up and down from zero almost instantaneously, charging only for the resources used
Cloud run developer workflow:
- write app code. App must start a server that listens to web requests
- build + package app into a container image and push image into Artifact Registry
- deploy app container image into Cloud Run
Once deployed, we get a unique HTTPS URL back for accessing the app
Cloud Run then starts the app containers on demand to handle requests and ensures that all incoming requests are handled by dynamically adding and removing containers
Using the 'container-based workflow' provides transparency and flexibility
With Cloud Run we can as well use the 'source-based workflow' to deploy our apps
The 'source-based workflow' deploy the source code instead of a container image. Cloud Run automatically build and deploy a container image containing the app source code by using 'buildpacks' (an opensource project)
Pricing model:
- pay only for system resources used while containers handle web requests, with a granularity of 100ms, and during containers startup and shutdown... don't pay if container doesn't handle requests
- there are also small fees for every 1 Million requests served
- the price of container time increases with cpu and memory... A container with more vCPU and memory is more expensive
- Cloud Run can run any binary compiled for Linux 64 bits => popular languages like Java, Python, Node.js, PHP, GO, C++ can be used to run web applications in Cloud Run. Less popular languages like Cobol, Haskell, Perl could also be used

Google Kubernetes Engine (GKE)

GKE sketches | GKE docs | GKE versions and features

Here are some important things we need to know when creating a GKE 'standard' cluster.

Cluster network

The main private subnetwork range associated with the cluster will be used to assign IP addresses to the cluster 'nodes' and servives of type 'LoadBalancer'. That subnetwork is also called the primary subnetwork
In addition to the primary subnetwork range, a GCP network (VPC) can also have secondary subnetwork ranges associated with it
Secondary subnetwork ranges will be used to assign IP addresses to the GKE cluster's 'pods' and 'services' with type other than 'LoadBalancer'
You choose which secondary subnetwork to use for 'pods' or 'services' when creating a nodepool in the cluster, by specifying the name of the secondary subnetwork
You can create N secondary subnetworks for use by 'pods' and 'services' of your nodepools as you want. You are just limited by the number of secondary subnetworks you can add to a GCP network (VPC)

Number of pods per nodes

The number of pods per nodes you choose when creating a nodepool is very important because it also defines how many nodes you will be able to create in that nodepool
Suppose the '10.100.1.0/20' network range is the one reserved for pods on the main nodepool. The smallest and largest subnets ranges that can be reserved for pods on a single node in that nodepool are respectively '/28' and '/24' taken from the '10.100.1.0/20' network
The '/28' subnet range will be reserved for pods on nodes of the main nodepool when the pods per node value is set to '8' and the '/24' reserved when that value is between '65' and '110'
If a '/28' is reserved, and because in the '10.100.1.0/20' we have '256' subnets of size '/28', the number of nodes you will be able to create in the main nodepool will be '256'
If a '/24' is reserved, and because in the '10.100.1.0/20' we have '16' subnets of size '/24', the number of nodes you will be able to create in the main nodepool will be '16'
The relation between the value of pods per nodes on a given nodepool and the size of the subnet that is reserved for pods on that nodepool is described on this page : gcp flexible pod cidr

Serverless resources

Cloud Functions

Cloud Functions sketches

Lightweight, event-based, asynchronous compute solution
Allows to create small, single-purpose functions that respond to cloud events without the need to manage servers or runtime environments
Can be used to construct applications workflows from individual business logic tasks and connect and extend cloud services
Billed to the nearest 100 ms, and only while code is running
Supported languages include: Node.js, Python, Go, Java, .Net Core, Ruby and PHP
GCS and Pub/Sub events can trigger 'Cloud Functions' asynchronously
HTTP invocation can also be used to execute Cloud Functions synchornously

Data analytics resources

Data analytics sketches | Data analytics pipeline sketches

Dataflow: managed data processing service
Dataproc: managed Spark and Hadoop service
BigQuery service: edge between data storage and data processing. Big data analysis and interactive query capabilities
Data/looker studio: turn data into informative dashboards that are fully customizable, easy to read and share

Other data store

Filestore

Filestore product page | Filestore docs

Managed NAS for GCE and GKE
Predictable performance
Full NFSv3 support
Native file locking
Up to ~ 100 TB capacity

Memorystore

Memorystore product page | Memorystore docs

In-memory data store service
High availability, failover, patching and monitoring
Up to 300 GB storage space, sub-millisecond latency
13 Gbps network throughput
Fully compatible with Redis protocol => lift and shift your apps without any code changes

Artifact registry

Artifact registry docs

Store container images and applications packages... can proxy to official packages repositories for many programming languages (npm, pypi, maven...).

Monitoring and logging

Monitoring

GCP's monitoring service is called Cloud Monitoring (formerly Stackdriver)
Monitoring is the base of Site Reliability Engineering (SRE)
SRE is a discipline that was born at Google
Google uses SRE principles to build, monitor and maintain some of the largest software systems in the world
SRE applies some software engineering principles to operations for creating software systems that are very reliable and easy to evolve
SRE principles include 'monitoring', 'incident response', 'post-morterms' or 'root cause analysis', 'testing and release procedures', 'capacity planning', etc.
GCP's Cloud Monitoring service dynamically configures monitoring after a resource is deployed, with intelligent defaults
This allows platforms monitoring through:
- the metrics of platforms, systems and applications
- dashboards/charts and alerts
- uptime/health checks

Logging

GCP's logging service is called Cloud Logging (formerly Stackdriver)
All logs of resources inside a GCP project are stored inside GCS buckets
To see buckets used for storing logs of a project use:

$ gcloud logging buckets list
LOCATION  BUCKET_ID  RETENTION_DAYS  CMEK  RESTRICTED_FIELDS  INDEX_CONFIGS  LIFECYCLE_STATE  LOCKED  CREATE_TIME  UPDATE_TIME
global    _Default   30                                                      ACTIVE
global    _Required  400                                                     ACTIVE           True

Project logs are by default stored inside the _Default and _Required buckets
Inside 'logging buckets', logs are organized into 'logs containers' (depending on their types)
To list logs containers for a project, use:

$ gcloud logging logs list
projects/myproject/logs/GCEGuestAgent
projects/myproject/logs/OSConfigAgent
projects/myproject/logs/cloudaudit.googleapis.com%2Faccess_transparency
projects/myproject/logs/cloudaudit.googleapis.com%2Factivity
projects/myproject/logs/cloudaudit.googleapis.com%2Fdata_access        
projects/myproject/logs/cloudaudit.googleapis.com%2Fsystem_event       
projects/myproject/logs/clouderrorreporting.googleapis.com%2Finsights  
projects/myproject/logs/cloudscheduler.googleapis.com%2Fexecutions     
projects/myproject/logs/cloudsql.googleapis.com%2Fmysql.err
projects/myproject/logs/cloudsql.googleapis.com%2Fpostgres.log
projects/myproject/logs/compute.googleapis.com%2Fhealthchecks
projects/myproject/logs/compute.googleapis.com%2Fshielded_vm_integrity
projects/myproject/logs/container-runtime
projects/myproject/logs/container.googleapis.com%2Fcluster-autoscaler-visibility
projects/myproject/logs/diagnostic-log
projects/myproject/logs/docker
projects/myproject/logs/events
(...)

To read logs from any logs containers, use:

gcloud logging read "LOG_FILTER"

We can optionally use the following options:
- --order=ORDER: desc (default) or asc to change the order in which logs are shown
- --organization=ORGANIZATION_ID, --project=PROJECT_ID, --folder=FOLDER_ID: select the GCP organization, project or folder to read logs from
- --bucket=BUCKET, --location=LOCATION: select the bucket and bucket location from where to read logs
- --limit=LIMIT: limit the number of returned entries
LOG_FILTER is a filter expression that is used to retrieve specific log entries. For details, see gcp-logging-query-language
Here is an exemple:

# Activity logs of a pod in specific time interval

$ gcloud logging read 'protoPayload.resourceName:mypod AND protoPayload.resourceNamespace:mynamespace AND logName:projects/mygcpproject/logs/cloudaudit.googleapis.com%2Factivity AND timestamp>="2024-05-14T06:05:54.238629139Z" AND timestamp<="2024-05-14T13:16:54.238629139Z"' --limit 1
---
insertId: c9bfcdbf-9be3-4401-8fc0-b9bbdaaca1dc
labels:
  authorization.k8s.io/decision: allow
  authorization.k8s.io/reason: 'RBAC: allowed by ClusterRoleBinding "system:controller:generic-garbage-collector"
    of ClusterRole "system:controller:generic-garbage-collector" to ServiceAccount
    "generic-garbage-collector/kube-system"'
logName: projects/mygcpproject/logs/cloudaudit.googleapis.com%2Factivity
operation:
  first: true
  id: c9bfcdbf-9be3-4401-8fc0-b9bbdaaca1dc
  last: true
  producer: k8s.io
protoPayload:
  '@type': type.googleapis.com/google.cloud.audit.AuditLog
  authenticationInfo:
    principalEmail: system:serviceaccount:kube-system:generic-garbage-collector
  authorizationInfo:
  - granted: true
    permission: com.victoriametrics.operator.v1beta1.vmservicescrapes.delete
    resource: operator.victoriametrics.com/v1beta1/namespaces/mynamespace/vmservicescrapes/mypod
  methodName: com.victoriametrics.operator.v1beta1.vmservicescrapes.delete
  requestMetadata:
    callerIp: 10.201.10.42
    callerSuppliedUserAgent: kube-controller-manager/v1.27.11 (linux/amd64) kubernetes/2cefead/system:serviceaccount:kube-system:generic-garbage-collector
  resourceName: operator.victoriametrics.com/v1beta1/namespaces/mynamespace/vmservicescrapes/mypod
  serviceName: k8s.io
  status:
    code: 0
receiveTimestamp: '2024-05-14T08:55:59.938400352Z'
resource:
  labels:
    cluster_name: mygkecluster
    location: europe-west1
    project_id: mygcpproject
  type: k8s_cluster
timestamp: '2024-05-14T08:55:58.827020Z'

Here are details about required permissions for managing logs:
- Required permissions for exploring logs
- IAM logging roles
GCP also has a web UI log explorer interface available through Cloud console. Here is a link for an overview:
- GCP logs explorer
All logs entries are instances of LogEntry. Here is the current list of the fields of a log entry:
- Log entry fields
For tailing logs from command line, have a look at:
- Tailing GCP logs with gcloud
For more about reading and writing logs usging 'gcloud', have a look at:
- gcloud-logging
Here is a link for the 'logName' field format: logName value format
Here is the documentation for the list of available Platform logs (values for 'logName' field of log entries) and their descriptions + the associated resources types (values for the 'resource.type' field of log entries):
- Platform logs list
Here is a mapping of GCP services to logging resources types: GCP services to logging resource type mapping
Other useful documentation links:

Audit logs

GCP audit logs | GKE audit logs

'Audit logs' represents logs of principals (users, service accounts...) activities accross GCP, regarding resources and resources data access and modifications
Required roles for viewing audit logs:
- roles/logging.viewer: read logs other than 'Data Access' audit logs that are stored inside the _Default bucket
- roles/logging.privateLogViewer: access to all logs inside the _Required and _Default buckets including 'Data Access' logs
Types of audit logs:
- Admin Activity audit logs: API calls or actions that modify resources configurations or metadata. Always written. Can't be disabled. Audit logs 'sub-type' for this type of logs is ADMIN_WRITE.
  - Example actions written as 'Admin Activity' audit log: create or modify a Cloud storage bucket resource
- Data Access audit logs: API calls or actions that read resources configurations or metadata + user driven API calls that create, modify or read. Disabled by default. Should be activated per GCP service using the following audit logs 'sub-types':
  - DATA_READ: logs data read actions only
  - DATA_WRITE: logs data creation and modifcation actions
  - ADMIN_READ: logs resources configurations or metadata read actions
  - Example actions written as 'Data Access' audit logs: read a Cloud storage bucket resource metadata, create, read or delete a Cloud storage bucket object
  - To enable 'Data Access' audit logs use this
- System Event audit logs: logs entries for Google Cloud actions (not users direct actions) that modify resources configurations. Always written. Can't be disabled.
- Policy Denied audit logs: logs entries generated when a Google Cloud service denies access to a user or service account because of a security policy violation. Written by default. Project is charged for storing those logs. Exclusion filters can be used to prevent those logs from being stored in Cloud Logging (to limit logs volume fees for instance)
Here are the logs names for each audit logs type:
- activity: projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Factivity
- data access: projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Fdata_access
- system event: projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Fsystem_event
- policy denied: projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Fpolicy
Other useful documentation links: