High Availability and Scaling

Regional and Global AWS Architecture

R53 (DNS Service)
Global Service Location and Discovery
- how does your machine discover where to point at
Global Health Checks and Failover
- detecting if infrastructure is healthy or not in a one location and moving customers to another Country as required
CloudFront
Content Delivery (CDN) and optimization
- how does content get to users globally : from distributed or central location
- cache content globally as close to end user as possible to improve performance

Architecture

web tier
provide entry point to customer
abstracts internals away from customer
compute tier
provide functionality for the customer
storage tier
provide storage for compute infrastructure
cache tier
faster data access by caching data
reduce db reads, to improve performance and reduce costs
db tier
data storage
app service
provide functionality to applications like queues, data streaming etc.
Regional Scaling and Resilience

Elastic Load Balancer Evolution

3 types of load balancers (ELB) available
split between between v1 (avoid) and v2(prefer)
CLB (v1), lacking features, more expensive
Application Load Balancer (ALB)
HTTP/S/WebSocket
Network Load Balancer (NLB)
TCP/TLS/UDP
email, ssh
v2 : faster, cheaper, support target groups and rules

Elastic Load Balancer Architecture

accept connections from customers and distribute them across any registered backend compute
abstracts user away from physical infrastructure
means the amount of infrastructure can change without affecting customers
infrastructure can fail and be repaired, hidden from customers
pick if using ipv4 only or ipv4 and ipv6
pick which AZs the load balancer will use
2 or more AZs
one subnet per AZ
one load balancer is made up of many nodes
product places one or more load balancer nodes into the subnets
load balancer created with a DNS record
A record
points to all of the ELB nodes created with the product
incoming requests are distributed equally across all the nodes
nodes scale within the AZ
- HA : if one fails it is replaced
- if load increases, then addition nodes are provisioned
Internet facing: have private and public IPs
Internal : have only privates IPs
generally used to separate different tiers of applications
- like web,app,db etc.
allows tiers to scale independently of each other
Nodes are configured with listeners
controls what the load balancer is listening to
accept traffic on a part and protocol
communicate with targets on a port and protocol
nodes can make connections with instances that are registered with the load balancer
can connect to both public and private instances
need 8+ freeIPs per subnet and a /27 or larger subnet to allow for scale
Cross-Zone Load Balancing
allows node to distribute connections equally across all registered instances across all AZs
allows for more even load balancing : when different AZs have unequal compute infrastructure

Application vs Network Load Balancers (ALB vs NLB)

Load Balancer Consolidation - Classic Load Balancer - clb has an attached ssl certificate and autoscaling group - clb distributes incoming connections to instances - doesn't scale; every unique https application name requires an individual clb because SNI isn't supported

V2 Load Balancers
can use one load balancer for multiple applications
- allows consolidation
listener based rules
- can hold multiple ssl certificates
host based rules
- using SNI
- direct incoming connections at multiple target groups
target groups forward connections to multiple scaling autoscaling groups
Application Load Balancer (ALB)
Layer 7 load balancer
- listens on HTTP or HTTPS only
- must have SSL certs if HTTPS is used
connection are terminated on the ALB
- no unbroken SSL connection
- a new connection is made to the application
slower than NLB : since more levels of network stack to process
application aware health checks
rules
- direct connections which arrive at a listener
- processed in priority order
- default rule = catchall
- rule conditions
- content type, cookies, custom headers, user location and app behavior
- host-header, http-header, http-request-method
- path pattern, query string
- source ip
- actions
- forward, redirect, fixed-response, authenticate-oidc, authenticate-cognito
Network Load Balancer (NLB)
layer 4 : TCP, TLS, UDP, TCP_UDP
- no HTTP or HTTPS
really fast (millions of rps, 25% of ALB latency)
health checks are not application aware
can have static IP's : useful for whitelisting
can forward TCP to instances
- unbroken end to end encryption
used with private link to provide services to other VPCs

Deciding on One - NLB - unbroken encryption - static IP for whitelisting - fastest performance - protocols not HTTP or HTTPS - private link - ALB - anything else

Launch Configuration and Templates

define the configuration of an ec2 instance in advance
AMI, Instance Type, Storage, Key Pair
Network and Security Groups
Userdata and IAM role
not editable
LT has versions
LT has newer features
recommended over LC
Placement Groups
Capacity Reservations
Elastic Graphics
T2/T3 Unlimited
LC are used for autoscaling groups
LT
used for autoscaling groups
used to launch ec2 instances directly

Auto Scaling Groups

configure ec2 to scale automatically depending on demand
When and Where to launch to
self healing
uses ec2 health checks
terminates bad ones, and provisions a new one in its place
uses launch template/configuration to know what to launch
minium, desired, max
x:y:z
keep running instances at the desired capacity byb provisioning or terminating instances
scaling policies
update desired based on metrics ie) cpu load
manual scaling : manually adjust the desired capacity
scheduled scaling : time based adjustment
- for periods of low or high usage
dynamic scaling
- simple
- rule based on metric to provision or remove instances
- stepped scaling
- bigger +/- based on difference
- allow you to act quicker to extreme changes
- preferred over simple
- target tracking
- define a metric to maintain
run within a vpc
subnets within the vpc are configured on the autoscaling groups
configured subnets will be used to provision instances
there will be an attempt to keep the number of instances in each subnet equal
cooldown period
how long to wait after a scaling action before doing another one
Integration with Load Balancers
used with ALB for elasticity
ASH instances can automatically be added to or removed from a load balancers target group
can use load balancers health checks
Scaling Processes
launch and terminate
- suspend and resume
AddToLoadBalancer
- if instances is added to LB on launch
AlarmNotification
- accept notifications from CW
AZRebalance
- balance instances evenly across all of the AZs
HealthCheck
- on/off
ReplaceUnhealthy
ScheduledActions
- on/off
Standby
- protect instance from ASG, when doing maintenance
Cost
ASH are Free
only resources created are billed
use cooldowns to avoid rapid scaling
think about more smaller instances

ASG Scaling Policies

ASG can be created without scaling policies
in this case min, max, desired capacity are static
manual scaling
manual change scaling
for test or urgent situations
Dynamic scaling
automatically scaling based on a criteria
Simple scaling
- define actions which occur when an alarm moves into an alarm state
- ex) cpu utilization less than 40%
- not flexible, or efficient
- same amount added or removed irrespective of size of load change
Step Scaling
- more flexible, more conditions possible
- adjustments vary based on the size of alarm breach
- larger load changes can be configured to add or remove more than a lesser load change
Target Tracking
- define an ideal value for a metric
- autoscaling groups makes adjustments to get close to the ideal metric value
Scaling based on SQS
- ApproximateNumberOfMessagesVisible
- scale based on the number of messages in the queue

ASG Lifecycle Hooks

custom actions on instances during ASG actions
instance launch or terminate transitions
instances are paused
until a timeout (then either continue or abandon)
you complete the lifecycle action withCompleteLifecycleAction
can be integrated with EventBridge or SNS Notifications
Simple flow:
Scale out
- Pending -> Pending Wait -> Pending : Proceed -> inService
Scale in
- Terminating -> Terminating Wait -> Terminating Proceed -> Terminated
send messages to SNS or EventBridge

ASG HealthChecks

ASG assess health of instances within a group using health checks
if instance fails a health check, it is replaced
EC2 Checks
Stopping, Stopped, Terminated, Shutting Down or impaired (not 2/2 status) => UNHEALTHY
ELB Check
running and passing ELB health check => HEALTHY
can be more application aware
Custom Check
instances marked healthy or unhealthy by an external tool
health check grace period (Default 300s)
delay before starting checks
allows system launch, bootstrapping and application start

SSL Offload & Session Stickiness

three different ways ELB can handle SSL
SSL Bridging
- default
- listener is configured for HTTPS
- connection is terminated on the ELB, and needs a certificate for the domain name
- LB initiates a new SSL connection to backend instances
- pros : ELB gets to see unencrypted HTTP an can take actions
- cons : ELB and instances require SSL certificates and instance need compute required for cryptographic operations
SSL Pass Through
- NLB passes clients connection directly to instances
- Listener listens on TCP
- pros : no certificate exposure to aws
- cons : no load balancing based on HTTP since it is never decrypted, instances still need SSL certs and compute for cryptography
SSL Offloading
- Listener is configured for HTTPS, connections are terminated and then backend connections use HTTP
- pros : certificate not required on instances
- cons : data is in plaintext format in aws network
Session Stickiness
with no stickiness, connections are distributed equally across all in-service backend instances
generates a cookie which licks the device toa single backend instance for a duration
- 1s to 7 days
allow an application to function if the state of the user session is stored on an individual server
can cause uneven load on servers, hurst load balancing

Gateway Load Balancer

network security at scale
help you run and scale 3rd party appliances
like firewalls, intrusion detection, and prevention systems
inbound and outbound transparent traffic inspection and protection
GWLB endpoints : where traffic enters or leaves from
GLWB balances across multiple backend appliances
traffic and metadata is tunneled using Geneve protocol