EC2 High Availability and Scalability - Load Balancers and Auto Scaling Groups

Scalability and High Availability

Scalability: an application or system can handle greater loads by adapting
They are two kinds of scalability:
- Vertical scalability: we need to increase the size of the instance. Used for non-distributed systems such as databases
- Horizontal scalability (elasticity): we need to increase the number of instances/systems for the application. Implies having distributed systems, such as modern web applications

High Availability

High availability means running an application/system in at least 2 data centers (usually 2 AZs)
The goal of the high availability is to survive a data center loss
High availability can be passive (RDS Multi AZ deployment) or active (horizontal scaling)

Load Balancing Introduction

Load balancers are servers that forward internet traffic to multiple servers (EC2 instances)
Reasons to use a load balancer:
- Spread the load across multiple downstream instances
- Expose a single point of access (DNS) to the application
- Seamlessly handle failures of downstream instances
- Do regular health checks to the instances, redirect the traffic in case of failure to healthy instances
- Provide SSL termination (HTTPS) for our website
- Enforce stickiness with cookies
- High available across availability zones
- Separate public traffic from private traffic

Elastic Load Balancer (ELB)

An ELB is a managed load balancer, meaning:
- AWS guarantees that it will be working
- AWS takes care of upgrades, maintenance and high availability
- AWS provides only a few configuration knobs
It costs less to setup our non-managed load balancer but it will be a lot less effort to use a managed one

Types of Load Balancers

AWS offers 3 types of load balancers:
- Classic Load Balancer (v1 - old generation)
- Application Load Balancer (v2 - new generation)
- Network Load Balancer (v2 - new generation)
Overall it is recommended to use the newer v2 generation load balancers as they provide more features
We can set up external or internal load balancers

Health Checks

Health Checks are crucial for load balancers
They enable the load balancer to know if instances, to which traffic is forwarded, are available to reply to requests
The health check is done on a port and a route (/health is common)
If the response is not 200 (OK), then the instance is considered to be unhealthy

Application Load Balancers (v2)

Application Load Balancers (Layer 7) allow to do:
- Load balancing to multiple HTTP applications across machines (target groups)
- Load balancing to multiple applications on the same machine (ex: containers)
- Load balancing based on route in URL
- Load balancing based on hostname in URL
Basically they are perfect for micro-services and container based applications
ALB has a port mapping feature to redirect to a dynamic port
Stickiness can be enabled at the target group level
- Same request goes to the same instance
- Stickiness is directly generated by the ALB (not the application)
ALB supports HTTP/HTTPS and WebSockets protocols
The application servers don’t see the IP of the client directly
- The IP of the client we initiated the request is encoded in the header X-Forwarded-For
- We can also get the Port (X-Forwarded-Port) and the protocol (X- Forwarded-Proto)

Network Load Balancers (v2)

Network Load Balancers (Layer 4) allow to do:
- Forward TCP and UDP traffic to the instances
- Handle millions of requests oer second
- Support for static IP and elastic IP
NLBs have less latency (~100ms) compared to ALB (~400ms)
NLBs are mostly used for extreme performance and should not be the default load balancer of choice

Got to Know

Classic Load Balancers (CLB) are deprecated, we should use ALB or NLB instead
CLB and ALB support SSL certificates and provide SSL termination
All load balancers have health check capability
ALB can route based on hostname and path
All of the load balancers have a static host name, we should not resolve the URL to get the IP
LBs can scale but not instantly - contact AWS for warm-up
In case of NLB, we can directly see the requester IP address
4xx errors are client induced errors
5xx errors are application induced errors
- LB error 503 means the LB is at capacity or there are no registered targets
If the LB can not connect ot an application, we should check the security groups

Load Balancer Stickiness

It is possible to implement stickiness meaning the same client is always redirected to the same instance behind a load balancer
Stickiness works for both CLB and ALB
Stickiness is achieved by setting a cookie on the request. This cookie has an expiration date
Use case for stickiness: make sure to not use session data
Enabling stickiness may bring imbalance to the load over the backend EC2 instances

Load Balancer for SysOps

Application Load Balancer (ALB):
- Layer 7 (HTTP, HTTPS, WebSocket)
- URL based routing (hostname or path)
- Does not support static IP, but has a fixed DNS
- Provides SSL termination
Network Load Balancer (NLB):
- Layer 4 (TCP)
- No pre-warming needed
- I provides a static IP per subnet
- Does not provide SSL termination (SSL must be enabled by the application itself)
Fixed IP for ALB: we have to chain an NLB and an ALB the have a static fixed IP address for the ALB

Load Balancer Pre-Warming

ELB scales gradually to the actual traffic
ELB may fail in case of sudden spike of traffic (10x traffic)
If we expect high traffic, we have to open a support ticket with AWS to pre-warm the ELB. We have to answer the following questions:
- Duration of traffic
- Expected request per second
- Size of a request (in KB)

Load Balancer Error Codes

Successful request: 200
Unsuccessful at client side: 2xx
- 400: Bad Request
- 401: Unauthorized
- 403: Forbidden
- 460: Client closed connection
- 463: X-Forwarded-For header with has over 30 IP addresses (similar to malformed request)
Unsuccessful at server side: 5xx
- 500: Internal server error, would mean some error happened on the ELB itself
- 502: Bad Gateway
- 503: Service Unavailable (server overloaded)
- 504: Gateway Timeout
- 561: Unauthorized Request

SSL for Older Browser

Common question: how do we support Legacy Browsers that have and old TLS (TLS 1.0)?
Answer: change the policy to allow for weaker cipher (example: DES-CBC3-SHA for TLS 1.0)

LB Common Troubleshooting

We have to check the security groups
We have check the health checks (maybe some application is down)
Sticky sessions may bring imbalance on the load balancing side
For Multi-AZ, we have to make sure cross zone load balancing is enabled
We should use internal load balancer for private applications that don’t need a public access
We should enable deletion protection for production load balancers

Load Balancing Monitoring

All LB metrics are directly pushed to CloudWatch metrics
Metrics:
- BackendConnectionErrors
- HealthyHostCount/UnhealthyHostCount
- HTTPCode_Backend_2XX: successful requests
- HTTPCode_Backend_3XX: redirects
- HTTPCode_Backend_4XX: client errors
- HTTPCode_Backend_5XX: server errors
- Latency
- RequestCount
- SurgeQueueLength: the total number of requests or connections that are pending routing to a healthy instance. Can be used for scale out. Max value is 1024
- SpilloverCount: the total number of requests that were rejected because the surge queue is full

Load Balancers Access Logs

Access logs are disabled by default, we can enable them
Access logs from load balancers can be stored in S3 and can contain:
- Time
- Client IP address
- Latencies
- Request path
- Server response
- Trace ID
We pay only for the S3 storage in case of LB logs
LB logs are useful for compliance reasons
Helpful for keeping access data even after ELB and EC2 instances are terminated
Access Logs are automatically encrypted

Application Load Balancer Request Tracing

Request tracing: each HTTP request has an added custom header X-Amzn-Trace-Id
This is useful in logs/distributed tracing platform to track a single request
Application Load Balancer is not (yet) integrated with AWS X-Ray

LB Troubleshooting Using Metrics

HTTP 400 - BAD REQUEST: the client sent a malformed request
HTTP 503 - Service Unavailable: ensure we have a healthy instance in every AZ that our LB is configured to respond. Look for HealthyHostCount int CloudWatch
HTTP 503 - Gateway Timeout: check if keep-alive settings on EC2 instances are enabled and make sure that the keep-alive timeout is greater than the idle timeout settings of the LB

Auto Scaling Groups

The goal of an Auto Scaling Group (ASG) is to:
- Scale out (add EC2 instances) to match an increasing load
- Scale in (remove EC2 instances) to mach a decreasing load
- Ensure we have a minimum and a maximum number of machines running
- Automatically register new instances to LB
ASG attributes:
- A launch configuration:
  - AMI + Instance Type
  - EC2 User Data
  - EBS Volumes
  - Security Groups
  - SSH Key Pair
- Min size/max size/initial capacity
- Network + Subnet information
- Load balancer information
- Scaling Policies
ASG alarms:
- It is possible to scale an ASG based on CloudWatch alarms
- An alarm monitors a metric (such as Average CPU)
- Metrics are computed for the overall ASG instances (overall average)
- Based on the alarm we can create scale-out and scale-in policies
ASG Scaling new rules:
- It is now possible to define “better” ASG scaling rules that are directly managed by EC2:
  - Target average CPU usage
  - Number of requests on the ELB per instance
  - Average network in/out
- These rules are easier to set up and make more sense than the previous rules
Auto scaling based on custom metric:
- We can auto scale instances based on a custom metric (ex: number of connected users)
- This will happen as it follows:
  1. Send a custom metric from application on EC2 to CloudWatch using the PutMetric API
  2. Create a CloudWatch alarm to react to low/high values
  3. Use the CloudWatch alarm as the scaling policy for ASG

ASG - Types of Scaling

Scheduled scaling:
- Scaling based on a schedule allows us to scale the application ahead of know load changes
Dynamic scaling:
- ASG enables us to follow the demand curve for our application closely, reducing the need to manually provision instances
- ASG can automatically adjust the number of EC2 instances as needed to maintain a target
Predictive scaling:
- ASG uses machine learning to schedule the right number of EC2 instances in anticipation of traffic changes

ASG Scaling Policies

Target Tracking Scaling
- Most simple and easy to setup
- Example: we want the average ASG CPU to stay around 40%
Simple/Step Scaling
- Example:
  - When a CloudWatch alarm is triggered (example average CPU > 70%), then add 2 units
  - When a CloudWatch alarm is triggered (example average CPU < 30%), then remove 1 unit
Scheduled Actions
- Can be used if we can anticipate scaling based on known usage patterns
- Example: increase the min capacity to 10 at 5 PM on Fridays

Scaling Cool-downs

The cool-down period helps to ensure that our ASG doesn’t launch or terminate additional instances before the previous scaling activity takes effect
In addition to default cool-down for ASG we can create cool-downs that apply to specific simple scaling policy
A scaling-specific cool-down overrides the default cool-down period
Common use case for scaling-specific cool-downs is when a scale-in policy terminates instances based in a criteria or metric. Because this policy terminates instances, an ASG needs less time to determine wether to terminate additional instances
If the default cool-down period of 300 seconds is too long, we can reduce costs by applying a scaling-specific cool-down of 180 seconds for example
If our application is scaling up and down multiple times each hour, we can modify the ASG cool-down timers and the CloudWatch alarm period that triggers the scale

ASG Scaling Termination Policies

Determine which AZ hast the most instances
Determinate which instance to terminate so as to align the remaining instances to the allocation strategy for the On-Demand or Spot instances
Determine whether any of the instances uses the oldest launch template
Determine whether any of the instances uses the oldest launch configuration
If there are multiple unprotected instances to terminate, determine which is closest to the next billing hour

ASG Summary

Scaling policies can be based on CPU, Network or a custom metric even
We can scale instances based on a schedule
ASG use Launch configurations and we update an ASG by providing a new launch configuration
IAM roles attached to an ASG will get assigned to EC2 instances
ASG is a free service, we pay for the underlying resources created
Having instances under an ASG means that if they get terminated for whatever reason, the ASG will restart them
ASG can terminate instances marked ans unhealthy by an LB (and replace them)

Scaling Processes in ASG

Launch: ASG adds a new EC2 to the group, increasing the capacity
Terminate: ASG removes an EC2 instances from the group, decreasing its capacity
HealthCheck: ASG checks the health of an instance
ReplaceUnhealthy: ASG terminates unhealthy instances and recreates them
AZRebalance: ASG balances the number of instances across AZs
AlarmNotification: ASG accepts notifications from CloudWatch
ScheduledAction: ASG performs a scheduled action
AddToLoadBalancer: ASG adds instances to the load balancer or target group
We can suspend these processes!
AZRebalance:
- Launch new instances then terminate old instances
- If we suspend the Launch process:
  - AZRebalance wont launch instances
  - AZRebalance wont terminate instances
- If we suspend the Terminate process:
  - The ASG can grow up to 10% of this size (it’s allowed during rebalances)
  - The ASG could remain at the increased capacity as it can not terminate instances

ASG for Sysops

To make sure we have high availability, means we have to have at least 2 instances running across 2 AZs in the ASG (must configure multi AZ ASG)
Health checks available:
- EC2 Status Checks
- ELB Health Checks
ASG will launch a new instance after terminating an unhealthy one
ASG will not reboot unhealthy hosts for us
CLI commands:
- set-instance-health
- terminate-instance-in-autoscaling-group

Troubleshooting ASG issues

< number of instances > instance(s) are already running. Launching EC2 instance failed:
- ASG group has reached the limit set by the DesiredCapacity parameter. We should update the ASG group by providing a new value for the desired capacity
Launching EC2 instances is failing:
- The security group does not exist. SG might have been deleted
- The key pair does not exist. The key pair might have been deleted
If the ASG fails to launch an instance for over 24 hours, it will automatically suspend the process (administration suspension)

CloudWatch Metrics for ASG

The following metrics are available for ASG:
- GroupMinSize
- GroupMaxSize
- GroupDesiredCapacity
- GroupInServiceInstances
- GroupPendingInstances
- GroupStandbyInstances
- GroupTerminatingInstances
- GroupTotalInstances
We should enable metric collection to see these metrics
Metrics for the ASG are collected at every 1 minute
We can also monitor the underlying EC2 instances:
- Basic monitoring: every 5 minutes
- Detailed monitoring: every 1 minute