EC2

Placement Groups

Sometimes we want control over the EC2 instance placement strategy
This strategy can be defined using placement groups
When we create a placement group, we specify one the following strategies:
- Cluster: places instances into a low latency group in a single AZ
- Spread: spreads instances across underlying hardware (limitation: max 7 instances per group per AZ)
- Partition: similar to spread, instances are spread across many different partitions (different sets of racks) within an AZ. Can scale up to hundreds of EC2 instances per group. Recommended for Hadoop, Cassandra, Kafka

Cluster Placement Group

All instances are placed on the same rack, same AZ
Pros: great network (10Gbps bandwidth between instances)
Cons: if the rack fails, all instances fails at the same time
Use cases:
- Big Data job that needs to complete fast
- Application that needs extremely low latency and high network throughput

Spread Placement Group

All the EC2 instances are located on different hardware
Pros:
- Can span across AZs
- Reduced risk for simultaneous failure
Cons:
- Limited to 7 instances per AZ per placement group
Uses cases:
- Application that needs to maximize high availability
- Critical applications where each instance must be isolated from failure from each other

Partition Placement Group

Partitions are sets of racks
We can create up to 7 partitions in case of partition placement group
We can have hundreds of EC2 instances as part of a placement group
The instances in a partition do not share racks with the instances in the other partitions
A partition failure can affect many EC2 instances but wont affect other partitions
EC2 instances get access to the partition information as metadata
Use cases: HDFS, HBase, Cassandra, Kafka

Shutdown Behavior

Shutdown Behavior: how should an instance react when shutdown is done using the OS?
- Stopped: default
- Terminated
Shutdown Behavior is not applicable from AWS Console or AWS API, itt applies to shutdowns from inside the VM
CLI Attribute: InstanceInitiatedShutdownBehavior
Termination protection: protects against accidental termination in AWS Console or CLI
We have an instance where shutdown behavior is “terminate” and termination protection is enabled. When we shutdown the instance from the OS, the instance still be terminated

EC2 Launch Troubleshooting

# InstanceLimitExceeded error: means that we have reached the limit of max number of instances per region. (As of September 2019, this limit is counted in vCPU instead of instances, default is 32).
- Solution: launch the instance in a different region or create a support ticket to increase the limit. Default limit is: 20
# InsufficientInstanceCapacity: means AWS does not have that much On-Demand capacity in the particular AZ in order for the instance to be launched
- Solutions:
  - Wait for a few minutes before requesting again
  - If more than once instance is requested, we can break down the request by creating the instances one by one
  - If urgent, submit a request for a different instance type and upgrade it afterwards
# Instance terminates immediately (the instance goes from the pending state to the terminated state):
- EBS volume limit is reached
- EBS snapshot is corrupt
- The root EBS volume is encrypted and we don’t have permission to access the KMS key for decryption
- The instance store-backed AMI that we are using to launch the instance is missing a required part (an image.part.xxx file)
To find the exact reasons: check the EC2 console of AWS - Instances - Description tab - State transition reason

EC2 SSH Troubleshooting

We have to make sure the private key (pem file) has 400 permissions, else we get Unprotected Private Key File error
We have to make sure the username for the OS is given correctly when logging via SSH, else we get a “Host key not found” error
Connection timeout reasons:
- SG is not configured correctly
- CPU load of the instance is high

EC2 Instance Launch Types

On Demand Instances: short workload, predictable pricing
Reserved (minimum 1 year)
- Reserved Instances: long workloads
- Convertible Reserved Instances: long workload with flexibly instance types
- Scheduled Reserved Instances
Spot Instances: short workloads for cheap. Instance can be lost over time
Dedicated Instances: no other customers will share the hardware
Dedicated Hosts: book an entire physical server, we can control instance placement

EC2 On Demand

Payt for what we use
Billing is happening per second after the first minute
Has the highest cost but does not require any upfront commitment
Recommended for short-term and un-interrupted workloads where we can’t predict how the application will behave

EC2 Reserved Instances

Up to 75% discount compared to On Demand
Pay upfront for what we use with long term commitment
Reservation period can be 1 or 3 years
We can reserve a specific instance type
Recommended for steady state usage applications (example: database)

Convertible Reserved Instances

Similar as Reserved Instances
We can change the EC2 instance type
Up to 54% discount

Scheduled Reserved Instances

Instances which are reserved and are launched within a time window
Recommended when we require compute for a fraction of the day, week, month

Dedicated Hosts

Physical dedicated EC2 server for our use
Offers full control of EC2 instance placements
Offers visibility into the underlying sockets/physical cores of the hardware
Can be allocated for 3 years period reservation
More expensive
Useful for software that have complicated licensing model (BYOL - Bring Your Own License) or string regulatory compliance needs

Dedicated Instances

Instances are running on hardware dedicated for the account
May share hardware with other instances from the same account
We have no control over instance placement (can move hardware after Stop/Start)

EC2 Spot Instances

Can get a discount of up to 90% compared to On Demand instances
We can define a max spot price and get he instance of our price is bigger than the current price
If the current spot price goes beyond our max price, we can choose to stop or terminate the instance within 2 minutes grace period
If we don’t want our spot instance to be reclaimed by AWS, we can use a Spot Block
- We can block a spot instance during a specified time frame (1 to 6 hours) without interruptions
- In rare situations the instance may be reclaimed
Use cases for spot instances: batch jobs or workloads that are resilient to failure
We can launch spot instances with a spot request. A spot request contains the following information:
- Maximum price
- Desired number of instances
- Launch specification
- Request type: on-time, persistent
- Valid from, valid until
Request types:
- One time request: as soon as the request is fulfilled, the request will go away
- Persistent request: the number of instances is attempted to be kept even if some instances are reclaimed, meaning that the request will not go away as soon as it is completed first time
Canceling a spot instances: in order ot cancel a spot instance, it has to be in an open, active or disabled state.
Cancelling a spot request, it will not terminate the instances themselves. In order to terminate instances, first we have to terminate the spot request, if there is one active

Spot Fleets

Spot Fleet - set of spot instances + (optional) on-demand instances
The spot fleet will try to meet the target capacity with price constraints
A launch pool can have the following can have different instance types, OS, AZ
We can have multiple launch pools, so the fleet can choose the best
Spot fleet will stop launching instances the target capacity is reached
Strategies to allocate spot instances:
- lowestPrice: the spot fleet will launch instances from the pool with the lowest price
- diversified: distribute instances across all pools
- capacityOptimized: launch instances based on the optimal capacity for the number of instances
Spot fleets allow us to automatically request spot instances with the lowest price

EC2 Instance Types

R: applications that need a lot of RAM, example in-memory caches
C: applications that need a good CPU, example compute/databases
M: applications that are balanced, example general web-app
I: applications that need a good local I/O (instance storage), example databases
G: applications that need a GPU, example video-rendering, ML
T2/T3: burstable instances
T2/T3 - unlimited: unlimited CPU burst, we get charged for bursts

Burstable Instances (T2/T3)

Burst: when a machine needs to process something unexpected (a spike), it can burst the CPU power
If the machine bursts, it will utilize burst credits
If the credits are gone, the machine CPU performance will suffer
If the machine is stopped, burst credits do accumulate over time
The bigger the instance, the faster we can earn back burst credits
T2/T3 unlimited: offer unlimited burst credit balance. If the instance used all its burst credits, we pay extra for the additional bursts

AMIs

Virtual machine images with customized set-up/software
Custom AMI can provide the following advantages:
- Pre-install packages
- Faster boot time (no need to configure user data)
- Machine comes pre-configured with monitoring
- Security concerns - control over the machine in the network
- Control of maintenance and updates of AMI over time
- Active Directory Integration out of the box
- Installing out app ahead of time
- Using someone’s AMI which is optimized for running an app (example: database)
AMIs are not global, the are built for a specific region

Using Public AMIs

We can leverage AMIs from other people
We can pay for other people’s AMI by the hour
AMIs can be found and published on the Amazon Marketplace

AMI Storage

AMI take space and they live in Amazon S3
By default AMIs are private and locked for the account who created it
We can make AMIs public and share them with other AWS accounts or sell them on the AMI Marketplace

AMI Pricing

AMIs live in S3, so we get charged for the actual space it takes in S3
Overall it is quite inexpensive to store private AMIs
We have to make sure to remove AMIs we don’t use to save costs

Cross Account AMI Copy

It is possible the share AMI with another AWS account
Sharing an AMI does not affect the ownership of the AMI
If a shared AMI is copied, than the account who did the copy becomes the owner
To copy an AMI that was shared from another account, the owner of the source AMI must grant read permissions for the storage that backs the AMI, either the associated EBS snapshot or an associated S3 bucket
Limits:
- An encrypted AMI can not be copied. Instead, if the underlying snapshot and encryption key where shared, we can copy the snapshot while re-encrypting it with a key of our own. The copied snapshot can be registered as a new AMI
- We cant copy an AMI with an associated billingProduct code that was shared with us from another account. This includes Windows AMIs and AMIs from the AWS Marketplace. To copy a shared AMI with billingProduct code, we have to launch an EC2 instance from our account using the shared AMI and then create an AMI from source

Elastic IPs

When we stop and then start an EC2 instance, the public IP will change
If we need to have a fixed public IP, we need an Elastic IP
An Elastic IP is a public IPv4 IP we own as long as we don’t delete it
An Elastic IP can be attached to one instance at a time
We can remap it across instances
We don’t pay for the Elastic IP if it is attached to a VM
With an Elastic IP we can mask the failure of an instance or software by rapidly remapping the address to another instance in our account
We can have up to 5 Elastic IPs in an account (soft limit, can be requested to be increased)
Overall try we should try to avoid using Elastic IPs:
- We could use a random public IP and register a DNS name to the instance
- Use a LB with a static hostname

CloudWatch Metrics for EC2

AWS Provided Metrics:
- Basic monitoring (default): metrics are collected at a 5 minute interval
- Detailed Monitoring (paid): metrics are collected at a 1 minute interval
- AWS Provided Metrics include: CPU, Network, Disk and Status Check metrics
Custom Metrics:
- Basic resolution of custom metrics is 1 minute
- High resolution can go all the way up to 1 second
- Custom metrics can include RAM, application level metrics
- IAM permission is required on the EC2 instance role to be able to push metrics to CloudWatch

EC2 Included Metrics

CPU: CPU Utilization + Cred Usage / Balance (in case of T2/T3)
Network: Network In/Out
Status Checks:
- Instance status: checks for the EC2 VM
- System status: checks for the underlying hardware (Amazon health checks, no user control is granted over them)
Disk (only for instance stored based instances): Read/Write for Ops/Bytes
RAM is NOT included in AWS EC2 metrics!

CloudWatch Unified Agent

New kind of metrics, gathers logs and metrics at the same time