EC2 Storage and Data Management - EBS and EFS
EBS Volumes
- An EC2 machine loses its root volume when it is manually terminated
- Unexpected terminations might happen from time to time (AWS would send an email)
- Sometimes, we need a way to store instance date somewhere
- An EBS (Elastic Block Store) Volume is a network drive we attach to our EC2 instances while they are running
- Data can be persisted on the EBS volumes
- EBS is a network drive:
- It uses network to communicate to the instance, which means there might be a bit of latency
- It can be detached from an EC2 and attached to another instance
- It is locked to an AZ, in order to move them to another AZ, we need to create a snapshot
- EBS volumes have provisioned capacity (size in GB and IOPS). We get billed for all the provisioned capacity
- The capacity of the drive can be increased over time
EBS Volume Types
GP2
- Recommended for most workloads
- It can be a system boot volume
- Size can be 1GiB to 16GiB
- Small GP2 volumes can burst IOPS to 3000
- Max IOPS is 16000
- We get 3 IOPS per GB, meaning at 5334GiB we are at the max IOPS capacity
IO1
- Recommended for critical business applications which require sustained IOPS performance or more than 16000 IOPS per volume
- Recommended for database workloads
- Size can range between 4GiB and 16TiB
- IOPS is provisioned: min 100 - max 64000 (Nitro instances), 32000 (other instances)
- Maximum ratio of provisioned IOPS to requested volume size is 50:1
ST1
- Recommended for streaming workloads requiring consistent, fast throughput at low price
- Can not be a boot volume
- Size ranges from 500 Gib to 16 TiB
- Max IOPS is 500
- Maximum throughput is 500 MiB/s - with ability to burst
SC1
- Throughput-oriented storage for large volumes of data that is infrequently accessed
- Recommended for scenarios where the lowest storage cost is important
- Can not be a boot volume
- Size ranges from 500 Gib to 16 TiB
- Max IOPS is 250
- Maximum throughput is 250 MiB/s - with ability to burst
GP2 Volume I/O Burst
- If the GP2 volume is less than 1000GiB (means IOPS is less than 3000), it can burst to 3000 IOPS performance
- Concept is similar to t2 instances with CPU bursts
- We accumulate burst credit over time, which allows the volume to have good performance when needed
- The bigger the volume, the faster we can fill up the burst credit balance
- If the I/O credit balance is empty:
- The max I/O we get becomes the baseline we paid
- If we see the balance being 0 all the time, we have to increase the GP2 volume or switch to IO1
- We can use CloudWatch to monitor the I/O credit balance
- Burst is also available for ST1 and SC1
EBS Computing Throughput
- GP2:
- Throughput in MB/s = (Volume Size in GiB) * (IOPS per GiB) * (I/O size in KiB)
- Example: 3000 I/O operations per second * 256 KiB per I/O operation = 75 MiB/s
- Limit to a max of 250 MiB/s (means volume >= 334 GiB wont increase throughput)
- IO1:
- Throughput in MB/s = (Provisioned IOPS) * (I/O size in KiB)
- The throughput limit of IO1 volumes is 256 KiB/s for each IOPS provisioned
- Limit to a max of 500 MiB/s (at 32000 IOPS) and 1000 MiB/s (at 64000 IOPS)
EBS Volume Resizing
- We can only increase the EBS volumes size and provisioned IOPS
- After resizing a volume, we need to repartition the drive
- After increasing the size, it is possible for the volume to be for a longer time in an optimization phase. The volume will be usable during this time
EBS Snapshots
- Snapshots are incremental: only backup changed blocks
- EBS backups use IO and we should not run them while our application is handling a lot of traffic
- Snapshots are stored in S3, but we are not directly able to see them
- The is not necessary required for a volume to be detached while the snapshot is being made, but it is recommended
- An account can have max 100.000 snapshots
- We can copy snapshots across AZs or regions
- We can make an AMI from a snapshot
- EBS volumes restored from a snapshot need to be pre-warmed (using
fio
or dd
commands to read the entire volume)
- Snapshots can be automated using Amazon Data Lifecycle Manager
EBS Migrations
- EBS volumes are locked to a specific AZ
- To migrate a volume to a different AZ or region, we have to do the following:
- Create a snapshot from the volume
- (optional) Copy the snapshot to a different region
- Create a volume from the snapshot in the AZ of choice
EBS Encryption
- We we create an encrypted EBS volume, we get the following:
- Data at rest is encrypted inside the volume
- All the data in flight moving between the instance and the volume is encrypted
- All the snapshots are encrypted
- All volumes created from snapshot are encrypted
- Encryption and decryption are handled transparently
- Encryption has a minimal impact on latency
- EBS Encryption leverages keys from KMS (AES-256)
- Copying an un-encrypted snapshot allows encryptions
- Snapshot of encrypted volumes are also encrypted
Encrypt an Un-encrypted EBS volume
- Create an EBS snapshot from the volume
- Encrypt the EBS snapshot (using volume copy)
- Create a new EBS volume from the snapshot
EBS Volume vs Instance Store
- Some instances do not come with root EBS volumes, instead they come with an instance store
- Instance store = ephemeral storage: data is lost on stop or termination
- An instance store is a physically attached drive to the machine, while EBS is a network drive
- Instance store is block storage just like EBS
- Pros of Instance Store:
- Better I/O performance
- Good for buffer, cache, scratch data and temporary content
- Data on an instance stores survives an instance reboot
- Cons:
- On stop or termination of an instance, the instance stor lost
- An instance store can not be resized
- Backups from instance store data must be operated by the user
Local EC2 Instance Store
- Instance store is a physical disk attached to the physical server where the EC2 runs
- Instance store provides very high IOPS
- Instance store disks are up to 7.5 TiB, stripped to reach 30 TiB
- It can not be resized
- Risk of data loss if hardware fails
EBS for SysOps
- If we plan to use the root volume of an instance after it is terminated, we have to set the “Delete on Termination” flag to “No”
- If we use EBS for high performance, we should use EBS-optimized instance types
- If an EBS volume is unused, we still pay for it
- For cost saving over a long period of time, we should create a snapshot from a volume and restore it later (3x cost saving)
- In case of high wait time or slow response increase IOPS
- EC2 wont start with EBS volume as root: make sure volume names are properly mapped (/dev/xvdb instead of /dev/xvda for example)
- After increasing a volume size, we still need to repartition to use the incremental storage
EBS RAID Configuration
- EBS is already redundant storage (replicated within az AZ)
- RAID is possible as long as OS supports it
- Some RAID options are:
- RAID 0
- RAID 1
- RAID 5, 6 (not recommended of EBS)
RAID 0
- Used for increase performance
- RAID 0 means combining 2 or more volumes and getting the total disk space and I/O
- If one dist fails, all the data is compromised
- Use case for RAID 0:
- An application that needs a lot of IOPS and does not need fault tolerance
- A database which has replication already built in
RAID 1
- Used for increase fault tolerance
- RAID 1 means mirroring a volume to another
- If one disk fails, our logical volume is still working
- We have to send the data to two EBS volumes at the same time, meaning we need EC2 instances with higher network capability
CloudWatch and EBS
- Important EBS volume CloudWatch metrics:
- VolumeIdleTime: number of seconds when no read/write is submitted
- VolumeQueueLength: number of operations waiting ti be executed. Higher number means probably an IOPS or application issue
- BurstBalance: if it becomes 0, we need a volume with more IOPS
- GP2 volume reporting interval: 5 minutes
- IO1 volume reporting interval: 1 minute
- Reporting intervals are unchangeable
- EBS volumes have status checks:
- OK - the volume is performing goog
- Warning - performance is bellow expected
- Impaired - stalled, performance is severly degraded
- Insufficient data - metric data collection in progress
EFS - Elastic File System Overview
- Managed NFS (network file system) that can be mounted on many EC2
- EFS works with EC2 instances in multi-AZ
- Highly available, scalable, expensive (3x GP2), pay per use
- Use cases: content management, web service, data sharing, WordPress
- Uses NFSv4.1 protocol
- Uses security group to control access to EFS
- EFS is compatible with Linux based AMIs (not windows)!
- Provides encryption at rest using KMS
- It has a POSIX file system with a standard file API
- EFS Scale:
- Thousands of concurrent NFS clients 10 GB+/s throughput
- It can grow to PB scale network file system
- Performance mode (set at EFS creation time):
- General purpose (default): latency-sensitive use cases (web server, CMS, etc..)
- Max I/O: higher latency, throughput, highly parallel (big data, media processing)
- Storage tiers with lifecycle management feature:
- Standard: for frequently accessed files
- Infrequent access (EFS-IA): cost to retrieve files, lower price to store