Amazon S3 allows people to store objects (files) fun “buckets” (directories)
Buckets must have a globally unique name
Naming convention:
No uppercase
No underscore
3-63 characters long
Not an IP
Must start with lowercase letter or number
Objects
Objects (files) have a Key. The key is the FULL path:
/my_file.txt
/my_folder/another_folder/my_file.txt
_ There’s no concept of “directories” within buckets (although the UI will trick you to think otherwise)
_ Just keys with very long names that contain slashes (“/“)
_ Object Values are the content of the body:
Max Size is 5TB
If uploading more than 5GB, must use “multi-part upload”
_ Metadata (list of text key / value pairs - system or user metadata)
_ Tags (Unicode key / value pair - up to 10) - useful for security / lifecycle
_ Version ID (if versioning
AWS S3 - Versioning
It is enabled at the bucket level
Same key overwrite will increment the “version”: 1, 2, 3
It is best practice to version your buckets
Protect against unintended deletes (ability to restore a version)
Easy roll back to previous versions
Any file that is not version prior to enabling versioning will have the version “null”
S3 Encryption for Objects
There are 4 methods of encrypt objects in S3
SSE-S3: encrypts S3 objects
Encryption using keys handled & managed by AWS S3
Object is encrypted server side
AES-256 encryption type
Must set header: “x-amz-server-side-encryption”:”AES256”
SSE-KMS: encryption using keys handled & managed by KMS
KMS Advantages: user control + audit trail
Object is encrypted server side
Maintain control of the rotation policy for the encryption keys
Must set header: “x-amz-server-side-encryption”:”aws:kms”
SSE-C: server-side encryption using data keys fully managed by the customer outside of AWS
Amazon S3 does not store the encryption key you provide
HTTPS must be used
Encryption key must provided in HTTP headers, for every HTTP request made
Client Side Encryption
Client library such as the amazon S3 Encryption Client
Clients must encrypt data themselves before sending to S3
Clients must decrypt data themselves when retrieving from S3
Customer fully manages the keys and encryption cycle
Encryption in transit (SSL)
AWS S3 exposes:
HTTP endpoint: non encrypted
HTTPS endpoint: encryption in flight
You’re free to use the endpoint your ant, but HTTPS is recommended
HTTPS is mandatory for SSE-C
Encryption in flight is also called SSL / TLS
S3 Security
User based
IAM policies - which API calls should be allowed for a specific user from IAM console
Resource based
Bucket policies - bucket wide rules from the S3 console - allows cross account
Object Access Control List (ACL) - finer grain
Bucket Access Control List (ACL) - less common
Networking
Support VPC endpoints (for instances in VPC without www internet)
Logging and Audit:
S3 access logs can be stored in other S3 buckets
API calls can be logged in AWS CloudTrail
User Security:
MFA (multi factor authentication) can be required in versioned buckets to delete objects
Signed URLs: URLS that are valid only for a limited time (ex: premium video services for logged in users)
S3 Bucket Policies
JSON based policies
Resources: buckets and objects
Actions: Set of API to Allow or Deny
Effect: Allow / Deny
Principal: The account or user to apply the policy to
Use S3 bucket for policy to:
Grant public access to the bucket
Force objects to be encrypted at upload
Grant access to another account (Cross Account)
S3 Websites
S3 can host static website sand have them accessible on the world wide web
The website URL will be:
.s3-website..amzonaws.com
OR
.s3-website..amazonaws.com
If you get a 403 (forbidden) error, make sure the bucket policy allows public reads!
S3 Cors
If you request data from another S3 bucket, you need to enable CORS
Cross Origin Resource Sharing allows you to limit the number of websites that can request your files in S3 (and limit your costs)
This is a popular exam question
AWS S3 - Consistency Model
Read after write consistency for PUTS of new objects
As soon as an object is written, we can retrieve itex: (PUT 200 -> GET 200)
This is true, except if we did a GET before to see if the object existedex: (GET 404 -> PUT 200 -> GET 404) - eventually consistent
Eventual Consistency for DELETES and PUTS of existing objects
If we read an object after updating, we might get the older versionex: (PUT 200 -> PUT 200 -> GET 200 (might be older version))
If we delete an object, we might still be able to retrieve it for a short timeex: (DELETE 200 -> GET 200)
AWS S3 - Other
S3 can send notifications on changes to
AWS SQS: queue service
AWS SNS: notification service
AWS Lambda: serverless service
S3 has a cross region replication feature (managed)
AWS S3 Performance
Faster upload of large objects (>5GB), use multipart upload
Parallelizes PUTs for greater throughput
Maximize your network bandwidth
Decrease time to retry in case a part fails
Use CloudFront to ache S3 objects around the world (improves reads)
S3 Transfer Acceleration (uses edge locations) - just need to change the endpoint you write to, not the code
If using SSE-KMS encryption, you may be limited to your AWS limits for KMS usage (~100s - 1000s downloads / uploads per second)