The replication is using MD5 file checks together with CRCs to detect object issues
When objects are stored in S3 using the API, a HTTP 200 OK response is provided
Billing:
GB/month of data stored in S3
A dollar for GB charge transfer out (in is free)
Price per 1000 requests
No specific retrieval fee, no minimum duration, no minimum size
S3 standard makes data accessible immediately, can be used for static website hosting
Should be used for data frequently accessed
S3 Standard-IA:
Shares most of the characteristics of S3 standard: objects are replicated in 3 AZs, durability is the same, availability is the same, first byte latency is the same, objects can be made publicly available
Billing:
It is more cost effective for storing data
Data transfer fee is the same as S3 standard
Retrieval fee: for every GB of data there is a retrieval fee, overall cost may increase with frequent data access
Minimum duration charge: we will be billed for a minimum of 30 days, minimum capacity of the objects being 128KB (smaller objects will be billed as being 128 KB)
Should be used for long lived data where data access is infrequent
S3 One Zone-IA:
Similar to S3 standard, but cheaper. Also cheaper than S3 standard IA
Data stored using this class is only stored in one region
Billing:
Similar to S3 standard IA: similar minimum duration fee of 30 days, similar billing for smaller objects and also similar retrieval fee per GB
Same level of durability (if the AZ does not fail)
Data is replicated inside one AZ
Since data is not replicated between AZs, this storage class is not HA. It should be used for non-critical data or for data that can be reproduced easily
S3 Glacier Instant Retrieval:
It like S3 Standard-IA, but with cheaper storage, more expensive retrieval, longer minimums
Recommended for data that is infrequently accessed (once per quarter), but it still needs to be retrieved instantly
Minimum storage duration charge is 90 days
S3 Glacier Flexible Retrieval (formerly knowns as S3 Glacier):
Same data replication as S3 standard and S3 standard IA
Same durability characteristics
Storage cost is about 1/6 of S3 standard
S3 objects stored in Glacier should be considered cold objects (should not be accessed frequently)
Objects in Glacier class are just pointers to real objects and they can not be made public
In order to retrieve them, we have to perform a retrieval process:
A job that needs to be done to get access to objects
Retrievals processes are billed
When objects are retrieved for Glacier, they are temporarily stored in standard IA and they are removed after a while. We can retrieve them permanently as well
Retrieval job types:
Expedited: objects are retrieved in 1-5 minutes, retrieval process being the most expensive
Standard: data is accessible at 3-5 hours
Bulk: data is accessible at 5-12 hours at lower cost
Glacier has a 40KB minimum billable size and a 90 days minimum duration for storage
Glacier should be used for data archival (yearly access), where data can be retrieved in minutes to hours
S3 Glacier Deep Archive:
Deep Archive represents data in a frozen state
Has a 40KB minimum billable data size and a 180 days minimum duration for data storage
Objects can not be made publicly available, data access is similar to standard Glacier class
Restore jobs are longer:
Standard: up to 12 hours
Bulk: up to 48 hours
Should be used for archival which is very rarely accessed
S3 Intelligent-Tiering:
It is a storage class containing 5 different tiering a storage
Objects that are access frequently are stored in the Frequent Access tier, less frequently accessed objects are stored in the Infrequent Access tier. Objects accessed very infrequently will be stored in either Archive or Deep Archive tier
We don’t have to worry for moving objects over tier, this is done by the storage class automatically
Intelligent tier can be configured, archiving data is optional and can be enabled/disabled
There is no retrieval cost for moving data between frequent and infrequent tiers, we will be billed based on the automation cost per 1000 objects
S3 Intelligent-Tiering is recommended for unknown or uncertain data access usage
Storage classes comparison:
S3 Standard
S3 Intelligent-Tiering
S3 Standard-IA
S3 One Zone-IA
S3 Glacier Instant
S3 Glacier Flexible
S3 Glacier Deep Archive
Designed for durability
99.999999999% (11 9’s)
99.999999999% (11 9’s)
99.999999999% (11 9’s)
99.999999999% (11 9’s)
99.999999999% (11 9’s)
99.999999999% (11 9’s)
99.999999999% (11 9’s)
Designed for availability
99.99%
99.9%
99.9%
99.5%
99.9%
99.99%
99.99%
Availability SLA
99.9%
99%
99%
99%
99%
99.9%
99.9%
Availability Zones
≥3
≥3
≥3
1
≥3
≥3
≥3
Minimum capacity charge per object
N/A
N/A
128KB
128KB
128KB
40KB
40KB
Minimum storage duration charge
N/A
30 days
30 days
30 days
90 days
90 days
180 days
Retrieval fee
N/A
N/A
per GB retrieved
per GB retrieved
per GB retrieved
per GB retrieved
per GB retrieved
First byte latency
milliseconds
milliseconds
milliseconds
milliseconds
milliseconds
select minutes or hours
select hours
Storage type
Object
Object
Object
Object
Object
Object
Object
Lifecycle transitions
Yes
Yes
Yes
Yes
Yes
Yes
Yes
S3 Lifecycle Configuration
We can create lifecycle rules on S3 buckets which can move objects between tiers or expire objects automatically
A lifecycle configuration is a set of rules applied to a bucket or a group of objects in a bucket
Rules consist of actions:
Transition actions: move objects from one tier to another after a certain time
Expiration actions: delete objects or versions of objects
Objects can not be moved based on how much they are accessed, this can be done by the intelligent tiering. We can move objects based on time passed
By moving objects from one tier to another we can save costs, expiring objects also will help saving costs
Transitions between tiers:
Considerations:
Files from One Zone-IA can transition to Glacier Flexible, or Deep Archive, NOT into Glacier Instant retrieval
Smaller objects cost more in Standard-IA, One Zone-IA, etc.
An object needs to remain for at least 30 days in standard tier before being able to be moved to infrequent tiers (objects can be uploaded manually in infrequent tiers)
A single rule can not move objects instantly from standard IA to infrequent tiers and then to Glacier tiers. Objects have to stay for at least 30 days in infrequent tiers before being able to be moved by one rule only. In order ot overcome this, we can define 2 different rules
S3 Replication
2 types of replication are supported by S3:
Cross-Region Replication (CRR)
Same-Region Replication (SRR)
Both types of replication support same account replication and cross-account replication
If we configure cross-account replication, we have to define a policy on the destination account to allow replication from the source account
We can create replicate all objects from a bucket or we can create rules for a subset of objects. We can filter objects to replicate based on prefix or tags or both
We can specify which storage class to use for an object in the destination bucket (default: use the same class)
We can also define the ownership of the objects in the destination bucket. By default it will be the same as the owner in the source bucket
Replication Time Control (RTC): if enabled ensures a 15 minutes replication of objects
Replication consideration:
By default replication is not retroactive: only newer objects are replicated after the replication is enabled
Versioning needs to be enabled for both source and destination buckets
Batch replication can be used to replicate existing objects. It needs to be specifically configured. If it is not, replication wont be retroactive
Replication by default one-way only, source => destination. There is an option to use bi-directional replication, but this has to be configured
Replication is capable of handling objects encrypted with SSE-S3 and SSE-KMS (with extra configuration). SSE-C (customer managed keys) is also supported, historically it was incompatible
Replication requires for the owner of source bucket needs permissions on the objects which will be replicated
System events will not be replicated, only user events
Any objects in the Glacier and Glacier Deep Archive will not be replicated
By default, deletion are not replicated. We can enable replication for deletion events
Replication use cases:
SRR:
Log aggregation
PROD and Test sync
Resilience with strict sovereignty
CRR
Global resilience improvements
Latency reduction
S3 Encryption
Buckets aren’t encrypted, objects inside buckets are encrypted
Encryption at rest types:
Client-Side encryption: data is encrypted before it leaves the client
Server-Side encryption: data is encrypted at the server side, it is sent on plain-text format from the client
Both encryption types use encryption in-transit for communication
Server-side encryption is mandatory, we cannot store data in S3 without being encrypted
There are 3 types of server-side encryption supported:
SSE-C: server-side encryption with customer-provided keys
Customer is responsible for managing the keys, S3 managed encryption/decryption
When an object is put into S3, we need to provide the key utilized
The object will be encrypted by the key, a hash is generated and stored for the key
The key will be discarded after the encryption is done
In case of object retrieval, we need to provide the key again
SSE-S3 (default): server-side encryption with Amazon S3-managed keys
AWS handles both the encryption/decryption and the key management
When using this method, S3 creates a master key for the encryption process (handled entirely by S3)
When an object is uploaded an unique key is used for encryption. After the encryption, this unique key is encrypted as well with the master key and the unencrypted key is discarded. Both the key and the object are stored together
For most situations, this is the default type of encryption. It uses AES-256 algorithm, they key management is entirely handled bt S3
SSE-KMS: Server-side encryption with customer-managed keys stored in AWS Key Management Service (KMS)
Similar to SSE-S3, but for this method the KMS handles stored keys
When an object is uploaded for the first time, S3 will communicate with KMS and creates a customer master key (CMK). This is default master key used in the future
When new objects are uploaded AWS uses the CMK to generate individual keys for encryption (data encryption keys). The data encryption key will be stored along with the object in encrypted format
We don’t have to use the default CMK provided by AWS, we can use our own CMK. We can control the permission on it and how it is regulated
SSE-KMS provides role separation:
We can specify who can access the CMK from KMS
Administrators can administers buckets but they may not have access to KMS keys
Default Bucket Encryption:
When an object is uploaded, we can specify which server-side encryption to be used by adding a header to the request: x-amz-server-side-encryption
Values for the header:
To use SSE-S3: AES256
To use SSE-KMS: aws:kms
All Amazon S3 buckets have encryption configured by default, and all new objects that are uploaded to an S3 bucket are automatically encrypted at rest
Server-side encryption with Amazon S3 managed keys (SSE-S3) is the default encryption configuration for every bucket in Amazon S3, this can be overridden in a PUT request with x-amz-server-side-encryption header
S3 Bucket Keys
Each object in a bucket is using a uniq data-encryption key (DEK)
AWS uses the bucket’s KMS key to generate this data-encryption key
Calls to KMS have a cost and levels where throttling occurs: 5500/10_000/50_000 PUT/sec depending on region
Bucket keys:
A time limited bucket key is used to generate DEKs within S3
KMS generates a bucket key and gives it to S3 to use to generate DEKs for each upload, offloading the load from KMS to S3
Reduces the number of KMS API calls => reduces the costs/increases scalability
Using bucket keys is not retroactive, it will only affect objects after bucket keys are enabled
Thing to keep in mind after enabling bucket keys:
CloudTrail KMS event logs will show the bucket ARN instead of the object ARN
Fewer CloudTrail events of KMS will be in the logs (since work is offloaded to S3)
Bucket keys work with SRR and CRR; the object encryption settings are maintained
If we replicate plaintext to a bucket using bucket keys, the object is encrypted at the destination side; this can result in ETAG changes on the object
S3 Presigned URLs
Is a way to give other people access to our buckets using our credentials
An IAM admin can generate a presigned URL for a specific object using his credentials. This URL will have an expiry date
The presigned URL can be given to unauthenticated uses in order to access the object
The user will interact with S3 using the presigned URL as if it was the person who generated the presigned URL
Presigned URLs can be used for downloads and for uploads
Presigned URLs can be used for giving direct access private files to an application user offloading load from the application. This approach will require a service account for the application which will generate the presigned URLs
Presigned URL considerations:
We can create a presigned ULR for objects we don’t have access to
When using the URL, the permissions match the identity which generated it. The permissions are evaluated at the moment of accessing the object (it might happen the the identity had its permissions revoked, meaning we wont have access to the object either)
We should not generate presigned URLs generated on temporary credentials (assuming an IAM role). When the temporary credentials are expired, the presigned URL will stop working as well. Recommended to use long-term identities such as an IAM user
S3 Select and Glacier Select
Are ways to retrieve parts of objects instead of entire objects
S3 can store huge objects (up to 5 TB)
Retrieving a huge objects will take time and consume transfer capacity
S3/Glacier provides services to access partial objects using SQL-like statements to select parts of objects
Both S3 Select and Glacier selects supports the following formats: CSV, JSON, Parquet, BZIP2 compression for CSV and JSON
S3 Access Points
Improves the manageability of objects when buckets are used for many different teams or they contain objects for a large amount of functions
Access Points simplify the process of managing access to S3 buckets/objects
Rather than 1 bucket (1 bucket policy) access we can create many access points with different policies
Each access point can be limited from where it can be accessed, and each can have different network access controls
Each access point has its own endpoint address
We can create access point using the console or the CLI using aws s3control create-access-point --name < name > --account-id < account-id > --bucket < bucket-name >
Any permission defined on the access point needs to be defined on the bucket policy as well. We can do delegation, by defining wide access permissions in the bucket policy and granular permissions on the access point policy
S3 Block Public Access
The Amazon S3 Block Public Access feature provides settings for access points, buckets, and accounts to help manage public access to Amazon S3 resources
The settings we can configure with the Block Public Access Feature are:
IgnorePublicAcls: this prevents any new ACLs to be created or existing ACLs being modified which enable public access to the object. With this alone existing ACLs will not be affected
BlockPublicAcls: Any ACLs actions that exist with public access will be ignored, this does not prevent them being created but prevents their effects
BlockPublicPolicy: This prevents a bucket policy containing public actions from being created or modified on an S3 bucket, the bucket itself will still allow the existing policy
RestrictPublicBuckets: this will prevent non AWS services or authorized users (such as an IAM user or role) from being able to publicly access objects in the bucket
S3 Cost Saving Options
S3 Select and Glacier Select: save in network a CPU cost by retrieving ony the necessary data
S3 Lifecycle Rules: transition objects between tiers
Compress objects to save space
S3 Requester Pays:
In general, bucket owners pay for all Amazon S3 storage and data transfer costs associated with their bucket
With Requester Pays buckets, the requester instead of the bucket owner pays the cost of the request and the data download from the bucket
The bucket owner always pays the cost of storing data
Helpful when we want to share large datasets with other accounts
Requires a bucket policy
If an IAM role is assumed, the owner account of that role pays for the request!
S3 Object Lock
Object Lock can be enabled on newly created S3 buckets. For existing ones in order to enable Object Lock we have to contact AWS support
Versioning will be also enabled when Object Lock is enabled
Object Lock can not be disabled, versioning can not be suspended when Object Lock is active on the bucket
Object Lock is a Write-Once-Read-Many (WORM) architecture: when an object is written, can not be modified. Individual versions of objects are locked
There are 2 ways S3 managed object retention:
Retention Period
Legal Hold
Object versions can have both retention period and legal hold enabled; can have only one of those enabled or none of them
Object Lock retentions can be individually defined on object versions, a bucket can have default Object Lock settings
Retention Period
When a retention period is enabled on an object, we specify the days and years for the period
The retention period will end after the period
There are 2 types of retention period modes:
Compliance mode:
Object can not be adjusted, deleted or overwritten. The retention period can not be reduced, the retention mode can not be adjusted even by the account root user
Should be used for compliance reasons
Governance mode:
Objects can not be adjusted, deleted or overwritten, but special permissions can be added to some identities to allow for the lock setting to be adjusted
This identities should have the s3:BypassGovernanceRetention permission
The governance mode can be overwritten when passing x-amz-bypass-governance-retention:true header (header is default for console ui)
Legal Hold
We don’t set a retention period for this type of retention, Legal Hold can be on or off for specific versions of an object
We can’t delete or overwrite an object with Legal Hold
An extra permission is required when we want to add or remove the Legal Hold on an object: s3:PutObjectLegalHold
Legal Hold can be used for preventing accidental removals
S3 Transfer Accelerate
Used to transfer files into S3. Enables fast, easy, and secure transfers of files over long distances between our client and an S3 bucket
Takes advantage of the globally distributed edge locations in Amazon CloudFront
We might want to use Transfer Acceleration on a bucket for various reasons:
We upload to a centralized bucket from all over the world
We transfer gigabytes to terabytes of data on a regular basis across continents
We can’t use all of our available bandwidth over the internet when uploading to Amazon S3
To use Transfer Accelerate, it must be enabled on the bucket. After we enable Transfer Acceleration on a bucket, it might take up to 20 minutes before the data transfer speed to the bucket increases
S3 Object Lambda
With Amazon S3 Object Lambda, we can add our own code to Amazon S3 GET, LIST, and HEAD requests to modify and process data as it is returned to an application
S3 Object Lambda uses AWS Lambda functions to automatically process the output of standard S3 GET, LIST, or HEAD requests
After we configure a Lambda function, we attach it to an S3 Object Lambda service endpoint, known as an Object Lambda Access Point
The Object Lambda Access Point uses a standard S3 access point
When we send a request to your Object Lambda Access Point, Amazon S3 automatically calls your Lambda function
Hosting Static Site on S3
We can host a static site on S3
To host a static site in a bucket we must enable static website hosting, configure an index document, and set permissions
We should also make a bucket content public:
We should turn of Block Public Access settings
We should attach a bucket policy which allows public read on the objects
Amazon S3 website endpoints do not support HTTPS! If we want to use HTTPS, we can use Amazon CloudFront to serve a static website hosted on Amazon S3