RDS is often described as a Database-as-a-service (DBaaS) but this is not accurate. It should be named Database Server as a Service (DBSaaS) product
RDS provides managed database instances, which can themselves hold one or more databases
Benefits of RDS are the we don’t need to manage the physical hardware, the server operating system or the database system itself
RDS supports MySQL, MariaDB, PostgreSQL, Oracle, Microsoft SQL Server
Amazon Aurora: it is a db engine created by AWS and we can select it as well for usage
RDS Subnet Group: list of subnets which an RDS database can use. Generally it is best practice to have on Subnet Group per database deployment
RDS Database Instance
Runs one of the few types of db engine mentioned above
Can contain multiple user created databases
A database instance after creation can be accessed using its hostname (CNAME)
RDS instances come in various types, share many of features of EC2. Example of instances: db.m5, db.r5, db.t3
RDS instances can be single AZ or multi AZ (active-passive failover)
When an instance is provisioned, it will have a dedicated storage allocated as well (usually EBS)
Storage allocated can be based on SSD storage (IO1, GP2) or magnetic (mainly for compatibility)
Billing for RDS:
We are billed based on instance size on a hourly rate
We are billed for additional instances used for Multi AZ deployments
We are also billed per storage (GB/month) + extra per iops in case of provisioned iops (IO1)
Data transfer is also billed if data is coming from/goes to the internet/other regions
Backups and snapshots are also billed (per GB per month)
Licensing is applicable is also billed
RDS Multi AZ
They are 2 types of Multi AZ deployments:
Multi AZ Instance (historically called Multi AZ)
Multi AZ Cluster
Multi AZ Instance:
Used to add resilience to an RDS instance
Replication happens at the storage level
Enables synchronous replication between primary and standby instances
Multi AZ is an option which can be enabled on an RDS instance, when enabled secondary hardware is allocated in another AZ (standby replica)
RDS is accessed via provided endpoint address (CNAME)
With a single instance the endpoint address points the instance itself, with multi AZ, by default the endpoint points to the primary instance
We can not directly access the standby instance
If an error occurs with the primary instance, RDS automatically changes the endpoint to point to the standby replica. This failover occurs in around 60-120 seconds
Multi AZ is not available in the Free-tier (generally costs double as it would the single AZ)
Backups are taken from the standby instance (removes performance impact)
In case of a failover the DNS name will be updated to point to the standby replica instance. Since this is a DNS change, for the update it generally takes between 60-120 seconds to occur. This can be lessened by removing DNS caching in the application
With Multi AZ instance we have ONE standby replica. This replica cannot be used for read and writes. It waits for a failover to happen and then it can be used
Backups can be taken from the standby instance to improve performance
Failovers can happen if:
AZ outage
Primary instance failure
Manual failover
Instance type change
Software patching
Multi AZ Cluster:
RDS is capable of having one writer replicate to two reader instances. We can have 2 readers only!
These readers are in different AZs compared to the writer
Compared two Aurora cluster mode, Multi AZ cluster can have 2 readers only, while Aurora Cluster can have more
In case of Multi AZ cluster the instances to which data is replicated are usable, compared to Multi AZ instance mode when they are not
In terms of replication the data is viewed as committed when one of the readers confirms that it was written
Other comparisons two Aurora Cluster:
In RDS multi AZ cluster each instance has its own storage, in case of Aurora this is not the case
Like Aurora, the cluster can be accessed with multiple endpoints:
Cluster endpoint: database CNAME, points to the writer, can be used for reads/writes and administration
Reader endpoint: points to any available endpoint for reads (it can point to the writer instance in certain cases). Generally it points to the dedicated reader instances
Instance endpoints: each instance gets one endpoint, generally not recommended to be used
Generally Multi AZ Cluster runs on faster hardware: Graviton + local NVME SSD storage. Any writes are written to local super fast storage, after that they are flushed to the EBS
Replications are done via transaction logs => much more efficient then Multi AZ instance. This also allows faster failover: ~35 seconds + any time required to apply the transaction logs
RDS Backups and Restores
RPO (Recovery Point Objective): time between the last working backup and the failure. Lower the RPO value, usually the more expensive the solution
RTO (Recovery Time Objective): time between the failure and system being fully recovered. Can be reduced with spare hardware, predefined processes, etc. Lower the RTO value, the system is usually more expensive
RDS backup types:
Manual snapshots:
Have to be run manually, or via a script
First snapshot is full content of the DB, incremental onward
When any snapshot occurs, there is brief interruption in the flowing of data between the compute resource and the storage (no noticeable effect in case of Multi AZ, since the backup is taken from the standby instance)
Manual snapshots do not expire
When we delete an RDS instance, AWS offers to make one final snapshot
Automatic backups:
They occur once per day (backup window is defined on the instance)
Snapshots which occur automatically, first being full snapshot, incremental afterwards
In addition to the automated snapshots, every 5 minute transaction logs are written to S3
Automatic backups are not retained, we can set the retention period between 0 and 35 days
Automatic backups can be retained after a DB is deleted, but they still expire after the retention period
We can replicate backups to another region: both snapshots and transaction logs can be replicated. Charges apply to cross-region data copy and any storage used in the destination region
Cross-region replication has to be explicitly configured within automated backups
Backups are stored in AWS manages S3 buckets (backups are not visible to us directly in S3) => any data in S3 is regionally resilient
RDS backups are taken from the standby instance in case Multi AZ is enabled
RDS Restores:
RDS creates a new RDS instance when we restore an automated backup or a manual snapshot => new address will be created for the DB
When we restore a snapshot, we restore our DB to a single point in time, when the creation time of the snapshots happened
With automated backups we can chose a point-in-time to where we want to restore (any 5 minute point-in-time)
Restoring snapshots is not a fast procedure (important for RTO)
RDS Read-Replicas
Provide 2 main benefits: performance and availability
Read replicas are read-only replicas of an RDS instance
Read replicas can be used for reading only data
Multi AZ Cluster mode is a similar to how read replicas work, but for read replicas we have to think of read replicas as separate things:
They are not part of the main database instance
They have their own endpoint address
Require application support
There is no automatic failover to a read replica
The primary instance and read replica is kept sync using asynchronous replication
There can be a small amount of lag in case of replication
Read replicas can be created in a different AZ or different region (CRR - Cross-Region Replication)
We can 5 direct read-replicas per DB instance
Each read-replica provides an additional instance of read performance
Read-replicas can also have read-replicas, but lag starts to be a problem in this case
Read-replicas can provide global performance improvements
Snapshots and backups improve RPO but not RTO. Read-replicas offer near 0 RPO
Read-replicas can be promoted to primary in case of a failure. This offers low RTO as well (lags of minutes)
Read-replicas can replicate data corruption
Data Security
With all the RDS engines we can use encryption in transit (SSL/TLS). This can be set to be mandatory on a per user bases
For encryption at rest RDS supports EBS volume encryption using KMS which is handled by the host EBS and it is invisible for the database engine
We can use customer managed or AWS generated CMK data keys for encryption at rest
Storage, logs and snapshots will be encrypted with the same customer master key
Encryption can not be removed after it is activated
In addition to encryption at rest MSSQL and Oracle support TDE (Transparent Data Encryption) - encryption at the database engine level
Oracle supports TDE with CloudHSM, offering much stronger encryption
IAM authentication with RDS:
Normally login is controlled with local database users (username/password)
We can configure RDS to allow IAM authentication (only authentication, not authorization, authorization is handled internally!):
RDS Proxy
Opening and closing connections consumes resources and takes time => in case we only want to read/write a tiny amount of data the overhead of establishing a connection creates a significant latency
Handling failure of databases instances is hard, this adds significant overhead and risks to our application
DB proxies can help, but managing them is not always trivial (scaling, resilience)
In case of an RDS proxy our application connects to the proxy, which handles connection polling and connectivity to the database
RDS proxies provide multiplexing: a smaller number of connections can be used to connect to the database while having a larger number of applications using the database through the proxy. This helps to reduce the load on the database
RDS Proxy can help with database failover events abstracting this from the applications. The proxy can wait until a healthy database instance is in place and can automatically connect to it
When to use RDS proxy?
In case we have errors such as Too many connections. An RDS proxy can reduce the number of connections to the dabase while being able to handle many more connections from the applications to itself
Useful when using AWS Lambda, we won’t need to invoke a new connection after each invocation of our function. Saves time by connection reuse and IAM auth
Useful for long running applications (SAAS apps) by reducing latency
RDS Proxy key facts:
Fully managed by RDS/Aurora
By default provides auto scaling, HA
Provides connections pooling, which reduces DB load
Only accessible from a VPC, not accessible from the public internet
Accessed via Proxy Endpoint
Can enforce SSL/TLS connection
Can reduce failover time by over 60% in case of Aurora
Abstracts the failure of a database away for our application
RDS Custom
Fills the gap between the main RDS product and EC2 running a DB engine
The main RDS is fully managed database service => OS/Engine access is limited
In contrast databases running on EC2 are self managed, this can have significant management overhead
RDS custom bridges this gap, we can utilize RDS but still get access to customization we would have when running a DB instance on EC2
Currently RDS custom works from MSSQL or Oracle
We can connect to the underlying OS using SSH, RDP or Session Manager
RDS custom will run withing our AWS account. Classic RDS will run in an AWS managed environment
If we need to perform RDS customization for RDS Custom, we need to look inside the Database Automation settings to make sure we wont have any disruption caused by the Database Automation. We need to pause Database Automation for this period