As part of the Aembit Workload IAM Platform, we make use of RDS PostgreSQL instances deployed in AWS for online transaction processing. Originally, we deployed a primary RDS instance in a single region, with one or more read-replica instances deployed in the same region for resilience. Additionally, we replicated our database snapshots to an alternate region to facilitate disaster recovery.
With this architecture, we observed two primary limitations:
No automatic failover between RDS instances. In the case of primary instance failure, our monitoring infrastructure needed to detect and report on the failure, and then we needed to promote one of the read replicas to be the primary instance. We also needed to redirect our software to connect to the newly promoted instance. Automation of this process required that we develop and maintain our own system to perform these operations in the event of RDS failure.
No multi-region redundancy. If AWS experienced a regional outage, we needed to deploy a new RDS instance in an alternate region using a stored backup.
Given these concerns, we began the process of migrating our AWS RDS PostgreSQL instances to AWS Aurora. In this post, we’d like to share a bit about our thought process, and some of the challenges we encountered during this exercise.
With respect to the limitations described above, Aurora provides:
Automatic failover between instances. Within a single region, if a failure is detected on a primary Aurora DB instance, failover to a replica Aurora DB instance is automatic. The Aurora cluster within a region provides a single endpoint for our software to connect to, so software requires no modification to maintain connectivity with the Aurora cluster.
Multi-region redundancy. Aurora provides the option of a global cluster, whereby we can deploy Aurora in multiple regions. Failover to another region is not automatic, but switching to an active Aurora cluster in another region obviates the need to restore a DB instance from backup, which should streamline our disaster recovery scenarios.
We did have some concerns about Aurora compared with RDS PostgreSQL, including:
Cost. Aurora is not ideal for smaller DB deployments because the minimum DB instance type supported is ‘db.t3.medium.’ For global clusters, the minimum instance requirement is one of the ‘db.r5’ or ‘db.r6’ memory-optimized instance types.
Automated cross-region backups. AWS RDS PostgreSQL supports replication of backups to an alternate region. With Aurora, cross-region backups cannot be configured through Aurora directly, but must be implemented through Lambda, systems manager automation, or AWS Backup.
No upgrade logs can be published to Amazon CloudWatch, only PostgreSQL logs.
Delays. Occasionally, we see the Aurora PostgreSQL engine version lag behind AWS RDS PostgreSQL, but this has not been considered to be a severe limitation.
Our current AWS RDS infrastructure is fully managed through Terraform, and for the migration to Aurora, we had the following requirements:
1) Existing RDS data must be preserved.
2) The Aurora infrastructure must be wholly managed through Terraform templates.
For point No. 2, we understood that several manual steps may be required during the migration process, but after the migration is complete, there should be no further manual intervention required – the resultant infrastructure should be fully managed through Terraform.
Make a backup of the existing RDS database before migration (‘Take Snapshot’ from the ‘Actions’ menu).