Skip to content

Relation Database Service (RDS)

Database Refresher and Models

  • Relational
  • use Structured Query Language (SQL)
  • rigid schema : structure in and between tables of data
    • defined in advance
    • difficult to use for data that has rapidly changing relationships
  • table of rows, where each row has a unique primary key
  • relationship between tables

    • composite key : links to two tables which indirectly connects them
  • NoSQL

  • not one single thing, many models
  • generally more relaxed schema
  • Key-Value
    • list of key value pairs
    • no real schema or tables
    • scalable since sections of the list can be split to different servers
    • really fast
    • in memory caching
  • Wide Column Store (DynamoDB)
    • two keys, partition and other key
    • can have multiple attributes, different items don't need to have the same attributes
  • Document
    • key : value, where the values are documents
    • documents usually json or xml format
    • ideal for interacting with whole documents or deep attribute interactions
  • Column (Redshift)
    • store data based on columns
    • really good at querying single columns for reporting and analytics
  • Graph
    • nodes and edges representing relationships
    • both can have key value pairs of data

ACID v BASE

  • database transaction models
  • cap theorem: (choose 2)
  • consistency : every read to a database will get the most recent write or get an error
  • availability : every request will receive a non-error response, without the guarantee that you get the most recent write
  • partition tolerant : system can be made of multiple network partitions
  • acid : consistency
  • transactions are :
    • atomic : all or no components of a transaction succeeds or fails ex) banking transfers
    • consistent : transaction move database from one valid state to another, no in-between
    • isolated : concurrent transactions don't interfere with each other
    • durable : once committed, transactions are durable, stored on non-volatile memory
  • RDS
  • limits scaling
  • base : availability
  • transactions are :
    • basically available : read and write operations are available as much as possible without consistency guarantees
    • soft state : database doesn't enforce consistency, offloaded to application/user
    • eventually : if we wait long enough, reads from the system will be consistent
  • highly scalable
  • nosql

Databases on EC2

  • usually a bad idea
  • admin overhead : managing ec2 and dbhost
  • backup / DR management / replication
  • ec2 runs in a single AZ
  • missing features aws db products have, performance
  • on or off, not serverless, no easy scaling

  • putting database in an ec2 introduces dependency

  • need reliable communication between application and database ec2
  • cost for data transferring across different AZ

justifications - access to the db instance OS - advanced db option tuning - db or db version aws doesn't support - specific os/db combination aws doesn't provide - architecture aws doesn't provide

Relational Database Service (RDS)

  • Database-as-a-service (DBaas)
  • managed database instance (1 or more databases)
  • multiple engines MySQL, MAriaDB, PostgreSQL, Oracle, Microsoft SQL Server
  • Amazon Aurora

Architecture - RDS Database Instance - use same tooling as engine to access - has one attached block storage in the same AZ as the instance - io1, gp2, magetic - billed for the instance (cpu/memory) and for the storage GB/month - access to to rds instance is controlled by its associated security group

  • create a subnet group
  • inform which subnets to place rds instances into

RDS Multi AZ

  • secondary hardware is allocated in another AZ (standby replica)
  • access database using cname, which points to primary rds
  • Synchronous Replication
  • database writes directed at primary database cname
    • which writes changes to it's storage
  • changes immediately replicated across to standby replica
    • standby writes changes to it's storage
  • almost zero lag, as replication occurs as data is being written to primary database
  • if error occurs with primary database
  • rds changes the database endpoint cname to the standby replica

  • gives high availability, but not fault tolerance

  • not available for free tier
  • standby can't be directly used, doesn't increase performance
  • 60-120 seconds failover
  • same region only, work with other AZs in the VPC
  • backups are taken from standby, (removes performance impact)
  • failure reasons : az outage, primary failure, manual failure, instance type change, software patching

RDS Backups and Restores

Recovery Point Objective (RPO) - time between last backup and the point a failure occurred - maximum data loss possible - influences technical solution and cost : - lower rpo -> HIGHER COST

Recovery Time Objective (RTO) - time between a failure and when system is return to service - influenced by process, staff, tech, and documentation - generally lower values cost more

RDS is capable of performing Manual Snapshots and Automatic backups - both use aws managed s3 buckets that aren't visible to user - gives region resilience - first snapshot is full, then onwards incremental - can affect performance of single instance, but if using multi-az used on standby - Manual Snapshots : live forever - Automatic Backups : configure how often it occurs - every five minutes, database transactions logs are written to s3 - the actual data that changes in the database - database can be restored back in time within a 5 minute granularity - RPO = 5 minutes - database snapshot is restored, then the transaction logs are replayed over the top of the snapshot - are not retained indefinitely, automatically cleaned up - 0-35 day retention period - can retain automated backups after deleting rds instance, but they will still expire - must take manual snapshot

RDS Restores - creates a new rds instance with new address - will have to update application that use the database endpoint address - restoring using snapshots restores to single point in time

RDS READ-Replicas

  • RDS Read Replicas can be added to an RDS Instance - 5 direct per primary instance.
  • each providing an additional instance of read performance
    • ie) make applications used replicas for read operations
  • offers low RTO recovery for any instance failure issues
  • replicas can be promoted to a primary quickly
  • don't help with data corruption as the corruption will be replicated to the RR.
  • like another RDS instance, has its own address
  • asynchronous replication form primary to read replica
  • can be in the same region, or cross-region replicas.
  • global performance improvements
  • read replicas can have read replicas, but lag start to be a problem

RDS Data Security

  • authentication : how users can log in to rds
  • can configure RDS to allow iam authentication
    • create RDS local DB account configured to use aws authentication token
    • policy attached to users or roles maps that iam identity onto the local RDS user
    • generate-db-auth-token for 15 mins
  • authorization : how access is controlled
  • encryption in transit : between client and rds
  • SSL/TLS, can be mandatory
  • encryption at rest : how data is protect when it is written to disk
  • default : EBS volume encryption with KMS
    • handled by RDS HOST/EBS
    • CMK generates data keys used for encryption operations
    • storage, logs, snapshots, and replicas are encrypted
    • can't be removed once added
  • MSSQL and ORACLE support Transparent Data Encryption (TDE)
    • encryption handled within the DB engine, ie encrypted before leaving the instance
    • ORACLE supports integration with CloudHSM
    • stronger key controls

Aurora Architecture

  • uses a cluster
  • made of a single primary instance and 0 or more replicas
  • replicas can be used for reads during normal operation
    • provide availability and read scalability
  • no local storage : uses cluster volume
  • shared and available to all compute instances within a cluster
    • faster provision, improved availability and performance
  • data written to primary disk, aurora synchronously replicates the data across 6 replicas in 3 AZs
  • aurora automatically detects failures in disk volumes of shared storage
    • when segment fails, aurora immediately repairs using data inside other storage nodes
    • avoids data loss and reduces any need for snapshot restores
  • Up to 15 replicas, that can be assigned to failover operations
    • instances can be added without requiring storage provisioning
  • All SSD Based - high IOPS, low latency
  • Don't specify storage, billed on what's used, for the most used
    • create new cluster and migrate data to reduce costs
  • Cluster Endpoint
  • points at primary instance
  • can be used for read and write operations
  • Reader Endpoint
  • load balances across all available replicas
  • used for read operations, easier reader scaling
  • Custom Endpoint
  • can create custom endpoints
  • individual endpoints
  • each instance (primary or replica) has its own endpoint
  • Costs
  • no free-tier option
    • doesn't support micro instances
  • beyond rds single az, aurora offers better value
  • compute : hourly charge per second, 10 minute min
  • storage : GB-Month consumed, IO costs per request
  • 100% DB Size in backups are included
  • Backups
  • work in aurora in the same ay as RDS
  • restores create a new cluster
  • backtrack can be used which allow in-place rewinds to previous point in time
  • fast clone : makes a new databases MUCH faster than copying all the data
    • only stores differences of data that changed in the clone or the original

Aurora Serverless

  • removes the admin overhead of managing individual instances
  • scalable Aurora Capacity UnitsACU
  • allocated from a shared pool managed by aws
  • represent an amount of compute and memory
  • can set min and max acu values
  • can go to 0 and be paused
  • billed per-second basis
  • same resilience as aurora, 6 replicas across AZs
  • requests go to a hidden layer proxy fleet managed by aws
  • which broker connection with acu
  • allows for seamless scaling
  • Use cases
  • infrequently used application
  • new applications
  • variable workloads
  • unpredictable workloads
  • development and test databases
  • multi-tenant applications (scaling is aligned between infrastructure size and revenue )

Aurora Global Database

  • Create global level replication using aurora from a main region to up to 5 secondary regions
  • entire secondary cluster is read only
  • can have up to 16 replicas, which can be promoted to R/W
  • replication occurs at the storage layer
  • 1 second replication from main to all regions
  • no impact on db performance
  • Use Case:
  • cross-region disaster recovery and Business continuity
  • global read scaling : low latency performance improvements

Aurora Multi-Master

  • default aurora mode has one r/w and 0+ replicas
  • not fault tolerant
  • failover takes time as a replica is promoted
  • in multi-master mode all instances are r/w
  • no cluster endpoint to use
  • no load balancing, must be handled by application
  • application can connect with the instances directly
  • when writer gets a request, it tries to get others to agree to add the data to all data nodes
  • changes is replicated to other nodes in memory cache
    • so their reads are consistent with newly updated data
  • application can send request to another instance if a request fails
    • foundation of being able to build a fault tolerant application
    • faster availability

Database Migration Service (DMS)

  • no downtime
  • runs using a replication instance
  • an ec2 instance with migration software that can communicate with the DMS service
  • replication tasks : define the options relating to the migration
  • source and destination endpoints pointing at source and target databases
  • one endpoint must be aws
  • full load
  • migrating existing all data
  • full load + Change Data Capture (CDC)
  • migrates existing data and replicates any on going changes occurring on the source
  • CDC Only
  • if you want to use an alternative method to transfer the bulk of the data
  • migrate only the changes
  • Schema Conversion Tool (SCT)
  • can assist with schema conversion
  • used when converting one database engine to another
  • not used when migrating between compatible engines
  • works with OLTP and OLAP databases
  • DMS & Snowball
  • larger migrations in multi TB size
  • use schema conversion tool to extract data locally and move to a snowball device
  • ship the device to aws, the load it onto aan s3 bucket
  • DMS migrates from s3 into the target store
  • CDC can capture changes and which can be written to target database via s3