Relation Database Service (RDS)
Database Refresher and Models
Relational
- use Structured Query Language (SQL)
rigid schema
: structure in and between tables of data- defined in advance
- difficult to use for data that has rapidly changing relationships
- table of rows, where each row has a unique primary key
-
relationship between tables
composite key
: links to two tables which indirectly connects them
-
NoSQL
- not one single thing, many models
- generally more relaxed schema
Key-Value
- list of key value pairs
- no real schema or tables
- scalable since sections of the list can be split to different servers
- really fast
- in memory caching
Wide Column Store (DynamoDB)
- two keys, partition and other key
- can have multiple attributes, different items don't need to have the same attributes
Document
- key : value, where the values are documents
- documents usually json or xml format
- ideal for interacting with whole documents or deep attribute interactions
Column (Redshift)
- store data based on columns
- really good at querying single columns for reporting and analytics
Graph
- nodes and edges representing relationships
- both can have key value pairs of data
ACID v BASE
- database transaction models
cap theorem: (choose 2)
consistency
: every read to a database will get the most recent write or get an erroravailability
: every request will receive a non-error response, without the guarantee that you get the most recent writepartition tolerant
: system can be made of multiple network partitionsacid : consistency
- transactions are :
atomic
: all or no components of a transaction succeeds or fails ex) banking transfersconsistent
: transaction move database from one valid state to another, no in-betweenisolated
: concurrent transactions don't interfere with each otherdurable
: once committed, transactions are durable, stored on non-volatile memory
RDS
limits scaling
base : availability
- transactions are :
basically available
: read and write operations are available as much as possible without consistency guaranteessoft state
: database doesn't enforce consistency, offloaded to application/usereventually
: if we wait long enough, reads from the system will be consistent
highly scalable
nosql
Databases on EC2
usually a bad idea
admin overhead
: managing ec2 and dbhost- backup / DR management / replication
- ec2 runs in a single AZ
- missing features aws db products have, performance
-
on or off, not serverless, no easy scaling
-
putting database in an ec2 introduces dependency
- need reliable communication between application and database ec2
- cost for data transferring across different AZ
justifications
- access to the db instance OS
- advanced db option tuning
- db or db version aws doesn't support
- specific os/db combination aws doesn't provide
- architecture aws doesn't provide
Relational Database Service (RDS)
- Database-as-a-service (DBaas)
- managed database instance (1 or more databases)
- multiple engines MySQL, MAriaDB, PostgreSQL, Oracle, Microsoft SQL Server
- Amazon Aurora
Architecture - RDS Database Instance - use same tooling as engine to access - has one attached block storage in the same AZ as the instance - io1, gp2, magetic - billed for the instance (cpu/memory) and for the storage GB/month - access to to rds instance is controlled by its associated security group
- create a subnet group
- inform which subnets to place rds instances into
RDS Multi AZ
- secondary hardware is allocated in another AZ (standby replica)
- access database using cname, which points to primary rds
Synchronous Replication
- database writes directed at primary database cname
- which writes changes to it's storage
- changes immediately replicated across to standby replica
- standby writes changes to it's storage
- almost zero lag, as replication occurs as data is being written to primary database
- if error occurs with primary database
-
rds changes the database endpoint cname to the standby replica
-
gives
high availability
, but not fault tolerance - not available for free tier
- standby can't be directly used, doesn't increase performance
60-120
seconds failoversame region
only, work with other AZs in the VPCbackups are taken from standby
, (removes performance impact)- failure reasons : az outage, primary failure, manual failure, instance type change, software patching
RDS Backups and Restores
Recovery Point Objective (RPO)
- time between last backup and the point a failure occurred
- maximum data loss possible
- influences technical solution and cost :
- lower rpo -> HIGHER COST
Recovery Time Objective (RTO)
- time between a failure and when system is return to service
- influenced by process, staff, tech, and documentation
- generally lower values cost more
RDS is capable of performing Manual Snapshots and Automatic backups
- both use aws managed s3 buckets that aren't visible to user
- gives region resilience
- first snapshot is full, then onwards incremental
- can affect performance of single instance, but if using multi-az used on standby
- Manual Snapshots
: live forever
- Automatic Backups
: configure how often it occurs
- every five minutes, database transactions logs are written to s3
- the actual data that changes in the database
- database can be restored back in time within a 5 minute granularity
- RPO = 5 minutes
- database snapshot is restored, then the transaction logs are replayed over the top of the snapshot
- are not retained indefinitely, automatically cleaned up
- 0-35 day retention period
- can retain automated backups after deleting rds instance, but they will still expire
- must take manual snapshot
RDS Restores
- creates a new rds instance with new address
- will have to update application that use the database endpoint address
- restoring using snapshots restores to single point in time
RDS READ-Replicas
- RDS Read Replicas can be added to an RDS Instance -
5 direct per primary instance
. - each providing an additional instance of read performance
- ie) make applications used replicas for read operations
- offers
low RTO recovery
for any instance failure issues - replicas can be promoted to a primary quickly
- don't help with data corruption as the corruption will be replicated to the RR.
- like another RDS instance, has its
own address
- asynchronous replication form primary to read replica
- can be in the
same region, or cross-region replicas
. global
performance improvements- read replicas can have read replicas, but lag start to be a problem
RDS Data Security
- authentication : how users can log in to rds
- can configure RDS to allow iam authentication
- create RDS
local DB account
configured to useaws authentication token
- policy attached to users or roles maps that iam identity onto the local RDS user
generate-db-auth-token
for15 mins
- create RDS
- authorization : how access is controlled
- encryption in transit : between client and rds
- SSL/TLS, can be
mandatory
- encryption at rest : how data is protect when it is written to disk
- default : EBS volume encryption with KMS
handled by RDS HOST/EBS
- CMK generates data keys used for encryption operations
storage, logs, snapshots, and replicas are encrypted
can't be removed
once added
- MSSQL and ORACLE support Transparent Data Encryption (
TDE
)- encryption handled within the DB engine,
ie encrypted before leaving the instance
- ORACLE supports integration with CloudHSM
- stronger key controls
- encryption handled within the DB engine,
Aurora Architecture
- uses a
cluster
- made of a single
primary
instance and0 or more replicas
- replicas can be used for reads during normal operation
- provide availability and read scalability
- no local storage : uses
cluster volume
- shared and available to all compute instances within a cluster
- faster provision, improved availability and performance
- data written to primary disk,
aurora synchronously replicates
the data across 6 replicas in 3 AZs aurora automatically detects failures
in disk volumes of shared storage- when segment fails, aurora
immediately repairs using data inside other storage nodes
- avoids data loss and reduces any need for snapshot restores
- when segment fails, aurora
Up to 15 replicas
, that can be assigned to failover operations- instances can be added without requiring storage provisioning
All SSD Based - high IOPS, low latency
- Don't specify storage,
billed on what's used
, for the most used- create new cluster and migrate data to reduce costs
Cluster Endpoint
- points at primary instance
- can be used for read and write operations
Reader Endpoint
- load balances across all available replicas
- used for read operations, easier reader scaling
Custom Endpoint
- can create custom endpoints
individual endpoints
- each instance (primary or replica) has its own endpoint
Costs
- no free-tier option
- doesn't support micro instances
- beyond rds single az, aurora offers better value
- compute : hourly charge per second, 10 minute min
- storage : GB-Month consumed, IO costs per request
- 100% DB Size in backups are included
Backups
- work in aurora in the same ay as RDS
- restores create a new cluster
backtrack
can be used which allowin-place rewinds
to previous point in timefast clone
: makes a new databasesMUCH
faster than copying all the data- only stores differences of data that changed in the clone or the original
Aurora Serverless
- removes the admin overhead of managing individual instances
- scalable
Aurora Capacity UnitsACU
- allocated from a shared pool managed by aws
- represent an amount of compute and memory
- can set
min and max
acu values - can go to
0 and be paused
- billed per-second basis
- same resilience as aurora, 6 replicas across AZs
- requests go to a hidden layer
proxy fleet
managed by aws - which broker connection with acu
- allows for seamless scaling
Use cases
- infrequently used application
- new applications
- variable workloads
- unpredictable workloads
- development and test databases
- multi-tenant applications (scaling is aligned between infrastructure size and revenue )
Aurora Global Database
- Create global level replication using aurora from a main region to up to
5 secondary regions
- entire secondary cluster is read only
- can have up to
16 replicas
, which can be promoted to R/W - replication occurs at the storage layer
1 second
replication from main to all regions- no impact on db performance
- Use Case:
cross-region
disaster recovery and Business continuityglobal read scaling
: low latency performance improvements
Aurora Multi-Master
- default aurora mode has one
r/w
and0+
replicas not fault tolerant
- failover takes time as a replica is promoted
- in
multi-master
mode all instances arer/w
- no cluster endpoint to use
no load balancing
, must be handled by applicationapplication can connect with the instances directly
- when writer gets a request, it tries to get others to agree to add the data to all data nodes
- changes is replicated to other nodes in memory cache
- so their reads are consistent with newly updated data
- application can send request to another instance if a request fails
- foundation of being able to build a
fault tolerant
application - faster availability
- foundation of being able to build a
Database Migration Service (DMS)
- no downtime
- runs using a replication instance
- an ec2 instance with migration software that can communicate with the DMS service
- replication tasks : define the options relating to the migration
source and destination endpoints
pointing at source and target databases- one endpoint must be aws
full load
- migrating existing all data
full load + Change Data Capture (CDC)
- migrates existing data and replicates any on going changes occurring on the source
CDC Only
- if you want to use an alternative method to transfer the bulk of the data
- migrate only the changes
- Schema Conversion Tool (
SCT
) - can assist with schema conversion
- used when converting one database engine to another
- not used when migrating between compatible engines
- works with OLTP and OLAP databases
DMS & Snowball
- larger migrations in multi TB size
- use schema conversion tool to extract data locally and move to a snowball device
- ship the device to aws, the load it onto aan s3 bucket
- DMS migrates from s3 into the target store
- CDC can capture changes and which can be written to target database via s3