Relation Database Service (RDS)
Database Refresher and Models
Relational- use Structured Query Language (SQL)
rigid schema: structure in and between tables of data- defined in advance
- difficult to use for data that has rapidly changing relationships
- table of rows, where each row has a unique primary key
-
relationship between tables
composite key: links to two tables which indirectly connects them
-
NoSQL - not one single thing, many models
- generally more relaxed schema
Key-Value- list of key value pairs
- no real schema or tables
- scalable since sections of the list can be split to different servers
- really fast
- in memory caching
Wide Column Store (DynamoDB)- two keys, partition and other key
- can have multiple attributes, different items don't need to have the same attributes
Document- key : value, where the values are documents
- documents usually json or xml format
- ideal for interacting with whole documents or deep attribute interactions
Column (Redshift)- store data based on columns
- really good at querying single columns for reporting and analytics
Graph- nodes and edges representing relationships
- both can have key value pairs of data
ACID v BASE
- database transaction models
cap theorem: (choose 2)consistency: every read to a database will get the most recent write or get an erroravailability: every request will receive a non-error response, without the guarantee that you get the most recent writepartition tolerant: system can be made of multiple network partitionsacid : consistency- transactions are :
atomic: all or no components of a transaction succeeds or fails ex) banking transfersconsistent: transaction move database from one valid state to another, no in-betweenisolated: concurrent transactions don't interfere with each otherdurable: once committed, transactions are durable, stored on non-volatile memory
RDSlimits scalingbase : availability- transactions are :
basically available: read and write operations are available as much as possible without consistency guaranteessoft state: database doesn't enforce consistency, offloaded to application/usereventually: if we wait long enough, reads from the system will be consistent
highly scalablenosql
Databases on EC2
usually a bad ideaadmin overhead: managing ec2 and dbhost- backup / DR management / replication
- ec2 runs in a single AZ
- missing features aws db products have, performance
-
on or off, not serverless, no easy scaling
-
putting database in an ec2 introduces dependency - need reliable communication between application and database ec2
- cost for data transferring across different AZ
justifications
- access to the db instance OS
- advanced db option tuning
- db or db version aws doesn't support
- specific os/db combination aws doesn't provide
- architecture aws doesn't provide
Relational Database Service (RDS)
- Database-as-a-service (DBaas)
- managed database instance (1 or more databases)
- multiple engines MySQL, MAriaDB, PostgreSQL, Oracle, Microsoft SQL Server
- Amazon Aurora
Architecture - RDS Database Instance - use same tooling as engine to access - has one attached block storage in the same AZ as the instance - io1, gp2, magetic - billed for the instance (cpu/memory) and for the storage GB/month - access to to rds instance is controlled by its associated security group
- create a subnet group
- inform which subnets to place rds instances into
RDS Multi AZ
- secondary hardware is allocated in another AZ (standby replica)
- access database using cname, which points to primary rds
Synchronous Replication- database writes directed at primary database cname
- which writes changes to it's storage
- changes immediately replicated across to standby replica
- standby writes changes to it's storage
- almost zero lag, as replication occurs as data is being written to primary database
- if error occurs with primary database
-
rds changes the database endpoint cname to the standby replica
-
gives
high availability, but not fault tolerance - not available for free tier
- standby can't be directly used, doesn't increase performance
60-120seconds failoversame regiononly, work with other AZs in the VPCbackups are taken from standby, (removes performance impact)- failure reasons : az outage, primary failure, manual failure, instance type change, software patching
RDS Backups and Restores
Recovery Point Objective (RPO)
- time between last backup and the point a failure occurred
- maximum data loss possible
- influences technical solution and cost :
- lower rpo -> HIGHER COST
Recovery Time Objective (RTO)
- time between a failure and when system is return to service
- influenced by process, staff, tech, and documentation
- generally lower values cost more
RDS is capable of performing Manual Snapshots and Automatic backups
- both use aws managed s3 buckets that aren't visible to user
- gives region resilience
- first snapshot is full, then onwards incremental
- can affect performance of single instance, but if using multi-az used on standby
- Manual Snapshots : live forever
- Automatic Backups : configure how often it occurs
- every five minutes, database transactions logs are written to s3
- the actual data that changes in the database
- database can be restored back in time within a 5 minute granularity
- RPO = 5 minutes
- database snapshot is restored, then the transaction logs are replayed over the top of the snapshot
- are not retained indefinitely, automatically cleaned up
- 0-35 day retention period
- can retain automated backups after deleting rds instance, but they will still expire
- must take manual snapshot
RDS Restores
- creates a new rds instance with new address
- will have to update application that use the database endpoint address
- restoring using snapshots restores to single point in time
RDS READ-Replicas
- RDS Read Replicas can be added to an RDS Instance -
5 direct per primary instance. - each providing an additional instance of read performance
- ie) make applications used replicas for read operations
- offers
low RTO recoveryfor any instance failure issues - replicas can be promoted to a primary quickly
- don't help with data corruption as the corruption will be replicated to the RR.
- like another RDS instance, has its
own address - asynchronous replication form primary to read replica
- can be in the
same region, or cross-region replicas. globalperformance improvements- read replicas can have read replicas, but lag start to be a problem
RDS Data Security
- authentication : how users can log in to rds
- can configure RDS to allow iam authentication
- create RDS
local DB accountconfigured to useaws authentication token - policy attached to users or roles maps that iam identity onto the local RDS user
generate-db-auth-tokenfor15 mins
- create RDS
- authorization : how access is controlled
- encryption in transit : between client and rds
- SSL/TLS, can be
mandatory - encryption at rest : how data is protect when it is written to disk
- default : EBS volume encryption with KMS
handled by RDS HOST/EBS- CMK generates data keys used for encryption operations
storage, logs, snapshots, and replicas are encryptedcan't be removedonce added
- MSSQL and ORACLE support Transparent Data Encryption (
TDE)- encryption handled within the DB engine,
ie encrypted before leaving the instance - ORACLE supports integration with CloudHSM
- stronger key controls
- encryption handled within the DB engine,
Aurora Architecture
- uses a
cluster - made of a single
primaryinstance and0 or more replicas - replicas can be used for reads during normal operation
- provide availability and read scalability
- no local storage : uses
cluster volume - shared and available to all compute instances within a cluster
- faster provision, improved availability and performance
- data written to primary disk,
aurora synchronously replicatesthe data across 6 replicas in 3 AZs aurora automatically detects failuresin disk volumes of shared storage- when segment fails, aurora
immediately repairs using data inside other storage nodes - avoids data loss and reduces any need for snapshot restores
- when segment fails, aurora
Up to 15 replicas, that can be assigned to failover operations- instances can be added without requiring storage provisioning
All SSD Based - high IOPS, low latency- Don't specify storage,
billed on what's used, for the most used- create new cluster and migrate data to reduce costs
Cluster Endpoint- points at primary instance
- can be used for read and write operations
Reader Endpoint- load balances across all available replicas
- used for read operations, easier reader scaling
Custom Endpoint- can create custom endpoints
individual endpoints- each instance (primary or replica) has its own endpoint
Costs- no free-tier option
- doesn't support micro instances
- beyond rds single az, aurora offers better value
- compute : hourly charge per second, 10 minute min
- storage : GB-Month consumed, IO costs per request
- 100% DB Size in backups are included
Backups- work in aurora in the same ay as RDS
- restores create a new cluster
backtrackcan be used which allowin-place rewindsto previous point in timefast clone: makes a new databasesMUCHfaster than copying all the data- only stores differences of data that changed in the clone or the original
Aurora Serverless
- removes the admin overhead of managing individual instances
- scalable
Aurora Capacity UnitsACU - allocated from a shared pool managed by aws
- represent an amount of compute and memory
- can set
min and maxacu values - can go to
0 and be paused - billed per-second basis
- same resilience as aurora, 6 replicas across AZs
- requests go to a hidden layer
proxy fleetmanaged by aws - which broker connection with acu
- allows for seamless scaling
Use cases- infrequently used application
- new applications
- variable workloads
- unpredictable workloads
- development and test databases
- multi-tenant applications (scaling is aligned between infrastructure size and revenue )
Aurora Global Database
- Create global level replication using aurora from a main region to up to
5 secondary regions - entire secondary cluster is read only
- can have up to
16 replicas, which can be promoted to R/W - replication occurs at the storage layer
1 secondreplication from main to all regions- no impact on db performance
- Use Case:
cross-regiondisaster recovery and Business continuityglobal read scaling: low latency performance improvements
Aurora Multi-Master
- default aurora mode has one
r/wand0+replicas not fault tolerant- failover takes time as a replica is promoted
- in
multi-mastermode all instances arer/w - no cluster endpoint to use
no load balancing, must be handled by applicationapplication can connect with the instances directly- when writer gets a request, it tries to get others to agree to add the data to all data nodes
- changes is replicated to other nodes in memory cache
- so their reads are consistent with newly updated data
- application can send request to another instance if a request fails
- foundation of being able to build a
fault tolerantapplication - faster availability
- foundation of being able to build a
Database Migration Service (DMS)
- no downtime
- runs using a replication instance
- an ec2 instance with migration software that can communicate with the DMS service
- replication tasks : define the options relating to the migration
source and destination endpointspointing at source and target databases- one endpoint must be aws
full load- migrating existing all data
full load + Change Data Capture (CDC)- migrates existing data and replicates any on going changes occurring on the source
CDC Only- if you want to use an alternative method to transfer the bulk of the data
- migrate only the changes
- Schema Conversion Tool (
SCT) - can assist with schema conversion
- used when converting one database engine to another
- not used when migrating between compatible engines
- works with OLTP and OLAP databases
DMS & Snowball- larger migrations in multi TB size
- use schema conversion tool to extract data locally and move to a snowball device
- ship the device to aws, the load it onto aan s3 bucket
- DMS migrates from s3 into the target store
- CDC can capture changes and which can be written to target database via s3