Simple Storage Service S3

Regionally resilient
bucket name + key (filename)
zero bytes to 5TB per object
versionId, MetaData
Access Control
Subresources
great for large scale data storage, distribution or upload
can be used as input or output to many aws products not a file or block storage

Bucket

containers for data, created within a region
name is globally unique
unlimited objects
flat structure, no true folders, folders are prefixes
100 soft limit, 1000 hard limit

Object

key : value

S3 Security

Private by default. Only root account can access it, and everything else must be explicitly given permission.

Bucket Policy

A type of resource policy.

allow access from the same or different account
policy is attached to resource, which can then reference any identity either within or outside the account
allow or deny anonymous principals
by reference all principals
bucket can only have one policy, but that policy can have multiple statements

black public access - option to override any policy or ACL that allow public access - applies on public access, and not on aws identities (anon principals) - disabling this doesn't give access permissions, instead it grants the ability to grant permission

Static Website Hosting

Allow access via HTTP.

must set: - index : default page, or entry point to website - Error : page shown when there is an error

Website url will be automatically generated from region and bucket name. If you want to use your own custom name, then the bucket name must match the domain.

Use cases: 1. Host blogs 2. Offloading - offload static media to s3, since it is cheaper than compute - then html that is sent to clients browser can point to the buckets for the static media 3. Out-of-band pages - if compute service is down, point customers to static website to give information

Object Versioning and MFA Delete

Object Versioning

Lets you store multiple versions of objects within a bucket.

Operations which would modify objects generate a new version.
versions are identified by an id
when accessing an object, if an id is not specified, then the latest version of the object is used
controlled at bucket level
disabled by default, once enabled it cannot be disabled again
can be suspended, then re-enabled
billed for storage of all versions of objects
when deleting a versioned object without specifying an id, a new latest version called a delete marker will be created
this hides the other versions and makes it looks like the object is deleted
the delete marker can be deleted, thereby making the other versions visible again (un-delete)
when deleting an object with specifying an id
the object of that version will be really deleted
objects versions will be moved up

MFA Delete

Enabled in versioning configuration
mfa is required to change bucket versioning state (enabled, suspended )
mfa is required to delete versions

Performance

Single PUT Upload - default way objects are uploaded to s3 - single data stream to s3 - if a stream fails, upload fails, requires full restart - limited to 5GB of data

Multipart Upload - data is broken up, - minimum size is 100MB - 10,000 max parts of size 5MB - 5GB - last part can be smaller than 5MB - parts can fail and be restarted in isolation - improves transfer rate, by avoiding single stream inefficiencies or limitations

Accelerated Transfer - uses edge locations to use aws global network which is purpose built to connect regions to each other - improve the reliability and speed of transferring data across regions over using the public network - bucket can't have period

Object Encryption

buckets aren't encrypted, objects are
encryption is defined at object level, with a possibility of different objects using different methods

At rest

Client-Side Encryption : object is encrypted on client side, before it gets uploaded to s3 - client is responsible for managing keys and encryption process

Server-Side Encryption : object reaches s3 in plaintext, and then gets encrypted by s3 - Server-Side Encryption with Customer-Provided Keys (SSE-C) - most control - Customer manages keys, s3 manages encryption - provide s3 they key along with object being upload - object is stored with hash of key supplied - hash verifies the same key is being used during decryption - key is never stored in s3 - Server-Side Encryption with Amazon s3-Managed Keys (SSE-S3) - default - s3 managed both encryption and keys (AES 256) - s3 key creates a unique key for every object uploaded - a master key encrypts the unique key which is then stored with the encrypted data - Server-Side Encryption with Customer Master Keys (CMK's) stored in AWS Key Management Service (SSE-KMS) - kms manages the keys, CMK creates DEK to encrypt object uploaded - gives fine grain control over key being used - allows role separation by a limiting permissions to CMK used to encrypt objects - allow control to key rotation

note in both cases the data is encrypted in transit as https is used to upload

Bucket Default Encryption - applied when encryption is not specified at object level

Storage Classes

S3 Standard - default - replicated on at least 3 AZ's - charged GB/month fee for storage, $ per GB for transfer out, price per 1000 requests - no retrieval fee - no minimum duration - no minimum size - used for frequently accessed data

S3 Standard-IA (Infrequent Access) - similar architecture to S3 Standard - cheaper than S3 Standard to store data - per GB retrieval fee, cost increases with data increases - minimum billing duration of 30 days - minimum capacity billed at 128KB - used for long-lived data, that is important but infrequently accessed

S3 One Zone-IA - cheaper than S3 Standard and S3 Standard-IA - still has retrieval fee, minimum duration and capacity billed - stored in one AZ, cheaper storage by more risky - used for long-lived data, which is non-critical or replaceable, where access is infrequent

S3 Glacier - instant - like S3 Standard-IA but cheaper storage, more expensive retrieval, longer minimum - minimum billing duration of 90 days - still have instant access to objects

S3 Glacier - Flexible - objects are retrieved to S3 Standard - IA temporarily - expedited (1-5 mins), standard (3-5) hours, bulk (5-12 hours) - faster retrieval results in more expensive costs - objects cannot be made publicly accessible - 90 days, 40KB minimums - archive data where frequent or fast access is not needed

S3 Glacier Deep Archive - 180 days, 40KB minimums - standard (12 hours), bulk (48 hours) - archival data that is rarely if ever accessed

S3 Intelligent Tiering - frequent access, infrequent access, archive instant access, archive access, deep archive - s4 standard, S3 Standard-IA, S3 Glacier - instant, S3 Glacier - Flexible, S3 Glacier Deep Archive - monitors usage of objects and automatically moves objects to appropriate tiers - monitoring and automation cost per 1000 objects - used for long lived data where usage is changing or unknown

Lifecycle Configuration

Automate deletion of object (or object versions) or change storage classes.

a lifecycle configuration is a set of rules
consisting of actions on a bucket or group of objects
transition actions : change the storage class of object
expiration actions : can delete object or object version
objects can transition down a waterfall, never up
Standard -> Standard IA -> Intelligent-Tiering -> One Zone-IA -> Clacier IR -> Glacie FR -> Glacier DA
minimum of 30 days in standard before transition, if it starts in standard
a single rule cannot transition to standard IA or One Zone IA and then to glacier within 30 days
use two rules to get around 30 day limit

Replication

Cross-Region Replication (CRR) - source bucket replicated to destination bucket in a different region'

Same-Region REplication (SRR) - source and destination buckets in the same region

Replication Configuration - bucket to use - iam role s3 will assume, to be able to read and replicate data in object - if destination is in a different account, add a bucket policy that trusts the role used by source

Options - all object or subset - storage class objects in destination will use, (default same as source class) - ownership - replication time control (RTC), for quicker (15min) replication

Considerations - not retroactive - versioning needs to be on - one way replication : source -> destination - can handle un-encrypted, SSE-S3, and SSE-KMS - source bucket owner needs permission to objects - no system events, glacier, or glacier deep archive - delete aren't replicated

Use case - SRR - Log Aggregation - sync different account (Prod Test) - resilience and strict sovereignty - CRR - global resilience improvements - latency reduction

Presigned URLs

temporary url that gives access to S3 objects
created by identity that has access to S3
used to give unauthenticated identities access to S3, while keeping the bucket private
download (GEt) or upload (PUT) supported

Considerations - when using url, you are assuming the same permission as the identity that generated the url - identity can create url to object he doesn't have access to, but the url will result in an access denied page - if permissions change after the url is created, the permissions of the url will also change - don't generate with a role - since roles can expire faster than the url, the url can stop working abruptly

Select and Glacier Select

Allow to access to partial access using SQL-Like statements.

reduce billing and increase speed by filtering before data is streamed out of s3

Events

when enabled, a notification is generated when events occur in a bucket
can be used to deliver to SNS, SQS, and lambda functions as part of serverless application
add resource policies allowing s4 principal access

events - object created - object deletion - object restored from glacier - object replication statistics

Access Logs

enabling logging on source bucket
destination bucket acl allow the source s3 log delivery group
detailed information about requests made on source bucket
for auditing or research access patterns of customers

Requester Pays

requester pays the cost of transfer out requests
bucket configuration (cannot be set per object)
doesn't work with static website hosting or bitTorrent
requires authentication to allow access
requesters must add x-amz-request-payer in requests to confirm payment responsibility