Simple Storage Service S3
- Regionally resilient
- bucket name + key (filename)
- zero bytes to 5TB per object
- versionId, MetaData
- Access Control
- Subresources
- great for large scale data storage, distribution or upload
- can be used as input or output to many aws products not a file or block storage
Bucket
- containers for data, created within a region
- name is globally unique
- unlimited objects
- flat structure, no true folders, folders are prefixes
- 100 soft limit, 1000 hard limit
Object
- key : value
S3 Security
Private by default. Only root account can access it, and everything else must be explicitly given permission.
Bucket Policy
A type of resource policy.
- allow access from the same or different account
- policy is attached to resource, which can then reference any identity either within or outside the account
- allow or deny anonymous principals
- by reference all principals
- bucket can only have one policy, but that policy can have multiple statements
black public access
- option to override any policy or ACL that allow public access
- applies on public access, and not on aws identities (anon principals)
- disabling this doesn't give access permissions, instead it grants the ability to grant permission
Static Website Hosting
Allow access via HTTP.
must set:
- index
: default page, or entry point to website
- Error
: page shown when there is an error
Website url will be automatically generated from region and bucket name. If you want to use your own custom name, then the bucket name must match the domain.
Use cases: 1. Host blogs 2. Offloading - offload static media to s3, since it is cheaper than compute - then html that is sent to clients browser can point to the buckets for the static media 3. Out-of-band pages - if compute service is down, point customers to static website to give information
Object Versioning and MFA Delete
Object Versioning
Lets you store multiple versions of objects within a bucket.
- Operations which would modify objects generate a new version.
- versions are identified by an
id
- when accessing an object, if an id is not specified, then the latest version of the object is used
- controlled at bucket level
disabled
by default, once enabled it cannot be disabled again- can be
suspended
, then re-enabled - billed for storage of all versions of objects
- when deleting a versioned object without specifying an id, a new latest version called a
delete marker
will be created - this hides the other versions and makes it looks like the object is deleted
- the delete marker can be deleted, thereby making the other versions visible again (un-delete)
- when deleting an object with specifying an id
- the object of that version will be really deleted
- objects versions will be moved up
MFA Delete
- Enabled in versioning configuration
- mfa is required to change bucket versioning state (enabled, suspended )
- mfa is required to delete versions
Performance
Single PUT Upload - default way objects are uploaded to s3 - single data stream to s3 - if a stream fails, upload fails, requires full restart - limited to 5GB of data
Multipart Upload - data is broken up, - minimum size is 100MB - 10,000 max parts of size 5MB - 5GB - last part can be smaller than 5MB - parts can fail and be restarted in isolation - improves transfer rate, by avoiding single stream inefficiencies or limitations
Accelerated Transfer - uses edge locations to use aws global network which is purpose built to connect regions to each other - improve the reliability and speed of transferring data across regions over using the public network - bucket can't have period
Object Encryption
- buckets aren't encrypted, objects are
- encryption is defined at object level, with a possibility of different objects using different methods
At rest
Client-Side Encryption : object is encrypted on client side, before it gets uploaded to s3 - client is responsible for managing keys and encryption process
Server-Side Encryption : object reaches s3 in plaintext, and then gets encrypted by s3 - Server-Side Encryption with Customer-Provided Keys (SSE-C) - most control - Customer manages keys, s3 manages encryption - provide s3 they key along with object being upload - object is stored with hash of key supplied - hash verifies the same key is being used during decryption - key is never stored in s3 - Server-Side Encryption with Amazon s3-Managed Keys (SSE-S3) - default - s3 managed both encryption and keys (AES 256) - s3 key creates a unique key for every object uploaded - a master key encrypts the unique key which is then stored with the encrypted data - Server-Side Encryption with Customer Master Keys (CMK's) stored in AWS Key Management Service (SSE-KMS) - kms manages the keys, CMK creates DEK to encrypt object uploaded - gives fine grain control over key being used - allows role separation by a limiting permissions to CMK used to encrypt objects - allow control to key rotation
note in both cases the data is encrypted in transit as https is used to upload
Bucket Default Encryption - applied when encryption is not specified at object level
Storage Classes
S3 Standard - default - replicated on at least 3 AZ's - charged GB/month fee for storage, $ per GB for transfer out, price per 1000 requests - no retrieval fee - no minimum duration - no minimum size - used for frequently accessed data
S3 Standard-IA (Infrequent Access) - similar architecture to S3 Standard - cheaper than S3 Standard to store data - per GB retrieval fee, cost increases with data increases - minimum billing duration of 30 days - minimum capacity billed at 128KB - used for long-lived data, that is important but infrequently accessed
S3 One Zone-IA - cheaper than S3 Standard and S3 Standard-IA - still has retrieval fee, minimum duration and capacity billed - stored in one AZ, cheaper storage by more risky - used for long-lived data, which is non-critical or replaceable, where access is infrequent
S3 Glacier - instant - like S3 Standard-IA but cheaper storage, more expensive retrieval, longer minimum - minimum billing duration of 90 days - still have instant access to objects
S3 Glacier - Flexible - objects are retrieved to S3 Standard - IA temporarily - expedited (1-5 mins), standard (3-5) hours, bulk (5-12 hours) - faster retrieval results in more expensive costs - objects cannot be made publicly accessible - 90 days, 40KB minimums - archive data where frequent or fast access is not needed
S3 Glacier Deep Archive - 180 days, 40KB minimums - standard (12 hours), bulk (48 hours) - archival data that is rarely if ever accessed
S3 Intelligent Tiering - frequent access, infrequent access, archive instant access, archive access, deep archive - s4 standard, S3 Standard-IA, S3 Glacier - instant, S3 Glacier - Flexible, S3 Glacier Deep Archive - monitors usage of objects and automatically moves objects to appropriate tiers - monitoring and automation cost per 1000 objects - used for long lived data where usage is changing or unknown
Lifecycle Configuration
Automate deletion of object (or object versions) or change storage classes.
- a lifecycle configuration is a set of rules
- consisting of actions on a bucket or group of objects
- transition actions : change the storage class of object
- expiration actions : can delete object or object version
- objects can transition down a waterfall, never up
- Standard -> Standard IA -> Intelligent-Tiering -> One Zone-IA -> Clacier IR -> Glacie FR -> Glacier DA
- minimum of 30 days in standard before transition, if it starts in standard
- a single rule cannot transition to standard IA or One Zone IA and then to glacier within 30 days
- use two rules to get around 30 day limit
Replication
Cross-Region Replication (CRR) - source bucket replicated to destination bucket in a different region'
Same-Region REplication (SRR) - source and destination buckets in the same region
Replication Configuration - bucket to use - iam role s3 will assume, to be able to read and replicate data in object - if destination is in a different account, add a bucket policy that trusts the role used by source
Options - all object or subset - storage class objects in destination will use, (default same as source class) - ownership - replication time control (RTC), for quicker (15min) replication
Considerations - not retroactive - versioning needs to be on - one way replication : source -> destination - can handle un-encrypted, SSE-S3, and SSE-KMS - source bucket owner needs permission to objects - no system events, glacier, or glacier deep archive - delete aren't replicated
Use case - SRR - Log Aggregation - sync different account (Prod Test) - resilience and strict sovereignty - CRR - global resilience improvements - latency reduction
Presigned URLs
- temporary url that gives access to S3 objects
- created by identity that has access to S3
- used to give unauthenticated identities access to S3, while keeping the bucket private
- download (GEt) or upload (PUT) supported
Considerations - when using url, you are assuming the same permission as the identity that generated the url - identity can create url to object he doesn't have access to, but the url will result in an access denied page - if permissions change after the url is created, the permissions of the url will also change - don't generate with a role - since roles can expire faster than the url, the url can stop working abruptly
Select and Glacier Select
Allow to access to partial access using SQL-Like statements.
- reduce billing and increase speed by filtering before data is streamed out of s3
Events
- when enabled, a notification is generated when events occur in a bucket
- can be used to deliver to SNS, SQS, and lambda functions as part of serverless application
- add resource policies allowing s4 principal access
events - object created - object deletion - object restored from glacier - object replication statistics
Access Logs
- enabling logging on source bucket
- destination bucket acl allow the source s3 log delivery group
- detailed information about requests made on source bucket
- for auditing or research access patterns of customers
Requester Pays
- requester pays the cost of transfer out requests
bucket configuration
(cannot be set per object)- doesn't work with static website hosting or bitTorrent
- requires authentication to allow access
- requesters must add
x-amz-request-payer
in requests to confirm payment responsibility