Skip to content

Elastic Compute Cloud (EC2)

  • virtual machines known as instances
  • private service, configured to run in a VPC network
  • is launched into a specific subnet, AZ resilient
  • different sizes and capabilities available
  • on-demand billing per second
  • charged for compute, storage, and external software
  • storage can be local on-host or Elastic Block Store (EBS)
  • will be charged for EBS even if instance is stopped
  • states : running, stopped, terminated
  • windows rdp port 3389, linux ssh port 22

Amazon Machine Image (AMI)

  • has attached permissions defining who can use it
  • used to create an EC2 or from an EC2
  • contains the root volume, the drive that boots the operating system
  • block device mapping, used to determine which volume is root volume and which is data volume

Permission Options: 1. Public Access 2. Owner Only 3. Specific AWS Accounts

  • used to launch ec2 instances
  • can be created from an existing instance
  • can get from aws or community or marketplace
  • regional

Block device mapping - ebs snapshots created from instance ami is created from - block devices mapping is a table that links, the newly created snapshots with the device id the snapshots had in the original ec2 instance - when ami is used to create a new instance - snapshots are used to create new ebs volumes - the volumes then are attached to the new instance using the same device ids as the original instance

Extra: - only exist in one region - ami baking : creating an ami from a configured instance + application - can't be edited, launch instance, update configurations and make a new ami - can be copied between regions, including snapshots - default permissions is owner only

Billing - storage cost of ebs volumes the ami references

Virtualization

Process of running more than one os on a piece of hardware.

Without virtualization: - server has hardware, os (kernel), and applications - os runs in privileged state giving it access to the hardware - applications make a system call to the kernel to get access to the hardware

  • Emulated Virtualization
  • software (hypervisor) runs in privileged state
  • guest hosts wrapped in containers called virtual machines
  • vm emulated hardware, and os is unaware of virtualization
  • binary translation :hypervisor intercepts hosts call to emulated hardware, to communicate with real hardware
  • Para-virtualization
  • guest os aware of virtualization
  • hypercalls : guest os makes call directly to host hypervisor
  • Hardware Assisted Virtualization
  • host hardware (cpu) aware of virtualization
  • host cpu intercepts calls from guest os and passes it to hypervisor
  • SR-IOV
  • hardware devices become virtualization aware
  • host network card can present itself as several mini cards
  • no translation required by hypervisor
  • host network card directly connects to host mini card
  • consistent lower latency at high amounts of consistent io

Architecture

  • virtual machine (os + resources)
  • run on EC2 Hosts hardware that aws manages
  • shared hosts hardware shared across different aws customers
  • dedicated hosts hardware dedicated to customer
  • AZ Resilient as hosts run in one AZ
  • local storage instance store, temporary storage, lost if instance moves to another host
  • storage and data networking
  • when instances are provisioned in a specific subnet
  • a primary elastic network interface is provisioned in a subnet and maps to physical hardware of EC2 host
  • can have multiple network interfaces in different subnets in the same AZ
  • cannot connect cross region
  • Elastic Block Store (EBS), remote storage
  • runs in one AZ, can't be accessed cross zone
  • volumes, portions of persistent storage allocated to instances in the same AZ
  • instances stay on a host until
  • host fails or is taken down
  • if instance is stopped and then started (not restarted)
  • cannot natively move between AZ
  • create copies and re-provision

Use case: - traditional os + application compute requirements - long-running compute - server style applications waiting for incoming connections - burst or steady-state load requirements - monolithic application stacks - migrating application workloads or disaster recovery

EC2 Instance Types

Different types of instances have different: - raw amount of resources - resource ratios - storage and data network bandwidth - system architecture / vendor - additional features and capabilities like gpu's

  • General Purpose (A, T, M)
  • default
  • diverse workloads, equal resource ratios
  • Compute optimized (C)
  • media processing, HPC, scientific modelling, gaming, machine learning
  • mor cpu than memory
  • Memory optimized (R, X, Z)
  • processing large in memory datasets
  • more memory than cpu
  • Accelerate Computing (P, G, F)
  • hardware gpu, field programmable gate arrays (FPGAs)
  • Storage Optimized (I, D, H)
  • large amounts of super fast local storage
  • good for sequential and random io operations
  • databases, data warehousing, elasicsearch, analytic workloads

Instance type Schema: <instance family><generation><additional capabilities>.<size>

SSH vs EC2 Instance Connect

SSH - make sure instance security group allows your ip address

Instance connect - uses aws ip to connect and presents it in the browser - make sure aws ip compatible with EC2_INSTANCE_CONNECT in your region is allowed

## Storage Basics

  • Direct (local) attached storage
  • directly connected to the EC2 host
  • called the instance store
  • fast because it is directly attached to the hardware
  • if disk or hardware fails, then the storage can be lost
  • if EC2 moves between hosts the storage can be lost
  • Network attached storage
  • called EBS
  • highly resilient
  • separate from instance hardware so it can survive issues with EC@ host

  • Ephemeral Storage

  • temporary storage
  • instance store
  • Persistent storage
  • permanent
  • lives on past the lifetime of the instance
  • EBS

  • Block Storage

  • create volume presented to os as a collection of uniquely addressable blocks
  • no structure
  • like empty hard drive/disk
  • os creates a file system on top of the block and mounts it
    • as c drive in windows or root in linux
  • bootable
  • File Storage
  • presented as a file share, has structure accessible over the network
  • mountable not bootable
  • Object Storage
  • collection of objects
  • not mountable, not bootable

Storage Performance

  • IO (block) size size of data writing to disk (MB)
  • IOPS input output operations per seconds (s)
  • Throughput amount of data that can be transferred ina given second (MB/s)

IO X IOPS = Throughput - choose right block size and then maximize iops to maximize throughput

Elastic Block Store (EBS)

  • block storage, raw disk allocations
  • can be written to or read using a block number
  • can be encrypted using KMS
  • AZ resilient provisioned in one AZ
  • in general attached to one over a storage network
  • can be detached and reattached
  • not lifecycle linked to one instant, persistent
  • can create a snapshot (back up) to S3
  • can create a volume from snapshot (migrate between AZ's)
  • has different, physical storage types, sizes, performance profiles
  • billed based GB per month

EBS Volume Types

General Purpose

GP2

  • default general purpose ssd based storage
  • size range 1GB to 16TB
  • created with io credit allocation
    • capacity of 5.4million io credits
    • fills at rate of baseline performance that is based on its size
    • min of 100 credit per second per gb of volume size + 3 credits per second, per GB of volume size
    • max 4000 iops
    • up to 3000 iops burst rate by depleting the bucket faster than it replenishes
    • EBS larger than 1 TB, maximum 16,000 io credit per second
    • no burst, baseline always achieved, don't use credit system
  • good for boot volumes, low latency applications

GP3

  • every volume regardless of size starts with 3000 iops & 125 MiB/s
  • 20% cheaper than GP2 at base price
  • up to 16,000 iops or 1000 MiB/s
  • add extra iops explicitly not based on size

Provisioned IOPS SSD

  • io1/2/BlockExpress
  • IOPS can be adjusted independently of size
  • designed for super high performance, low latency, io intensive databases
  • Up to 64,000 (256,000) IOPS per volume and 1,000 (4,000) MB/s (block express)
  • volume size ranges that are compatible 4 GB - 16 TB io1/2 , 4GB - 64TB block express
  • max size to performance ratio
  • io1 : 50IOPS/GB MAX
  • io2 : 500IOPS/GB MAX
  • BlockExpress : 1000IOPS/GB MAX
  • pay for size and provisioned iops
  • per instance performance (performance cap for an individual ec2 instance)
  • io1 : 260,000 IOPS & 7,500 MB/s
  • io2 : 160,000 IOPS & 4,750 MB/s
  • io2 & BlockExpress : 260,000 IOPS & 7,500 MB/s
  • cap also depends on the type and size of instance

Hard Disk Drive (HDD)

Both options provide less IOPS than SSD. Generally chosen for cost purposes.

st1

  • (throughput optimized)
  • A low-cost HDD designed for frequently accessed, throughput-intensive workloads.
  • big data, data warehouses, log processing
  • sequentially accessed data
  • 125 GB - 16 TB in size
  • Base : 40MB/s/TB , Burst 250MB/s/TB
  • max 500 IOPS - 500 MB/s

sc1

  • (cold HDD)
  • The lowest-cost HDD design for less frequently accessed workloads.
  • cold data, archives
  • 125 GB - 16 TB in size
  • Base : 12MB/s/TB , Burst 80MB/s/TB
  • max 250 IOPS - 250 MB/s

Instance Store

provide block storage storage devices presented to the os and used as the basis for a file system that can be used as applications

  • raw volumes that can be attached to an instance
  • physically connected to one ec2 host
  • instances on that host can access them
  • highest storage performance in aws
  • D3 instance type -> 4.6 GB/s throughput
  • I3 -> 16 GB/s throughput
  • more IOPS snd throughput vs EBS
  • included in the prices, use it or lose it
  • allocated a certain number of volumes based on instance type and size
  • attach at launch time
  • ephemeral (temporary) storage
  • if an instance moves between hosts then data stored in instance store volume is lost
    • if stopped and started, change instance type, or undergoing maintenance

EBS vs Instance Store

EBS - pair with good size that can give the level of performance - persistent storage - resilient storage - storage isolated from instance lifecycle

  • Cheap -> ST1 or SC1
  • throughput or streaming -> ST1
  • Boot -> not ST1 or SC1

  • GP2/3 - up to 16,000IOPS

  • io1/2 up to 64,000 IOPS (256,000 IOPS block express)

  • take lots of individual ebs volumes to create RAID0 set

  • achieve combined performance of all individual volumes
  • RADI0 + EBS up to 260,000 IOPS (io1/2-BE/GP2/3)

  • keep in mind the performance each volume gives, and the maximum performance of the instance itself

Instance Store - super high performance - cost -> instance store (often included) - more than 260,000 IOPS, CAN GET MILLIONS IOPS

Extra - resilience w/ App in-build replication -> it depends - high performance -> it depends

EBS Snapshots

Efficient way to back up EBS volumes to s3

  • the first is a full copy of data used on the volume
  • future snaps are incremental
  • difference between previous snapshot and current state of volume
  • if you delete incremental snapshot, newer snapshots will still function
  • each snapshot is self sufficient
  • volumes can be created from snapshots
  • snapshots allow you to clone a volume
  • new volume can be created in a new AZ, or regions since they are stored in S3
  • makes EBS volumes that are AZ resilient, to regionally resilient

Performance - new EBS -> full performance immediately - snaps restore lazily - fetched gradually - requested blocks are fetched immediately - can force a read of all data immediately from s3 to volume - using a tool in instance os - Fast Snapshot Restore (FSR) - immediate restore - up to 50 snaps per region, set on the snap and AZ - costs extra

Billing - gigabyte / month - used not allocated data - charged for changed or new allocation in snapshot - reference data that is not changed from older snapshots

EBS Encryption

  • Encryption uses KMS which uses CMK
  • when an encrypted volume is created
  • CMK saves an encrypted DEK onto the volume
  • when the volume is first used
  • EBS asks KMS to use CMK to decrypt the DEK which is then loaded into the memory of the EC2 host using it
  • EC2 instance running on the host can now use the decrypted DEK in the host, to interact with the encrypted EBS
  • cipher text stored at rest
  • snapshots of encrypted volumes, and volumes created from such snapshots all share the same DEK

  • accounts can be set to encrypt by default

  • each volume uses 1 unique DEK
  • can't change a volume to not be encrypted
  • os isn't aware of the encryption
  • (AES256) algo is done on host
  • no performance loss

Network Interfaces, Instance IPs and DNS

  • network interface
  • Every ec2 has an elastic network interface (eni)
    • can have secondary eni in a separate subnet but same AZ
  • has
    • a mac address
    • primary ipv4 private ip -> internal dns ip10-x-x-.ec2.internal
    • 0 or more secondary ips
    • 0 or 1 public ipv4 address
    • dynamic, changes when ec2 is stopped and started in a different host
    • dns -> ec2-x-x-x.compute-1.azmazonaws.com
    • inside vpc resolves to primary private ip
    • outside of vpc it resolves to public ip
    • 1 elastic ip per private ipv4 address
    • allocated to an account
    • can be associated with primary or secondary interface
      • if attached to primary, it replaces the public ipv4
    • 0 or more ipv6 addresses
    • security groups
    • source/destination check
  • can detach secondary interfaces and move them to other ec2 instances

Extra: - can move secondary eni mac licensing - multi-homed (subnets) - use multiple interfaces, with different ips and security groups - os never sees public ipv4, only sees the private - public ipv4 is handled by internet gateway using nat - ipv6 public by default - ipv4 public ips are dynamic, stop/start or (change of host) changes ip - public dns resolves to private in vpc - never leaves the vpc, for instance to instance communication

Purchase Options

  • On-Demand (default)
  • isolated, but multiple customer instances run on shared hardware
  • instances of different sizes run on the same ec2 hosts, consuming a defined allocation of resources
  • per-second billing while an instance is running
    • associated resources such as storage consume capacity, so billed regardless of instance state
  • no interruptions
  • no capacity reservation
  • predictable pricing
  • no upfront cost
  • no discount
  • choose for short-term or unknown workloads or applications which can't be interrupted

  • Spot

  • aws selling unused ex2 host capacity for up to 90% discount
  • spot price is based on the spare capacity at a given time
  • if spot price goes above your maximum price, then your instances are terminated
  • never use for workloads which can't tolerate interruptions
  • for workloads that are non time critical, can be rerun, cost sensitive, stateless, has bursty capacity need

  • Dedicated Hosts

  • pay for host that contained the instances, no instance charges
  • might have software licensing based on sockets or cores
  • host affinity --> if instance is stop and started it remains on the same host
  • only your instances run on the dedicated host
  • specific family and size of instance
    • nitro can use different size of instances at the same time
  • on-demand & reserved options for pricing
  • ami limits : rhel, suse linux, and windows amis aren't supported
  • amazon rds instances are not supported
  • placement groups are not supported
  • hosts can be shared with other ORG accounts using Resource Access Manager (RAM)

    • host own sees all instances running on it, but can only edit ones it owns
    • instances owners, that ar not the host owner, can only see their instances
  • Dedicated Instances

  • no other customers use the same hardware
  • don't own or share the host
  • hourly fee per region regardless of how many dedicated instances are being used
  • for when you have requirements to not share hardware

On-demand, reserved, and spot

  • Reserved
  • for long term consistent usage
  • unused reservation are still billed
  • for a particular type of instance and locked to an AZ or Region
    • reserve based on AZ also reserves capacity
  • can have a partial effect
    • ex) reservation for small T3 instances might partially apply to large T3 instances
  • plans
    • 1 year or 3 year
    • no upfront -> reduce per second fee
    • all upfront -> no per second fee, greatest discount
    • partial upfront
  • scheduled reserved instances
    • ideal for long term usage which doesn't run constantly
    • specify frequency, duration, and time
    • ex) batch processing daily for 5 hours at 23:00
    • slightly cheaper than on demand
    • doesn't support all instance types or regions
    • 1200 hours per year minimum
    • 1 year term minimum
  • capacity reservation

    • when major failure results in lack of available capacity in a region or AZ
    • there is a priority list of which purchase options get available instances first
    • reserved purchases -> on-demand -> spot
    • regional reservation
    • billing discount for valid instances launched in any AZ in that region
    • don't reserve capacity, same priority as on-demand
    • 1 or 3 year term
    • zonal reservation
    • same discount as regional reservation, but only apply to one az
    • capacity reservation in the az
    • 1 or 3 year term
    • on-demand capacity reservation
    • book to enure you always have access to capacity in an az
    • at full on demand price
    • no term limit
    • pay regardless if you consume it
  • Savings Plans

  • hourly commitment for a 1 or 3 year term
  • general plan
    • reservation of general compute $ amount, save up to 66%
    • ec2, fargate, lambda
  • ec2 savings plan
    • up to 72% savings
  • products have an on-demand rate and a savings plane rate
  • get savings plan rate, up to to the amount you commit to

Instance Status Checks & Auto Recovery

  • Instance Status Checks
  • each ec2 instance gets two 2 tests
    • system status
    • loss of system power, network connectivity,, host software/hardware issue
    • instance status
    • corrupted file system, incorrect instance networking, os kernel issue
  • resolution

    • can manually stop and start an instance, restart, or terminate and recreate
  • Auto recovery

  • moves instance to new host with same configs
    • same instance ID, private IP addresses, Elastic IP addresses, and all instance metadata
  • can create a cloudwatch alarm triggered when an instance fails a status check
    • can send message
    • can take action
    • recover (auto-recover), reboot, stop, or terminate
  • works only with instances with ebs volumes attached

  • Termination protection

  • can be enabled in instance settings, to protect from accidental termination
  • can separate permissions to enable and disable termination protection

Horizontal and Vertical Scaling

Scaling is what happens when systems have to grow or shrink depending on changes to the load the experience.

  • Vertical
  • resizing the EC2 instance
  • requires a reboot which can potentially cause disruption
  • generally scale during pre-agreed times (outage windows)
    • limits how quickly you can respond
  • larger instances cost more, cost increase scales faster than size increase
  • upper cap on performance (instance size)
  • no application modification required
  • works for all applications even monoliths

  • Horizontal

  • changes the number of instances
  • require a load balancer
    • to distribute traffic across multiple instances
  • can be shifting across instances
  • requires application support or off-host sessions
  • no disruption when scaling
  • no real limits, keep adding instances
  • often less expensive, no large instance premium
  • more granular, by adding smaller instances

Instance Metadata

  • Service EC2 provides to instances
  • accessible inside all instance
  • at ip http://169.254.169.254/latest/meta-data/
  • data about the instance that can be used to configure or manage the instance
  • info on the environment that the instance is in
  • ex) networking, authentication info, user data
  • not authenticated or encrypted

Bootstrapping

Process where scripts are run when an instance is first launched. To create an instance with a certain configuration. - enabled using user data

User data - accessed via the meta-data ip - http://169.254.169.254/latest/user-data/ - anything in userdata is executed by the instance Os - only executed on launch - ec2 doesn't interpret, the os needs to understand the user data - it's possible to have a bad config and instance to still run - not secure, don't use for passwords or long term credentials - 16KB size limit - for larger, make script download larger file - can be modified when instance stopped

Boot-TimeToService - Basic ami -> launch instance -> manual post launch configuration - Basic ami -> launch instance -> bootstrapping - Baked ami -> launch instance - less flexible - do for time intensive processes

Ideally combine bootstrapping and baked ami

Cloudformation Init (CFN-INIT) - a way to pass complex bootstrap configuration to an instance - simple configuration management system - install packages with version awareness - manipulate groups and users - download sources - create files - run commands and check state - run services - provided with directives via metadata and AS::CloudFormation::init on a CN resource - `works with stack update

CloudFormation CreationPolicy and Signals - inform cloudformation if it has been configured correctly - creation policies create a 'WAIT STATE' on resources - not allowing the resource to move to CREATE_COMPLETE until signalled using the cfn-signal tool - reported to the stack, and updates the instance state (OK or not)

Instance Roles and Profiles

  • Roles that an instance can assume.
  • Anything running in the instance has the permissions granted by the role
  • instance profile : wrapper around an iam role
  • allows permissions to get inside the instance
  • attached to the instance
  • temporary credentials delivered via the instance meta-data
  • iam/security-credentials/role-name
  • creating the role in the console also creates the profile, must create both separate with cloudformation

SSM Parameter` Store

  • storage for configuration and secrets
  • license codes, database strings, full configs and passwords
  • types: String, StringList, SecureString
  • supports hierarchy and versioning
  • plaintext and ciphertext (with KMS)
  • public parameters
  • integrated with IAM to authorize access
  • changes can create events

System and Application Logging on EC2

  • Monitoring inside the instance
  • cloudwatch is for metrics and cloudwatch logs is for logging
  • neither natively capture data inside an instance
  • cloudwatch agent is required with some configuration and permissions
  • software that runs inside the instance and captures os visible data and sends to cloudwatch
  • configuration tells the agent what to do (what data to capture)
  • permissions can be given by attaching a role to the instance
  • one log group for each log to capture. and one log stream in each log group for each instance sending data

Ec2 Placement Groups

Allows you to control where ec2 instances are placed.

  • Cluster Placement Groups (PERFORMANCE)
  • pack instances close together
  • best practices : to let aws find a location with needed capacity
    • launch all instances at the same time
    • use same tyupe of instances
  • single AZ, same rack, sometimes same host, all members have direct connections to each other
    • 10Gbps stream (vs 5 normally)
    • lowest latency and max packets per second (PPS) possible in aws
    • should use enhanced networking for best performance
  • can span VPC peers, but performance will be impacted
  • requires a supported instance type
  • use cases : performance, fast speeds, low latency
  • Spread Placement Groups (Resilience)
  • keep instances separated, different hardware (racks)
    • each rack has its own network and power source
  • can span multiple AZ's, 7 instances per AZ limit
  • not supported for dedicated instances or hosts
  • use case :small number of critical instances that need to be kept separated from each other
  • Partition Placement Groups (Topology Awareness)
  • groups of instances spread apart
  • can span multiple AZ's
  • divided into paritions, max 7 per AZ
  • each partition has its own racks no sharing
  • can launch as many instances as needed, spread across the partitions
  • have awareness of which partition the instance is in
  • use case : huge scale parallel systems

Enhanced Networking & EBS Optimized Instances

Enhanced Networking - SR-IOV : Host has Network Interface Cards (NIC) that are virtualization aware - higher I/O & Lower Host CPU Usage - More Bandwith (fasster network speed) - higher packets per second (PPS) and consistent lower latency

EBS Optimized Instances - dedicated capacity for EBS (doesn't affect data networking) - most instances support, and have it enabled by default - required for high performance EBS volume types