Computing/storing at scale
Customizable resources
Compute from anywhere
Web-based services, industry applications, ...
💰💰💰
Infinitely customizable resources
Fine-tuning resources to your needs is hard
Academia is now filled with tales of grad students inadvertently racking up thousands of dollars of charges. In reality, there are safeguards that you can put in place. But with everything so highly customizable, hard to know what's important.
I am not a cloud computing expert. I am barely even a novice.
The goals of this lecture are to give you basics of:
awscli
;Everything can also be done from a browser GUI (i.e., the AWS Console), but...
You should have downloaded and configured awscli
for use with your AWS Educate account using these instructions.
I want to give you enough insight that reading further documentation is not too daunting. But keep in mind, most of the use-cases for AWS are still industrial, web-based applications as opposed to research-focused ones. So, for now at least, much of the online help/tutorials are geared towards that crowd. It's a tough nut to crack, but this is the first step in that journey.
awscli
We need to first configure awscli
, so that it knows your AWS credentials.
Login to your AWS Educate.
Click AWS Account and sign into your starter account.
DO NOT REFRESH/LEAVE THIS PAGE FOR THE REST OF THE LECTURE.
Each session is three hours, at which point your credentials reset.
awscli
In the ~/.aws
directory, we need to create two files:
config
:
[default]region = us-east-1output = yaml
credentials
[default]aws_access_key_id=XXXaws_secret_access_key=XXXaws_session_token=XXX
Change XXX
in the above to what you see under "Account Details"/"Show".
credentials
file.Here [default]
specifies that these are the default credentials. It is possible to have multiple sets of credentials and you can often override the default by including a --profile
option to command calls.
awscli
Check that everything is working properly:
$ aws ec2 describe-regions | head -5Regions:- Endpoint: ec2.eu-north-1.amazonaws.com OptInStatus: opt-in-not-required RegionName: eu-north-1
For a standard account, you can use aws configure
aws_session_token
From the docs, EC2 provides virtual computing environments, known as instances, and preconfigured templates for instances, Amazon Machine Images (AMIs).
Typical use case: you need to do some computing and do not have local resources to complete the job.
What will the workflow look like?
ssh
There are three basic things to know about AWS security.
IAM Users
A key pair
A security group
Additionally, you should enable multi-factor authentication.
IAM users are not enabled in AWS Educate starter accounts, but let's pause to discuss what they are and why they're important.
When you sign up for an AWS account, you create a root user, an account that can do whatever it wants in AWS.
It is suggested that you set up IAM users to specifically manage permissions.
If you work for an organization that uses AWS, chances are you will be given IAM credentials to access their resources. This allows them to control what services of AWS you use.
If you are using AWS for personal use, it's still a good idea to follow best practices of accessing resources via an IAM profile. It adds an extra layer of security for you to keep your credit card information separate from the way that you are logging into remote instances.
To login to an AWS instance, you will need a key pair; literally like a key.
We will generate a key pair from the command line as follows:
$ aws ec2 create-key-pair --key-name aws_laptop_key \--query 'KeyMaterial' --output text > \~/.ssh/aws_laptop_key.pem$ chmod 400 ~/.ssh/aws_laptop_key.pem
key-name
you like.~/.ssh
, a common place to store keys.chmod
command makes the key read-only for you.We will further restrict access by selecting what IP addresses can log in to the machine.
Run aws ec2 describe-vpcs
and find VpcId
. It will be something like vpc-2d8c4750
.
$ aws ec2 create-security-group --group-name my-sg \--description "ssh from current IP address" \--vpc-id vpc-2d8c4750
You should see output showing GroupID
with your security group ID.
Here we're setting up a security group for EC2 VPC (as opposed to EC2 classic). See docs for setting up on EC2 classic.
We will add your current IP address to this security group as follows.
$ my_ip=$(curl https://checkip.amazonaws.com)$ echo $my_ip
group-id
to output from previous slide.$ aws ec2 authorize-security-group-ingress \--group-id sg-0af9981424cb188b1 \--protocol tcp --port 22 --cidr $my_ip/32
Unfortunately, this is pretty convoluted stuff. In short, we'll eventually be using ssh
to access our EC2 instance. ssh
connects via tcp
on port 22
, so we're basically telling AWS that this security group allows incoming traffic on that port from your current IP address.
If you were to try to access your instance from a different IP address, then it will not work; you'd need to modify the security group of your instance.
An Amazon Machine Image (AMI) is a template that contains a software configuration.
An instance is a copy of the AMI running as a virtual server in the cloud.
Once an instance is launched, you can sign in to that instance.
There are important technical distinctions between a virtual machine and what Docker is, but some of the key ideas are the same. The AMI tells AWS what you want your environment to look like in terms of OS and software, while the image additionally includes specifications like how much memory and how many CPUs you want to use.
You can also configure instances to launch and run a script automatically, which we may get to in when we talk about awscli
.
There are an incredible amount of AMIs available.
What OS do you want?
Remember: any software that you want beyond what is included in your AMI will need to be installed on the instance.
You can spend a lot of time chasing down compilers etc... AWS is basically handing you an (at least almost) clean, new computer. If you want to run e.g., R, you will need to install it. So you may end up going through similar painful processes as with installing R on WSL or from command line in Mac.
Since we will be spinning up instances from the command line, we will need the ID for the AMI that we'd like to use.
Here are a few useful ones:
ami-0885b1f6bd170450c
(64-bit x86)ami-0947d2ba12ee1ff75
(64-bit x86)ami-098f16afa9edf40be
(64-bit x86)There are also an incredible amount of instance types.
How many CPUs do you want?
How much memory do you want?
Do you need GPUs? Do you need really fast read/write? ...
The T2 series is a good starting point.
t2.micro
available with free tier, but memory is small. t2.large
is what I have often use for moderate computations.t2.micro is too small to be useful for most computing, but you can access it using the free tier, so that's how we'll start exploring AWS. I have found that I need to go up to about a t2.large in order to have sufficient memory to do the types of computing that I need.
We're finally ready to run an instance!
$ aws ec2 run-instances --image-id ami-0885b1f6bd170450c \--count 1 --instance-type t2.large \--key-name aws_laptop_key \--security-group-ids sg-0af9981424cb188b1
key-name
with the key you generatedsecurity-group-ids
with the security group you createdYou will see a long list of output. If needed, hit q
to return to command line.
When you submit that command, AWS provisions your virtual machine.
You can retrieve the public IP address of your virtual machine as follows.
$ public_ip=$(aws ec2 describe-instances \ --query Reservations[*].Instances[*].PublicIpAddress \ --output text)$ echo $public_ip
Note all of these describe-instances
commands will only work if this is your only running instance. Otherwise, multiple IP addresses will be saved.
You can now log in to your machine using ssh
.
$ ssh -i ~/.ssh/aws_laptop_key.pem ubuntu@${public_ip}
ubuntu
is the default user name for Ubuntu images.ec2-user@...
or root@...
.You will see a warning message:
The authenticity of host 'XXX' can't be established.ECDSA key fingerprint is XXX.Are you sure you want to continue connecting (yes/no/[fingerprint])?
yes
and hit enter.You are now computing in the cloud! 😎
You now have a virtual computer in the cloud. What next?
$ instance_id=$(aws ec2 describe-instances \ --query Reservations[*].Instances[*].InstanceId \ --output text)$ aws ec2 terminate-instances --instance-ids $instance_id
After some time, check to make sure State
is terminated
.
$ aws ec2 describe-instances
Install programs, packages, etc... that you need.
E.g., install R
and awscli
on Ubuntu:
$ sudo apt-key adv --keyserver keyserver.ubuntu.com \--recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9$ sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/'$ sudo apt-get update$ sudo apt-get install -y r-base$ sudo apt-get install -y awscli
Anything we have installed via command line can be installed on your image.
The first two lines of the bash
ensure that we are installing the most recent version of R
for Ubuntu. From here.
More docs on installing R
and R Studio server on AWS image.
Once you terminate an image, everything is deleted!
How can we "save our workspace"?
$ aws ec2 create-image --instance-id $instance_id \--name "Ubuntu and R and awscli" --description "Base R and awscli"$ my_ami="ami-0b56ddda752b70ca2"
This will return an ImageID
that you can reference in the --image-id
option of aws ec2 run-instances
.
my_ami
.There are two steps to deleting a self-registered AMI:
Don't run these commands yet. We'll use this AMI later.
De-register the image.
$ aws ec2 deregister-image --image-id $my_ami
$ snapshot_id=$(aws ec2 describe-snapshots --owner-ids self \--query "Snapshots[*].{Id:SnapshotId}" \--output text)$ aws ec2 delete-snapshot --snapshot-id $snapshot_id
We can use scp
to move files to and from image.
From local computer to image (on local computer):
$ scp -i ~/.ssh/aws_laptop_key.pem \path/to/local/file ubuntu@$public_ip:path/to/file/on/image
From image to local computer (on local computer):
$ scp -i ~/.ssh/aws_laptop_key.pem \ubuntu@$public_ip:path/to/file/on/image path/to/local/file
Here $public_ip
is a local variable referencing the public IP address of the image.
We can run a script automatically at start up via user-data
.
Save the contents below in a file called my_user_data.sh
.
#! /bin/bashecho "Hello from AWS" > /home/ubuntu/hello.txt
Now pass in my_user_data.sh
in your run-instances
command.
aws ec2 run-instances --image-id ami-0885b1f6bd170450c \--count 1 --instance-type t2.micro \--key-name new_aws_laptop_key \--security-group-ids sg-0af9981424cb188b1 \--user-data file://my_user_data.sh
After a couple minutes, you can retrieve hello.txt
.
scp -i ~/.ssh/aws_laptop_key.pem \ubuntu@$public_ip:/home/ubuntu/hello.txt .
public_ip
to the address of the running image.If we're careful, we can achieve full reproducibility!
We will soon automate retrieval of results and termination of image.
Simple Storage Service (S3) is a storage platform for AWS.
Buckets are like file folders, but they are accessible via web address.
We can make a bucket using s3 mb
.
aws s3 mb s3://dbenkesers-first-bucket
Many aws s3
commands are similar to bash
.
ls
, mv
, cp
, rm
, ...rb
removes a bucketFor example, we can copy local objects into a bucket.
$ aws s3 cp path/to/local/file s3://dbenkesers-first-bucket
Or, we can retrieve objects from a bucket to current working directory.
$ aws s3 cp s3://dbenkesers-first-bucket/file .
The same mv
and cp
syntax can be used to move files between buckets.
The idea of moving files is the same as with scp
that we saw earlier.
The final demo of the lecture is an automated R
workflow.
R
scripts to run a job locally.bash
script locally thatR
script;One of the complications of using AWS services on AWS instances is getting your AWS credentials to your instance.
Treat your access/secret key like credit card information.
In the next few steps, we define an IAM role for your instance.
Unfortunately, these steps look more complicated than they are...
This is one of the things in AWS that is trivial to do via the Console (GUI), but a bit of a hassle to do via the command line.
Step 1: Create a trust policy, which specifies that AWS account members are allowed to assume a role.
Copy the following text into a file called ec2-role-trust-policy.json
.
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "ec2.amazonaws.com" }, "Action": "sts:AssumeRole" } ]}
More info on the "version" line.
If you get any "contains invalid Json" errors, check try typing the text by hand and using spaces and not tabs.
Step 2: Create an IAM role called s3access
and link to the trust policy.
$ aws iam create-role --role-name s3access \--assume-role-policy-document file://ec2-role-trust-policy.json
Step 3: Create access policy to grant permission for S3 on the instance.
Copy the following text into a file called ec2-role-access-policy.json
.
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": ["s3:*"], "Resource": ["*"] } ]}
Step 4: Attach the access policy to the IAM role.
$ aws iam put-role-policy --role-name s3access \--policy-name S3-Permissions \--policy-document file://ec2-role-access-policy.json
Step 5: Create an instance profile named s3access-profile
and add the s3access
role to the s3access-profile
.
$ aws iam create-instance-profile \--instance-profile-name s3access-profile$ aws iam add-role-to-instance-profile \--instance-profile-name s3access-profile --role-name s3access
We can now launch EC2 instances that have access to S3!
--iam-instance-profile Name="s3access-profile"
in your ec2 run-instances
command. Save this in a file called ec2_script.R
:
#! /usr/bin/Rscriptten_random_normals <- rnorm(10)write.csv(ten_random_normals, file = "/home/ubuntu/output.csv")
Copy to your S3 bucket:
$ aws s3 cp ec2_script.R s3://dbenkesers-first-bucket$ aws s3 ls dbenkesers-first-bucket
Save this in a file called auto_r_user_data.sh
#! /bin/bash# copy script from s3aws s3 cp s3://dbenkesers-first-bucket/ec2_script.R /home/ubuntu# execute scriptchmod +x /home/ubuntu/ec2_script.R/home/ubuntu/ec2_script.R# copy script output to s3aws s3 cp /home/ubuntu/output.csv s3://dbenkesers-first-bucket# turn off instancesudo poweroff
Now we can run an instance with our user data.
$ aws ec2 run-instances --image-id $my_ami \--count 1 --instance-type t2.large \--key-name aws_laptop_key \--security-group-ids sg-0af9981424cb188b1 \--iam-instance-profile Name="s3access-profile" \--user-data file://auto_r_user_data.sh \--instance-initiated-shutdown-behavior terminate
R
and awscli
are installed;s3access
role via iam-instance-profile
;shutdown
via instance-init...-behavior
.After a few minutes, we should see output.csv
in our S3 bucket.
$ aws s3 ls s3://dbenkesers-first-bucket2020-11-12 21:56:07 106 ec2_script.R2020-11-16 17:11:02 232 output.csv
So far we have discussed running EC2 on demand instances, there is a cheaper approach known as spot instances.
Spot instance prices vary over time/region. E.g., currently in us-east-1
,
t2.large
is $0.0928/hour on-demand, whilet2.large
is $0.0278/hour spot instanceWhy not always use spot instances?
There is a middle ground: defined duration spot instances.
t2.large
= $0.046/hour for 1 hour, $0.058/hour for 6 hours.These prices change all the time. These were the prices I saw as of 11/12/2020 at 11AM. You can check the price history of different instance types in different regions in the AWS console.
It seems that spot instances are not supported on AWS Educate starter accounts, but here is info on running them on a regular account.
We will show how to use run-instances
, but you can also use request-spot-instances
.
Save the following contents in a file called spot_options.json
.
{ "MarketType": "spot", "SpotOptions": { "MaxPrice": "0.05", "SpotInstanceType": "one-time", "BlockDurationMinutes": 60 }}
The spot_options
file is in JSON format.
We can request a spot instance as follows.
$ aws ec2 run-instances --image-id ami-0885b1f6bd170450c \--count 1 --instance-type t2.medium \--key-name aws_laptop_key \--security-group-ids sg-0af9981424cb188b1 \--instance-market-options file://spot_options.json
Use aws describe-instances
to get public IP address and login as usual.
Computing/storing at scale
Customizable resources
Compute from anywhere
Web-based services, industry applications, ...
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |