Using EBS and S3 on Amazon EC2

Over the past couple of days I’ve been getting up to speed with Amazon Web Services (AWS). The aim is to run HBase on EC2 to store a large amount of data that I will later query with cascalog.

The process hasn’t been straightforward as information is scattered about in many different places. Here I’ll present an overview of the steps that I’ve taken and where I’ve got to so far.

Having created an AWS account and noted down the access and secret keys, as well as creating the key pair and X.509 certificate, I uploaded the data I’ve been given to analyse into an S3 storage “bucket” which I created through the AWS console. This gave me somewhere to store the data and from which I could easily access it from other Amazon services.

I then launched a vanilla AWS micro instance in the “free tier” of AWS and logged in:

ssh -i mykey.pem ec2-user@ec2-xx-xxx-xxx-xxx.eu-west-1.compute.amazonaws.com

where the “x”s represent the public IP of the AWS instance.

Having I logged in I enabled the “Extra Packages for Enterprise Linux” (EPEL) repository by editing /etc/yum.repos.d/epel.repo and changing “enabled” from 0 to 1.

I then ran:

sudo yum update

and updated all the packages before running:

sudo yum install s3cmd

to install the s3cmd tools package from EPEL. These are used to transfer the data that I’d uploaded to my S3 bucket into an Elastic Block Storage (EBS) volume which I could then attach to my EC2 instance.

The advantage of EBS is that it persists after the instance is shutdown which is useful as it means that I can attach it to my HBase instance later.

Back on my home machine I created my news EBS volume with:

ec2-create-volume -s 10 -z eu-west-1a -K privatekey.pem -C certificate.pem –region eu-west-1

which created a 10Gb volume for me (in the same region as my running instance and my S3 bucket). This process returned a volume id. I then attached the volume to my EC2 instance:

ec2-attach-volume -d /dev/sdh -i (instance-id) (volume-id) -K privatekey.pem -C certificate.pem –region eu-west-1

So far so good.

In order to use the EBS volume in our instance though I needed to mount it and make a file system on it. So back on EC2 see if xfs already exists, if not, install the module in the kernel:

grep -q xfs /proc/filesystems || sudo modprobe xfs

To use an XFS file system xfsprogs need to be installed:

sudo yum install -y xfsprogs

Great. Now I needed to mount the volume in the instance:

echo “/dev/sdh /vol xfs noatime 0 0” | sudo tee -a /etc/fstab

sudo mkdir -m 000 /vol

sudo mount /vol

and I’m good to go. I made a new directory for my data under /vol, and made it accessible to the ec2-user account:

cd /vol

sudo mkdir mydata
sudo chown ec2-user mydata/
sudo chgrp ec2-user mydata/
cd mydata/

Now I could copy the data over from S3. First I needed to configure the S3 tools:

s3cmd –configure

and added my acces and secret keys. Then, finally, I could copy the data over so it’s accessible in EC2:
s3cmd get s3://mybucket/data.tar.gz .

My experience of AWS is that it has not been possible to do this all in the “free tier” so it is costing me a bit of money, though not too much a present (around $10).

That’s quite a lot of information so I’ll stop here. I hope that someone find it useful! If you do, please let me know.

In the next post I’ll describe how to get the data into HBase.

Resources used:

http://aws.amazon.com/amazon-linux-ami/faqs/#epel

http://docs.amazonwebservices.com/AWSEC2/latest/CommandLineReference/ApiReference-cmd-CreateVolume.html

http://s3tools.org/s3cmd

http://biggdata.blogspot.com.es/2011/03/amazon-ec2-and-s3-shenanigans.html

http://aws.amazon.com/articles/1663

http://www.manamplified.org/archives/2008/03/notes-on-using-ec2-s3.html

https://github.com/datawrangling/trendingtopics#readme

Advertisements

About simonholgate

I'm CEO of Sea Level Research Ltd (www.sealevelresearch.com) - a Liverpool, UK based startup that uses machine learning to predict sea level surges and optimise shipping movements into and out of port. I'm an oceanographer and I'm also a Clojure developer who is interested in democracy and Big Data.
This entry was posted in AWS. Bookmark the permalink.

One Response to Using EBS and S3 on Amazon EC2

  1. Pingback: Importing data to HBase « Simon Holgate's Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s