The process hasn’t been straightforward as information is scattered about in many different places. Here I’ll present an overview of the steps that I’ve taken and where I’ve got to so far.
Having created an AWS account and noted down the access and secret keys, as well as creating the key pair and X.509 certificate, I uploaded the data I’ve been given to analyse into an S3 storage “bucket” which I created through the AWS console. This gave me somewhere to store the data and from which I could easily access it from other Amazon services.
I then launched a vanilla AWS micro instance in the “free tier” of AWS and logged in:
ssh -i mykey.pem firstname.lastname@example.org
where the “x”s represent the public IP of the AWS instance.
Having I logged in I enabled the “Extra Packages for Enterprise Linux” (EPEL) repository by editing /etc/yum.repos.d/epel.repo and changing “enabled” from 0 to 1.
I then ran:
sudo yum update
and updated all the packages before running:
sudo yum install s3cmd
to install the s3cmd tools package from EPEL. These are used to transfer the data that I’d uploaded to my S3 bucket into an Elastic Block Storage (EBS) volume which I could then attach to my EC2 instance.
The advantage of EBS is that it persists after the instance is shutdown which is useful as it means that I can attach it to my HBase instance later.
Back on my home machine I created my news EBS volume with:
ec2-create-volume -s 10 -z eu-west-1a -K privatekey.pem -C certificate.pem –region eu-west-1
which created a 10Gb volume for me (in the same region as my running instance and my S3 bucket). This process returned a volume id. I then attached the volume to my EC2 instance:
ec2-attach-volume -d /dev/sdh -i (instance-id) (volume-id) -K privatekey.pem -C certificate.pem –region eu-west-1
So far so good.
In order to use the EBS volume in our instance though I needed to mount it and make a file system on it. So back on EC2 see if xfs already exists, if not, install the module in the kernel:
grep -q xfs /proc/filesystems || sudo modprobe xfs
To use an XFS file system xfsprogs need to be installed:
sudo yum install -y xfsprogs
Great. Now I needed to mount the volume in the instance:
echo “/dev/sdh /vol xfs noatime 0 0” | sudo tee -a /etc/fstab
sudo mkdir -m 000 /vol
sudo mount /vol
and I’m good to go. I made a new directory for my data under /vol, and made it accessible to the ec2-user account:
sudo mkdir mydata
sudo chown ec2-user mydata/
sudo chgrp ec2-user mydata/
Now I could copy the data over from S3. First I needed to configure the S3 tools:
and added my acces and secret keys. Then, finally, I could copy the data over so it’s accessible in EC2:
s3cmd get s3://mybucket/data.tar.gz .
My experience of AWS is that it has not been possible to do this all in the “free tier” so it is costing me a bit of money, though not too much a present (around $10).
That’s quite a lot of information so I’ll stop here. I hope that someone find it useful! If you do, please let me know.
In the next post I’ll describe how to get the data into HBase.