James Gardner: Home > Blog > 2007 > Amazon EC2 Basics For Python Programmers

Amazon EC2 Basics For Python Programmers

Posted:2007-09-01 17:42
Tags:Python, EC2

I've been very interested in Amazon's Web Services ever since I developed the What Should I Read Next? site for Thoughtplay based on Amazon's book lookup tools so I thought it was time I tried out EC2 and S3 for real.

These are some notes I made as I went along. I hope they are useful, feel free to add corrections in the comments. Bear in mind that because I've done a lot of copying and pasting you shouldn't rely on the instance IDs and DNS names being consistent through the tutorial. Always replace them with your own.

What is Amazon EC2?

Amazon EC2 (which stands for Amazon Elastic Compute Cloud) is a virtual machine hosting service from Amazon which forms a component of Amazon Web Services along with other services like S3 and SQS which provide data storage and message queuing respectively.

Each virtual machine instance runs an Amazon Machine Image which is effectively just a packaged up file system which your instance will boot from when it starts. You are free to choose from a range of AMIs which have been created by Amazon or third parties but it is much more likely you will want to create your own to run your own server software. You can use one of the existing AMIs as a basis for your own one as is described in the Getting Started Guide or create your own from scratch without using any of the supplied tools which is what we'll do later on in this tutorial to fully understand what is going on.

Each EC2 instance predictably provides the equivalent of a system with a 1.7Ghz x86 processor, 1.75GB of RAM, 160GB of local disk, and 250Mb/s of network bandwidth and at the time of writing the pricing is as follows:

Instances
$0.10 per instance-hour consumed (or part of an hour consumed)
Data Transfer In
$0.10 per GB - all data transfer in
Data Transfer Out
$0.18 per GB - first 10 TB / month data transfer out, $0.16 per GB - next 40 TB / month data transfer out, $0.13 per GB - data transfer out / month over 50 TB

You pay only for what you use and there is no minimum fee. Data transfer "in" and "out" refers to transfer into and out of Amazon EC2. Data transferred within the Amazon EC2 environment, or between Amazon EC2 and Amazon S3, is free of charge (i.e., $0.00 per GB). All Amazon S3 storage and request charges still apply.

The key benefit of EC2 over traditional hosting is that you don't need to purchase any hardware. If a service you are running suddenly recieves a lot of traffic you can easily start a few more instances to handle the extra load and equally as importantly once the interest has died down you can just as easily terminate the instances you don't need. You only pay for the time each server is actually running so the set up is very flexible.

Another obvious potential use of EC2 is to solve computer problems. For example if you have a computing problem which lends itself to parallel computing and would require 10 years of processing on one machine then in theory you could start 3654 EC2 instances and solve the same problem in a day before terminating them all again. Because Amazon charges by the instance hour this would cost you the same as running the one server for 10 years but would solve your problem at lot faster. (You'd have to contact Amazon for permission to start this many instances though).

The one big caveat with the whole EC2 setup is that there is no immutable storage. Put another way: if you shutdown your server, terminate your instance, if there is a hardware failure or if Amazon chooses to terminate it themselves (which will happen from time to time) you will lose all the data that was contained on that instance.

The idea instead is that you store all your data in a third party service such as Amazon S3 so that the data will still be available when you load up another server to replace the one which went disappeared.

This makes EC2 completely useless if you are looking to host a small website with a database and perhaps an email facility but it isn't as big a problem as it might first appear for very large multi-server applications because typically they would build in redundancy, backup and restore strategies anyway so when an EC2 instance goes down it is no more or less inconvenient than if a piece of physical hardware had failed.

How Does It Actually Work?

You don't need to understand Amazon's infrastructure to use the service but it is useful to have an idea of what's going on so that you realise how little the tools supplied by Amazon actually do and how much control you have over the process if you want to handle things yourself.

Amazon's EC2 infrastructure is built using a large number of machines based on x86 hardware running Xen. Rather than controlling the instances (which are Xen DomU guests in Xen terminology) from the host machine using the xm command as you would in a traditional Xen setup you control them with an XML web services API. Amazon also provides a set of Java command line tools called ec2-tools which implements the XML web services API so you can control your instances from the command line if you prefer.

To start an instance you use an XML web service API to instruct EC2 to downloads a series of encrypted and compressed 10Mb chunks from Amazon's S3 service for the particular image you wish to use. EC2 then reassembles, decrypts and decompresses the image and boots the operating system. The kernel on your Amazon Machine Image gets replaced with a Xen 2.6.16 kernel compiled with GCC 4.0 because Amazon do not allow custom kernels although you can use custom kernel modules, we'll see how to later on. You are free to use any the images or (AMIs) which Amazon have created or you can use ones from third parties or create your own.

The EC2 infrastructure doesn't give you low level access to Amazon's firewall but instead provides a way to control access to your running instances using security groups. These are managed using an XML web services API. The default security group does not allow any public access to any of the ports of the running instance but you'll learn about how to change this later.

Getting Started

If you've looked thorough the Amazon EC2 Getting Started Guide you could be forgiven for thinking using EC2 is an incredibly complex business. It is actually fairly straightforward but you need to be aware that the Getting Started Guide covers three distinct areas of technology altogether.

If you deal with each of these separately it is much more obvious what is actually going on.

SSH Key Pairs

Amazon has a problem when they distribute their public AMIs for people to use with the EC2 service: if they set a root password on the image then every instance which was started would have the same root password. This means there is a slim chance a malicious (or curious) user could in theory sign into the instance before the person who started it had a chance to change the password. This isn't very good security so instead, all public AMIs only allow sign ins using an SSH key pair.

In a traditional SSH key pair setup you would generate a private key and a public key pair. You'd then add the public key to the /root/.ssh/authorized_keys file on the remote computer. When you SSH into the remote computer you specify the private key to use and if the private key matches the public key you can sign in without needing the root password.

Using an SSH key pair to sign into remote machines is standard practice so if you haven't come across it before it is worth reading some tutorials on the web.

The Amazon EC2 set up is slightly different, because lots of different people will use the same image Amazon can't add everyone's public key to /root/.ssh/authorized_keys. Instead you use one of the XML web services to ask Amazon to generate a key pair for you. Each key pair you generate has a name and Amazon keeps the public key and returns the private key to you to save. When you start an instance you tell Amazon the name of the key pair you want to use and it automatically sets up the instance with the public part of your key pair added to the /root/.ssh/authorized_keys file so only you can sign in. You then SSH in using the private key you saved and are granted access without needing a password.

XML Web Services

All the interactions involved in running instances of your AMI images are handled via a series of XML web services APIs. You can read the API documentation at http://docs.amazonwebservices.com/AWSEC2/2007-03-01/DeveloperGuide/ by clicking "API Reference" then "EC2 Query API". Amazon also provides command line tools in Java which use the web services API but don't do anything clever at all. TO prove it we won't even install them for this tutorial and instead we'll use a Python library called boto to generate the XML for us.

AMI Tools

The real benefit of EC2 comes when you can create your own images. Amazon provides some AMI Tools to help in the setting up, compressing, encryption, splitting up and transferring to and from S3 of your own custom AMIs. The AMI tools are written in Ruby.

Keys and Certificates

In order to get started with EC2 you will need to register for the service. Since this EC2 is still in limited beta you may not get an account straight away. Visit the EC2 homepage and click "Sign Up For This Service" at the top of the right hand column. You'll need an Amazon.com account.

In order to get started you will need the following pieces of information:

Once you've signed up for Amazon EC2 you can get the AWS Access Key ID and AWS Secret Access Key by signing in and moving your mouse over the button labeled "Your Web Services Account". Select the "View Access Key Identifiers" link on the menu that appears.

The AWS Access Key ID and AWS Secret Access Key are necessary for using any of the web services APIs so you can't run an EC2 instance without them. The Amazon account number is useful if you want other people to be able to give you permission to launch their AMIs and the X.509 certificate is for bundling up your images.

We'll discuss each of these as we use it.

Caution!

You should never tell anyone your AWS Secret Access Key and you should be very wary of entering it into programs you don't trust because if someone has both your AWS Access Key ID and your AWS Secret Access Key they can run up huge bills by running their own instances on your account.

Starting an Existing Image

You can read this in conjunction with Amazon's EC2 Getting Started Guide. If you are using Putty you should read the appendix of the guide. This tutorial assumes you are using SSH on Linux.

Setting up an SSH Key Pair

For our interaction with Amazon we are going to use a Python package called Boto but you could equally well build and send the XML required manually.

Get boto and start using it with these commands (you'll need subversion and Python 2.4 installed):

$ svn checkout http://boto.googlecode.com/svn/trunk/ boto
$ cd boto
$ python

Now setup a connection object:

>>> from boto.ec2.connection import EC2Connection
>>> conn = EC2Connection('<aws access key>', '<aws secret key>')

At this point the variable conn will point to an EC2Connection object. In this example, the AWS access key and AWS secret key are passed in to the method explicitly. Alternatively, you can set the environment variables:

AWS_ACCESS_KEY_ID - Your AWS Access Key ID
AWS_SECRET_ACCESS_KEY - Your AWS Secret Access Key

and then call the constructor without any arguments, like this:

>>> conn = EC2Connection()

To generate an SSH key pair for accessing the public AMIs without a root password you do the following:

>>> key_pair = conn.create_key_pair('gsg-keypair')
>>> print key_pair.name
gsg-keypair
>>> print key_pair.fingerprint
1f:51:ae:28:bf:89:e3:d8:1f:25:5d:37:2d:7d:b8:ca:9f:f5:f1:6f
>>> print key_pair.material
-----BEGIN RSA PRIVATE KEY-----
MIIEoQIBAAKCAQBuLFg5ujHrtm1jnutSuoO8Xe56LlT+HM8v/xkaa39EstM3/aFxTHgElQiJLChp
HungXQ29VTc8rc1bW0lkdi23OH5eqkMHGhvEwqa0HWASUMll4o3o/IX+0f2UcPoKCOVUR+jx71Sg
5AU52EQfanIn3ZQ8lFW7Edp5a3q4DhjGlUKToHVbicL5E+g45zfB95wIyywWZfeW/UUF3LpGZyq/
ebIUlq1qTbHkLbCC2r7RTn8vpQWp47BGVYGtGSBMpTRP5hnbzzuqj3itkiLHjU39S2sJCJ0TrJx5
i8BygR4s3mHKBj8l+ePQxG1kGbF6R4yg6sECmXn17MRQVXODNHZbAgMBAAECggEAY1tsiUsIwDl5
91CXirkYGuVfLyLflXenxfI50mDFms/mumTqloHO7tr0oriHDR5K7wMcY/YY5YkcXNo7mvUVD1pM
ZNUJs7rw9gZRTrf7LylaJ58kOcyajw8TsC4e4LPbFaHwS1d6K8rXh64o6WgW4SrsB6ICmr1kGQI7
3wcfgt5ecIu4TZf0OE9IHjn+2eRlsrjBdeORi7KiUNC/pAG23I6MdDOFEQRcCSigCj+4/mciFUSA
SWS4dMbrpb9FNSIcf9dcLxVM7/6KxgJNfZc9XWzUw77Jg8x92Zd0fVhHOux5IZC+UvSKWB4dyfcI
tE8C3p9bbU9VGyY5vLCAiIb4qQKBgQDLiO24GXrIkswF32YtBBMuVgLGCwU9h9HlO9mKAc2m8Cm1
jUE5IpzRjTedc9I2qiIMUTwtgnw42auSCzbUeYMURPtDqyQ7p6AjMujp9EPemcSVOK9vXYL0Ptco
xW9MC0dtV6iPkCN7gOqiZXPRKaFbWADp16p8UAIvS/a5XXk5jwKBgQCKkpHi2EISh1uRkhxljyWC
iDCiK6JBRsMvpLbc0v5dKwP5alo1fmdR5PJaV2qvZSj5CYNpMAy1/EDNTY5OSIJU+0KFmQbyhsbm
rdLNLDL4+TcnT7c62/aH01ohYaf/VCbRhtLlBfqGoQc7+sAc8vmKkesnF7CqCEKDyF/dhrxYdQKB
gC0iZzzNAapayz1+JcVTwwEid6j9JqNXbBc+Z2YwMi+T0Fv/P/hwkX/ypeOXnIUcw0Ih/YtGBVAC
DQbsz7LcY1HqXiHKYNWNvXgwwO+oiChjxvEkSdsTTIfnK4VSCvU9BxDbQHjdiNDJbL6oar92UN7V
rBYvChJZF7LvUH4YmVpHAoGAbZ2X7XvoeEO+uZ58/BGKOIGHByHBDiXtzMhdJr15HTYjxK7OgTZm
gK+8zp4L9IbvLGDMJO8vft32XPEWuvI8twCzFH+CsWLQADZMZKSsB5sOZ/h1FwhdMgCMcY+Qlzd4
JZKjTSu3i7vhvx6RzdSedXEMNTZWN4qlIx3kR5aHcukCgYA9T+Zrvm1F0seQPbLknn7EqhXIjBaT
P8TTvW/6bdPi23ExzxZn7KOdrfclYRph1LHMpAONv/x2xALIf91UB+v5ohy1oDoasL0gij1houRe
2ERKKdwz0ZL9SWq6VTdhr/5G994CK72fy5WhyERbDjUIdHaK3M849JJuf8cSrvSb4g==
-----END RSA PRIVATE KEY-----

In this case we named out key pair gsg-keypair so this is the name we use later when starting an instance.

You need to save the private key to a file. Copy everything including the lines starting ----- into a file named id_rsa-gsg-keypair and then execute the following command to ensure that no-one other than you can read the file:

$ chmod 600 id_rsa-gsg-keypair ; ls -l id_rsa-gsg-keypair
-rw------- 1 james james 1672 2007-08-27 13:10 id_rsa-gsg-keypair

Caution!

You should keep this secret too because other people could sign into your instance over SSH if they had it too. It would be a bit like giving them your password.

Now that you have your key pair you are ready to start an instance.

Images and Instances

An Image object represents an Amazon Machine Image (AMI) which is an encrypted machine image stored in Amazon S3. It contains all of the information necessary to boot instances of your software in EC2.

To get a listing of all available Images:

>>> images = conn.get_all_images()
>>> images
[Image:ami-20b65349, Image:ami-22b6534b, Image:ami-23b6534a,
Image:ami-25b6534c, Image:ami-26b6534f, Image:ami-2bb65342, Image:ami-78b15411,
Image:ami-a4aa4fcd, Image:ami-c3b550aa, Image:ami-e4b6538d, Image:ami-f1b05598]

There are actually 224 images at the time of writing but I've just shown a few for practicality reasons. You can access the location of each image which gives a clue about what they are each for like this:

>>> print images[22].location
ec2-public-images/fedora-core4-base.manifest.xml

Incidentally you can't access these image manifests over the web (which you might have expected to be able to because they are stroed in S3). This particular one is at: http://s3.amazonaws.com/ec2-public-images/fedora-core4-base.manifest.xml but you get an access denied message.

We're only interested in the official images so lets get their positions in the list:

>>> for i, image in enumerate(conn.get_all_images()):
...     if image.location.startswith('ec2-public-images'):
...         print "%s, %s"%(i, image.location)
22, ec2-public-images/fedora-core4-base.manifest.xml
25, ec2-public-images/fedora-core4-mysql.manifest.xml
27, ec2-public-images/fedora-core4-apache.manifest.xml
30, ec2-public-images/fedora-core4-apache-mysql.manifest.xml
32, ec2-public-images/developer-image.manifest.xml
40, ec2-public-images/getting-started.manifest.xml
159, ec2-public-images/demo-paid-AMI.manifest.xml

The numbers will be different for you but you can use them to access the particular image you are after. In this case let's run the Amazon getting started image (index 40 in our list):

>>> image = images[40]
>>> print image.location
ec2-public-images/getting-started.manifest.xml
>>> reservation = image.run(key_name='gsg-keypair')

This will begin the boot process for a new EC2 instance using the keypair we generated. The full set of possible parameters to the run method are:

min_count
The minimum number of instances to launch.
max_count
The maximum number of instances to launch.
key_name
Keypair to launch instances with (either a KeyPair object or a string with the name of the desired keypair.
security_groups
A list of security groups to associate with the instance. This can either be a list of SecurityGroup objects or a list of strings with the names of the desired security groups.
user_data
Data to be made available to the launched instances. This should be base64 encoded according to the EC2 documentation.

The run method returns a Reservation object which represents a collection of instances that are all started at the same time. In this case, we only started one but you can check the instances attribute of the Reservation object to see all of the instances associated with this reservation:

>>> reservation.instances
[Instance:i-6761850e]
>>> instance = reservation.instances[0]
>>> intstance.state
u'pending'

So, we have an instance booting up that is still in the pending state. We can call the update method on the instance to get a refreshed view of it's state:

>>> instance.update()
>>> instance.state
u'pending'

It takes a minute or two for the instance to be started so wait a bit then:

>>> instance.update()
>>> instance.state
u'running'

So, now our instance is running. The time it takes to boot a new instance varies based on a number of different factors but usually it takes less than five minutes.

The simplest way to stop an instance is to use the stop method of the Instance object:

>>> instance.stop()
>>> instance.update()
>>> instance.state
u'shutting-down'
>>> # wait a minute
>>> instance.update()
>>> instance.state
u'terminated'
>>>

You can also use the stop_all() method on the Reservation object:

>>> reservation.stop_all()
>>>

If you just want to get a list of all of your running instances, use the get_all_instances() method of the connection object. Note that the list returned is actually a list of Reservation objects (which contain the Instances) and that the list may include recently terminated instances for a small period of time subsequent to their termination:

>>> instances = conn.get_all_instances()
>>> instances
[Reservation:r-a76085ce, Reservation:r-a66085cf, Reservation:r-8c6085e5]
>>> r = instances[0]
>>> for inst in r.instances:
...    print inst.state
u'terminated'
>>>

Instance Addressing

All Amazon EC2 instances are assigned two IP addresses at launch: a private address, and a public address. The public IP address is directly mapped to the private address through Network Address Translation (NAT). Private addresses are only reachable from within the Amazon EC2 network. Public addresses are reachable from the Internet.

Amazon EC2 also provides an internal DNS name and a public DNS which map to the private and public IP addresses respectively. The internal DNS name is only resolvable from within Amazon EC2. The public DNS name resolves to the public IP address from outside of Amazon EC2, and, currently, resolves to the private IP address from with Amazon EC2.

The Amazon EC2 service provides the ability to dynamically add and remove instances. However, this flexibility can complicate firewall configuration and maintenance which traditionally relies on IP addresses, subnet ranges or DNS host names as the basis for the firewall rules.

The Amazon EC2 firewall allows you to assign your compute resources to user-defined groups and define firewall rules for and in terms of these groups. As compute resources are added to or removed from groups, the appropriate rules are enforced. Similarly, if a group's rules are changed these changes are automatically applied to all members of the affected group.

There are two kinds of IP addresses and DNS names associated with Amazon EC2 instances.

Each instance is assigned a private (RFC1918) address which is allocated by DHCP. This is the only address the operating system knows about. This is the address that should be used when communicating between Amazon EC2 instances. This address is not reachable from the Internet.

Additionally, Amazon EC2 also provides a public (Internet routable) address for each instance using Network Address Translation (NAT). This is the address that must be used from outside the Amazon EC2 network (i.e. the Internet).

Amazon EC2 also provides an internal DNS name and a public DNS name which map to the private and public IP addresses, respectively. The internal DNS name is only resolvable from within Amazon EC2. The public DNS name resolves to the public IP address from outside of Amazon EC2, and, currently, resolves to the private IP address from with Amazon EC2.

If you terminated your instance, start it up again. Once the instance is up and running you can find out its public DNS name like this:

>>> print instance.dns_name
ec2-72-44-40-153.z-2.compute-1.amazonaws.com

and the private DNS name like this:

>>> print instance.private_dns_name
domU-12-31-35-00-53-14.z-2.compute-1.internal

Obviously your DNS names won't be the same as those above. Note the domU part again hints at Xen hosting. The private DNS name can only be accessed by other EC2 machines.

At this point you might think you would be able to access the machine but first you need to set up security groups and access rules.

Try visiting the public DNS name of your running instance in a web browser or via SSH just to convince yourself.

Security Groups

The Amazon EC2 service provides the ability to dynamically add and remove instances. However, this flexibility can complicate firewall configuration and maintenance which traditionally relies on IP addresses, subnet ranges or DNS host names as the basis for the firewall rules.

The Amazon EC2 firewall allows you to assign your compute resources to user-defined groups and define firewall rules for and in terms of these groups. As compute resources are added to or removed from groups, the appropriate rules are enforced. Similarly, if a group's rules are changed these changes are automatically applied to all members of the affected group.

A security group is a named collection of access rules. These access rules specify which ingress, i.e. incoming, network traffic should be delivered to your instance. All other ingress traffic will be discarded.

A group's rules may be modified at any time. The new rules are automatically enforced for all running, as well as for subsequently launched, instances affected by the change in rules.

Note: Currently there is a limit of one hundred rules per group.

Group Membership

When an AMI instance is launched it may be assigned membership to any number of groups.

If no groups are specified, the instance is assigned to the "default" group. This group can be modified, by you, like any other group you have created.

To get a listing of all currently defined security groups:

>>> rs = conn.get_all_security_groups()
>>> print rs
[SecurityGroup:default]

Group Access Rights

The access rules define source based access either for named security groups or for IP addresses, i.e. CIDRs. For CIDRs you may also specify the protocol and port range (or ICMP type/code):

>>> default_security_group = rs[0]
>>> print default_security_group.name
default
>>> default_security_group.rules
[IPPermissions:tcp(0-65535),
 IPPermissions:udp(0-65535),
 IPPermissions:icmp(-1--1)]
>>>

By default, this group allows all network traffic from other members of the "default" group and discards traffic from other IP addresses and groups.

In order to access your instance over the web you will probably want to give everyone access to port 80 (the HTTP port) and port 22 (for SSH). To do this you need to modify a security group and doing so will modify the security permissions for all the instances running in that group. In this case we only have one instance running so this is fine.

You can authorize TCP access between ports 80 and 80 to every IP address like this:

>>> default_security_group.authorize('tcp', 80, 80, '0.0.0.0/0')
True

If you visit http://<your-public-dns-name>/ again you'll be greeted by a welcome message:

                  Congratulations!

You've successfully booted an instance of Fedora Core 4.

Of course it isn't really ideal to be modifying the default group for this purpose because if you start a different instance without specifying the security group it will be assigned the default group too only this time it will have port 80 open. Let's revoke the port 80 access before we forget:

>>> default_security_group.revoke('tcp', 80, 80, '0.0.0.0/0')
True

Instead let's set up a new group called web_test precisely for this testing purpose and let's open ports 80 and 22:

>>> web_test = conn.create_security_group('web_test', 'Our website testing group')
>>> web_test.authorize('tcp', 80, 80, '0.0.0.0/0')
>>> web_test.authorize('tcp', 22, 22, '0.0.0.0/0')

You'll need to stop your instance:

>>> instance.stop()

and start a new one in the web_test security group you can use this command:

>>> reservation = image.run(key_name='gsg-keypair', security_groups=['web_test'])

Your new instance will have a different public DNS name:

>>> print reservation.instances[0].dns_name
ec2-67-202-2-136.z-2.compute-1.amazonaws.com

If you visit this address in your web browser you'll see the same welcome message without having to set up any permissions because this instance is using the web_test security group.

We can now try to SSH into the box. Remember though that no root password is set on the Amazon AMI images so we need to use the key pair authentication we set up earlier. Amazon will have copied the required keys to your instance already so all you need is the private key you copied and pasted into the id_rsa-gsg-keypair file earlier.

Connect like this:

ssh -i id_rsa-gsg-keypair root@ec2-67-202-2-136.z-2.compute-1.amazonaws.com

You'll see a message asking you to check the fingerprint. Enter yes:

The authenticity of host 'ec2-67-202-2-136.z-2.compute-1.amazonaws.com (72.44.47.244)' can't be established.
RSA key fingerprint is f9:9c:3b:f2:f0:75:74:a9:10:5a:8a:18:74:48:63:55.
Are you sure you want to continue connecting (yes/no)?

You'll then get a message saying that the fingerprint has been added:

Warning: Permanently added 'ec2-67-202-2-136.z-2.compute-1.amazonaws.com' (RSA) to the list of known hosts.

Then you'll be shown the EC2 welcome screen:

         __|  __|_  )  Rev: 2
         _|  (     /
        ___|\___|___|

 Welcome to an EC2 Public Image
                       :-)

    Getting Started


    __ c __ /etc/ec2/release-notes.txt

[root@domU-12-31-35-00-52-B4 ~]#

Congratulations, you're in! I always like to ping google to check network connectivity:

[root@domU-12-31-35-00-52-B4 ~]# ping google.com
PING google.com (64.233.167.99) 56(84) bytes of data.
64 bytes from py-in-f99.google.com (64.233.167.99): icmp_seq=0 ttl=244 time=28.9 ms
64 bytes from py-in-f99.google.com (64.233.167.99): icmp_seq=1 ttl=244 time=29.1 ms
64 bytes from py-in-f99.google.com (64.233.167.99): icmp_seq=2 ttl=244 time=29.3 ms

--- google.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 28.938/29.146/29.319/0.157 ms, pipe 2
[root@domU-12-31-35-00-40-92 ~]#

If you hadn't setup the key correctly or had chosen a different private key in the ssh command compared to the keypair name you launched the instance with you would have seen an error like this and would be asked to enter a password. Since there isn't one you wouldn't have got very far!

Warning: Identity file id_rsa-gsg-keypair not accessible: No such file or directory.
root@ec2-67-202-2-136.z-2.compute-1.amazonaws.com's password:

Bundling

We're going to follow the tutorial and create an AMI with a different welcome message.

After using SSH to sign into the EC2 instance run this command, using your own name:

# sed -i -e 's/Congratulations!/Congratulations James!/' /var/www/html/index.html

Check the file has been modified by pressing refresh in your browser to check the message has been updated or looking at the file modification date.

The EC2 image you have already comes with the set of tools you need to bundle an AMI so you don't need to install them yourselves. They are actually a set of Ruby scripts so you don't need to do the Java setup described in the Getting Started Guide to use them. We're going to use the ec2-bundle-vol utilitty to bundle, encrypts and sign the image. First we need to set up a X.509 certificate.

Creating an X.509 certificate

You can create your own X.509 certificate if you know how to but it is much simpler to let Amazon create one for you.

Sign into your Amazon account again, hover over the "Your Web Services Account" button and click the "AWS Access Identifiers" link. At the bottom of this page is a section titled "Your X.509 Certificate". Follow the "Create New" button in this section to create a new X.509 certificate.

You are shown the following message:

You can only have one certificate associated with your AWS account.

If you already have a certificate associated with your AWS account and
create a new certificate, the certificate currently associated with your
AWS account becomes obsolete, and is replaced by the newly created one. Any
requests made to AWS using the obsolete certificate will fail with an
authentication error.

After the certificate files are created, you need to download both the
Certificate and Private Key files in order to use them with your
application or Toolkit.

IMPORTANT: You should store your Private Key file in a secure location. If
you lose your Private Key file you will need to create a new certificate to
use with your account. AWS does not store Private Key Information.

Your Private Key is secret, and should be known only by you. You should
never include your Private Key information in a requests to AWS, except
encrypted as a signature. You should also never e-mail your Private Key
file to anyone. It is important to keep your Private Key confidential to
protect your account.

Once you've created your certificate download the private key (it will be a file starting pk-) and the certificate itself (a file starting cert-). Both files end in .pem.

It is very important you don't loose the file starting pk- because Amazon doesn't keep a copy and you can't regenerate it without creating a new X.509 certificate and the new certificate and key won't work with any of the existing images you have encrypted.

Amazon Account ID

You should use this value whenever you need to provide an EC2 user ID. Staying with the AWS portal page, move your mouse over the button labeled "Your Web Services Account" and select the "Account Activity" link on the menu that appears. At the top of this page, just under the Account Activity title, you should see a label named "Account Number", followed by a hyphenated number. Something like this:

4952-1993-3132

Your AWS account ID, with the hyphens removed is your EC2 user ID. The example above would be 495219933132.

Copy the files to the running EC2 instance so the ec2-bundle-vol command has access to them. Once again you'll need to use your id_rsa-gsg-keypair file to authorise the transfer:

# scp -i id_rsa-gsg-keypair pk-HKZYKTAIG2ECMXYIBH3HXV4ZBZQ55CLO.pem cert-HKZYKTAIG2ECMXYIBH3HXV4ZBZQ55CLO.pem root@ec2-67-202-2-136.z-2.compute-1.amazonaws.com:/mnt
pk-HKZYKTAIG2ECMXYIBH3HXV4ZBZQ55CLO.pem                         100%  717     0.7KB/s   00:00
cert-HKZYKTAIG2ECMXYIBH3HXV4ZBZQ55CLO.pem                       100%  684     0.7KB/s   00:00

Again, you'll need to replace the filenames and host with value appropriate for you.

Make sure the files are uploaded to the /mnt directory because we don't want them included in the final image (remember the private key is supposed to be kept private) so we'll specifically exclude the /mnt directory when generating the image.

You'll now use your AWS Account ID to bundle the volume:

ec2-bundle-vol -d /mnt -k /mnt/pk-HKZYKTAIG2ECMXYIBH3HXV4ZBZQ55CLO.pem -c /mnt/cert-HKZYKTAIG2ECMXYIBH3HXV4ZBZQ55CLO.pem -u 495219933132

The bundler will work away and exclude key directories (including /mnt) until it has eventually finished (it does take a few minutes). After it is finished the output looks like this:

Copying / into the image file /mnt/image...
Excluding:
         /sys
         /dev/shm
         /proc
         /dev/pts
         /net
         /proc/sys/fs/binfmt_misc
         /dev
         /media
         /mnt
         /proc
         /sys
         /mnt/image
         /mnt/img-mnt
1+0 records in
1+0 records out
mke2fs 1.38 (30-Jun-2005)
warning: 256 blocks unused.

Bundling image file...
Splitting /mnt/image.tar.gz.enc...
Created image.part.00
Created image.part.01
Created image.part.02
...
Created image.part.20
Created image.part.21
Created image.part.22
Generating digests for each part...
Digests generated.
Creating bundle manifest...
ec2-bundle-vol complete.

Check everything has gone according to plan and that the manifest and all the parts are there:

[root@domU-12-31-35-00-52-B4 ~]# ls -lh /mnt/image.*
-rw-r--r--  1 root root 4.8K Aug 27 11:17 /mnt/image.manifest.xml
-rw-r--r--  1 root root  10M Aug 27 11:17 /mnt/image.part.00
-rw-r--r--  1 root root  10M Aug 27 11:17 /mnt/image.part.01
-rw-r--r--  1 root root  10M Aug 27 11:17 /mnt/image.part.02
-rw-r--r--  1 root root  10M Aug 27 11:17 /mnt/image.part.03
...
-rw-r--r--  1 root root  10M Aug 27 11:17 /mnt/image.part.20
-rw-r--r--  1 root root  10M Aug 27 11:17 /mnt/image.part.21
-rw-r--r--  1 root root 7.2M Aug 27 11:17 /mnt/image.part.22

As you can see the compressed and encrypted files are about 217MB in size but each part is no bigger than 10Mb. Now you need to upload them.

Uploading the AMI to Amazon S3

All AMIs are loaded from Amazon S3 storage. The newly bundled AMI needs to be uploaded to an existing account on Amazon S3. You'll need an S3 account for this part if you haven't already got one.

S3 stores data objects in buckets, which are similar in concept to directories. You'll need to specify a bucket name in the command below as <your-s3-bucket>. Buckets have globally unique names and are owned by unique users. If you have used S3 before, you can use any of your existing buckets or just give ec2-upload-bundle any name that makes sense to you. The ec2-upload-bundle utility will upload the bundled AMI to a specified bucket. If the specified bucket does not exist it will create it. If the specified bucket belongs to another user ec2-upload-bundle will fail, and you will have to try a different name.

If you've already being playing with S3 you might already have created a bucket. Find out like this:

>>> from boto.s3.connection import S3Connection
>>> conn = S3Connection('<aws access key>', '<aws secret key>')
>>> rs = conn.get_all_buckets()
>>> bucket = rs[0]
>>> print bucket.name
james-music

I've already got a bucket called james-music so I'm going to use that. Notice from the above code how accessing S3 using boto is fairly similar to accessing EC2.

The upload process gives you continuous feedback until the upload has completed. The transfer happens over SSL so other people shouldn't have access to your image.

All AMIs are loaded from S3 so we need to upload the parts to S3. Here's the command you'll need:

# ec2-upload-bundle -b <your-s3-bucket> -m /mnt/image.manifest.xml -a <aws-access-key-id> -s <aws-secret-access-key>
Setting bucket ACL to allow EC2 read access ...
Uploading bundled AMI parts to https://s3.amazonaws.com:443/james-music ...
Uploaded image.part.00 to https://s3.amazonaws.com:443/james-music/image.part.00.
Uploaded image.part.01 to https://s3.amazonaws.com:443/james-music/image.part.01.
Uploaded image.part.02 to https://s3.amazonaws.com:443/james-music/image.part.02.
Uploaded image.part.03 to https://s3.amazonaws.com:443/james-music/image.part.03.
Uploaded image.part.04 to https://s3.amazonaws.com:443/james-music/image.part.04.
Uploaded image.part.05 to https://s3.amazonaws.com:443/james-music/image.part.05.
Uploaded image.part.06 to https://s3.amazonaws.com:443/james-music/image.part.06.
Uploaded image.part.07 to https://s3.amazonaws.com:443/james-music/image.part.07.
Uploaded image.part.08 to https://s3.amazonaws.com:443/james-music/image.part.08.
Uploaded image.part.09 to https://s3.amazonaws.com:443/james-music/image.part.09.
Uploaded image.part.10 to https://s3.amazonaws.com:443/james-music/image.part.10.
Uploaded image.part.11 to https://s3.amazonaws.com:443/james-music/image.part.11.
Uploaded image.part.12 to https://s3.amazonaws.com:443/james-music/image.part.12.
Uploaded image.part.13 to https://s3.amazonaws.com:443/james-music/image.part.13.
Uploaded image.part.14 to https://s3.amazonaws.com:443/james-music/image.part.14.
Uploaded image.part.15 to https://s3.amazonaws.com:443/james-music/image.part.15.
Uploaded image.part.16 to https://s3.amazonaws.com:443/james-music/image.part.16.
Uploaded image.part.17 to https://s3.amazonaws.com:443/james-music/image.part.17.
Uploaded image.part.18 to https://s3.amazonaws.com:443/james-music/image.part.18.
Uploaded image.part.19 to https://s3.amazonaws.com:443/james-music/image.part.19.
Uploaded image.part.20 to https://s3.amazonaws.com:443/james-music/image.part.20.
Uploaded image.part.21 to https://s3.amazonaws.com:443/james-music/image.part.21.
Uploaded image.part.22 to https://s3.amazonaws.com:443/james-music/image.part.22.
Uploading manifest ...
Uploaded manifest to https://s3.amazonaws.com:443/james-music/image.manifest.xml.
ec2-upload-bundle complete

Registering the AMI and Launching the New Image

The final thing you have to do now the parts are uploaded is to register the AMI. Because the image is on S3 now you can do this equally well from the machine you started with using the connection object you instantiated the original image with:

>>> conn.register_image('james-music/image.manifest.xml')
u'ami-ddec09b4'

Once the image is registered you'll be given the AMI identifier, in this case ami-ddec09b4. You can now find the image:

>>> images = conn.get_all_images()
>>> for i, image in enumerate(images):
...     if image.id == u'ami-ddec09b4':
...             print i
...
185
>>> images[185].location

Let's start it up:

>>> reservation = images[185].run(key_name='gsg-keypair', security_groups=['web_test'])
# Wait a minute or two
>>> reservation.instances[0].update()
>>> reservation.instances[0].state
u'running'
>>> reservation.instances[0].dns_name
u'ec2-67-202-2-136.z-2.compute-1.amazonaws.com'

Visit http://<public-dns-name>/ and you should be able to see the new welcome message:

            Congratulations James!

You've successfully booted an instance of Fedora Core 4.

At this point you have successfully created your own EC2 image, bundled it and started an instance of it. That's pretty much everything you need to know to be productive with EC2.

Shutting Everything Down

If you've been following this tutorial you've probably started up quite a few instances so you'll probably want to shut them down to avoid being charged for them longer than you need to be. Here's some code to do that:

>>> for reservation in conn.get_all_instances():
...     for instance in reservation.instances:
...         instance.update()
...         if instance.state == u'running':
...             print "%s, %s"%(instance.id, instance.dns_name)
...             instance.stop()
...
i-c4c428ad, ec2-67-202-2-136.z-2.compute-1.amazonaws.com
i-92c428fb, ec2-72-44-49-114.z-1.compute-1.amazonaws.com

Wait a few minutes and run this again to check everything has shut down successfully.

If you don't want the image you've just created (you'll be charged about $0.03/month for the storage on S3 to keep it) you can delete it.

First deregister it:

>>> conn.deregister_image(u'ami-ddec09b4')
True

Then remove the files from S3 using the S3 connection object we created earlier:

>>> from boto.s3.key import Key
>>> k = Key(bucket)
>>> k.key = 'image.manifest.xml'
>>> k.delete()
>>> for x in range(23):
>>>     k = Key(bucket)
>>>     if x < 10:
>>>         k.key = 'image.part.0'+str(x)
>>>     else:
>>>         k.key = 'image.part.'+str(x)
>>>     k.delete()
>>>

Comments

Memoria de Acceso Aleatorio &raquo; del.icio.us: Cerebelo artificial para robots, Multi-programaciĆ³n Python con Amazon EC2

Posted:2007-09-30 16:57

[...] James Gardner &raquo; Amazon EC2 Basics For Python Programmers (tags: amazon distributed computing python programming) [...] :URL: http://www.faq-mac.com/bitacoras/memoria/?p=758

Software Development Guide

Posted:2007-10-30 02:21

<strong>Software Development Guide...</strong>

I couldn't understand some parts of this article, but it sounds interesting... :URL: http://www.programming.reviewsbargain.com/

oloricelcop

Posted:2007-10-31 17:19

dareltpasrac :URL: http://www.noboccnav.com

Website Templates and Web Design, Graphic Layouts

Posted:2007-11-01 12:42

<strong>Website Templates and Web Design, Graphic Layouts...</strong>

Sorry, it just sounds like a crazy idea for me :)... :URL: http://www.graphics.inbuddy.org/

(view source)

James Gardner: Home > Blog > 2007 > Amazon EC2 Basics For Python Programmers