The Amazon cloud, explained.

Home / Blog / Storing data in EC2 images

In general, there are 3 ways of storing data in an EC2 instance that persists after terminating it:

  1. Bundled AMIs
  2. EBS volumes
  3. S3

Most people use a combination of these 3; I’ll describe the basic use cases for each:

Let’s use a Windows EC2 instance as an example. Suppose you want to host a web site on your instance.

The first thing you do is install IIS on your instance. As you probably know, if you terminate this instance, the installation you did prior will be gone.

In order to keep this installation, you need to bundle it into an AMI. As you’ve seen, the console makes quick work of this, you can right-click and in a few minutes you will have your AMI.

When you start up that AMI, the installation of IIS you made will still be there (note - only the C:\ is saved, not the D:\).

If you put your web files in the default IIS directory (i.e. C:\inetpub\wwwroot), then they will be saved as well.

So, you can save everything you need by just bundling. But, there are 2 drawbacks to only doing bundling:

  1. Bundles are limited to 10GB in total size. This includes the base operating system. So, you have a fixed limit to how much data you can put in your instance.
  2. The only way to save changes to your instance is by bundling again. Bundling is pretty easy, but probably not something you want to often (i.e. daily) basis.

What is really common to do is to bring EBS volumes into the mix. As you’ve seen, you can mount an EBS volume like a regular hard drive and see it as E:\ or something.

What a lot of people then do is use EBS volumes to store their data - so you could use set IIS to use E:\inetpub\wwwroot instead. Any change you make to data in an EBS volume is saved pretty much immediately.

It is probably more common to see databases being hosted on EBS - so I might configure SQL Server to store its databases on E:\ instead of C:\. Any change to the database is permanently saved.

So, EBS gives you a few advantages:

  1. No need to re-bundle to save instance data
  2. No 10GB limit, you can store as much as you want.

But there are a few drawbacks with EBS as well.

1. You pay for an EBS volume based on allocation. When you create an EBS volume of 100 GB for example, you immediately start paying for that space, whether you actually have data stored there are not.

2. EBS volumes are conceptually like USB-powered portable hard drives. They are easy to plug-into a computer (or instance), but you can’t share them with another computer simultaneously. You can duplicate the contents of an EBS volume by creating a snapshot and then creating a new volume based on that snapshot, but the sharing is not in real-time.

The last option is S3.

With S3, you put your data into buckets that are Internet-accessible. So, what you might do is put your web site contents in a .zip file in a S3 bucket. When an instance starts up, it is configured to pull down that data from S3 and unzip it onto its local (or EBS) volume.

With S3, you get a few advantages:

1. You only pay for the data you store; there is no concept like the pre-allocation in EBS, you just pay for what you upload into S3.

2. You can share data between instances in real-time (or near real-time, look up eventual consistency as it applies to AWS).

But, with S3 there are some drawbacks too:

1. You are making HTTP requests to get data, so it will be quite a bit slower relatively speaking than data in C:\ or in EBS.

2. Since you are making HTTP requests, it is not as simple as referencing a regular volume label like C:\ or E:\. Though, there are some plug-ins that map S3 buckets to volumes.

As you can probably see, each technology has its own level of convenience, cost and capability. Most people tend to use a combination of all 3 to get the results they are looking for.

Thanks!

Eric.

One Response to “Storing data in EC2 images”

  1. Christian Palm Says:

    Great post! A nice feature would be if we could attach EBS volumes in read-only mode to more instances, and attach with read/write.
    Then we could easily share data between instances. (website staging, load balancing)