Read_

Getting started with Amazon S3 Bucket

Today I had a small task, that may be common for DevOp guys, but can get clumsy if you are coming from a complementary domain, like programming.

What we want to do is to have a system of backing up our website using S3 bucket as storage medium. Both database and content will be included in the backup.

Since I didn’t use Amazon S3 so far, I wanted to start with the trial offer which kind of fits our needs: 5GB of free storage and plenty of free request / month. Even so, on signup you need to be prepared to fill in your card details just in case you exceed the free plan.

Define your bucket

This is the easiest step. Just go to S3 dashboard, use the button “Create Bucket” and name it as you want as long as the name is unique in the entire S3 domain. Let’s call our bucket “my-s3-bucket” (choose a real name, of course).

When you choose your region please do this by considering the physical distance from where you will want to use this bucket in most of the cases. So for instance we have our servers in Germany so the best option was to choose region “EU (Frankfurt)” with ID “eu-central-1”. You will need this region ID later when configuring AWS CLI. For other regions and ID’s go here.

After defining the bucket for our use case I have defined two folders “db” and “content”. In db folder we will put database daily backups. Content folder, on the other hand, will be managed on an incremental basis, where we add new files from the source, but won’t auto delete. However, from time to time we will do a manual sync with delete where unneeded files will be removed. Why? Because let’s say someone gets access to our servers and deletes the entire content. In this case we don’t want the backup call to delete files from S3 bucket.

Security concerns

What you will first notice is the level of security that Amazon is trying to put on this. They encourage you not to use the main amazon account in order to access different API calls. Even more, they say you will not need to use the main account in most of the cases. Instead they advice you to create users and groups for each specific use case. This way you can control who gets access to what, and you don’t risk compromising the entire account if a user is compromised.

Defining groups is an efficient way to manage the permissions for multiple users at a time. For each group you can assign permissions using predefined policies or define your custom ones. All this setup is made using IAM dashboard in sections Users, Groups and Policies.

Identity and acceess management (IAM) users

I had to create users for each of my team colleagues, and make them part of the new defined “Administrators” group. Policy for the group was set to predefined AdministratorAccess. And necessary I had to define temporary passwords that must be changed on first login.

Now since these users are not regular accounts on amazon there must be a particular place where the login takes place. You can find login URL if you go to Identity and Access Management dashboard.

Another two users we need here, but these are not users with login credentials as they don’t need access to AWS dashboard, rather users with very specific permissions that we will use to upload and read files to/from the bucket.

Read, write and delete policies

The first user has write and delete permissions. In addition to write, this user must have delete rights because we have decided to run from time to time a sync operation that will delete from the bucket the content that is no longer on our website. But on a daily basis we will only add new content, as described in sync operation. Policy for this user is as follow:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::my-s3-bucket/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::my-s3-bucket"
            ]
        }
    ]
}

One particularity here: in order to delete objects form the bucket we had to allow the user to list bucket content. And this right is at the bucket level, that is resource with URI  “arn:aws:s3:::my-s3-bucket”. The other resource where we defined delete and write (put) permissions has a slightly different URI “arn:aws:s3:::my-s3-bucket/“. Here the ending part with / means we are targeting objects in the bucket and not the bucket itself.

The second user will have read permissions, in order to be able to get the database backup dump and update my local vagrant setup with that. Here is the policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::my-s3-bucket/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::my-s3-bucket"
            ]
        }
    ]
}

Also for this case we need to allow ListBucket action at the bucket level, because we want to get a 404 response code when trying to read a file that doesn’t exist. If this permission is missing the response will be 403 (access denied), that can indicate something is misconfigured.

Working with AWS CLI on your server machine

Now that we have finished S3 bucket setup, along with defining the two users with specific permissions, it’t time to get our hands and do some read/write operations to the bucket. While it is possible to work directly with the REST API I consider this a bit low level for our needs, and instead I went to install AWS CLI.

Installing is quite easy if you are running on a Mac or Unix system using python (2.6.5 or higher ) pip package management. Just issue this command “pip install awscli”. If you are on windows download installer from here.

After installing the tool you need to run initial configuration where you will define your access key, secret, region and output format. Do this by running “aws configure“. Now you will need to go back to Amazon IAM console and generate an access key along with a secret for the user we have granted write and delete permissions (see screenshot bellow). You don’t need to download or save these credentials anywhere but only use them at this time. You can later create another access key if you want to replicate the setup on another machine. The region ID is where you have defined your bucket, as explained above.

Using this tool one can call usual unix commands for working with files and folders.

aws s3 cp ${BACKUP_NAME} s3://my-s3-bucket/db/ --storage-class 'STANDARD_IA'

Easy to figure out what this does: will copy the database dump file stored in variable ${BACKUP_NAME} to our s3 bucket in the “db” folder. Here we use standard infrequent access storage option. This will have lower costs with storage, but access cost will be a bit more high, compared to standard storage (check here all storage options).

For the incremental backup that we want for our website content (write but do not delete), there is the sync call:

aws s3 sync ${CONTENT_FOLDER} s3://my-s3-bucket/content/ --storage-class 'STANDARD_IA'

This is very similar with the copy operation described above but it does work with folders and by default it won’t delete any file on the bucket, unless we specify option –delete.

A complete list with possible S3 operations available with AWS CLI can be found here.

Final considerations

To keep things in a safe zone I would strongly encourage you to define a monthly budget and get notified when the budget is exceeded (or close to that). This is because it’s not so easy to figure out how you are charged for different services, especially when you use tools like “aws cli” that exposes operations like sync folders where one call can use hundreds or thousands of PUT requests at once. And of course this will protect you in case some account is compromised and a third party will make abusive requests. You can grant IAM users access to your billing information here in section “IAM User Access to Billing Information” and then granting users an IAM policy like “IAMFullAccess”. But always proceed with caution when security and payment is what you manage.

Written by on March 10, 2017.