Saturday, December 05, 2009

Syncing your Snow Leopard Mac with Amazon S3

First a bit of history, I have had an Amazon S3 account for a while now, and as per another blog entry, I've been trying to put it to good use in concert with my MacBook. In the past I've tried to get MacFUSE with s3fs tools to work, however that wasn't so easy and haven't found any evidence on the web that anyone has been successful in getting this to work (unless they're not telling of course).

I did find some people having success with command line tools such as s3sync (a Ruby based script) and s3cmd, which is also ruby based.

The article that got me going on this solution is found here.


 

What does it do?

Ideally what I wanted is mount an S3 bucket as a drive that is available as an icon in the finder. Much like what you get when you have a MobileMe account and get iDisk. The solution presented here does not give the same functionality as that, but its close.

What this solution does is sync a folder of your choice to your S3 bucket of choice. The way it does this is using an upload script that syncs the content of your local folder of choice (this is a shell script that calls the s3sync tool). This script is ran by Launchd (based on a proper plist file to configure it) whenever it sees a change is made in your folder of choice.

Unfortunately this does not take into account any changes made in nested folders…

At the moment the only way to sync the whole folder is triggering it by putting a file in your folder of choice which you can then later be deleted. Another solution might be to have a script being run every 5 mins or so, based on whether there is network connectivity. On Linux it could be made more sophisticated with the use of inotywait. Apparently on Mac OS there is an equivalent going by the name kqueue, but I have yet to find out how to work this in a shellscript (if even possible).

Until I've found a better way, this is the way I do an automated full recursive folder sync.


 

Prerequisites

  1. An Amazon S3 account, which once create will give you an access ID and a secret key you need in the programs below. If you don't have one, you can apply for it here.
  2. A copy of s3sync, get it here and also take the time to read the README file.


 

Optionally

  1. s3cmd which you can get here. This command line utility is handy to check your work as you progress.
  2. Apple's Property List Editor (comes with the XCode developer tools, which you find on your installation DVD). This come in handy to create a plist config file for LAUNCHD. You can also get a shareware copy of a similar program here.


 

The optional tools are not required to get this solution going, but they do come in handy when debugging your work and for creating/editing plist files easily (this can also be done using a standard command line texteditor such as nano or pico).


 

Ok, so how do we put all this stuff together?

First of all you need to do all this on the command line, which you get by running the Terminal application (found in your Utilities folder which is in your Application folder).


 

Step 1; Setting up s3sync

First of all you need to download the tool. I have installed the tool in my home directory.

$ wget http://s3.amazonaws.com/ServEdge_pub/s3sync/s3sync.tar.gz

$ tar xvzf s3sync.tar.gz

The 2 commands above will create a directory called s3sync. You can now clean up (remove the zip file you downloaded) like so:

$ rm s3sync.tar.gz


 

The following 2 steps are optional, if you want your up/downloads to be encrypted through SSL. You'll need to download some certificates for SSL to work. First create a directory in your s3sync directory to store the certificates, like so:

$ cd s3sync

$ mkdir certs

$ cd certs

$ wget http://mirbsd.mirsolutions.de/cvs.cgi/~checkout~/src/etc/ssl.certs.shar


 

The run the following commands

$ sh ssl.certs.shar

$ cd ..

These command install the certs and get you back into the s3sync directory.


 

Right now would be a good idea to create the folder you are going to sync. I created one in my home folder that I called "backup".

Also create a bucket on Amazon S3 and in that bucket, create a directory that you'll use as a target for the syncing. Name that directory the same as your folder of choice (of course, no spaces in the folder name).


 

Next you need to create two shell scripts.


 

The first can be called upload.sh (or whatever you prefer) with the following content:

#!/bin/bash 

# script to upload local directory upto s3 

cd /path/to/yourshellscript/ 

export AWS_ACCESS_KEY_ID=yourS3accesskey export AWS_SECRET_ACCESS_KEY=yourS3secretkey export SSL_CERT_DIR=/your/path/to/s3sync/certs 

ruby s3sync.rb -r –ssl --no-md5 --delete ~/backup/ syncbucket:backup

# copy and modify line above for each additional folder to be synced


 

The second script can be called download.sh and is the same as the script you created above, so do a cp upload.sh download.sh and simply change the last line as follows:

ruby s3sync.rb -r –ssl --no-md5 –delete –make-dir syncbucket:backup ~/ 


 

So the only difference being the source and destination of the sync having been swapped.


 

The final thing to do for security (don't want anyone to peek at your Amazon key and secret):

$ chmod 700 upload.sh 

$ chmod 700 download.sh


 

This ensures that only you (your account rather) can read these files.


 


 

Step 2; Creating a Launchd job to keep your directory synced

This part of the solution was inspired by a rather old, but still relatively useful posting here that explains the workings of Launchd and whose example is exactly what we need for this solution.


 

Basically we'll need to create a .plist file that we will place in a folder that may not yet exist:

Your home folder/Library/LaunchAgents


 

If it doesn't, create it now.


 

Then we create the .plist file in this directory as follows. You can use nano for instance and name it your.domain.s3sync.plist

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">

<plist version="1.0">

<dict>

<key>Label</key>

<string>your.domain.s3sync</string>

<key>ProgramArguments</key>

<array>

<string>/Users/youraccount/s3sync/upload.sh</string>

</array>

<key>WatchPaths</key>

<array>

<string>/Users/youraccount/backup</string>

</array>

</dict>

</plist>


 

The last thing you need to do is tell launchd to load this script and go watch that folder of choice. This is done like so:


 

$ launchctl load ~/Library/LaunchAgents/your.domain.s3sync.plist


 

And you're all set!


 

Improvements


 

  1. Recursive triggering of the script
  2. More secure setting of the environment variables (currently contained in the script, which is only readable by your user account)


 


 


 

No comments: