Moving Half a Million Database Tables to AWS Aurora (Part 2)

Quick recap: Migrate half a million database tables from a single bare metal server with 1 database to 101 database slices on AWS Aurora.

Wait, half a million database tables?! Answered in Part 1.

Plan

1. Stop the server, take an LVM snapshot.
2. Use mydumper to dump the snapshot to SQL files.
3. rysnc these to the new server.
4. Use myloader to load the SQL files into the new databases.

A more detailed view of our plan as seen in our migration kanban

Captain Not So Obvious

Why not setup the Aurora as a replica and then switch over?

Because our MariaDB server was a bare metal box outside of AWS. The read Replica docs imply that MySQL has to already be in AWS for that to work. If that’s not enough this doc says use mysqldump to start, then sync after. This doc also says use mysqldump. All signs point to nope.

Why not DMS?

Answered in Part 1.

Mostly, at the end of the day, because our hosted networks are already on AWS it was simply more cost effective to shut down our freemium site and migrate in one swoop than to have our whole team keep at this for weeks, possibly months.

Epilogue: What About Uploads?

Each book has media library files (GIF, PNG, JPG, EPUB, PDF, etc). A few days before the migration, we copied all files from the production server’s uploads/ directory using rsync:

rsync -avz -e "ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null" someuser@oldpressbooksdotcom:/path/to/uploads/ /path/to/uploads/ --progress

This process took about 10 hours.

Then, on migration day, we ran the same command again with the --delete option to update the new server with the latest files from the old server and remove any files that have been deleted on the old server:

rsync -avz --delete -e "ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null" someuser@oldpressbooksdotcom:/path/to/uploads/ /path/to/uploads/ --progress

Much quicker! (around 7 minutes)

Launch it!

“If we get into the trees it could be rather disastrous, so we’ve got to hit the roses.” – Ken Carter

Scripts from Part 1 (read it already!) were modified to include Slack notifications:

notify() {
  read -d '' payLoad << EOF
  {
    "channel": "#operations",
    "username": "Pressbot",
    "icon_emoji": ":closed_book:",
    "text": "Slice \`${1}\` has been imported on AWS."
  }
EOF

  curl \
    --write-out %{http_code} \
    --silent \
    --output /dev/null \
    -X POST \
    -H 'Content-type: application/json' \
    --data "${payLoad}" "https://hooks.slack.com/services/<SLACK_WEBHOOK_ID>"
}

# Usage

notify $slice
Slack enhanced scripts.

In an effort to reduce downtime we imported slices as soon as they were transferred. Dumping was faster than imports.

Pressbot
Still slacking!

Ned working hard:

Ned
screen -r, control+a control+d, repeat

All while coding sprint tasks in between.

Things That Went Wrong

We noticed an embarrassing typo in the first few database slices we imported. We had to redo them because renaming a database with tens of thousands of tables in it is not obvious.

I ordered takeout from the wrong fish & chips shop. I had to take a subway 30 minutes to downtown to get it. (Psst Foodora, your geolocation feature sucks!)

Otherwise, nothing. We landed in the roses.

Timeline

  • 8:00: Migration started.
  • 10:40: Database migration started.
  • 19:10: Database migration completed!
  • 19:30: Migration completed.
  • Total time: 11 hours 30 minutes.

DONE. Exciting to turn the page on this. Thanks for reading.