Moving Half a Million Database Tables to AWS Aurora (Part 2)

Quick recap: Migrate half a million database tables from a single bare metal server with 1 database to 101 database slices on AWS Aurora.

Wait, half a million database tables?! Answered in Part 1.


1. Stop the server, take an LVM snapshot.
2. Use mydumper to dump the snapshot to SQL files.
3. rysnc these to the new server.
4. Use myloader to load the SQL files into the new databases.

A more detailed view of our plan as seen in our migration kanban

Captain Not So Obvious

Why not setup the Aurora as a replica and then switch over?

Because our MariaDB server was a bare metal box outside of AWS. The read Replica docs imply that MySQL has to already be in AWS for that to work. If that’s not enough this doc says use mysqldump to start, then sync after. This doc also says use mysqldump. All signs point to nope.

Why not DMS?

Answered in Part 1.

Mostly, at the end of the day, because our hosted networks are already on AWS it was simply more cost effective to shut down our freemium site and migrate in one swoop than to have our whole team keep at this for weeks, possibly months.

Epilogue: What About Uploads?

Each book has media library files (GIF, PNG, JPG, EPUB, PDF, etc). A few days before the migration, we copied all files from the production server’s uploads/ directory using rsync:

rsync -avz -e "ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null" someuser@oldpressbooksdotcom:/path/to/uploads/ /path/to/uploads/ --progress

This process took about 10 hours.

Then, on migration day, we ran the same command again with the --delete option to update the new server with the latest files from the old server and remove any files that have been deleted on the old server:

rsync -avz --delete -e "ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null" someuser@oldpressbooksdotcom:/path/to/uploads/ /path/to/uploads/ --progress

Much quicker! (around 7 minutes)

Launch it!

“If we get into the trees it could be rather disastrous, so we’ve got to hit the roses.” – Ken Carter

Scripts from Part 1 (read it already!) were modified to include Slack notifications:

notify() {
  read -d '' payLoad << EOF
    "channel": "#operations",
    "username": "Pressbot",
    "icon_emoji": ":closed_book:",
    "text": "Slice \`${1}\` has been imported on AWS."

  curl \
    --write-out %{http_code} \
    --silent \
    --output /dev/null \
    -X POST \
    -H 'Content-type: application/json' \
    --data "${payLoad}" "<SLACK_WEBHOOK_ID>"

# Usage

notify $slice
Slack enhanced scripts.

In an effort to reduce downtime we imported slices as soon as they were transferred. Dumping was faster than imports.

Still slacking!

Ned working hard:

screen -r, control+a control+d, repeat

All while coding sprint tasks in between.

Things That Went Wrong

We noticed an embarrassing typo in the first few database slices we imported. We had to redo them because renaming a database with tens of thousands of tables in it is not obvious.

I ordered takeout from the wrong fish & chips shop. I had to take a subway 30 minutes to downtown to get it. (Psst Foodora, your geolocation feature sucks!)

Otherwise, nothing. We landed in the roses.


  • 8:00: Migration started.
  • 10:40: Database migration started.
  • 19:10: Database migration completed!
  • 19:30: Migration completed.
  • Total time: 11 hours 30 minutes.

DONE. Exciting to turn the page on this. Thanks for reading.

Moving Half a Million Database Tables to AWS Aurora (Part 1)

This post is about migrating to AWS.

Does It Scale?

At Pressbooks we use WordPress Multisite as a development platform. Pressbooks changes WordPress and makes every blog a book.

The prevailing wisdom of the day is that a relational database should have a manageable set of tables with lots of rows of data in them …and then there’s WordPress.

WP Multisite creates 10 database tables for every blog (or in our case: for every book.) Tables share IDs but they are not enforced at the database level. There are no foreign key constraints, no triggers, nor routines. It’s simple no frills MySQL.

Pressbooks dot com is currently running on a single bare metal server. (Our hosted instances are already on AWS) This server has a single MariaDB database with 60,000 books in it. When we do the math that’s over 600,000 tables in one database. Are you nuts?! Unusual? Horrible? Yet entirely possible, even plausible.

I’ve met people who worked at Automattic and from the stories, I heard WordPress dot com uses the same WP Multisite technology but instead of half a million tables it’s over a billion tables (probably not all in the same database though, more on that later).

I call this the Schrödinger’s Cat of database design because so long as we don’t look it’s alive?


As I said, we already host hundreds of WP Multisite networks on AWS. We build, manage, and deploy to our infrastructure using Terraform, Ansible, wp-cli and all things competent. We simply just, sort of, well, neglected to move our freemium site over because we were too busy.

The time has come.


We tried mysqldump. It was too slow. Our tests showed that a dump would take days.

Some colleagues recommended AWS DMS. It did not work.  Some reasons:

  • The AUTO_INCREMENT attribute is not migrated.
  • WP Multisite tables are prefixed wp_1_, wp_2_, wp_3_, ... MySQL considers the underscore a one character wildcard when used in LIKE queries, DMS provided no way to escape it for table filters.
  • Unexplained crashing on anything less than a c4.large (failed last Error The task stopped abnormally Stop Reason RECOVERABLE_ERROR Error Level RECOVERABLE)
  • Mostly, the fastest migration we could get going, running 4 tasks in parallel, was an ETA of 10 days

We asked for help. Ned had a conference call with a reputable consulting firm and they gave us a quote: $34K USD + travel & on-site expenses.

Coffee spitting

Next, our research led us to mydumper. From the README:

== What is mydumper? Why? ==

  • Parallelism (hence, speed) and performance (avoids expensive character set conversion routines, efficient code overall)
  • Easier to manage output (separate files for tables, dump metadata, etc, easy to view/parse data)
  • Consistency – maintains snapshot across all threads, provides accurate master and slave log positions, etc
  • Manageability – supports PCRE for specifying database and tables inclusions and exclusions

So far so good…

== How to build it? ==

Jerry Seinfeld leaving

Just kidding. It turns out we don’t have to build mydumper. On Ubuntu sudo apt install mydumper works fine. Similar command using yum on CentOS.

Our tests conclude that mydumper finishes in hours instead of days.


It is our opinion that this kind of problem is better suited for a document-oriented database. Given that this is the database design we inherited, there’s not much we can do about it, so we’ll try our best with what we’ve got. ¯\_(ツ)_/¯

At a billion tables, Automattic has already established its own internal best practices with plugins like HypderDB. Unfortunately HyperDB doesn’t have Composer support and doesn’t look maintained. LudicrousDB, a Composer compatible drop-in that works with our existing tech stack, to the rescue.

LudicrousDB is an advanced database interface for WordPress that supports replication, fail-over, load balancing, and partitioning, based on Automattic’s HyperDB drop-in.

With LudicrousDB tested and working, we are moving towards a 101 slice approach. 1 slice for core tables and 100 slices for books.

The idea is to use the last two digits of a book ID to pick one of 100 slices. If this becomes unmanageable in the future (important to remember that we already have over half a million tables in 1 database and things are fine), we can change the splitting algorithm by adding a condition to use the last X digits on books with IDs bigger than Y.


For informational purposes only. Read the snippets and reason about them. Copy/paste at your own peril.

LudicrousDB  Callback

 * Slices
 * We can predict what slice a blog is in by looking 
 * at the last two digits of the id. Examples:
 * + blog_id: 9, in db09
 * + blog_id: 74, in db74
 * + blog_id: 999989, in db89
 * + blog_id: 9200, in db00
 * @param $query
 * @param \LudicrousDB $wpdb
 * @return string
function pb_db_callback( $query, $wpdb ) {
  if ( preg_match( "/^{$wpdb->base_prefix}\d+_/i", $wpdb->table ) ) {
    $last_two_digits = (int) substr( $wpdb->blogid, -2 );
    $db = sprintf( 'db%02d', $last_two_digits ); // db00, db01, db02, ..., db99
    return $db;
  } else {
    return 'global';
$wpdb->add_callback( 'pb_db_callback' );

Export DB Into 101 Slices:


# This script will CREATE 101 directories in current 
# working directory, you have been warned!


sudo mydumper --regex="^${db}\.wp_[a-zA-Z]+.*" --database="${db}" --outputdir="core" --build-empty-files
for ((i=0; i<=99; i++)); do
  ii=`printf %02d $i`
  sudo mydumper --regex="^${db}\.(wp_${i}_|wp_\d+${ii}_).*" --database="${db}" --outputdir="${ii}" --build-empty-files

Import 101 Slices:


# This script will READ 101 directories in current 
# working directory


sudo myloader --directory="core" --database="${db}" --overwrite-tables
for ((i=0; i<=99; i++)); do
  ii=`printf %02d $i`
  sudo myloader --directory="${ii}" --database="${db}_${ii}" --overwrite-tables

Did it work?

To be continued in Part 2…

Bonus tips:

Because Pressbooks has so many MySQL tables, the Clients I use are always getting stuck or freezing. Here are some tricks I use to keep sane:

  • Don’t let MySQL Workbench load the table schemas. Set up your GUI so that schemas are in a separate tab, disable autoloading, autocomplete, etc. (Edit ⇨ Preferences ⇨ SQL Editor)
  • Disable MySQL CLI auto-completion with --disable-auto-rehash

Rethinking Book Themes

Pressbooks has built book themes the same way for quite a while, with the only significant change being the switch from CSS to SCSS in Pressbooks 3.0. Shown below is the structure for one of our open-source themes, Clarke (with a couple of omissions that aren’t relevant to this post).

├── export
│   ├── epub
│   │   ├── images
│   │   │   ├── asterisk.png
│   │   │   ├── em-dash.png
│   │   ├── style.scss
│   ├── prince
│   │   ├── images
│   │   │   ├── em-dash.png
│   │   ├── script.js
│   │   ├── style.scss
├── _fonts-epub.scss
├── _fonts-prince.scss
├── _fonts-web.scss
├── _mixins.scss
├── functions.php
├── style.css
├── style.scss
├── theme-information.php

We’re in the midst of a significant re-think of how we build themes, and here’s what it looks like so far.

├── assets
│   ├── images
│   │   ├── em-dash.png
│   │   ├── epub
│   │   │   ├── asterisk.png
│   ├── scripts
│   │   ├── prince
│   │   │   ├── script.js
│   ├── styles
│   │   ├── epub
│   │   │   ├── _fonts.scss
│   │   │   ├── style.scss
│   │   ├── prince
│   │   │   ├── _fonts.scss
│   │   │   ├── style.scss
│   │   ├── web
│   │   │   ├── _fonts.scss
│   │   │   ├── style.scss
│   │   ├── components
│   │   │   ├── _accessibility.scss
│   │   │   ├── _alignment.scss
│   │   │   ├── _colors.scss
│   │   │   ├── _elements.scss
│   │   │   ├── _elements-special.scss
│   │   │   ├── _media.scss
│   │   │   ├── _structure.scss
│   │   │   ├── _titles.scss
│   │   │   ├── _toc.scss
├── functions.php
├── style.css
├── theme-information.php

All three core outputs now keep their assets in the assets folder, with shared assets going in the directory roots of assets/fonts (when needed), assets/imagesassets/scripts and assets/styles. We used to keep the uncompiled web stylesheet in the theme root along with a compiled version, but this caused some confusion. The style.css file in the theme root was never loaded in the web view; since Pressbooks 3.0, we’ve always used a freshly compiled version which is generated when the user changes themes or changes their theme options. But WordPress requires that theme information be stored in the file header of style.css. Our practice moving forward will be to use the style.css file in the root for theme information only, and keep a style.scss file for the web book in assets/styles/web/.

Each of the files in assets/styles/components imports a global components file or file(s) and a variables file for the relevant item(s) from the Pressbooks plugin’s assets/book/styles/ directory (which is loaded by our SCSS compiler). For example, assets/styles/components/_elements.scss might contain the following:

// Elements

$orphans: 3 !default;

// Change variables above this line, using the !default flag to allow overrides.
@import 'variables/elements';

// Add custom SCSS below these imports and includes.
@import 'components/elements/links';
@import 'components/elements/blockquotes';
@import 'components/elements/body';
@import 'components/elements/headings';
@import 'components/elements/lists';
@import 'components/elements/miscellaneous';
@import 'components/elements/paragraphs';
@import 'components/elements/tables';
@include tables();

In this hypothetical theme, all of the default element styles have been imported but the theme developer has changed the orphan property from 1 to 3. (Using the SCSS !default flag allows a variable like this to be overridden by the book user once we overhaul our theme options.)

All of these component files are imported into the EPUB, PDF and web style.scss files, like so:

$type: 'prince';

@import 'fonts';
@import '../components/alignment';
@import '../components/colors';
@import '../components/elements';
@import '../components/specials';
@import '../components/media';
@import '../components/titles';
@import '../components/structure';
@import '../components/toc';

So now we have a book theme with a more coherent structure, comprehensive default variables and an easy method to override them, and lots of possibilities.

We have lots to do to move forward with the implementation of this new theme structure (including backwards compatibility). You can follow along on the theme-structure branch to observe or participate in the implementation process. If you want to get involved in the discussion of these changes, feel free to join in on the relevant GitHub issue!