Pressbooks and Gutenberg

We’ve been closely following the development of Gutenberg, WordPress’ new block-based editor, since its earliest technical prototypes. With the impending release of Gutenberg in WordPress 5.0 (currently scheduled for November 19), we want to provide an update to the Pressbooks community on our plans for Gutenberg integration.

In brief: we have deactivated Gutenberg for Pressbooks users, so upgrading to WordPress 5.0 will not change your editing experience. We’ve done this in Pressbooks itself, so additional plugins like Classic Editor are not required. We’ve made this decision for several reasons.

1. Gutenberg has continued to change significantly up to the projected release date, without a sufficient period of API and UI stability.

At Pressbooks, our small development team accomplishes a lot with limited resources, but ensuring compatibility with Gutenberg in time for its November 19th launch has proven to be beyond our capacity. The lack of clarity in Gutenberg’s development process has hindered us from integrating Gutenberg into our roadmap. We are now two weeks from its production release, and Gutenberg’s API freeze is not yet complete. We’ve been tracking blocking issues over the last year and a half and have tried to contribute where possible, but ongoing API and user interface changes have made it difficult for us to keep on top of things without neglecting Pressbooks core development, and have made us hesitant to invest our limited resources in building on a codebase that has not yet stabilized. Once WordPress 5.0 is released, we will be able to evaluate a stable version of Gutenberg and map out a plan for integration.

2. Gutenberg lacks functionality that Pressbooks relies upon.

Because Pressbooks is used to produce content for the web and for export, we leverage WordPress custom post statuses to determine whether chapters and other content appear on the web, in export files, or both. (Custom post statuses do not have a full-featured public API in WordPress, but they were the best solution for our use case. This is not the fault of the Gutenberg team—arguably, it’s our fault—but that’s where we are.) We also use a custom post submission UI to more clearly reflect content status for our users. At this point in time, Gutenberg does not support custom post statuses, which is a fundamental issue for us (and others who leverage this functionality). Furthermore, the new post publish panel is not easily modified, and we feel that it’s important to maintain continuity of this interface for our users. Following the release of WordPress 5.0, we will engage on these specific issues with the WordPress and Gutenberg development teams so that we can provide our users with a familiar experience in this area.

3. Gutenberg does not meet Pressbooks’ accessibility standards.

WordPress Accessibility team lead Rian Rietveld’s resignation in early October and the team’s recent report on Gutenberg have brought widespread attention to the accessibility shortcomings of the Gutenberg project. These issues are not new, as some have implied; Rietveld documented many in March of this year, and tests of Gutenberg by Léonie Watson and Sina Bahram highlighted significant problems around the same time. Rietveld’s resignation, the team’s report, and the conversations surrounding them have highlighted flaws in the Gutenberg development process and have underscored how hard it is to retrofit an existing interface if it was not designed with accessibility and inclusivity in mind. We’re currently making accessibility improvement to elements of our core UI that date back to the early days of our product—so we know this from firsthand experience! More importantly, though, we’re continually seeking to expand our own shared understanding of accessibility and inclusive design principles so that new features are inclusive from day one. We’re grateful to Jess Mitchell, Jonathan Hung, and the rest of the team at OCAD’s Inclusive Design Research Centre for their guidance as we work towards these goals.

We acknowledge the assessment of the WordPress Accessibility team, who write that “based on [Gutenberg’s] current status, we cannot recommend that anybody who has a need for assistive technology allow it to be in use on any sites they need to use at this time.” We’re grateful to the many accessibility experts and advocates who have shone a spotlight on accessibility in WordPress, particularly the volunteers who form the WordPress Accessibility team. We’re also grateful to Rachel Cherry and the WPCampus organization, who are funding a much-needed accessibility audit of Gutenberg. As our focus is on continuously improving Pressbooks’ accessibility and all signs suggest that Gutenberg would undermine this goal in its current state, we will be awaiting the results of WPCampus’ audit following the release of WordPress 5.0, and we will evaluate our next steps at that time.

Closing Thoughts

There are parts of the Gutenberg project that we’re pretty excited about. The ability to add and edit complex elements visually could be extremely helpful for book production, allowing educational textboxes and other specialized content to be edited visually while ensuring clean, semantic output on the front end. We’d love to be able to allow shortcodes to be edited visually, which Gutenberg supports. And we’re eager to see what comes of Gutenberg’s Annotation API, which could support expanded integrations with tools like Hypothesis. However, knowing what we know at this point and for the reasons articulated above, we feel that a “wait and see” approach is most prudent for us and our users. We are committed to tracking each of these issues and doing what we can to ensure that authors using our software have the best available tools for the job at hand. We look forward to a future where Gutenberg can be used for making books.

Moving Half a Million Database Tables to AWS Aurora (Part 2)

Quick recap: Migrate half a million database tables from a single bare metal server with 1 database to 101 database slices on AWS Aurora.

Wait, half a million database tables?! Answered in Part 1.

Plan

  1. Stop the server, take an LVM snapshot.
  2. Use mydumper to dump the snapshot to SQL files.
  3. rysnc these to the new server.
  4. Use myloader to load the SQL files into the new databases.

Captain Not So Obvious

Why not setup the Aurora as a replica and then switch over?

Because our MariaDB server was a bare metal box outside of AWS. The read Replica docs imply that MySQL has to already be in AWS for that to work. If that’s not enough this doc says use mysqldump to start, then sync after. This doc also says use mysqldump. All signs point to nope.

Why not DMS?

Answered in Part 1.

Mostly, at the end of the day, because our hosted networks are already on AWS it was simply more cost effective to shut down our freemium site and migrate in one swoop than to have our whole team keep at this for weeks, possibly months.

Epilogue: What About Uploads?

Each book has media library files (GIF, PNG, JPG, EPUB, PDF, etc). A few days before the migration, we copied all files from the production server’s uploads/ directory using rsync:

rsync -avz -e "ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null" someuser@oldpressbooksdotcom:/path/to/uploads/ /path/to/uploads/ --progress

This process took about 10 hours.

Then, on migration day, we ran the same command again with the --delete option to update the new server with the latest files from the old server and remove any files that have been deleted on the old server:

rsync -avz --delete -e "ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null" someuser@oldpressbooksdotcom:/path/to/uploads/ /path/to/uploads/ --progress

Much quicker! (around 7 minutes)

Launch it!

“If we get into the trees it could be rather disastrous, so we’ve got to hit the roses.” – Ken Carter

Scripts from Part 1 (read it already!) were modified to include Slack notifications:

notify() {
  read -d '' payLoad << EOF
  {
    "channel": "#operations",
    "username": "Pressbot",
    "icon_emoji": ":closed_book:",
    "text": "Slice \`${1}\` has been imported on AWS."
  }
EOF

  curl \
    --write-out %{http_code} \
    --silent \
    --output /dev/null \
    -X POST \
    -H 'Content-type: application/json' \
    --data "${payLoad}" "https://hooks.slack.com/services/<SLACK_WEBHOOK_ID>"
}

# Usage

notify $slice

In an effort to reduce downtime we imported slices as soon as they were transferred. Dumping was faster than imports.

Pressbot

Ned working hard:

Ned

All while coding sprint tasks in between.

Things That Went Wrong

We noticed an embarrassing typo in the first few database slices we imported. We had to redo them because renaming a database with tens of thousands of tables in it is not obvious.

I ordered takeout from the wrong fish & chips shop. I had to take a subway 30 minutes to downtown to get it. (Psst Foodora, your geolocation feature sucks!)

Otherwise, nothing. We landed in the roses.

Timeline

  • 8:00: Migration started.
  • 10:40: Database migration started.
  • 19:10: Database migration completed!
  • 19:30: Migration completed.
  • Total time: 11 hours 30 minutes.

DONE. Exciting to turn the page on this. Thanks for reading.

Moving Half a Million Database Tables to AWS Aurora (Part 1)

This post is about migrating Pressbooks.com to AWS.

Does It Scale?

At Pressbooks we use WordPress Multisite as a development platform. Pressbooks changes WordPress and makes every blog a book.

The prevailing wisdom of the day is that a relational database should have a manageable set of tables with lots of rows of data in them …and then there’s WordPress.

WP Multisite creates 10 database tables for every blog (or in our case: for every book.) Tables share IDs but they are not enforced at the database level. There are no foreign key constraints, no triggers, nor routines. It’s simple no frills MySQL.

Pressbooks dot com is currently running on a single bare metal server. (Our hosted instances are already on AWS) This server has a single MariaDB database with 60,000 books in it. When we do the math that’s over 600,000 tables in one database. Are you nuts?! Unusual? Horrible? Yet entirely possible, even plausible.

I’ve met people who worked at Automattic and from the stories, I heard WordPress dot com uses the same WP Multisite technology but instead of half a million tables it’s over a billion tables (probably not all in the same database though, more on that later).

I call this the Schrödinger’s Cat of database design because so long as we don’t look it’s alive?

Prerequisites

As I said, we already host hundreds of WP Multisite networks on AWS. We build, manage, and deploy to our infrastructure using Terraform, Ansible, wp-cli and all things competent. We simply just, sort of, well, neglected to move our freemium site over because we were too busy.

The time has come.

Research

We tried mysqldump. It was too slow. Our tests showed that a dump would take days.

Some colleagues recommended AWS DMS. It did not work.  Some reasons:

  • The AUTO_INCREMENT attribute is not migrated.
  • WP Multisite tables are prefixed wp_1_, wp_2_, wp_3_, ... MySQL considers the underscore a one character wildcard when used in LIKE queries, DMS provided no way to escape it for table filters.
  • Unexplained crashing on anything less than a c4.large (failed last Error The task stopped abnormally Stop Reason RECOVERABLE_ERROR Error Level RECOVERABLE)
  • Mostly, the fastest migration we could get going, running 4 tasks in parallel, was an ETA of 10 days

We asked for help. Ned had a conference call with a reputable consulting firm and they gave us a quote: $34K USD + travel & on-site expenses.

Coffee spitting

Next, our research led us to mydumper. From the README:

== What is mydumper? Why? ==

  • Parallelism (hence, speed) and performance (avoids expensive character set conversion routines, efficient code overall)
  • Easier to manage output (separate files for tables, dump metadata, etc, easy to view/parse data)
  • Consistency – maintains snapshot across all threads, provides accurate master and slave log positions, etc
  • Manageability – supports PCRE for specifying database and tables inclusions and exclusions

So far so good…

== How to build it? ==

Jerry Seinfeld leaving

Just kidding. It turns out we don’t have to build mydumper. On Ubuntu sudo apt install mydumper works fine. Similar command using yum on CentOS.

Our tests conclude that mydumper finishes in hours instead of days.

Plan

It is our opinion that this kind of problem is better suited for a document-oriented database. Given that this is the database design we inherited, there’s not much we can do about it, so we’ll try our best with what we’ve got. ¯_(ツ)_/¯

At a billion tables, Automattic has already established its own internal best practices with plugins like HypderDB. Unfortunately HyperDB doesn’t have Composer support and doesn’t look maintained. LudicrousDB, a Composer compatible drop-in that works with our existing tech stack, to the rescue.

LudicrousDB is an advanced database interface for WordPress that supports replication, fail-over, load balancing, and partitioning, based on Automattic’s HyperDB drop-in.

With LudicrousDB tested and working, we are moving towards a 101 slice approach. 1 slice for core tables and 100 slices for books.

The idea is to use the last two digits of a book ID to pick one of 100 slices. If this becomes unmanageable in the future (important to remember that we already have over half a million tables in 1 database and things are fine), we can change the splitting algorithm by adding a condition to use the last X digits on books with IDs bigger than Y.

Code

For informational purposes only. Read the snippets and reason about them. Copy/paste at your own peril.

LudicrousDB  Callback

/**
 * Slices
 *
 * We can predict what slice a blog is in by looking 
 * at the last two digits of the id. Examples:
 *
 * + blog_id: 9, in db09
 * + blog_id: 74, in db74
 * + blog_id: 999989, in db89
 * + blog_id: 9200, in db00
 *
 * @param $query
 * @param \LudicrousDB $wpdb
 *
 * @return string
 */
function pb_db_callback( $query, $wpdb ) {
  if ( preg_match( "/^{$wpdb->base_prefix}\d+_/i", $wpdb->table ) ) {
    $last_two_digits = (int) substr( $wpdb->blogid, -2 );
    $db = sprintf( 'db%02d', $last_two_digits ); // db00, db01, db02, ..., db99
    return $db;
  } else {
    return 'global';
  }
}
$wpdb->add_callback( 'pb_db_callback' );

Export DB Into 101 Slices:

#!/bin/bash

# This script will CREATE 101 directories in current 
# working directory, you have been warned!

db='old_database_name'

sudo mydumper --regex="^${db}\.wp_[a-zA-Z]+.*" --database="${db}" --outputdir="core" --build-empty-files
for ((i=0; i<=99; i++)); do
  ii=`printf %02d $i`
  sudo mydumper --regex="^${db}\.(wp_${i}_|wp_\d+${ii}_).*" --database="${db}" --outputdir="${ii}" --build-empty-files
done

Import 101 Slices:

#!/bin/bash

# This script will READ 101 directories in current 
# working directory

db='new_database_name'

sudo myloader --directory="core" --database="${db}" --overwrite-tables
for ((i=0; i<=99; i++)); do
  ii=`printf %02d $i`
  sudo myloader --directory="${ii}" --database="${db}_${ii}" --overwrite-tables
done

Did it work?

To be continued in Part 2…

Bonus tips:

Because Pressbooks has so many MySQL tables, the Clients I use are always getting stuck or freezing. Here are some tricks I use to keep sane:

  • Don’t let MySQL Workbench load the table schemas. Set up your GUI so that schemas are in a separate tab, disable autoloading, autocomplete, etc. (Edit ⇨ Preferences ⇨ SQL Editor)
  • Disable MySQL CLI auto-completion with --disable-auto-rehash

Rethinking Book Themes

Pressbooks has built book themes the same way for quite a while, with the only significant change being the switch from CSS to SCSS in Pressbooks 3.0. Shown below is the structure for one of our open-source themes, Clarke (with a couple of omissions that aren’t relevant to this post).

├── export
│   ├── epub
│   │   ├── images
│   │   │   ├── asterisk.png
│   │   │   ├── em-dash.png
│   │   ├── style.scss
│   ├── prince
│   │   ├── images
│   │   │   ├── em-dash.png
│   │   ├── script.js
│   │   ├── style.scss
├── _fonts-epub.scss
├── _fonts-prince.scss
├── _fonts-web.scss
├── _mixins.scss
├── functions.php
├── style.css
├── style.scss
├── theme-information.php

We’re in the midst of a significant re-think of how we build themes, and here’s what it looks like so far.

├── assets
│   ├── images
│   │   ├── em-dash.png
│   │   ├── epub
│   │   │   ├── asterisk.png
│   ├── scripts
│   │   ├── prince
│   │   │   ├── script.js
│   ├── styles
│   │   ├── epub
│   │   │   ├── _fonts.scss
│   │   │   ├── style.scss
│   │   ├── prince
│   │   │   ├── _fonts.scss
│   │   │   ├── style.scss
│   │   ├── web
│   │   │   ├── _fonts.scss
│   │   │   ├── style.scss
│   │   ├── components
│   │   │   ├── _accessibility.scss
│   │   │   ├── _alignment.scss
│   │   │   ├── _colors.scss
│   │   │   ├── _elements.scss
│   │   │   ├── _elements-special.scss
│   │   │   ├── _media.scss
│   │   │   ├── _structure.scss
│   │   │   ├── _titles.scss
│   │   │   ├── _toc.scss
├── functions.php
├── style.css
├── theme-information.php

All three core outputs now keep their assets in the assets folder, with shared assets going in the directory roots of assets/fonts (when needed), assets/imagesassets/scripts and assets/styles. We used to keep the uncompiled web stylesheet in the theme root along with a compiled version, but this caused some confusion. The style.css file in the theme root was never loaded in the web view; since Pressbooks 3.0, we’ve always used a freshly compiled version which is generated when the user changes themes or changes their theme options. But WordPress requires that theme information be stored in the file header of style.css. Our practice moving forward will be to use the style.css file in the root for theme information only, and keep a style.scss file for the web book in assets/styles/web/.

Each of the files in assets/styles/components imports a global components file or file(s) and a variables file for the relevant item(s) from the Pressbooks plugin’s assets/book/styles/ directory (which is loaded by our SCSS compiler). For example, assets/styles/components/_elements.scss might contain the following:

// Elements

$orphans: 3 !default;

// Change variables above this line, using the !default flag to allow overrides.
@import 'variables/elements';

// Add custom SCSS below these imports and includes.
@import 'components/elements/links';
@import 'components/elements/blockquotes';
@import 'components/elements/body';
@import 'components/elements/headings';
@import 'components/elements/lists';
@import 'components/elements/miscellaneous';
@import 'components/elements/paragraphs';
@import 'components/elements/tables';
@include tables();

In this hypothetical theme, all of the default element styles have been imported but the theme developer has changed the orphan property from 1 to 3. (Using the SCSS !default flag allows a variable like this to be overridden by the book user once we overhaul our theme options.)

All of these component files are imported into the EPUB, PDF and web style.scss files, like so:

$type: 'prince';

@import 'fonts';
@import '../components/alignment';
@import '../components/colors';
@import '../components/elements';
@import '../components/specials';
@import '../components/media';
@import '../components/titles';
@import '../components/structure';
@import '../components/toc';

So now we have a book theme with a more coherent structure, comprehensive default variables and an easy method to override them, and lots of possibilities.

We have lots to do to move forward with the implementation of this new theme structure (including backwards compatibility). You can follow along on the theme-structure branch to observe or participate in the implementation process. If you want to get involved in the discussion of these changes, feel free to join in on the relevant GitHub issue!