Blog

Pressbooks 4.4.0, Pressbooks Book 1.11.0, Clarke 2.1.0, and Donham 1.7.0

We tagged Pressbooks 4.4.0Pressbooks Book 1.11.0, Clarke 2.1.0 and Donham 1.7.0 on GitHub today and deployed them across our hosted networks. Here’s what’s changed:

Pressbooks 4.4.0

NOTICE: Pressbooks >= 4.4 requires WordPress 4.9.

  • [FEATURE] You can now assign Thema subject categories to your book on the Book Information page (see #978).
  • [FEATURE] Part slugs are now editable (props to @colomet for the suggestion; see 3f5eca2).
  • [CORE ENHANCEMENT] Pressbooks now uses WordPress’ included CodeMirror scripts and styles for our Custom Styles editor (see #980).
  • [CORE ENHANCEMENT] Added the pb_global_components_path filter which lets book themes override the global components path to point to their own bundled components libraries (see #982).
  • [CORE ENHANCEMENT] Added the pb_pre_export action to allow tweaks prior to an export routine (see 5302eea).
  • [CORE ENHANCEMENT] Our app() function now matches Laravel 5.4’s function signature (see cdcb9e8).
  • [FIX] Importing a Word document with multiple images now works properly (props to @rootl for the bug report; see #288 and #977).
  • [FIX] Chapters will now correctly inherit their book’s license in the API (see #979).
  • [FIX] Chapters will no longer show raw content in the API if they are password-protected (see #975).
  • [FIX] Uploading an image to the user catalog no longer causes an error (props to @emasters for the bug report; see #983).

Pressbooks Book 1.11.0

  • [FEATURE] Add parameter to pressbooks_copyright_license() to allow hiding custom copyright license (see #50).
  • [CORE ENHANCEMENT] Remove WordPress generator meta tag (see 6c621ad).

Clarke 2.1.0

  • Preparations for Book 2.0 compatibility.

Donham 1.7.0

  • Preparations for Book 2.0 compatibility.

Moving Half a Million Database Tables to AWS Aurora (Part 2)

Quick recap: Migrate half a million database tables from a single bare metal server with 1 database to 101 database slices on AWS Aurora.

Wait, half a million database tables?! Answered in Part 1.

Plan

1. Stop the server, take an LVM snapshot.
2. Use mydumper to dump the snapshot to SQL files.
3. rysnc these to the new server.
4. Use myloader to load the SQL files into the new databases.

A more detailed view of our plan as seen in our migration kanban

Captain Not So Obvious

Why not setup the Aurora as a replica and then switch over?

Because our MariaDB server was a bare metal box outside of AWS. The read Replica docs imply that MySQL has to already be in AWS for that to work. If that’s not enough this doc says use mysqldump to start, then sync after. This doc also says use mysqldump. All signs point to nope.

Why not DMS?

Answered in Part 1.

Mostly, at the end of the day, because our hosted networks are already on AWS it was simply more cost effective to shut down our freemium site and migrate in one swoop than to have our whole team keep at this for weeks, possibly months.

Epilogue: What About Uploads?

Each book has media library files (GIF, PNG, JPG, EPUB, PDF, etc). A few days before the migration, we copied all files from the production server’s uploads/ directory using rsync:

rsync -avz -e "ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null" someuser@oldpressbooksdotcom:/path/to/uploads/ /path/to/uploads/ --progress

This process took about 10 hours.

Then, on migration day, we ran the same command again with the --delete option to update the new server with the latest files from the old server and remove any files that have been deleted on the old server:

rsync -avz --delete -e "ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null" someuser@oldpressbooksdotcom:/path/to/uploads/ /path/to/uploads/ --progress

Much quicker! (around 7 minutes)

Launch it!

“If we get into the trees it could be rather disastrous, so we’ve got to hit the roses.” – Ken Carter

Scripts from Part 1 (read it already!) were modified to include Slack notifications:

notify() {
  read -d '' payLoad << EOF
  {
    "channel": "#operations",
    "username": "Pressbot",
    "icon_emoji": ":closed_book:",
    "text": "Slice \`${1}\` has been imported on AWS."
  }
EOF

  curl \
    --write-out %{http_code} \
    --silent \
    --output /dev/null \
    -X POST \
    -H 'Content-type: application/json' \
    --data "${payLoad}" "https://hooks.slack.com/services/<SLACK_WEBHOOK_ID>"
}

# Usage

notify $slice
Slack enhanced scripts.

In an effort to reduce downtime we imported slices as soon as they were transferred. Dumping was faster than imports.

Pressbot
Still slacking!

Ned working hard:

Ned
screen -r, control+a control+d, repeat

All while coding sprint tasks in between.

Things That Went Wrong

We noticed an embarrassing typo in the first few database slices we imported. We had to redo them because renaming a database with tens of thousands of tables in it is not obvious.

I ordered takeout from the wrong fish & chips shop. I had to take a subway 30 minutes to downtown to get it. (Psst Foodora, your geolocation feature sucks!)

Otherwise, nothing. We landed in the roses.

Timeline

  • 8:00: Migration started.
  • 10:40: Database migration started.
  • 19:10: Database migration completed!
  • 19:30: Migration completed.
  • Total time: 11 hours 30 minutes.

DONE. Exciting to turn the page on this. Thanks for reading.

Pressbooks 4.3.5 and Pressbooks Book 1.10.5

We tagged Pressbooks 4.3.5 and Pressbooks Book 1.10.5 on GitHub today and are now deploying them across our hosted networks. Here’s what’s changed:

Pressbooks 4.3.5

NOTICE: Pressbooks >= 4.3.3 requires WordPress 4.8.2.
NOTICE: Users of the Pressbooks Custom CSS theme must upgrade to Pressbooks Custom CSS 1.0 for compatibility with Pressbooks >= 4.3.0.

  • [CORE ENHANCEMENT] Use Laravel Container instead of Pimple as our service container; add Laravel Blade support for future templated outputs (see #831, #962, and #970).
  • [FIX] Content imported from EPUB is now ordered by spine order instead of manifest order (props to @hakkim-pits; see #442 and #968).
  • [FIX] Custom styles are no longer sanitized in ways that improperly encode characters (see #972).
  • [FIX] Sanitize body font size PDF theme option as a float instead of an integer to allow more size options (see #969).
  • [FIX] home_url is now used instead of site_url when linking to front-end content (see #971; reference: roots/bedrock#316).
  • [FIX] Shortcodes will now be cloned as is to preserve more footnote and LaTeX data (see #973).
  • [FIX] Special characters in a book title will no longer lead to filename issues under certain circumstances (see #974).

Pressbooks Book 1.10.15

  • [FIX] Added cache busting to ensure that custom styles are loaded after save (see #46).

Seeking: Junior Front-end Designer/Developer for 2 month+ contract

We’re looking for a junior front-end web designer/developer for a 2 month contract (in Montreal) that could lead into a full-time job with for Pressbooks, an open source web-based book publishing platform.

Ideally, you’ve worked with WordPress, know your way around CSS and SASS, and have a particular interest in books, reading, & writing. We hope you’ve got some good design sense, and think designing real print and web books using HTML & CSS sounds pretty cool. We are a small, diverse team, and you’ll be working closely with the product manager, dev team, and marketing and comms team. We have one of the best views of Montreal, from our 12th floor windows in Mile End, Montreal.

Deets:

  • Experience: 2yrs of WordPress, CSS + HTML (SASS a bonus)
  • Rate: ask us!
  • Make it happen: send us an email with some info about yourself here: jobs@pressbooks.com

Moving Half a Million Database Tables to AWS Aurora (Part 1)

This post is about migrating Pressbooks.com to AWS.

Does It Scale?

At Pressbooks we use WordPress Multisite as a development platform. Pressbooks changes WordPress and makes every blog a book.

The prevailing wisdom of the day is that a relational database should have a manageable set of tables with lots of rows of data in them …and then there’s WordPress.

WP Multisite creates 10 database tables for every blog (or in our case: for every book.) Tables share IDs but they are not enforced at the database level. There are no foreign key constraints, no triggers, nor routines. It’s simple no frills MySQL.

Pressbooks dot com is currently running on a single bare metal server. (Our hosted instances are already on AWS) This server has a single MariaDB database with 60,000 books in it. When we do the math that’s over 600,000 tables in one database. Are you nuts?! Unusual? Horrible? Yet entirely possible, even plausible.

I’ve met people who worked at Automattic and from the stories, I heard WordPress dot com uses the same WP Multisite technology but instead of half a million tables it’s over a billion tables (probably not all in the same database though, more on that later).

I call this the Schrödinger’s Cat of database design because so long as we don’t look it’s alive?

Prerequisites

As I said, we already host hundreds of WP Multisite networks on AWS. We build, manage, and deploy to our infrastructure using Terraform, Ansible, wp-cli and all things competent. We simply just, sort of, well, neglected to move our freemium site over because we were too busy.

The time has come.

Research

We tried mysqldump. It was too slow. Our tests showed that a dump would take days.

Some colleagues recommended AWS DMS. It did not work.  Some reasons:

  • The AUTO_INCREMENT attribute is not migrated.
  • WP Multisite tables are prefixed wp_1_, wp_2_, wp_3_, ... MySQL considers the underscore a one character wildcard when used in LIKE queries, DMS provided no way to escape it for table filters.
  • Unexplained crashing on anything less than a c4.large (failed last Error The task stopped abnormally Stop Reason RECOVERABLE_ERROR Error Level RECOVERABLE)
  • Mostly, the fastest migration we could get going, running 4 tasks in parallel, was an ETA of 10 days

We asked for help. Ned had a conference call with a reputable consulting firm and they gave us a quote: $34K USD + travel & on-site expenses.

Coffee spitting

Next, our research led us to mydumper. From the README:

== What is mydumper? Why? ==

  • Parallelism (hence, speed) and performance (avoids expensive character set conversion routines, efficient code overall)
  • Easier to manage output (separate files for tables, dump metadata, etc, easy to view/parse data)
  • Consistency – maintains snapshot across all threads, provides accurate master and slave log positions, etc
  • Manageability – supports PCRE for specifying database and tables inclusions and exclusions

So far so good…

== How to build it? ==

Jerry Seinfeld leaving

Just kidding. It turns out we don’t have to build mydumper. On Ubuntu sudo apt install mydumper works fine. Similar command using yum on CentOS.

Our tests conclude that mydumper finishes in hours instead of days.

Plan

It is our opinion that this kind of problem is better suited for a document-oriented database. Given that this is the database design we inherited, there’s not much we can do about it, so we’ll try our best with what we’ve got. ¯\_(ツ)_/¯

At a billion tables, Automattic has already established its own internal best practices with plugins like HypderDB. Unfortunately HyperDB doesn’t have Composer support and doesn’t look maintained. LudicrousDB, a Composer compatible drop-in that works with our existing tech stack, to the rescue.

LudicrousDB is an advanced database interface for WordPress that supports replication, fail-over, load balancing, and partitioning, based on Automattic’s HyperDB drop-in.

With LudicrousDB tested and working, we are moving towards a 101 slice approach. 1 slice for core tables and 100 slices for books.

The idea is to use the last two digits of a book ID to pick one of 100 slices. If this becomes unmanageable in the future (important to remember that we already have over half a million tables in 1 database and things are fine), we can change the splitting algorithm by adding a condition to use the last X digits on books with IDs bigger than Y.

Code

For informational purposes only. Read the snippets and reason about them. Copy/paste at your own peril.

LudicrousDB  Callback

/**
 * Slices
 *
 * We can predict what slice a blog is in by looking 
 * at the last two digits of the id. Examples:
 *
 * + blog_id: 9, in db09
 * + blog_id: 74, in db74
 * + blog_id: 999989, in db89
 * + blog_id: 9200, in db00
 *
 * @param $query
 * @param \LudicrousDB $wpdb
 *
 * @return string
 */
function pb_db_callback( $query, $wpdb ) {
  if ( preg_match( "/^{$wpdb->base_prefix}\d+_/i", $wpdb->table ) ) {
    $last_two_digits = (int) substr( $wpdb->blogid, -2 );
    $db = sprintf( 'db%02d', $last_two_digits ); // db00, db01, db02, ..., db99
    return $db;
  } else {
    return 'global';
  }
}
$wpdb->add_callback( 'pb_db_callback' );

Export DB Into 101 Slices:

#!/bin/bash

# This script will CREATE 101 directories in current 
# working directory, you have been warned!

db='old_database_name'

sudo mydumper --regex="^${db}\.wp_[a-zA-Z]+.*" --database="${db}" --outputdir="core" --build-empty-files
for ((i=0; i<=99; i++)); do
  ii=`printf %02d $i`
  sudo mydumper --regex="^${db}\.(wp_${i}_|wp_\d+${ii}_).*" --database="${db}" --outputdir="${ii}" --build-empty-files
done

Import 101 Slices:

#!/bin/bash

# This script will READ 101 directories in current 
# working directory

db='new_database_name'

sudo myloader --directory="core" --database="${db}" --overwrite-tables
for ((i=0; i<=99; i++)); do
  ii=`printf %02d $i`
  sudo myloader --directory="${ii}" --database="${db}_${ii}" --overwrite-tables
done

Did it work?

To be continued in Part 2…

Bonus tips:

Because Pressbooks has so many MySQL tables, the Clients I use are always getting stuck or freezing. Here are some tricks I use to keep sane:

  • Don’t let MySQL Workbench load the table schemas. Set up your GUI so that schemas are in a separate tab, disable autoloading, autocomplete, etc. (Edit ⇨ Preferences ⇨ SQL Editor)
  • Disable MySQL CLI auto-completion with --disable-auto-rehash

Pressbooks 4.3.4 and Pressbooks Book 1.10.4

We tagged Pressbooks 4.3.4 and Pressbooks Book 1.10.4 on GitHub today and are now deploying them across our hosted networks. Here’s what’s changed:

Pressbooks 4.3.4

NOTICE: Pressbooks >= 4.3.3 requires WordPress 4.8.2.
NOTICE: Users of the Pressbooks Custom CSS theme must upgrade to Pressbooks Custom CSS 1.0 for compatibility with Pressbooks >= 4.3.0.

  • [CORE ENHANCEMENT] The user catalog title can now be changed via the pb_catalog_titlefilter (props to @monkecheese; see #961).
  • [CORE ENHANCEMENT] SCSS variables from theme options will now be passed to the SCSS compiler as key/value pairs rather than by building SCSS in PHP (see #782 and #963).
  • [FIX] Fixed an issue where the PDF margins theme option was not being applied properly.
  • [FIX] Fixed a conflict between the updated Pressbooks LaTeX module and third-party renderers (props to @monkecheese; see #958 and #959).
  • [FIX] The publication date should now save properly, regardless of book language (thanks to @thomasdumm for the bug report; see #965 and #966).

Pressbooks Book 1.10.4

  • [FIX] Fixed an issue where part numbering would not reset properly in Prince if the part was the book’s first content (see #45).

Pressbooks 4.3.3, Pressbooks Publisher 3.1.3 and DocRaptor for Pressbooks 2.1.0

We tagged Pressbooks 4.3.3, Pressbooks Publisher 3.1.3, and DocRaptor for Pressbooks 2.1.0 and deployed them across our hosted networks today. Here’s what’s changed:

Pressbooks 4.3.3

NOTICE: Pressbooks 4.3.3 requires WordPress 4.8.2.

  • [CORE ENHANCEMENT] The Pressbooks plugin is now self-updating — GitHub Updater is no longer required (see #897 and #954).
  • [CORE ENHANCEMENT] Error logs from export routines can be emailed to an array of email addresses supplied via the pb_error_log_emails filter (see #956).
  • [CORE ENHANCEMENT] Images in cloned or imported books can now be properly edited using the WordPress image editor (see #920 and #949).
  • [FIX] We’ve implemented a better solution for the PDF profile bug (see #951, #952).
  • [FIX] URLs like /catalog/page/1 will no longer attempt to load user catalogs (see #953).

Pressbooks Publisher 3.1.3

  • [FIX] Removed duplicate footer markup (thanks to @jeremyfelt; see #11).

DocRaptor for Pressbooks 2.1.0

  • [CORE ENHANCEMENT] The DocRaptor for Pressbooks plugin is now self-updating — GitHub Updater is no longer required (see #19, #20, and #21).

 

Our new roadmap

At the start of 2017, we established a development roadmap for Pressbooks to guide our work through the coming year. This past week we had the opportunity to review and reflect on that roadmap during our retreat at Pressbooks HQ, and you can see our new roadmap here.

2017: Year of Core

I want to highlight a few of our accomplishments from the past nine months (you’ll see them crossed out on the old roadmap). In retrospect, our clear focus over the past nine months was improving Pressbooks’ core technology. When Dac came back, having a second developer let us expand upon my efforts in recent years to standardize our development processes under the hood. We now use consistent coding standards across all of our open source projects, and we have adopted a standardized build process for admin and front end assets. We have expanded our code coverage on the core Pressbooks plugin, and continue to do so with every release. These improvements let us work more efficiently and give open source contributors a clear framework within which to contribute to the Pressbooks ecosystem.

Our most significant core enhancement is our new REST API, built on the WordPress Core REST API infrastructure. This is the engine that powers our new cloning tool, and we will be making use of it in other areas over the next year. We’re also extremely excited to see what the Pressbooks Open Source community does with an API for books. If you are building something with it, let us know.

2018: Year of the Author

The roadmap for our next year has a new focus: improving Pressbooks for authors. There are a number of editing features that we included on our last roadmap that didn’t make the cut, and our goal for the next year is to fill as many of these gaps in the Pressbooks authoring toolset as we can. This includes:

  • Broadening our support for math, interactive content, video and audio across all formats (with graceful fallbacks in static formats like PDF)
  • Adding support for multiple contributors: authors, editors, translators and more
  • Adding indexing and glossary support

And more! Take a look at our new roadmap to see our comprehensive plans for improving Pressbooks’ authorship and editing tools, as well as all other aspects of the project.

I’m very excited about our accomplishments since January 2017, and I’m looking forward to building on them over the next year. Thanks to Apurva, Dac, Hugh, Liz, and Zoe for making Pressbooks such a productive team!

Pressbooks 4.3.1 and Pressbooks Book 1.10.3

We tagged Pressbooks 4.3.1 and Pressbooks Book 1.10.3 on GitHub today and deployed them across our hosted networks. Here’s what’s changed:

Pressbooks 4.3.1

NOTICE: Pressbooks 4.3.1 requires WordPress 4.8.1.
NOTICE: Users of the Pressbooks Custom CSS theme must upgrade to Pressbooks Custom CSS 1.0 for compatibility with Pressbooks 4.3.1.

  • [CORE ENHANCEMENT] Added a debugging switch to Custom Styles (see #946).
  • [FIX] Resolved an issue where some fonts would not be loaded properly during the PDF export routine (see #944 and #945).
  • [FIX] Updated routines that use XPath for compatibility with HTML5, resolving some issues with multi-level TOC and EpubCheck validation (see #947).

Pressbooks Book 1.10.3

  • [FIX] Fix some issues with Biblical Hebrew, Devanagari, and Turkish fonts.

Pressbooks 4.3, Pressbooks Book 1.10.2, Pressbooks Custom CSS 1.0, and Pressbooks Publisher 3.1.2

We tagged Pressbooks 4.3.0, Pressbooks Book 1.10.2, Pressbooks Custom CSS 1.0.0, and Pressbooks Publisher 3.1.2 on GitHub yesterday, and we’re deploying them across our hosted networks today. Here’s what’s changed:

Pressbooks 4.3

NOTICE: Pressbooks 4.3.0 requires WordPress 4.8.1.
NOTICE: Users of the Pressbooks Custom CSS theme must upgrade to Pressbooks Custom CSS 1.0 for compatibility with Pressbooks 4.3.

  • [FEATURE] Custom Styles: Navigate to AppearanceCustom Styles on your book’s dashboard to add custom CSS or SCSS to any book theme (see #658, #912, #925, #937, #938, #940, #941, and #942).
  • [ENHANCEMENT] Expanded the license property of the /metadata endpoint to include a human-readable license name and custom license text (if present) (see #934 and #936).
  • [ENHANCEMENT] Added the book’s short description to the /metadata endpoint as a disambiguatingDescription (see #930 and #932).
  • [ENHANCEMENT] Clarified errors when trying to clone a book from Pressbooks < 4.1 (see #914and #931).
  • [ENHANCEMENT] Renamed several action and filter hooks and deprecated the old versions (see #926).
  • [FIX] Fixed an issue which would prevent super administrators without any books on a network from accessing the cloning page (see #913 and #933).
  • [FIX] Fixed a regression which blocked the use of custom LaTeX renderers (props to @monkecheese; see #928).

Pressbooks Book 1.10.2

  • [ENHANCEMENT] Updated to version 2.1 of pressbooks/mix.
  • [FIX] The cover page now displays the subtitle from Book Information as the book’s subtitle, rather than the tagline.

Pressbooks Custom CSS 1.0.0

NOTICE: Pressbooks Custom CSS 1.0.0 requires Pressbooks 4.3.0.

  • [ENHANCEMENT] Custom CSS functionality is now included in this theme (see #2).

Pressbooks Publisher 3.1.2

  • [FIX] Prevented Pressbooks Publisher’s wrapper from being added to the user catalog page.