• Migrating a WordPress Media Library to S3

    Every developer has tasks that are impossible to be proud of and a little embarrassing to mention, because everything, absolutely everything about such tasks is bad. Take WordPress, for example: it stores data in serialized arrays in the database, does not know how to do data migrations, violates every imaginable and unimaginable, spoken and unspoken rule of writing good code… But it is used in production, brings the business money, and therefore it must be maintained and developed… and, well, you may also need to move the entire media library to S3. In fact, it could be any CDN.

    The Rough Action Plan

    1. Install a plugin for working with S3.
    2. Find all images.
    3. Upload them to S3.
    4. Change the image paths in posts.
    5. Preserve your sanity if possible.

    To make it more fun, all these actions have to happen automatically, without pressing buttons in the admin panel or changing configs, because the installation is expected to happen in AWS Elastic Beanstalk. And you had better not ask me why.

    Solution

    First, you need to install the tantan_wordpress_s3 plugin in WordPress in the standard way, so as not to reinvent the wheel yet again. Everything is simple with the plugin files: they will become part of the application. But the configuration will have to be tweaked manually a little.

    $config = array(
      "key" => AWS_KEY,
      "secret" => AWS_SECRET,
      "bucket" => AWS_BUCKET,
      "wp-uploads" => 1,
      "expires" => 315360000,
      "permissions" => "public",
      "cloudfront" => ""
    );
    
    $db->query("
      INSERT INTO `wp_options` (`option_name`, `option_value`, `autoload`) VALUES ('tantan_wordpress_s3', :config, 'yes');
    ", array(
      'config' => serialize($config)
    ));

    The plugin will also need to be activated:

    $pluginsSerialized = $db->fetchOne("SELECT `option_value` FROM `wp_options` WHERE `option_name` = 'active_plugins'");
    $plugins = unserialize($pluginsSerialized);
    $plugins[] = "tantan-s3-cloudfront/wordpress-s3.php";
    $db->query("UPDATE `wp_options` SET `option_value` = :option_value WHERE `option_name` = 'active_plugins';", array(
      'option_value' => serialize($plugins)
    ));

    At this stage, all new additions to the media library will go to S3. What remains is the migration of old data. First, select all of it:

    $stmt = $db->query("
      SELECT * FROM wp_postmeta WHERE meta_key = '_wp_attachment_metadata'
    ");

    Now we need to change links in the old data from local paths to file paths on the CDN. knplabs/gaufrette is used as the library for access to local files.

    foreach ($stmt->fetchAll() as $row) {
      $data = unserialize($row['meta_value']);
      $file = substr($data['file'], strpos($data['file'], '/wp-content'));
    
      $data = array(
        'bucket' => AWS_S3_BUCKET,
        'key' => '/wp-content/uploads/' . $file
      );
    
      $serialized = serialize($data);
      $stmt2 = $db->query("
        INSERT INTO wp_postmeta (post_id, meta_key, meta_value)
        VALUES (:post_id, :meta_key, :meta_value)
      ", array(
        'meta_value' => $serialized,
        'post_id' => $row['post_id'],
        'meta_key' => 'amazonS3_info'
      ));
    }
    
    foreach ($localFilesystem->listKeys() as $key) {
      $db->query("
        UPDATE `wp_posts`
        SET `post_content` = REPLACE(`post_content`, '/old_relative_url/wp-content/{$key}', 'http://AWS_BUCKET.s3.amazonaws.com/wp-content/{$key}')
      ");
    }

    Most importantly, do not forget to copy the files themselves. Fortunately, that is the simplest part of the task.