Sunday, April 20, 2008

Moving to morph: postgres migration.

This is the first in a series of posts about moving from a perfectly good VPS (RapidVPS.com) to the morph exchange platform. We love being on morph but it was not trivial for us to make the move.

Bear in mind, we have a production application with some 200MB of data over 6 months of operation. Not big, by any means, but significant.

Before anything else, we had to move from mysql to postgres. Morph only uses postgres (for now) because it supports almost-real-time database backups to an S3 bucket. We love it because it removes our 10 minute transaction log backups and nightly complete dumps.

First install postgres 8.3 on the mac.
Then install phppgadmin for admin features because that's what morph gives you to interact with your production database.
Doing the data migration turned out to be a headache.
First, we did a mysqldump from production, FTP'd it to a dev machine, and inserted it into a clean copy of the application and database.
We set up a different (clean) mac with only postgres and rails so that we'd be sure to get exactly the right pieces installed.
Then we followed this recipe to get the data into postgres.
For some reason the rake db:schema_dump made some errors with foreign key columns (making them :string instead of :integer). So we had to edit those by hand.
Turns out our data had quite a few rows with embedded line breaks (\n and \r) and we had to edit these by hand.

Application implications for postgres.
Many times we inadvertently relied on mysql retuning rows from an ActiveRecord fetch in id order. Postgres will not. Need to add specific :order hashes to every fetch.
Mysql sorts on :boolean fields differently than postgres does. We wanted NULL values first, which mysql does. Postgres, of course, puts them last.
Beware of mysql-specific functions in your sql. For example Postgres does not support the CONCAT function. I don't know if it's an ANSI/SQL92 function, but postgres uses the "||" operator to concat string columns.

Migration. Getting the data into the morph db when the app is first initialized requires setting up the schema and then inserting all the current production rows. We tried using pgphpadmin's SQL dialogs to do this remotely but we failed each time. We like rails, so we did it in the migration. To do it, we dumped the 200 MB of SQL into 4 separate files at the rails root. Then in the migration we called them via:

%w(tp1.sql tp2.sql tp3.sql tp4.sql).each do |sql_file|
File.open(sql_file) do |file|

execute file.read
end
end

Postgres keeps its serial columns' next sequential ids in so-called sequential tables. The inserts performed in the migration will not have incremented these because we wanted to use the existing id values from the production db. However, for new rows to be inserted in those tables, you'll need to update the sequential tables. We used this in the migration after the bulk load. Notice how we exclude tables that do not use sequential ids and empty tables.

# Gather tables that use sequential ids.
set_id_tables =
ActiveRecord::Base.connection.tables.sort.reject! do |tbl|
['schema_info', 'bookings_users', 'friendships', 'journal_entries_users', 'roles_users', 'sessions', 'slide_desk_images'].include?(tbl)
end

# Set the sequence to the highest of inserted records.
set_id_tables.each do |tbl|
execute "select setval('#{tbl}_id_seq', (select max(id) from #{tbl}));"
end

I'll add references and links to this post over the next few days.

1 comment:

Anonymous said...
This comment has been removed by a blog administrator.