Using Rails with a legacy database schema – Part 2

Part one of this blog post mini-series showed how to override default table names and primary key names in ActiveRecord model classes, and how to define alias attributes for legacy column names.

This part will discuss some options for primary key definitions in the schema file, which are relevant for legacy schemas, as well as primary key value generation on Oracle databases.

Primary key schema definition

The database schema definition of a Rails application is usually provided in a file called schema.rb via a simple domain specific language.

The create_table method implicitly adds a primary key column with name id (of numeric type) by default.

create_table 'users' do |t|
  t.string 'name', limit: 20
  # ...
end

If the primary key has a different name you can easily specify it via the primary_key option:

create_table 'users', primary_key: 'user_key' do |t|
  t.string 'name', limit: 20
  # ...
end

But what if a table has a primary key of non-numeric type? The Rails schema DSL does not directly support this. But there’s a workaround: you can set the id option of create_table to false, declare the primary key column like an ordinary non-nullable column, and add the primary key constraint afterwards via execute.

create_table 'users', id: false do |t|
  t.string 'user_key', null: false
  t.string 'name', limit: 20
  # ...
end
execute 'ALTER TABLE user ADD PRIMARY KEY (user_key)'

Primary key value generation

On Oracle databases new primary key values are usually created via sequences. The Oracle adapter for ActiveRecord assumes sequence names in the form of table name + “_seq”.  You can override this default sequence name in a model class via the sequence_name property:

class User < ActiveRecord::Base
  self.sequence_name = 'user_sequence'
  # ...
end

Sometimes primary key values are auto-generated via triggers. In this case you need the Oracle Enhanced adapter, which is a superset of the original ActiveRecord Oracle adapter, but with additional support for working with legacy databases. Now you can set the sequence_name property to the value :autogenerated:

class User < ActiveRecord::Base
  self.sequence_name = :autogenerated
  # ...
end

This circumvents the default convention and tells the adapter to not include primary key values in generated INSERT statements.

An Oracle story: Null, empty or what?

One big argument for relational databases is SQL which as a standard minimizes the effort needed to switch your app between different DBMSes. This comes particularily handy when using in-memory databases (like HSQL or H2) for development and a “big” one (like PostgreSQL, MySQL, DB2, MS SqlServer or Oracle) in production. The pity is that there are subtle differences with regard to the interpretation of the SQL-standard when it comes to databases from different vendors.

Oracle is particularily picky and offers quite some interesting behaviours: Most databases (all that I know well) treat null and empty as different values when it comes to strings. So it is perfectly valid to store an empty string in a not-null column and retrieving the string from the column yields an empty string. Not so with Oracle 10g! Inserting null and retrieving the value yields unsurprisingly null, even using Oracle. Inserting an empty string and retrieving the value leaves you with null, too! Oracle does not differentiate between empty strings and null values like a Java developer would expect. In our environment this has led to surprised developers and locally unreproducible bug which clearly exist in production a couple of times.

[rant]Oracle has great features for big installations and enterprises that can afford the support, maintenance and hardware of a serious Oracle DBMS installation. But IMHO it is a shame that such a big player in the market does not really care about the shortcomings of their flagship product and standards in general (Oracle 10g only supports SQL92 entry level!). Oracle, please fix such issues and help us developers to get rid of special casing for your database product![/rant]

The lesson to be learnt here is that you need a clone of the production database for your integration tests or acceptance tests to be really effective. Quite some bugs have slipped into production because of subtle differences in behaviour of our inhouse databases and the ones in production at the customer site.

Paging with different DBs

Sometimes you cannot or do not want to use an object-relational mapping tool. When not using an OR-mapper like Hibernate or Oracle Toplink you have to deal with database specifics. One common case especially for web applications is limiting the result set to a number of items that fit nicely on a web. You then often want to allow the users to navigate between these “pages” of items aka “paging”.

This type of functionality became part of SQL only as of SQL2008 in the following form:
SELECT * FROM t WHERE ... ORDER BY c OFFSET start_row FETCH count ONLY

Since most popular database management systems (DBMSes) do not yet implement this syntax you have to implement paging in propriatory ways.

My experience with an Oracle DBMS and the frustrating and comparatively long time it took to find the correct™ solution inspired me to write this post. Now I want to present you the syntax for some widely used DBMSes which we encounter frequently in our projects.

  • MySQL, H2 and PostgreSQL (< 8.4 which will also implement the SQL2008 standard) use the same syntax:
    SELECT * FROM t WHERE ... ORDER BY c LIMIT count OFFSET start
  • Oracle is where the fun begins. There is actually no easy and correct way of doing this. So you will end up with a mess like:
    SELECT col1 FROM (SELECT col1, ROWNUM r FROM (SELECT col1 FROM table ORDER BY col1)) WHERE r BETWEEN start AND end
  • DB2 AFAIK uses the syntax proposed in SQL2008 but correct me if I am wrong because we do not yet work with DB2 databases.
  • As we did not need paging with MS SQLServer as of now I did not bother to look for a solution yet. Hints are very welcome.

With all solutions the ORDER BY clause is critical because SQL does not guarantee the order of the returned rows.

Wikipedia delivers some additional and special case information but does not really explain the general, real world case the specific DBMSes.

I hope that I raised some awareness about database specifics and perhaps saved you some time trying to find a solution the problem using your favorite DBMS.