Oracle – Page 5 – Schneide Blog

What’s your time, database?

Time is a difficult subject. Especially time zones and daylight saving time. Adding layers makes things worse. Ask your database.

Time is a difficult subject. Especially time zones and daylight saving time. Sounds easy? Well, take a look.
Adding layers in software development complicates the issue and every layer has its own view of time. Let’s start with an example: we write a simple application which stores time based data in a SQL database, e.g. Oracle. The table has a column named ‘at’. Since we don’t want to mess around with timezones, we use a column type without timezone information, in Oracle this would be ‘Date’ if we do not need milliseconds and ‘Timestamp’ if we need them. In Java with plain JDBC we can extract it with a call to ‘getTimestamp’:

Date timestamp = resultSet.getTimestamp("at");

The problem is now we have a timestamp in our local timezone. Where is it converted? Oracle itself has two timezone settings: for the database and for the session. We can query them with:

select DBTIMEZONE from dual;

and

select SESSIONTIMEZONE from dual;

First Oracle uses the time zone set in the session, then the database one. The results from those queries are interesting though: some return a named timezone like ‘Europe/Berlin’, the other return an offset ‘+01:00’. Here a first subtle detail is important: the named timezone uses the offset and the daylight saving time from the respective timezone, the offset setting only uses the offset and no daylight saving. So ‘+01:00’ would just add 1 hour to UTC regardless of the date.
In our example changing these two settings does not change our time conversion. The timezone settings are for another column type: timestamp with (local) timezone.
Going up one layer the JDBC API reveals an interesting tidbit:
Timestamp getTimestamp(int columnIndex) throws SQLException

Retrieves the value of the designated column in the current row of this ResultSet object as a java.sql.Timestamp object in the Java programming language.

Sounds about right, but wait there’s another method:
Timestamp getTimestamp(int columnIndex, Calendar cal) throws SQLException

Retrieves the value of the designated column in the current row of this ResultSet object as a java.sql.Timestamp object in the Java programming language. This method uses the given calendar to construct an appropriate millisecond value for the timestamp if the underlying database does not store timezone information.

Just as in Oracle we can use a named timezone or an offset:

Date timestamp = resultSet.getTimestamp("at", Calendar.getInstance(TimeZone.getTimeZone("GMT+1:00")));

This way we have control over what and how the time is extracted from the database. The next time you work with time based information take a close look. And if you work with Java use Joda Time.

When UTF8 != UTF8

Not all encoding problems are problems with different encodings

Problem

Recently I encountered a problem with umlauts in file names. I had to read names from a directory and find and update the appropriate entry in the database. So if I had a file named hund.pdf (Hund is German for dog) I had to find the corresponding record in the database and attach the file. Almost all files went smooth but the ones with umlauts failed all.

Certainly an encoding problem I thought. So I converted the string to UTF-8 before querying. Again the query returned an empty result set. So I read up on the various configuration options for JDBC, Oracle and Active Record (it is a JRuby on Rails based web app). I tried them all starting with nls_language and ending with temporary setting the locale. No luck.

Querying the database with a hard coded string containing umlauts worked. Both strings even printed on the console looked identically.

So last but not least I compared the string from the file name with a hard coded one: they weren’t equal. Looking at the bytes a strange character combination was revealed \204\136. What’s that? UTF8 calls this a combining diaeresis. What’s that? In UTF8 you can encode umlauts with their corresponding characters or use a combination of the character without an umlaut and the combining diaeresis. So ‘ä’ becomes ‘a\204\136’.

Solution

The solution is to normalize the string. In (J)Ruby you can achieve this in the following way:

string = string.mb_chars.normalize.to_s

And in Java this would be:

string = Normalizer.normalize(string, Normalizer.Form.NFKC)

Ruby uses NFKC (or kc for short) as a default and suggests this for databases and validations.

Lesson learned: So the next time you encounter encoding problems look twice it might be in the right encoding but with the wrong bytes.

Using Rails with a legacy database schema – Part 2

Part one of this blog post mini-series showed how to override default table names and primary key names in ActiveRecord model classes, and how to define alias attributes for legacy column names.

This part will discuss some options for primary key definitions in the schema file, which are relevant for legacy schemas, as well as primary key value generation on Oracle databases.

Primary key schema definition

The database schema definition of a Rails application is usually provided in a file called schema.rb via a simple domain specific language.

The create_table method implicitly adds a primary key column with name id (of numeric type) by default.

create_table 'users' do |t|
  t.string 'name', limit: 20
  # ...
end

If the primary key has a different name you can easily specify it via the primary_key option:

create_table 'users', primary_key: 'user_key' do |t|
  t.string 'name', limit: 20
  # ...
end

But what if a table has a primary key of non-numeric type? The Rails schema DSL does not directly support this. But there’s a workaround: you can set the id option of create_table to false, declare the primary key column like an ordinary non-nullable column, and add the primary key constraint afterwards via execute.

create_table 'users', id: false do |t|
  t.string 'user_key', null: false
  t.string 'name', limit: 20
  # ...
end
execute 'ALTER TABLE user ADD PRIMARY KEY (user_key)'

Primary key value generation

On Oracle databases new primary key values are usually created via sequences. The Oracle adapter for ActiveRecord assumes sequence names in the form of table name + “_seq”. You can override this default sequence name in a model class via the sequence_name property:

class User < ActiveRecord::Base
  self.sequence_name = 'user_sequence'
  # ...
end

Sometimes primary key values are auto-generated via triggers. In this case you need the Oracle Enhanced adapter, which is a superset of the original ActiveRecord Oracle adapter, but with additional support for working with legacy databases. Now you can set the sequence_name property to the value :autogenerated:

class User < ActiveRecord::Base
  self.sequence_name = :autogenerated
  # ...
end

This circumvents the default convention and tells the adapter to not include primary key values in generated INSERT statements.

An Oracle story: Null, empty or what?

One big argument for relational databases is SQL which as a standard minimizes the effort needed to switch your app between different DBMSes. This comes particularily handy when using in-memory databases (like HSQL or H2) for development and a “big” one (like PostgreSQL, MySQL, DB2, MS SqlServer or Oracle) in production. The pity is that there are subtle differences with regard to the interpretation of the SQL-standard when it comes to databases from different vendors.

Oracle is particularily picky and offers quite some interesting behaviours: Most databases (all that I know well) treat null and empty as different values when it comes to strings. So it is perfectly valid to store an empty string in a not-null column and retrieving the string from the column yields an empty string. Not so with Oracle 10g! Inserting null and retrieving the value yields unsurprisingly null, even using Oracle. Inserting an empty string and retrieving the value leaves you with null, too! Oracle does not differentiate between empty strings and null values like a Java developer would expect. In our environment this has led to surprised developers and locally unreproducible bug which clearly exist in production a couple of times.

[rant]Oracle has great features for big installations and enterprises that can afford the support, maintenance and hardware of a serious Oracle DBMS installation. But IMHO it is a shame that such a big player in the market does not really care about the shortcomings of their flagship product and standards in general (Oracle 10g only supports SQL92 entry level!). Oracle, please fix such issues and help us developers to get rid of special casing for your database product![/rant]

The lesson to be learnt here is that you need a clone of the production database for your integration tests or acceptance tests to be really effective. Quite some bugs have slipped into production because of subtle differences in behaviour of our inhouse databases and the ones in production at the customer site.

Paging with different DBs

Sometimes you cannot or do not want to use an object-relational mapping tool. When not using an OR-mapper like Hibernate or Oracle Toplink you have to deal with database specifics. One common case especially for web applications is limiting the result set to a number of items that fit nicely on a web. You then often want to allow the users to navigate between these “pages” of items aka “paging”.

This type of functionality became part of SQL only as of SQL2008 in the following form:
SELECT * FROM t WHERE ... ORDER BY c OFFSET start_row FETCH count ONLY

Since most popular database management systems (DBMSes) do not yet implement this syntax you have to implement paging in propriatory ways.

My experience with an Oracle DBMS and the frustrating and comparatively long time it took to find the correct™ solution inspired me to write this post. Now I want to present you the syntax for some widely used DBMSes which we encounter frequently in our projects.

MySQL, H2 and PostgreSQL (< 8.4 which will also implement the SQL2008 standard) use the same syntax:
SELECT * FROM t WHERE ... ORDER BY c LIMIT count OFFSET start
Oracle is where the fun begins. There is actually no easy and correct way of doing this. So you will end up with a mess like:
SELECT col1 FROM (SELECT col1, ROWNUM r FROM (SELECT col1 FROM table ORDER BY col1)) WHERE r BETWEEN start AND end
DB2 AFAIK uses the syntax proposed in SQL2008 but correct me if I am wrong because we do not yet work with DB2 databases.
As we did not need paging with MS SQLServer as of now I did not bother to look for a solution yet. Hints are very welcome.

With all solutions the ORDER BY clause is critical because SQL does not guarantee the order of the returned rows.

Wikipedia delivers some additional and special case information but does not really explain the general, real world case the specific DBMSes.

I hope that I raised some awareness about database specifics and perhaps saved you some time trying to find a solution the problem using your favorite DBMS.

	Anonymous on Cache configuration with WildF…
	Miq on Nested queries like N+1 in pra…
	mariuselvert on Creating functors with lambda…
	Nested queries like… on Common SQL Performance Gotchas…
	Nested queries like… on Make your users happy by not c…