Out of Memory when loading Records in Rails

Recently I ran into a problem that only showed up outside the development environment.

I had a small script that needed to iterate over all records in the database and load blobs.

Document.all.each do |doc|
process(doc.blob)
end

With a small dataset everything worked as expected.
With production-sized data, however, the job was terminated by the runtime with an out-of-memory error.

This behaviour is not surprising once you look at what all.each actually does.

How all.each works

When calling all.each ActiveRecord is loading the complete result set into memory before the iteration starts.
For large tables this means that thousands or even millions of Ruby objects are instantiated at once.

If each record also references additional data — for example blobs, attachments, or associations — the memory usage grows quickly.

Loading Records with find_each

ActiveRecord provides find_each for exactly this scenario:

Document.find_each do |doc|
process(doc.blob)
end

In contrast to each, this method does not load all records at once.
Instead, records are fetched in batches and yielded one by one.

Conceptually the process looks like this:

  1. Load a limited number of records
  2. Yield them to the block
  3. Discard them
  4. Load the next batch

By default, find_each loads records in batches of 1000.
The batch size can be configured:

Document.find_each(batch_size: 100) do |doc|
process(doc.blob)
end

find_each always iterates in primary key order. This means the model must have a primary key that is orderable like integer or string. Any explicit ordering will be ignored.

If more control is required, find_in_batches can be used instead. It requires manual iteration over the batches.

Conclusion

Iterating over large tables with all.each is easy to write but can lead to excessive memory usage once the dataset grows.

For batch processing tasks, find_each is usually the safer default because it limits the number of instantiated records and keeps memory usage predictable.