DZone Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

Snippets has posted 5883 posts at DZone. View Full User Profile

Perform A Rails Find() And Iterate Over The Resulting Records In Groups

05.05.2008
| 16672 views |
  • submit to reddit
        
module ActiveRecord
  class Base
    # This method lets you iterate over the results of a .find, in groups. (Basically
    # an interface to LIMIT.) Anything you can pass as options to .find, you can pass here. 
    #
    # Returns the number of records found so you can determine whether or not something
    # actually happened.
    #
    # === Specific options
    # * <tt>:by</tt>     - Number of records in each group
    # * <tt>:update</tt> - If set to true, a message like 'Reading records 1-50' will be
    #                      output before each group of records is fetched from the database.
    #
    # === Examples
    #   # default # of records is 100...
    #   Order.find_each(:conditions => { :cc_processed_at => nil }) do |order|
    #     # do stuff with order
    #   end
    #
    #   # ...but you can explicitly specify it
    #   Person.find_each(:by => 50, :include => :address, :order => 'name') do |person, index|
    #     # do stuff with person and index
    #   end
    #
    #   # using the return value
    #   num_records = Order.find_each do |order|
    #     # ...
    #   end
    #   puts "No orders found." if num_records.zero?
    #
    #   # you can get the offset per iteration too
    #   Order.find_each do |order, offset|
    #     puts "Offset: #{offset}"
    #   end
    #
    #   # even works with named scopes!
    #   Dog.with_ears_cut_short.with_tail_cut_long.find_each do |dog|
    #     # ...
    #   end
    #
    # === Caveats
    # If you're resaving the records that .find_each returns such that executing the query with
    # the same offset and limit on the next iteration would omit the records, .find_each may end
    # up skipping records on subsequent iterations. For instance, if you have something like:
    # 
    #   Order.find_each(:conditions => { :processed => false }) do |order|
    #     order.processed = true
    #     order.save!
    #   end
    #
    # Here's the problem. On the first iteration .find_each will execute a <tt>SELECT .. LIMIT 0,100</tt>
    # query, and the records will get saved. The next iteration, it will execute <tt>SELECT .. LIMIT 100,100</tt>.
    # However, the records that were just saved no longer match the conditions in the query,
    # but the query still assumes that they're there, so in effect we're double-skipping them.
    #
    # In this case what you probably want to do is pull a list of all the ids that match the
    # conditions at the outset, and then iterate through those in groups. Unfortunately this
    # doesn't do that quite yet ;)
    def self.find_each(options={}, &blk)
      group_size = options.delete(:by) || 100
      update = options.delete(:update) || false
      num_records = count(options.except(:from))
      return 0 if num_records == 0
      #raise "Number of records: #{num_records}"
      also_pass_offset = (blk.arity == 2)
      0.step(num_records, group_size) do |offset|
        find_options = { :offset => offset, :limit => group_size, :order => "#{table_name}.#{primary_key}" }.merge(options)
        if update
          if num_records == 1
            puts ">> Reading the only record."
          else
            start_offset = offset + 1
            end_offset   = offset + group_size
            end_offset   = num_records if num_records < end_offset
            puts ">> Reading records #{start_offset}-#{end_offset} of #{num_records} (offset: #{offset}, limit: #{group_size})"
          end
        end 
        find(:all, find_options).each do |record|
          also_pass_offset ? blk.call(record, offset) : blk.call(record)
        end
      end
      num_records
    end
  end
end

Also see:

* http://weblog.jamisbuck.org/2007/4/6/faking-cursors-in-activerecord
* http://weblog.jamisbuck.org/2007/2/28/poor-man-s-pagination
* http://pseudocursors.rubyforge.org/