DZone Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

Hans has posted 19 posts at DZone. View Full User Profile

Cheapest Rsync Replacement (with Ruby)

03.31.2006
| 11390 views |
  • submit to reddit
        I often use rsync to keep a local copy of some HTTPD logs (around ~200MB atm.). Since they are append-only, having rsync compute and compare the checksums for the parts I already have seems wasteful: both my box and the one I'm copying from would be happier if they didn't have to process a couple hundred MBs for nothing. (...)

#!/usr/bin/env ruby

REMOTE_RUBY = "ruby"
# TODO: allow REMOTE_RUBY to be specified via a cmdline opt

if ARGV.size != 2 || ARGV[0][/:/].nil? || !File.exist?(ARGV[1])
  puts <<EOF
  ruby logfetcher.rb host:path/to/src dst
EOF
  exit
end

FILE = ARGV[1]
REMOTE_HOST, REMOTE_FILE = ARGV[0].split(/:/)
BLOCK_SIZE = 8192

osize = File.size(FILE)
#FIXME: cheap escaping
command = "File.open(#{REMOTE_FILE.inspect}){|f| " + 
          "f.pos = #{osize}; print f.read(#{BLOCK_SIZE}) until f.eof? }"

command.gsub!(/"/){'\\"'}
fetched = 0
t = nil
$stdout.sync = true
print "Establishing connection\r"
File.open(FILE, "a") do |os|
  IO.popen(%{ssh #{REMOTE_HOST} ruby -e '"#{command}"'}) do |is|
    until is.eof?
      data = is.read(BLOCK_SIZE)
      t ||= Time.new # ignore the time it takes to establish the SSH connection
      fetched += data.size
      print "Read #{fetched}                          \r"
      os.write(data)
    end
  end
end
print(" " * 50  + "\r")

dt = Time.new - t
puts "Fetched #{fetched} bytes."
puts "Total size #{osize + fetched}."
puts "Needed %4.1f seconds." % dt
puts "Average speed %d bytes/sec." % (fetched / dt)

Source: <a href="http://eigenclass.org/hiki.rb?cheap+rsync">Cheapest rsync replacement</a>