Mar 31, 2010

Comparing CSV Files with Ruby

Today I was tasked with comparing a spreadsheet of carefully verified data to what’s now on our production servers. After spending less than a minute staring at two spreadsheets in Excel and knowing I didn’t want to compare lots of rows of data with my eyes and scroll wheel finger, I decided it was time to get Ruby on the job. Here’s the outcome:

require 'rubygems'
require 'fastercsv'

def normalize(row)
  row[2].sub!(/^0+/, "") if row[2]
  row[3].sub!(/^0+/, "") if row[3]
  
  return row
end

old_rows = FasterCSV.read("old.csv")
new_rows = FasterCSV.read("new.csv")

old_rows.each { |row| normalize(row) }
new_rows.each { |row| normalize(row) }

additions = new_rows - old_rows
deletions = old_rows - new_rows

puts "Additions (#{additions.size})"

additions.each { |a| puts a.inspect }

FasterCSV.open("additions.csv", "w") do |csv|
  additions.each { |a| csv << a }
end

puts

puts "Deletions (#{deletions.size})"

deletions.each { |d| puts d.inspect }

FasterCSV.open("deletions.csv", "w") do |csv|
  deletions.each { |d| csv << d }
end

I’m using FasterCSV because it’s, um, faster, so make sure you have that gem installed. The script outputs two CSV files. One for additions, items that exist in the new file, but not the old one. And one for deletions, items that exist in the old file but not the new one. I have the normalize method in there to clean up some data. In my case I had leading zeros in some places but not others, so I removed those there.

So there you have it: tedium and eye strain avoided!

About
Be inspired and inspiring. Subscribe via RSS.