Comparing CSV Files with Ruby
Today I was tasked with comparing a spreadsheet of carefully verified data to what’s now on our production servers. After spending less than a minute staring at two spreadsheets in Excel and knowing I didn’t want to compare lots of rows of data with my eyes and scroll wheel finger, I decided it was time to get Ruby on the job. Here’s the outcome:
require 'rubygems'
require 'fastercsv'
def normalize(row)
row[2].sub!(/^0+/, "") if row[2]
row[3].sub!(/^0+/, "") if row[3]
return row
end
old_rows = FasterCSV.read("old.csv")
new_rows = FasterCSV.read("new.csv")
old_rows.each { |row| normalize(row) }
new_rows.each { |row| normalize(row) }
additions = new_rows - old_rows
deletions = old_rows - new_rows
puts "Additions (#{additions.size})"
additions.each { |a| puts a.inspect }
FasterCSV.open("additions.csv", "w") do |csv|
additions.each { |a| csv << a }
end
puts
puts "Deletions (#{deletions.size})"
deletions.each { |d| puts d.inspect }
FasterCSV.open("deletions.csv", "w") do |csv|
deletions.each { |d| csv << d }
end
I’m using FasterCSV because it’s, um, faster, so make sure you have that gem installed. The script outputs two CSV files. One for additions, items that exist in the new file, but not the old one. And one for deletions, items that exist in the old file but not the new one. I have the normalize method in there to clean up some data. In my case I had leading zeros in some places but not others, so I removed those there.
So there you have it: tedium and eye strain avoided!