Regex Two Step
I spend a lot of time parsing and cleaning up text files with regular expressions, and there’s a problem that comes up fairly often that used to really puzzle me. How do you replace something that requires complicated, multi-step logic?
Everyone who has used regular expressions is pretty familiar with doing something like this:
>> "5553331234".gsub(/(\d{3})(\d{3})(\d{4})/, '(\1) \2-\3')
=> "(555) 333-1234"
And truth be told, you can do a lot of good stuff with that simple concept. The problem is that it isn’t always so easy as replace “this” with “that”. Many times you want to inject some logic into the replacement process. Thankfully we can can pass blocks to the regular expression methods in Ruby (.NET offers something similar by passing a delegate to Regex.Replace).
This allows us to do some really complicated matching pretty easily. Last week I needed to reformat hundreds of SQL statements that NHibernate was generating. Here’s what some of them looked like:
exec sp_executesql N'INSERT INTO Actions (CreatedAt, Description,
IsReversible, Name, ObjectClass, UserTransactionID) VALUES (@p0,
@p1, @p2, @p3, @p4, @p5)',N'@p0 datetime,@p1 nvarchar(9),@p2 bit,@p3
nvarchar(12),@p4 nvarchar(15),@p5 int',@p0='2010-04-03
00:36:25.1100000',@p1=N'logged
in',@p2=0,@p3=N'Authenticate',@p4=N'UsersController',@p5=5589
exec sp_executesql N'INSERT INTO UserTransactions (CreatedAt,
Description, UserID) VALUES (@p0, @p1, @p2)',N'@p0 datetime,@p1
nvarchar(30),@p2 int',@p0='2010-04-03 00:37:26.9600000',@p1=N'viewed
applicant Michael Smith',@p2=1
I needed these formatted as simple SQL statement without the exec sp_executesql with parameters nonsense. I needed to replace these:
(@p0, @p1, @p2)
With these:
@p0='2010-04-03 00:37:26.9600000', @p1=N'viewed applicant Michael
Smith',@p2=1
So it would look like this:
('2010-04-03 00:37:26.9600000', 'viewed applicant Michael Smith', 1)
Shew. As you can imagine a simple gsub like our first example wasn’t going to cut it. But, by using a block we can easily execute additional logic on each match. Here’s the code:
# Match each SQL statement that starts with exec sp_executesql
sql.gsub!(/exec sp_executesql.*?$/) do |match|
# collect the params from the end of the statement into an array
values = match.scan(/(@p\d)=N?(.*?)(?=(,@p\d)|$)/)
# Remove the beginning and end of the statement
match.gsub!(/exec sp_executesql N'/, "").gsub!(/',N'@p0.*?$/, "")
# Replace each of the parameters with its corresponding value
values.each { |v| match.gsub!(v[0], v[1]) }
# Return our clean SQL statement
match
end
You can see here that I’m using a block with gsub instead of providing a replacement string. This gives us a match object for each matching SQL statement where we can make additional replacements, and our statements end up looking like this:
INSERT INTO Actions (CreatedAt, Description, IsReversible, Name,
ObjectClass, UserTransactionID) VALUES ('2010-04-03
00:36:25.1100000', 'logged in', 0, 'Authenticate',
'UsersController', 5589)
INSERT INTO UserTransactions (CreatedAt, Description, UserID)
VALUES ('2010-04-03 00:37:26.9600000', 'viewed applicant Michael
Smith', 1)
While some additional explanation about what’s happening in the block would probably be useful, the point here is that this kind of complicated logic is possible and with very little code and hassle.
If your regular expressions are seeming impossible or overly complicated, see if you can make it happen with the regex 2 (or 3 or 4) step instead.
Comparing CSV Files with Ruby
Today I was tasked with comparing a spreadsheet of carefully verified data to what’s now on our production servers. After spending less than a minute staring at two spreadsheets in Excel and knowing I didn’t want to compare lots of rows of data with my eyes and scroll wheel finger, I decided it was time to get Ruby on the job. Here’s the outcome:
require 'rubygems'
require 'fastercsv'
def normalize(row)
row[2].sub!(/^0+/, "") if row[2]
row[3].sub!(/^0+/, "") if row[3]
return row
end
old_rows = FasterCSV.read("old.csv")
new_rows = FasterCSV.read("new.csv")
old_rows.each { |row| normalize(row) }
new_rows.each { |row| normalize(row) }
additions = new_rows - old_rows
deletions = old_rows - new_rows
puts "Additions (#{additions.size})"
additions.each { |a| puts a.inspect }
FasterCSV.open("additions.csv", "w") do |csv|
additions.each { |a| csv << a }
end
puts
puts "Deletions (#{deletions.size})"
deletions.each { |d| puts d.inspect }
FasterCSV.open("deletions.csv", "w") do |csv|
deletions.each { |d| csv << d }
end
I’m using FasterCSV because it’s, um, faster, so make sure you have that gem installed. The script outputs two CSV files. One for additions, items that exist in the new file, but not the old one. And one for deletions, items that exist in the old file but not the new one. I have the normalize method in there to clean up some data. In my case I had leading zeros in some places but not others, so I removed those there.
So there you have it: tedium and eye strain avoided!
How to Restore MySQL Data from a Crashed Server
First off let me say that I hope you never have to deal with this. It’s terrible. I thought I had my servers sufficiently backed up, but I didn’t. Back up your database servers multiple ways; it’ll definitely pay off. Seriously.
OK so let me put you into the thick of it: through a terrible set of circumstances you have an unbootable Ubuntu server and can only recover the data. No other backups exist. A lot of data is on the line. Here’s how to be a hero.
Disclaimer: I’m not actually sure this is the best or right way to do this, but it worked for me. I am repeating this from memory so I apologize in advance for errors. Proceed at your own risk.
Setup a new MySQL server
I use the Rackspace Cloud, so spinning up a new server was no problem for me. If you’re using real hardware, good luck. Next, get MySQL up and running. Hopefully if you’re this far in you know how to do that.
Compress the data directory on your crashed server
You’ll need to start the crashed server in recovery mode if your cloud-ish host offers that or mount the drives in a working machine. In my case the data was stored in /var/lib/mysql. If you don’t know where your data is, check your MySQL config file. Look for datadir in /etc/mysql/my.cnf. Compressing the directory would look something like this:
$ cd path/to/mount/var/lib/
$ sudo tar -czvf mysql.tgz mysql/
Once you’ve gzipped your data, scp it over to your new server.
$ scp mysql.tgz user@newserverip:
Run your new server with your old data
Login to your new server. Your compressed data should by in your home folder. Decompress your data.
$ tar -xzvf mysql.tgz
Now let’s stop MySQL:
$ sudo /etc/init.d/mysql stop
Move the new MySQL data for safe keeping:
$ cd /var/lib
$ sudo mv mysql mysql_new
Copy your old data into place and fix the permissions:
$ sudo cp -r ~/mysql mysql
$ sudo chown -R mysql:mysql mysql/
Now it would be great if you could just start MySQL and call it a day, but there is good chance you’d have a mess. So instead we’re going to start MySQL and have it ignore permissions so that we can dump the data with mysqldump. Start MySQL, ignoring grant tables:
$ sudo mysqld_safe --skip-grant-tables &
You should now be able to connect to MySQL as root without a password and poke around in your databases. This is a good time to do a sanity check and make sure everything is in order.
$ mysql -u root
Once you feel good about what’s in there, it’s time to dump the data. You can dump all the data at once, and if you have lots of databases that is probably your best bet. I prefer to do it one database at a time so that I can selectively restore. Let’s head back to our home folder and put these files somewhere safe.
$ cd ~
$ mkdir backup
$ cd backup
Run the following for each one of your databases and you’ll end up with a backup directory full of .sql files.
$ mysqldump -u root databasename > databasename.sql
Get your new server back in fighting shape
Now at this point you can kill MySQL if you know how to look up its process ID and kill it, but sometimes MySQL can be a hard one to kill so we’re going to resort to rebooting. Let’s get or data directories back in order first:
$ cd /var/lib
$ sudo mv mysql mysql_old
$ sudo mv mysql_new mysql
All right. Our clean data is back in place. Let’s reboot the server.
$ sudo reboot now
Let’s restore some data!
At this point MySQL should be running with a standard set of data. So, reconnect to your server and make sure that MySQL is running and you can login. While you’re in there, create a database for each one that you’d like to restore.
$ mysql -u root -p
> create database databasename;
Once you’ve confirmed that all is well and have created your databases, exit from there and move to your backup directory.
$ cd ~/backup
Hey, we’ve made it to the big moment where you get to restore some data! Run the following for each file that you need to restore.
$ mysql -u root -p databasename < databasename.sql
Your data is back on a fresh MySQL server! You did it! Great work. Everyone whose data you saved will be anxiously lined up to shake your hand (and then ask why there were no backups).
You have averted disaster. Time to grab a cocktail and celebrate! That is unless you still need to reconfigure all your applications to use the new server… drat.