Search Oddity

Last night, I was checking the recent visitors report generated by SiteMeter, and I noticed one visitor had gone through 12 pages on the blog (which is quite a bit more than average). I was curious about what brought that person here; it turned out that they’d reached the site through an MSN Search for “November 2002”.

As of last night, this blog’s November, 2002 archives were the 8th result from that search. I can’t imagine what brought me so far up on the result list — though my blog is also high on MSN if you search for “November 2000” or “November 2001”.

Sometimes, I wonder about the web….

Migrated, with minimal breakage

I’ve just finally transferred most of the content of my old blog, Defenestration Corner, to this blog. I wound up writing a bunch of bad Python code to do much of the work, but still had to do quite a bit of manual cleanup (and someday, I may yet get around to categorizing the posts I transferred). I lost all the comments to the blog in the process; there are few enough (and many of them were spam, anyway) that I’ll look at them by hand rather than bother to try to write yet more single-purpose code.

One of the areas which caused me the most trouble was my use, in the early days, of a non-empty posting to hold a picture. I finally decided that those few comments were not worth the effort and tossed them, changing the link to the picture itself instead of the posting.

I also learned, yet again, to Keep It Simple, Stupid. My original plans, months ago, involved writing wonderfully clever code to go through the old site, grabbing each posting, examining it to see if it had any references which needed changing, and, if so, finding the target posting and updating it. This would have involved a stack, worrying about circular references, and many other perils. I eventually (months later) took a simpler path; I made a first pass over all of the articles, capturing essential information about them, such as the date as rendered by Manila (rather than trying to figure it out from the UTC date, sometimes badly-formed, passed back through the Manila SOAP interface into Python) and the title of the article. I used the date and title to create a slug for WordPress; I probably didn’t use the same algorithm WordPress would have used, but it didn’t matter.

After that, it was fairly easy to go through the rendered, content-only version of each article (thereby letting Manila resolve its internal “shortcuts”), find all the internal references, convert them to the new version (or, for images, just go to the underlying image), and use the MySQLdb Python module to directly insert the articles into the database on readthisblog.net.

I ran into a few problems where Manila did, ummm, odd things; rather than program around them, I just manually fixed up the results. And I’ll probably be doing more manual fixups later.

I still have to arrange for a redirect from dss.editthispage.com to this site, and I still will have to convert from the Manila forms (like /discuss/msgReader$nn) to the renamed postings here, but that’s fairly simple. I hope.

Don’t omit the commit

I’m slowly making progress at converting my old blog from Manila to WordPress; it looks like the simplest approach is to write a bunch of Python scripts to read the “content-only” version of the blog, resolve intra-blog references, and then directly insert the result into the underlying MySQL database using MySQLdb.

In testing this approach, I was trying to create a posting from scratch in a copy of WordPress running on my machine; I based the program on the examples I found here and here. But, even though the program seemed to work, and I could read the changes while the program was executing, but after the program finished, the database never reflected the changes — except that the ID for new entries kept increasing every time I ran the program.

It took me a long time to figure out what was wrong, but I eventually guessed it: I had to do an explicit “COMMIT” to have the changes I made from Python stick. I don’t know why the examples don’t show this, but it sure makes a difference.

More to come, I hope.

A catch in my referer log

One advantage of having a low-volume blog is that I check out my referer log fairly often. To be more accurate, I use the digested version provided by SiteMeter, which makes it easy for me to find out information about my hits, including search terms.

This morning, I noticed that someone had read more than one page on the blog and that they’d found it by doing a UK Google search for “parking lower slaughter”. Since I was curious, I re-ran the search and found that the top hit was on a site with the intriguing name of BeenThere-DoneThat. I clicked through, and liked what I saw (“the Unofficial Guide to Great Britain”), so I’m blogging it here so I have a chance of remembering it. I’ll also dogear it, but that’s only helpful when I have a connection behind the IBM firewall, something I try to avoid when on vacation.