Hello from Windows 8

Well, I went ahead and installed the $40 download version of the Windows 8 upgrade on my ThinkPad. I did an in-place upgrade, and hoped for the best. It turned out pretty well. I had to remove several Lenovo utilities, but almost everything else seems to be working. Visual Studio 2010 is working fine. (I haven’t tried much, other than a “hello world” program, but that worked OK.) My “WIMP” stack is fine — MySQL is running, and PHP under IIS is still working fine. I checked phpMyAdmin and my test Drupal site, and they both look OK.

Most of the utility programs I use seem to be fine. That includes DropBox, Evernote, KeePass, IrfanView, VLC, and Notepad++. I still need to re-install iTunes, so I’m not sure about that, but I think it will work. And of course Firefox is working fine — I’m using it right now to write this blog post.

The install process was pretty smooth. It took about an hour from the point where I started the purchase, to the point where all the files were downloaded, the pre-install stuff was done, and the actual install started. Then, the install itself took about an hour to complete.

My experience on the ThinkPad has been good enough that I think I can probably do an in-place upgrade of my desktop too, so I’ll probably do that when the box upgrade arrives.

I’ve got a bunch of other stuff to mess around with, so I’ll probably write at least one more blog post on this. But, so far, so good!

my first Drupal module

Here’s my second blog post for today. (Still sitting around at home, watching storm coverage.)

I recently read an article on CNET about how companies are increasingly looking at sites like GitHub when they’re looking at potential hires, to see actual code, rather than just going on what people say about themselves on, say, LinkedIn. I think the article exaggerates a bit, and maybe overgeneralizes. There are plenty of great programmers working in environments where they’re not likely to be posting any code on GitHub. I have been thinking lately, though, that it would be good if I had some open-source code out there for people to look at. My boss recently wanted me to write a Drupal module that would allow people to embed our store locator in a Drupal site. I recently finished writing the initial version of that module, and posted it as a sandbox project at drupal.org. You can find it here.

Because Drupal itself is open-source, and because PHP is interpreted, you really have to expose your code if you’re writing a Drupal module. So, as a beneficial side-effect, I now have some code out there that I could point someone to, if I needed to show a code sample to anyone. Mind you, I’m not actively looking for a new job, but it’s good to have something out there.

Given the kind of work that I normally do, it’s not that common that I work on any code that (a) I can post openly, (b) isn’t part of a “group effort” that multiple people have worked on, (c) is (somewhat) self-contained, and (d) is non-trivial. I think a lot of developers are likely in this category. It’s a good idea to keep an eye out for opportunities to work on occasional projects that fit these criteria, and can be posted publicly to GitHub or similar sites.

I’d like to do a couple of blog posts later highlighting some of the stuff I learned while writing this module. While Drupal is reasonably well-documented (for an open source project), there are a fair number of “dark corners” that are hard to get a handle on, and which I could possibly write some useful posts on.

Windows 8

It’s Sunday morning, and I’ve got nothing much to do, other than wait for Hurricane Sandy to hit, so I thought I’d catch up on blogging. I have a few things I want to write up, the first being some thoughts on Windows 8. (I’ve found a couple of good reviews/articles on Win 8 at The Register and ComputerWorld.)

I pre-ordered a boxed copy of the Windows 8 upgrade from Newegg, and I’d planned on using that to upgrade my ThinkPad from Windows 7 to 8 this weekend. However, it hasn’t arrived yet. I then thought about just downloading the $40 upgrade from Microsoft and using the boxed copy to upgrade my desktop at some point. I went as far as running the upgrade advisor on the ThinkPad, but the results I got made me back off on that plan and rethink things a bit.

Specifically, Visual Studio 2010 is listed as “not compatible”. I was pretty surprised at this, since I would expect that MS would want developers to be able to move to Win 8 early. I realize that they’d also like to see developers move to VS 2012, but they must know that not everyone can do that right away.

So, I’ve been thinking about my options. One option would be to just do a clean install of Windows 8 on the ThinkPad, and not worry about VS 2010. I do like having it available, but the ThinkPad isn’t my main machine, so there’s no reason I really need it to have VS 2010. Another option would be to just try the upgrade and see what happens. This guy has apparently had some luck with VS 2010 on Windows 8, so maybe it’ll work, even if it’s marked as “not compatible” by the upgrade advisor.

Another interesting thought I’ve had, after reading about how awesome Hyper-V is on Windows 8, is to have a fairly vanilla Win 8 install on the ThinkPad, then have VS 2010 and some other stuff set up in a Win 7 VM. (There are good articles on Hyper-V support in Windows 8 here and here.) Of course, then I need to have a Win 7 license that I can use in a VM. In the past, I’ve learned the hard way that you can’t reuse an OS license from a physical machine from a major OEM in a VM — it detects that you’re not on actual hardware from that OEM, and locks you out. I’m not 100% sure if that’s still the case, but I’d bet it is. So I can’t just use the ThinkPad Win 7 license in the VM.

I think I have a Win 7 product key from my old MSDN subscription, from my previous employer, but that subscription expired a couple of years ago, and I’m not sure if the product keys would still be valid. Which then brings up a bigger question that I’ve been putting off thinking about: Is it time for me to break down and finally buy my own MSDN subscription, or TechNet subscription? TechNet is affordable enough, but MSDN costs about as much as a new laptop would. I like being able to mess with VMs and experiment with new stuff from Microsoft, but the cost of doing so if somewhat prohibitive.

geocoding experiments

I wrote an initial blog post on Gisgraphy about a month ago. I wanted to write a follow-up, but hadn’t gotten around to it, what with all the other stuff I have going on. But I’m going to take a few minutes now and write something up.

The initial import process to get the Gisgraphy server up and running took about 250 hours. The documentation said it would take around 40 hours, but of course there’s no way to be accurate about that kind of thing without knowing about the specific hardware of the server it’s being installed on, and the environment it’s in. I’m guessing that, if I had more experience with AWS/EC2, I would have been better able to configure a machine properly for this project.

Once the import was complete, I started experimenting with the geocoding web service. I quickly discovered something that I’d overlooked when I was first testing, against his hosted web service. The geocoding web service takes a free-form address string as a parameter. It’s not set up to accept the parts of the address (street, city, state, zip) separately. It runs that string through an address parser, and here’s where we hit a problem. The address parser, while “part of the gisgraphy project,” is not actually open source. An installed instance of Gisgraphy, by default, calls out to the author’s web service site to do the address parsing. And, if you call it too often, you get locked out. At which point, you have to talk about licensing the DLL or JAR for it, or paying for access to it via web service.

Technically, the geocoder will work without the address parser, but I found that it returns largely useless results without it. For instance, it will happily return a result in California, given an address in New Jersey. I’m not entirely sure how the internal logic works, but it appears to just be doing a text search when it’s trying to geocode an un-parsed address, merely returning a location with a similar street name, for instance, regardless of where in the US it is.

While I don’t think the author is purposely running a bait-and-switch, I also don’t think he’s clear enough about the fact that the address parser isn’t part of the open source project, and that the geocoder is fairly useless without it. So, we shut down the EC2 instance for this and moved on the other things.

Specifically, we moved on to MapQuest Open, which I was going to write up here in this post, but I need to head out to work now, so maybe another time.

almost done

I just got back from my parents’ old house in Whiting. I threw out three bags full of stuff, and carted away four plastic tubs full of random stuff, three of which came home with me and one of which has been dropped off at my storage unit in Bridgewater. The house is now as empty as I hope it needs to be. The closing, as far as I know, is still scheduled for Wednesday. I think I still need to make one more trip to Whiting, to drop off my keys and some paperwork at the Cedar Glen Lakes office, but I don’t think I’ll need to go back to the house again. It feels weird, after having it on the market for two years, to finally be (almost) done with it.

Two Man Gentleman Band

I heard these guys on Soundcheck yesterday. They were pretty funny. Worth a listen, if you like funny songs about food and booze and stuff!

Gisgraphy

My boss stumbled across a project named Gisgraphy recently. A big part of what we do involves the need for geocoding. We have generally been using geocode.com for batch geocoding, but there’s a cost to that, and they only do US and Canada. There are many other geocoding services, but if you’re doing heavy volume, you’re generally excluded from free options, and the paid options can get expensive.

Gisgraphy is an open source project that you can set up on your own server. It will pull in data from freely-available sources, load it all into a local database, then allow you to use a REST web service to geocode addresses. A little testing with some US addresses leads me to believe that it’s generally accurate to street level, but not quite to house level. So, I’m not sure that we’ll want to use it for all of our geocoding, but it ought to be generally useful.

We decided to set it up on an AWS EC2 instance. We started messing with EC2 VMs for another project, and it seemed like EC2 would be a good fit for this project too. I started out with a small instance Linux VM, but switched it to a medium instance, since the importer was really stressing the small instance. I will probably switch back to small after the import is done. That’s one nice thing about EC2: being able to mess with the horsepower available to your VM.

Gisgraphy uses several technologies that are outside my comfort zone. I’m primarily a Windows / .NET / SQL Server guy, with a reasonable amount of experience with Linux / MySQL / PHP. Gisgraphy runs on Linux (also on Windows, but it’s obviously more at home on Linux), so that’s ok. But it’s written in Java, and uses PostgreSQL as its back-end database. I have only very limited experience with Java and PostgreSQL. And, of course, I’m new to AWS/EC2 also.

So, setting this all up was a bit of a challenge. The instructions are ok, but somewhat out of date. I’m using Ubuntu 12.04 LTS on EC2, and many things aren’t found in the same places as they were under whatever Linux environment he based his instructions on. For the sake of anyone else who might need a little help getting the basic setup done under a recent version of Ubuntu, I thought I’d list out a few pointers, where I had to do things a bit differently than found in the Linux instructions:

  • Java: I installed Java like this: “sudo apt-get install openjdk-6-jdk openjdk-6-jre”.
  • And JAVA_HOME should be /usr/lib/jvm/java-6-openjdk-i386/ or /usr/lib/jvm/java-6-openjdk-amd64/.
  •  PostgreSQL: I installed the most recent versions of PostgreSQL and PostGIS like this: “sudo apt-get install postgresql postgresql-contrib postgis postgresql-9.1-postgis”.
  • Config files were in /etc/postgresql/9.1/main and data files were in /var/lib/postgresql/9.1/main.
  • PostGIS: In his instructions for configuring PostGIS, the “createlang” command wasn’t necessary. 
  • And the SQL scripts you need to run are /usr/share/postgresql/9.1/contrib/postgis-1.5/postgis.sql and spatial_ref_sys.sql.

That’s about it for now, I think. I want to write up another blog entry on Gisgraphy, once I’ve got it fully up & running. And there might be some value in a blog entry on EC2. But now I have to get back to finishing my laundry!

Stumbling my way through the Drupal API

I’ve had to fix some interesting problems at work recently, related to a Drupal site that we’ll be rolling out soon. I just finished fixing one issue that, while seemingly minor, took quite a while to figure out.

I’m really glad to have come up with a good solution. The thing that amuses me most about this is that, after more than eight hours of messing around, the final solution involved writing only about a half-dozen lines of code.

The problem, in a nutshell, is that we have a content type in the system that represents a university. There’s a location field on each node with, minimally, city and country specified. The user can search for locations, using some custom search code, and the search results are displayed via a standard Drupal view. We allow the user to sort the results by one of a few different fields, with the sort drop-down exposed from the view. This works fine, except when sorting by country. On the location record, only the two-letter country code is stored, for instance “AE” for “United Arab Emirates”. So, when you sort by country, it’s really sorting on country code, so “AE” goes to the top, which isn’t really what the client wanted.

Of course, the first thing I did was Google the problem. I found this issue discussion, which pretty much matches my problem. There was a suggestion in the comments there about using hook_views_pre_render to re-sort the results right before displaying them. That works great, if you’re not paging results. But, if you’re pulling results back one page at a time from a large result set, this doesn’t work, since the pre-render hook only gives you the current page.

So I figured out that I really need to sort by country name at the SQL level, while retrieving results. This led to my next problem, which is that, even with the location module installed, there’s no SQL country lookup table in Drupal. The list of country codes and names is just stored in code, in an array, which can be retrieved via _country_get_predefined_list. (You shouldn’t call that directly, though, of course; you should use country_get_list.)

So off I went to find a module that could give me a SQL table with country info in it. The countries module does that, and a bit more. So, I installed that and figured out where the country table was. Then, my next blind alley was figuring out how to join to the new country table in a view. I was hoping I could just add a join to it in the view definition, and go from there. Well, I still don’t know that much about Drupal views, and it didn’t seem possible to do that easily.

So, the next blind alley was to see if I could alter the view SQL with hook_views_query_alter, which seemed sensible. Well, the query object that you get from that hook isn’t a nice simple query object that can easily be changed, so that turned out to be another dead end. (It’s likely possible that I could have figured it out, but it seemed like the wrong approach.)

Then, finally, I stumbled across this SO question. The one answer posted there led me in the direction of modifying the query with hook_query_alter, which can be used to modify just about any query Drupal issues to MySQL. So, finally, I found a workable solution.

hasAllTags('views', 'views_university_search')) {
    $ord =& $query->getOrderBy();
    if (array_key_exists('location_country', $ord)) {
      $query->addJoin('INNER', 'countries_country', 'cc', 'cc.iso2 = location.country');
      $ord = array('cc.name' => $ord['location_country']);
    }
  }
}

So that’s it. I add a join, and replace the ‘order by’ clause. About a half-dozen lines of code. Oh, and I now also understand passing by reference in PHP a little better too!

Museum visits

I got on a bit of a museum kick last August, and I seem to be doing the same thing this August. I went to the Met last weekend. I was going to go to the Frick and Whitney today, but my nearly deaf cab driver misunderstood my destination, and dropped me off much closer to the Met than the Frick, so I decided to just go with it, and visited the Met again. This is fine, as there is so much stuff in the Met that you can go twice in two weeks and see completely different stuff.  This time around, I stumbled into the Degas section, and spent some time browsing around in that neighborhood.

Last weekend, I took a cab to and from the museum. This weekend, I took a cab up, but walked back to Penn Station, which is a nice long walk. My ankles and knees hurt a little now, but I made the walk without any grief, so that makes me feel a little better about my current fitness level.

fun with WSDL and CURL

Ever since the debacle described in this blog post, I’ve made it a point to double-check the WSDL on the SOAP web services for our main product, any time I’m doing a non-trivial rollout, even if I know I haven’t changed anything that should affect the WSDL.

Up until today, I’ve always just done it by bringing up the WSDL URL for each web service in Firefox, and saving it to a text file. There’s only a half-dozen web services, so it doesn’t take that long. But this morning I finally broke down and wrote a batch file to fetch them all, using cURL.

I’ve gotten a bit more enthusiastic about using cURL, and other tools, to simplify things for me recently, since reading this blog post by Scott Hanselman.