Gisgraphy

My boss stumbled across a project named Gisgraphy recently. A big part of what we do involves the need for geocoding. We have generally been using geocode.com for batch geocoding, but there’s a cost to that, and they only do US and Canada. There are many other geocoding services, but if you’re doing heavy volume, you’re generally excluded from free options, and the paid options can get expensive.

Gisgraphy is an open source project that you can set up on your own server. It will pull in data from freely-available sources, load it all into a local database, then allow you to use a REST web service to geocode addresses. A little testing with some US addresses leads me to believe that it’s generally accurate to street level, but not quite to house level. So, I’m not sure that we’ll want to use it for all of our geocoding, but it ought to be generally useful.

We decided to set it up on an AWS EC2 instance. We started messing with EC2 VMs for another project, and it seemed like EC2 would be a good fit for this project too. I started out with a small instance Linux VM, but switched it to a medium instance, since the importer was really stressing the small instance. I will probably switch back to small after the import is done. That’s one nice thing about EC2: being able to mess with the horsepower available to your VM.

Gisgraphy uses several technologies that are outside my comfort zone. I’m primarily a Windows / .NET / SQL Server guy, with a reasonable amount of experience with Linux / MySQL / PHP. Gisgraphy runs on Linux (also on Windows, but it’s obviously more at home on Linux), so that’s ok. But it’s written in Java, and uses PostgreSQL as its back-end database. I have only very limited experience with Java and PostgreSQL. And, of course, I’m new to AWS/EC2 also.

So, setting this all up was a bit of a challenge. The instructions are ok, but somewhat out of date. I’m using Ubuntu 12.04 LTS on EC2, and many things aren’t found in the same places as they were under whatever Linux environment he based his instructions on. For the sake of anyone else who might need a little help getting the basic setup done under a recent version of Ubuntu, I thought I’d list out a few pointers, where I had to do things a bit differently than found in the Linux instructions:

  • Java: I installed Java like this: “sudo apt-get install openjdk-6-jdk openjdk-6-jre”.
  • And JAVA_HOME should be /usr/lib/jvm/java-6-openjdk-i386/ or /usr/lib/jvm/java-6-openjdk-amd64/.
  •  PostgreSQL: I installed the most recent versions of PostgreSQL and PostGIS like this: “sudo apt-get install postgresql postgresql-contrib postgis postgresql-9.1-postgis”.
  • Config files were in /etc/postgresql/9.1/main and data files were in /var/lib/postgresql/9.1/main.
  • PostGIS: In his instructions for configuring PostGIS, the “createlang” command wasn’t necessary. 
  • And the SQL scripts you need to run are /usr/share/postgresql/9.1/contrib/postgis-1.5/postgis.sql and spatial_ref_sys.sql.

That’s about it for now, I think. I want to write up another blog entry on Gisgraphy, once I’ve got it fully up & running. And there might be some value in a blog entry on EC2. But now I have to get back to finishing my laundry!

Stumbling my way through the Drupal API

I’ve had to fix some interesting problems at work recently, related to a Drupal site that we’ll be rolling out soon. I just finished fixing one issue that, while seemingly minor, took quite a while to figure out.

I’m really glad to have come up with a good solution. The thing that amuses me most about this is that, after more than eight hours of messing around, the final solution involved writing only about a half-dozen lines of code.

The problem, in a nutshell, is that we have a content type in the system that represents a university. There’s a location field on each node with, minimally, city and country specified. The user can search for locations, using some custom search code, and the search results are displayed via a standard Drupal view. We allow the user to sort the results by one of a few different fields, with the sort drop-down exposed from the view. This works fine, except when sorting by country. On the location record, only the two-letter country code is stored, for instance “AE” for “United Arab Emirates”. So, when you sort by country, it’s really sorting on country code, so “AE” goes to the top, which isn’t really what the client wanted.

Of course, the first thing I did was Google the problem. I found this issue discussion, which pretty much matches my problem. There was a suggestion in the comments there about using hook_views_pre_render to re-sort the results right before displaying them. That works great, if you’re not paging results. But, if you’re pulling results back one page at a time from a large result set, this doesn’t work, since the pre-render hook only gives you the current page.

So I figured out that I really need to sort by country name at the SQL level, while retrieving results. This led to my next problem, which is that, even with the location module installed, there’s no SQL country lookup table in Drupal. The list of country codes and names is just stored in code, in an array, which can be retrieved via _country_get_predefined_list. (You shouldn’t call that directly, though, of course; you should use country_get_list.)

So off I went to find a module that could give me a SQL table with country info in it. The countries module does that, and a bit more. So, I installed that and figured out where the country table was. Then, my next blind alley was figuring out how to join to the new country table in a view. I was hoping I could just add a join to it in the view definition, and go from there. Well, I still don’t know that much about Drupal views, and it didn’t seem possible to do that easily.

So, the next blind alley was to see if I could alter the view SQL with hook_views_query_alter, which seemed sensible. Well, the query object that you get from that hook isn’t a nice simple query object that can easily be changed, so that turned out to be another dead end. (It’s likely possible that I could have figured it out, but it seemed like the wrong approach.)

Then, finally, I stumbled across this SO question. The one answer posted there led me in the direction of modifying the query with hook_query_alter, which can be used to modify just about any query Drupal issues to MySQL. So, finally, I found a workable solution.

hasAllTags('views', 'views_university_search')) {
    $ord =& $query->getOrderBy();
    if (array_key_exists('location_country', $ord)) {
      $query->addJoin('INNER', 'countries_country', 'cc', 'cc.iso2 = location.country');
      $ord = array('cc.name' => $ord['location_country']);
    }
  }
}

So that’s it. I add a join, and replace the ‘order by’ clause. About a half-dozen lines of code. Oh, and I now also understand passing by reference in PHP a little better too!

Museum visits

I got on a bit of a museum kick last August, and I seem to be doing the same thing this August. I went to the Met last weekend. I was going to go to the Frick and Whitney today, but my nearly deaf cab driver misunderstood my destination, and dropped me off much closer to the Met than the Frick, so I decided to just go with it, and visited the Met again. This is fine, as there is so much stuff in the Met that you can go twice in two weeks and see completely different stuff.  This time around, I stumbled into the Degas section, and spent some time browsing around in that neighborhood.

Last weekend, I took a cab to and from the museum. This weekend, I took a cab up, but walked back to Penn Station, which is a nice long walk. My ankles and knees hurt a little now, but I made the walk without any grief, so that makes me feel a little better about my current fitness level.

fun with WSDL and CURL

Ever since the debacle described in this blog post, I’ve made it a point to double-check the WSDL on the SOAP web services for our main product, any time I’m doing a non-trivial rollout, even if I know I haven’t changed anything that should affect the WSDL.

Up until today, I’ve always just done it by bringing up the WSDL URL for each web service in Firefox, and saving it to a text file. There’s only a half-dozen web services, so it doesn’t take that long. But this morning I finally broke down and wrote a batch file to fetch them all, using cURL.

I’ve gotten a bit more enthusiastic about using cURL, and other tools, to simplify things for me recently, since reading this blog post by Scott Hanselman.

random PHP functions

I’m still doing a fair amount of PHP work. Right now, it’s all Drupal, for a site we’re rolling out very soon. I keep stumbling across random PHP functions I hadn’t heard of before, and that turn out to be nice little time savers. Two examples:

  1. curl_setopt_array: I used to just call curl_setopt() a bunch of times to set all my options Now I can set a bunch of options in one fell swoop.
  2. http_build_query: Nothing I couldn’t previously do with simple string concatenation, but this is much cleaner.

WIndows 8, Mountain Lion, and Ubuntu 12

I have to do a 10pm web site rollout tonight, so I find myself at home with some time to kill. I haven’t gotten much of a chance to play around with Windows 8, so I decided to download the 90-day eval, and install it on my old laptop. I have the ISO downloaded and ready to go now. However, I had installed Ubuntu 11 on the laptop back in February. I haven’t really played around with it much since then, and I was ready to wipe it out, but when I turned it on, I got an update message letting me know that I could update it to Ubuntu 12.04 LTS. Well, I decided I’d rather upgrade the Ubuntu install on this laptop rather than wiping it out and starting over with Windows 8. It’s running now, and seems to be chugging along smoothly.

I did a little searching, and it looks like 12.04.1 was only just released. There’s an article about it on ZDNet, dated yesterday. And I guess the original 12.04 release was a few months back, based on the date on this Lifehacker article.

There’s been a lot of OS-related news lately, with Mountain Lion just released and Windows 8 nearing general availability. My old 2007 MacBook can’t handle Mountain Lion, so I’m sticking with plain-old Lion on that for now. I’m tentatively planning to buy myself a new MacBook Pro early next year, but I’m not really that worried about it right now. And I’m curious about Windows 8, but not that enthusiastic about it, given what I already know. I read an interesting CNET article this morning, comparing Mountain Lion and Windows 8. I think I agree with his conclusions, for the most part.

I will likely upgrade both my Windows desktop and laptop to Windows 8, when the consumer version is released, but I’m not that excited about it. Meanwhile, maybe I’ll play around with Ubuntu a bit more!

Mandrill REST API

OK, here’s another REST API example. In the same system I blogged about yesterday, I also had to use the Mandrill API to send out some transactional emails. Mandrill is a relatively new service from MailChimp that can be used to send out e-mail via simple API calls. It’s pretty nice, and free to use, up to a certain point.

You can actually use it via regular old SMTP, with some control info added into the headers, or you can go ahead and use their REST API. I decided to use the API. Initially, I had no luck with that at all. After thrashing around a bit, I eventually figured out that I had to update to the newest version of EasyHttp. Then, it started working fine, no problem. I didn’t bother looking into the internals of EasyHttp to try and figure out what was causing the problem. I’m kind of curious, but, as usual, it’s more important to just get the thing working.

So, here’s another Gist with some code:

/*
 * Code/Mandrill.cs
 * These are calls to the Mandrill email service.
 * Reference: https://mandrillapp.com/api/docs/
 * 
 * ajh 2012-08-09: new
 */
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Configuration;
using System.Dynamic;
using EasyHttp.Http;
using System.Net;
using log4net;

namespace Sample.Web
{
    enum MandrillError
    {
        OK,
        WebException,
        HttpNotOk,
        Invalid,
        Rejected,
        Unknown
    }

    public class Mandrill
    {
        static string MandrillBaseUrl = ConfigurationManager.AppSettings["MandrillBaseUrl"];
        static Guid MandrillKey = new Guid(ConfigurationManager.AppSettings["MandrillKey"]);

        public static bool SendActivationEMail(BLL.TrialSignup ts, out string errorMsg)
        {
            string activationLink =
                HttpContext.Current.Request.Url.GetLeftPart(UriPartial.Authority) + "/Register/Activation.aspx?id=" + ts.Id;
            
            //send-template(string key, string template_name, array template_content, struct message) 
            dynamic sendParams = new ExpandoObject();
            sendParams.key = MandrillKey;
            sendParams.template_name = "Secret Project Trial Activation";

            sendParams.template_content = new List();
            
            sendParams.message = new ExpandoObject();
            sendParams.message.subject = "Here's your Secret Project activation email";
            sendParams.message.from_email = "info@SecretProject.com";
            sendParams.message.from_name = "Secret Project";

            sendParams.message.to = new List();
            sendParams.message.to.Add(new ExpandoObject());
            sendParams.message.to[0].email = ts.EMail;
            sendParams.message.to[0].name = ts.Name;

            sendParams.message.track_opens = true;
            //sendParams.message.track_clicks = true;

            sendParams.message.global_merge_vars = new List();
            sendParams.message.global_merge_vars.Add(new ExpandoObject());
            sendParams.message.global_merge_vars[0].name = "NAME";
            sendParams.message.global_merge_vars[0].content = ts.Name;

            sendParams.message.global_merge_vars.Add(new ExpandoObject());
            sendParams.message.global_merge_vars[1].name = "LINK";
            sendParams.message.global_merge_vars[1].content = activationLink;

            errorMsg = string.Empty;
            
            MandrillError merr = SendMessage(sendParams);

            switch (merr)
            {
                case MandrillError.OK:
                    return true;
                case MandrillError.WebException:
                case MandrillError.HttpNotOk:
                    errorMsg = "There was an issue sending your activation e-mail. Please try again later or call us directly.";
                    break;
                case MandrillError.Invalid:
                    errorMsg = "Your email address appears to be invalid. Please try again with a valid address, or call us directly.";
                    break;
                case MandrillError.Rejected:
                    errorMsg = "Your activation email was rejected. Please try again with a valid address, or call us directly.";
                    break;
                case MandrillError.Unknown:
                    errorMsg = "There was an unknown problem sending your activation email. Please try again, or call us directly.";
                    break;
            }
            return false;
        }

        public static bool SendSalesNotification(BLL.TrialSignup ts)
        {
            dynamic sendParams = new ExpandoObject();
            sendParams.key = MandrillKey;
            sendParams.template_name = "Secret Project Trial Sales Notification";

            sendParams.template_content = new List();

            sendParams.message = new ExpandoObject();
            sendParams.message.subject = "Secret Project Trial Account Notification";
            sendParams.message.from_email = "info@SecretProject.com";
            sendParams.message.from_name = "Secret Project";

            sendParams.message.to = new List();
            sendParams.message.to.Add(new ExpandoObject());
            sendParams.message.to[0].email = ConfigurationManager.AppSettings["SalesEmail"];
            sendParams.message.to[0].name = "Secret Project Sales";

            //sendParams.message.track_opens = true;
            //sendParams.message.track_clicks = true;

            sendParams.message.global_merge_vars = new List();
            sendParams.message.global_merge_vars.Add(new ExpandoObject());
            sendParams.message.global_merge_vars[0].name = "NAME";
            sendParams.message.global_merge_vars[0].content = ts.Name;

            sendParams.message.global_merge_vars.Add(new ExpandoObject());
            sendParams.message.global_merge_vars[1].name = "COMPANY";
            sendParams.message.global_merge_vars[1].content = ts.CompanyName;
            
            sendParams.message.global_merge_vars.Add(new ExpandoObject());
            sendParams.message.global_merge_vars[2].name = "EMAIL";
            sendParams.message.global_merge_vars[2].content = ts.EMail;

            MandrillError merr = SendMessage(sendParams);

            switch (merr)
            {
                case MandrillError.OK:
                    return true;
                case MandrillError.WebException:
                case MandrillError.HttpNotOk:
                case MandrillError.Invalid:
                case MandrillError.Rejected:
                case MandrillError.Unknown:
                    break;
            }
            return false;
        }

        private static MandrillError SendMessage(dynamic sendParams)
        {
            ILog _log = log4net.LogManager.GetLogger("Mandrill/SendMessage");

            string url = MandrillBaseUrl + "/messages/send-template.json";

            var http = new HttpClient
            {
                Request = { Accept = HttpContentTypes.ApplicationJson }
            };

            EasyHttp.Http.HttpResponse response;
            try
            {
                response = http.Post(url, sendParams, HttpContentTypes.ApplicationJson);
            }
            catch (WebException ex)
            {
                _log.ErrorFormat("Error: WebException - {0}", ex.Message);
                return MandrillError.WebException;
            }

            if (response.StatusCode != HttpStatusCode.OK)
            {
                _log.InfoFormat("Response = {0} - {1}", response.StatusCode, response.StatusDescription);
                _log.Info(response.RawText);
                return MandrillError.HttpNotOk;
            }

            dynamic rv = response.DynamicBody;
            _log.InfoFormat("email: {0}, status: {1}", rv[0].email, rv[0].status);

            string send_status = rv[0].status;
            if (send_status == "sent" || send_status == "queued")
                return MandrillError.OK;

            // otherwise, it should be "rejected" or "invalid"
            if (send_status == "invalid")
            {
                return MandrillError.Invalid;
            }
            if (send_status == "rejected")
            {
                return MandrillError.Rejected;
            }

            // unexpected...
            return MandrillError.Unknown;
        }
        
    }
}

I hope someone finds it useful and/or entertaining.
In this code, I’m sending out two e-mails, one is a trial account activation email, and the other is just a notification to sales. On the first, I’m looking at the errors closely and trying to return a useful error message, so the customer knows that something has gone wrong and has a clue about how to handle it. On the second, I’m less worried about that, as it’s just a notification to our salesperson.

Zoho CRM REST API

I have about a dozen programming-related items floating around in my head that I’d like to write up short blog entries about. Maybe I’ll manage to get one or two written up tonight.

I’ve been doing some work recently with some random REST APIs. When I’m calling REST APIs from .NET, I like to use EasyHttp. Based on the stats on its NuGet page, I guess it’s not that popular, but it’s updated often enough, and it works well for me.

I recently worked on a project where I had to create a lead in Zoho CRM, using their API. I had a little bit of trouble, but only because I didn’t have quite the right parameters set — it turns out that “newformat=1” is really important. I had looked for some sample C# code for the Zoho API when I started this, and didn’t find any, so I thought I’d post a quick Gist with my code:

/*
 * Sample C# code to create a lead with the Zoho CRM API.
 * ajh 2012-08-23
 */
using System;
using System.Collections.Generic;
using System.Configuration;
using System.Dynamic;
using System.Linq;
using System.Net;
using System.Web;
using System.Xml;
using System.Xml.Linq;
using EasyHttp.Http;
using log4net;

namespace Sample.Web
{
    public class ZohoCrm
    {
        static string ZohoApiBaseUrl = "https://crm.zoho.com/crm/private/xml/Leads/insertRecords"; 
        static string ZohoApiKey = ConfigurationManager.AppSettings["ZohoApiKey"];

        public static bool CreateLead(BLL.SampleLead sl)
        {
            ILog _log = log4net.LogManager.GetLogger("ZohoCrm/CreateLead");

            XDocument xmlData = new XDocument(
            new XElement("Leads",
                new XElement("row", new XAttribute("no", "1"),
                    new XElement("FL", new XAttribute("val", "Lead Source"), "Trial Signup"),
                    new XElement("FL", new XAttribute("val", "Company"), sl.CompanyName),
                    new XElement("FL", new XAttribute("val", "Last Name"), sl.Name),
                    new XElement("FL", new XAttribute("val", "Email"), sl.EMail)
                )));

            string url = string.Format("{0}?authtoken={1}&scope=crmapi&newFormat=1&xmlData={2}", 
				ZohoApiBaseUrl, ZohoApiKey, 
                HttpUtility.UrlEncode(xmlData.ToString()));

            var http = new HttpClient
            {
                Request = { Accept = HttpContentTypes.ApplicationXml }
            };

            dynamic emptyPost = new ExpandoObject();            
            EasyHttp.Http.HttpResponse response;
            try
            {
                response = http.Post(url, emptyPost, HttpContentTypes.ApplicationXml);
            }
            catch (WebException ex)
            {
                _log.ErrorFormat("Error: WebException - {0}", ex.Message);
                return false;
            }

            if (response.StatusCode != HttpStatusCode.OK)
            {
                _log.InfoFormat("Response = {0} - {1}", response.StatusCode, response.StatusDescription);
                _log.Info(response.RawText);
                return false;
            }

            XDocument xdoc;
            try
            {
                xdoc = XDocument.Parse(response.RawText);
            }
            catch (XmlException ex)
            {
                _log.ErrorFormat("Error: XmlException parsing API response - {0}", ex.Message);
                _log.Info(response.RawText);
                return false;
            }

            string msg;
            try
            {
                msg = xdoc.Descendants("result").First().Element("message").Value;
            }
            catch (Exception ex)
            {
                _log.ErrorFormat("Error: Exception reading from API response - {0}", ex.Message);
                _log.Info(response.RawText);
                return false;
            }
            if (msg != "Record(s) added successfully")
            {
                _log.InfoFormat("Unexpected XML result: {0}", msg);
                _log.Info(response.RawText);
                return false;
            }

            return true;
        }
    }

(See also this Gist.)

I’m also using log4net here.  I’m a really big fan of log4net, and I use it in almost all of my projects. I have a pretty standard log.config file I use that sets up a RollingFileAppender, with ten files of 100k each. That’s usually enough to keep a few days or weeks worth of history, depending on the level of logging and the level of activity. And I never have to worry about the log files growing out of control; they just keep rolling over. I think log4net is pretty popular, based on the numbers on their NuGet page.

I try to do a lot of error-handling whenever I’m dealing with a REST API. There are plenty of things that can go wrong. You can see in the code above that I try to trap anything that might throw an exception and log it. I’m never sure if I’m doing this the “right” way. I know some people, when writing a routine like this, would just let the exceptions happen, then have a try/catch around the call to the CreateLead method, and deal with it at a higher level.

Ghosts

I’ve only just now figured out that Ted Leo’s “Ghosts” is a cover of The Jam’s “Ghosts”. I’ve probably listened to Ted’s version 100 times in the last few years. I hadn’t really listened to The Jam at all recently, until digging out my copy of At the BBC last week. Listened to the first CD last Saturday, and I’m listening to the second CD today. Good stuff!