Thursday, July 09, 2009

Do Not Use PostgreSQL on Bluehost

Argh.

Much to my dismay, Bluehost treats PostgreSQL as a second class citizen. You can't back up PostgreSQL databases using the standard Bluehost tools, and the Export function in the phpPgAdmin tool is non-functional.

If you had shell access (why God why does Bluehost require ID verification for shell access?) you might be able to do backups using command line tools, but I'll never know.

Back to MySQL...

Wednesday, July 08, 2009

Drupal: Hell Yes!

Just tried Drupal for the first time. Wow - an hour to get a pleasant looking corporate site up, and another two to get things configured and teach someone else how to create content.

I will never build my own content management system again.

I also took the opportunity to try Postgres and used it as the underlying database, but the Drupal installation was so smooth that I didn't get to learn anything about Postgres.

Saturday, July 04, 2009

Inside Job at Save On Foods

Witnessed something bizarre this evening while walking past the Save On Foods at 152nd and highway 99 in Surrey.

A dark colored SUV stops traffic on the East side of Save On, while the occupants have a very animated discussion. Then the driver jumps out, yanks open the back door, and bodily hauls out the much scrawnier dude in the back. I was sure a beat down was about to ensue, but instead the scrawny guy (who, incidentally, was wearing a golf shirt very much like those worn by Save On staff) mumbles something like "Oh, I didn't know", crosses the sidewalk, walks into the bushes next to Save On, and pulls out a basket full of what I assume to be groceries.

He gets back in the SUV and they all drive off.

I can only assume that one of them snuck the groceries out the side door near the deli counter... I bet if Save On has cameras on that door they would see something entertaining around 6:45pm on July 4th!

Saturday, June 20, 2009

Google Apps SPF Checking More Strict than Gmail

When you host your email through Google Apps, you can create email aliases that are called "groups". Very recently messages routed via those aliases started to silently drop off the face of the earth. The exact same message sent directly to a real account (part of the same Google Apps domain) was received just fine.

The only thing that stood out as interesting was that the successfully received email was marked as a "soft fail" under SPF verification.

Now, these messages are all being routed through Radiant mail servers. Ah good old Radiant. Only several years into SPF being widely accepted and they still haven't published an authoritative list of their mail servers.

So my domain's SPF record listed every other place we might send from, but did not claim this was an exclusive list - hence the soft fail.

After totally botching my SPF record a few times because I didn't remember that "include:"ing domains without SPF records results in automatic fails, I used a "ptr:radiant.net" to say we trust any host that reverse resolves to *.radiant.net. (This is highly frowned upon because anyone can spoof reverse DNS lookups, but it seems like a limited risk for a small domain like mine.)

Anyway, as soon as that change propagated Google Apps started accepting my messages via aliases again. Very curious that they would have inconsistent SPF processing...

Saturday, June 13, 2009

The Ineffectiveness of Double Entry Email Validation

Way back when, pre-2002, I started maintaining an ecommerce website that delivers ordered products via email. This means that customers have to enter their email address during the order process, and that is more challenging than one might expect.

In fact, I recall that the bulk of the customer service revolved around resending orders to people that screwed up their own addresses. Unacceptable! So I whipped up the latest in luser-combatting technology - a double entry email verification system. It was totally vanilla, two entry boxes that got compared (minus whitespace) on the review page.

This seemed to cut down problem users significantly, and there was much rejoicing!

Now, at the time of implementation I added a logging function and periodically reviewed the results to make sure that my syntax validation wasn't too strict. A couple years ago I realized that this data was quite a bit more interesting than I previously thought, and started saving it for later analysis.

Later has arrived.

A short Python script that heavily abuses generators filters out the really stupid attempts and groups the rest into sessions based on difflib comparisons of the email addresses and the log timestamp (one hour cutoff).

I expected to see lots of people struggling with entering an address twice, possibly to the extent that I should not reject them on the basis that it is more important to take customers' money before they get frustrated and give up. Not so at all.

  • Only 4.22% had any kind of trouble - everyone else got their syntactically correct address entered twice identically on the first try and kept it that way. Huh... that's a lot of users having their time wasted for no reason.
  • 41.92% of trouble sessions gave up and never gave us any money.
  • Of the trouble sessions, 92.62% had trouble entering the same address twice.
  • Of the trouble sessions, 33.01% "round tripped", which is to say they finished with two matching email address that happen to match the very first email address they entered. Not helping those guys!
  • Of the trouble sessions, 4.6% started out with a good double email entry but then sabotaged themselves in some way (we have single page data entry forms...).
  • Average attempts for a trouble session before giving up: 1.44. So if they don't get it right on the first try, they probably aren't coming back.
  • Average attempts for a trouble session that eventually results in success: 2.23
A lot of these categories overlap, so summaries are tricky, but one clear finding is that 75% (gave up + round trippers) of the people that had trouble were not helped by the double entry system.

What I don't have is information on which sessions resulted in orders, or even include multiple orders. Demographics wise, I can confidently say that customers skew to the over-30 side, but beyond that I couldn't say.

I am currently leaning towards completely scrapping the double entry system, and possibly making the syntax validation a warning. This would move a certain number of customers into the post transaction support queue, but would hopefully increase the total number of transactions.

Monday, June 08, 2009

Pre-Rendering to Defeat Ad Blocking

I wonder why big media companies (like the newspapers going bankrupt because no one will pay for anything anymore) are still offering their content in a form that facilitates ad blocking.

It seems like if you are CNN or the New York Times you probably don't care that much about search engine traffic and therefore wouldn't be punished as severely for doing something like pre-rendering most of your content (with ads embedded of course). It seems to me this would make ad filtering a lot harder - you'd need more than a regular expression to do the work.

You would have to use heat maps to do links of course, but as long as you passed all the links through a redirector it still wouldn't be obvious what links are ads.

Or maybe ad blocking really isn't as wide spread as some make it out to be?

I don't understand why ads on webpages would be fundamentally less valuable than ads in newspapers though... (this is the claim of the dying companies, yes?) Maybe the sheer volume of amateur content is driving rates down?

Sunday, June 07, 2009

The Golden Age

Finally read the last book of John C. Wright's Golden Age sci fi series. I think I picked it up because of an Orson Scott Card recommendation.

Two things really bugged me in this tale of the far far future.

Firstly, the main characters were all massively augmented by technological evolution, yet still plagued by obnoxious character flaws. I would like to believe that the elites of 1000s of years from now will be more emotionally stable than the drama queens of today.

Secondly, the digressions into philosophy as a plot driving mechanism were painful. Much like in Zen and of Art of Motorcycle Maintenance I largely skipped those "deep" sections because I don't care enough to work through the tortured logic. And the idea that anyone would still be arguing on a level that present day readers could understand is kind of silly.

So I didn't actually think much of the story. All that being said, Wright's exploration of property rights and personhood in a society that doesn't require a biological presence is fascinating. Evolution is explicitly by "intelligent design", with individuals modifying themselves or creating "children" based on whatever criteria they want. Clones have a special place in the law, and there are all sorts of interesting rules declaring how much damage to copy can be tolerated before considering the copy a new individual.

Interesting stuff!