Using Perl and PDF::API2 to Update PDF Properties and Metadata

What do you do when you have 600 PDF documents with titles in all caps, when you need the titles to be title-cased? I dreaded the thought of asking anyone to open each document and edit the titles by hand, not to mention fearing the typos that process might introduce.

For better or worse, here was my solution:

Sounds fast and easy, right? Well, there were a few hitches:
Continue reading Using Perl and PDF::API2 to Update PDF Properties and Metadata

Twitter Status IDs and Direct Message IDs

twitter-birdI recently created a Magic Eight Ball twitter-bot as a demo. Written in Python using the python-twitter API wrapper, it runs every 2 minutes and polls twitter for new replies (status updates containing @osric8ball) and direct messages (DMs) to osric8ball. If there are any, it replies with a random 8-Ball response.

Every status update and DM has an associated numeric ID. Initially, I stored the highest ID in a log file and used that when I polled twitter (i.e. “retrieve all replies and DMs with ID > highest ID”). However, I discovered that status updates and DMs apparently are stored in separate tables on twitter’s backend, as they have a separate set of IDs. Since the highest status ID was an order of magnitude larger than the highest DM ID, my bot completely ignored all DMs. This was not entirely obvious at first, as the IDs looked very similar, other than an extra digit: 2950029179 and 273876291.

My fix for this was to store both the highest status update ID and the highest DM ID is separate log files.

Another interesting twist: you have to be a follower of a user in order to send that user a DM. Continue reading Twitter Status IDs and Direct Message IDs

University of Michigan jobs site has major browser compatibility issues

At the risk of sounding like a one-note, I would like to again talk about browser compatibility issues. These compatibility issues affect an organization’s bottom line, and should not be ignored. In this particular case, The University of Michigan’s (U-M) job web site is unusable to about 10-15% of visitors, by my estimates (they are using Google Analytics on the page, so they should have that data). To me, this says that U-M may be missing out on some of the most qualified candidates for their position openings, undeniably at great cost to the organization. [I am particularly concerned in this case because U-M is my alma mater.]

In particular, the browsers that are not compatible with the U-M jobs site are Safari, Chrome, and Opera — browsers typically used by more tech-savvy users — so U-M may be missing out on the very candidates best-suited for work in today’s web-based world.
Continue reading University of Michigan jobs site has major browser compatibility issues

T-Mobile Website Unfriendly to Chrome, Safari

Early this morning, Nicola was bugging me to add a data plan to her phone account in anticipation of receiving her shiny new MyTouch. We logged on to the site using our favored browser, Google’s Chrome. Here’s what we found:

T-Mobile\'s default page in Chrome, post login
T-Mobile's default page in Chrome, post login

After several unsuccessful attempts to view info for her line from several different screens, we called T-Mobile’s customer support. The service rep walked through the same steps and said, “OK, now you should see tabs on the left with your names, phone numbers, and ‘Add A Line’.”

That’s when it hit me. I should try a different browser.
Continue reading T-Mobile Website Unfriendly to Chrome, Safari

Apache Install and Ambiguous Errors

I installed Apache 2.2.11 on the Windows XP portion of my desktop workstation for development purposes, but I got a lot of ambiguous errors when starting from the Apache Service Monitor or the Windows start menu.

Finally, when I started Apache from the command line I got a more informative error:
(OS 10048) Only one usage of each socket address (protocal/network address/port) is normally permitted. : make_sock: could not bind to address 127.0.0.1:80 no listening sockets available, shutting down

It turns out, I had Skype running, which by default binds to ports 5520, 80, and 443. There are several solutions:
Continue reading Apache Install and Ambiguous Errors

Validating the Referer: Not as Useless as I Thought?

I used to validate the HTTP referer header to verify that users were accessing certain pages from certain other pages. For example, users accessing sampleapp/edit.cfm should be getting there from sampleapp/index.cfm. Anyone accessing sampleapp/edit.cfm without coming from sampleapp/index.cfm was probably monkeying around and should be send back to the index page, or possibly even logged out.

However, it is fairly trivial to modify your referer header, so anyone who wants to monkey around with sampleapp/edit.cfm can make it look like they are coming from sampleapp/index.cfm. (If you’re interested in modifying your HTTP headers, I suggest checking out the Tamper Data Firefox plugin.) The check provides absolutely no assurance that the user is really coming from the page. Therefore, I decided the check was useless.

I’ve been attending a weekly web application security study group with some of my colleagues for the past several weeks, where we’ve been reading and discussing The Web Application Hacker’s Handbook. The past couple sessions have been about cross-site scripting (XSS). Justin Klein Keane brought up a good point at today’s session: checking the referer may not keep a malicious user from altering his or her referer string, but could help identify victims of XSS attacks who were possibly directed to submit malicious data from a third-party site.

Checking the referer isn’t a sufficient protection against malicious users, by any means, but it could still be helpful. What do you think?

Installing Adobe AIR and Tweetdeck on an Asus eee 701

Tweetdeck is an Adobe AIR application that is a twitter client, and recently also a Facebook client.

My attempts to install Adobe AIR on the Asus eee 701 (running the default Xandros distro) were foiled several times in spite of following the instructions:

  1. Download Adobe AIR
  2. Make the AdobeAIRInstaller.bin file executable
  3. Run the .bin file as a superuser

I got a nice friendly fail message from the Adobe AIR installer every time.

I found a few relevant forum posts, e.g. Adobe Air Linux won’t install on Eee PC, that suggested memory was an issue. Sure enough, running in Full Desktop Mode with 1440×900 screen resolution (on an external display), I only had about 90MB of 500MB free.

I restarted the eee in Easy Mode and then immediately ran AdobeAirInstaller.bin. Success! (I later found these same instructions on the eee user forums.)

Installing Tweetdeck was trivial at that point: download the .air file, find it in the File Manager, and double-click it. However, when I ran it, it didn’t do anything. At one point I got a message that I was running an unknown desktop, and that Tweetdeck required Gnome or KDE.

I restarted in Full Desktop Mode, and was surprised to find a Tweetdeck icon already on the desktop. I ran it and was prompted to use KWallet, a KDE password manager. I canceled out of that, and found that Tweetdeck opened, but still didn’t do anything.

I tried again, activated the KWallet password manager, and then it worked! Tweetdeck prompted me for my twitter login, I additionally logged in to Facebook, and now I have a mean, lean, social networking machine.

Embedded FLV video players: Flowplayer and JW Player

Over the past several months, I have worked with both Flowplayer and JW Player as embedded FLV video players.

Why wouldn’t you just upload your videos to YouTube and use their embedded player? That’s a pretty fair question, as I think YouTube provides:

  • An excellent player that your users are already familiar with
  • A variety of options to control the appearance (e.g. you can disable related videos)
  • High-availability bandwidth

Of course there are several drawbacks:

  • Limits on length and file size
  • Critical infrastructure is no longer in your control
  • Their logo appears on your site
  • Progressive download only (no streaming)

Let’s take a look at both Flowplayer and JW Player:
Continue reading Embedded FLV video players: Flowplayer and JW Player

Finally got GD working with PHP under osX

Since time immemorial, I have been having problems with the php installation that came on my powerbook, striving and straining to get any new modules installed, to make it work in the ways that even the most simpleminded linux install does out of the box, generally frustrated with it. Currently I have two for pay projects that require me to use the GD library, so I broke down and really attacked it today. After about three tries, I finally got something working… specifically I used the entropy install of php, and got it to actually work by converting apache2 from a fat file to a 32 bit only binary based on the instructions from the same site. These instructions were NOT easy to find, and several google searches didn’t turn them up at any point. I only found them after reading of the trials and tribulations that the blogger at #| had with the same problem.
The good news is that it’s done now, and I’m happy.

Encrypted versus hashed passwords

I’m trying to decide whether it is better to store passwords in a database as key-encrypted strings, or as the result of a hash function (with salt).

Padlock

An encrypted string is secure as long as the key is secure, which it seems to me is both its strength and its Achilles’ heel. Since the application that accesses the database needs to use the key, that means that if both the database and the application server are compromised, the data is compromised.

It also means that if the application developers have access to the database, or if the DBAs have access to the application code, the data is available to those individuals. Even though those users are most likely trustworthy, it is perhaps an unnecessary risk–not to mention their workstations may be compromised. On top of that, you need to worry about key escrow–if you lose the key, you no longer have access to your own data.

A hashed value is secure even if the database and the application are compromised, as it is the result of a one-way function. The value is also inaccessible to either the application developers or the DBAs, which provides an additional layer of security. On the other hand, since it uses a well-known hash function, such as MD5, creating a dictionary file of hashed values is trivial, and apparently an even more effective method to reveal hashed data is to use a rainbow table (the description of which goes a bit over my head right now).

That’s where the salt comes in: concatenate the original value along with extra data before creating the hash. I can see how this would foil a simple dictionary of hashed values. From what I’ve read, the salt value can be unique for each hashed value, and can be stored alongside the hashed value in the database. I’m not sure I entirely understand how that defends against rainbow tables, but it sounds good to me.

Using a hashed value, you can’t retrieve the original data–you can only match against it. This would not work well for data that you need to access again in its original form, e.g. phone numbers. This is a drawback in some cases, but probably not for passwords.

Right now I’m leaning towards salted hash over encryption, but maybe that’s because I’m hungry and it sounds more like breakfast. I’d love to know what other people think.