Event processing, interval processing in Excel

(And by Excel, I mean MS Excel, Open Office, and Google Docs.)

I was recently working with a large amount of computer-generated event data. I wanted to analyze the data, but was only concerned with events (rows) that occurred within intervals demarcated by certain start and end events.

At the time, I had no answer for this in Excel. I wrote a small computer program that read the file one line at a time and ignored lines that occurred outside the intervals of interest. Recently I came up with a solution for this problem in Excel, so I thought I would share it here.

In this example, I am going to use a highly simplified traffic study as my example. A computer at a traffic light records 2 kinds of events:

sensor events
on or off, indicating whether or not there is a car in the intersection
light events
red, amber, or green, indicating the new light color

Here are some sample data collected by this computer:

seconds event state
0 light green
7 sensor on
8 sensor off
15 sensor on
16 sensor off
25 light amber
30 light red
60 light green
85 light amber
90 light red
92 sensor on
93 sensor off
120 light green
145 light amber
150 light red
180 light green
199 sensor on
200 sensor off
204 sensor on
205 light amber
206 sensor off
210 light red
240 light green
265 light amber
269 sensor on
270 light red
271 sensor off
300 light green

Let’s say we want to find out how many cars drove through a red light–that is, the light was red when the car started driving through the intersection.

First, add a new column. This column will indicate the current state of the light for each event. That’s trivial for each light event, but associating the state of the light with each sensor event is what we’re after. In this column, add the following formula:

Excel and Google Sheets:
=IF(B2="light",C2,D1)

Open Office Spreadsheets:
=IF(B2="light"; C2; D1)

That formula means:

  • IF the current event is a light event
  • THEN set this cell to the current state
  • ELSE set this cell to the most recent light state.

Next, add another column. This column will indicate whether the row represents a driving through a red light. In this column, add the following formula:

Excel and Google Sheets
=IF(B2="sensor", IF(C2="on", IF(D2="red", 1, 0), 0), 0)

Open Office Spreadsheets
=IF(B2="sensor"; IF(C2="on"; IF(D2="red"; 1; 0); 0); 0)

The above is a nested series of if statements:

  • IF the row contains a sensor event AND
  • IF the sensor event is an on event AND
  • IF the current state of the light is red
  • THEN it is a traffic violation
  • ELSE it is not a traffic violation

Copy these formulae to the other rows, via Edit–Fill–Down (Excel and Open Office) or ctrl-d (or cmd-d on Mac). The spreadsheet should now indicate that there was one incident of running a red light, which occurred at second 92.

Tags: , , , ,

Friday, January 3rd, 2014 Tips & Tricks No Comments

Using group expressions in regular expression pattern matching

I’ve used group expressions in regexes many times, but only for replacement. Yesterday I learned that they can also be used for matching.

For example, let’s say you have the text:

Banananananas don’t grow in Mississississippi because banananas are afraid of getting turned into Missississippi’s famous bananana pudding.

The following regular expression will find instances of iss or an that are repeated more than twice.

(iss|an)\1\1+

You can use \1\1 as the replacement (or $1$1 in Dreamweaver, which uses backslashes to identify groups in match expressions, but dollar signs to represent groups in replace expressions) to turn the misspelled words into Mississippi and banana(s).

Another example might be applying consistent formatting to phone numbers or dates.

Phone numbers
Let’s say you usually use 555-555-1212 as the format for phone numbers and sometimes you use 555.555.1212, but the new trend is to use spaces instead of dashes or dots as separators:

Find: ([\d]{3})([-\.])([\d]{3})\2([\d]{4})
Replace: \1 \3 \4

Dates
Let’s say you usually use 12/5/2013 as the format for dates, dabbled with 12.5.2013, but now you’ve decided that dashes are clearer:

Find: ([\d]{1,2})([\./])([\d]{1,2})\2([\d]{4})
Replace: \1-\3-\4

In both cases you could just repeat the bracketed character class, but then you could end up matching strings you didn’t intend to:

  • 555-555.1212
  • 12.5/2013

Tags: , ,

Friday, December 6th, 2013 Tips & Tricks No Comments

Converting lines to a list in ColdFusion

I’m so used to dealing with comma-delimited lists in ColdFusion that I would sometimes take a data file that had one item per line and replace the newline characters with commas.

It’s easy to use the carriage return [chr(13)] and line feed [chr(10)] characters as list delimiters, though, and remove the intermediary step. Here’s a quick example:

<cfsavecontent variable="data">
this
is
a
list
with
one
word
per
line
</cfsavecontent>

<cfoutput>
    <ol>
        <cfloop list="#data#" delimiters="#chr(13)##chr(10)#" index="line">
            <li>#line#</li>
        </cfloop>
    </ol>
</cfoutput>

Which produces the following:

  1. this
  2. is
  3. a
  4. list
  5. with
  6. one
  7. word
  8. per
  9. line

(I can’t believe I didn’t think of this until today!)

Tags: ,

Monday, November 11th, 2013 ColdFusion 1 Comment

Set operations in ColdFusion

Today I needed to get all the elements in one list that were not members of a second list. That may ring a bell — it’s known as a set difference, or a relative complement.

Although it would have been simple to loop through first the list and add only the elements not present in the second list to a new list, I thought I would look around to see if anyone had already implemented set operations in ColdFusion, e.g. on cflib.org. I was surprised that I didn’t find anything, so I decided to create my own, as much as an exercise as anything, and posted it to Bitbucket:
arraySet: ColdFusion set operations

I included the following operations:

  • union
  • intersection
  • set difference
  • subset
  • equality
  • size

In the process, I experimented with MXUnit, a unit testing suite for ColdFusion that integrates with Eclipse. (I am currently taking a Scala programming course that emphasizes Test-Driven Development, or TDD, so I thought I should try it in CF as well.)

Since I was implementing sets using arrays as the underlying data structure, I also decided to use cfinterface to define a set interface, which arraySet implements. I had never used cfinterface before, and although its usefulness has been questioned, it seemed more appropriate than extending a base set class.

Tags: , , , , , , , , , ,

Monday, October 28th, 2013 ColdFusion No Comments

Toad and Oracle Home

I recently upgraded work PC. One of the bigger hassles was setting up Toad and my Oracle connections again.

Steps (and mis-steps) I took:

  • I downloaded an Instant Client from the Oracle Instant Client Downloads
  • I selected a 32-bit client because I recall that Toad is picky about that, and a 10.2 client. I picked 10.2 primarily because I think that is what I had before, but also because I had downloaded a 64-bit 12.1 client that definitely did not work.
  • I copied it to my computer: C:\oracle\instantclient_10_2. That location is arbitrary–you should be able to save it anywhere.
  • I added an ORACLE_HOME environment variable (although this appears to have been unnecessary):
    C:\> SET ORACLE_HOME=C:\oracle\instantclient_10_2
  • I copied my old tnsnames.ora file to the same folder.
  • Start Toad

Error!
“No valid Oracle Client found. Please note that Toad only supports 32 bit Oracle Client installations. Please view the release notes for additional system requirements.”

When I try to select a client from the installed clients menu, another error:
“You do not have any Oracle homes installed!”

I had to add the client to the PATH environment variable. There are a couple ways you can do this:

  • C:\> PATH=%PATH%;C:\oracle\instantclient_10_2
  • Go to Control PanelSystemAdvanced System SettingsEnvironment VariablesSystem VariablesEdit

Toad then started without the error, but also did not recognize my tnsnames.ora file.

First I tried adding the TNS_ADMIN environment variable via the command-line:
C:\> SET TNS_ADMIN=C:\oracle\instantclient_10_2

For whatever reason, that did not solve the problem. I could echo the value back with echo %TNS_ADMIN%, but it did not appear under Environment Variables in the Control Panel.

I added TNS_ADMIN as a user environment variable in the Control Panel, restarted Toad, and then it recognized my tnsnames.ora file.

Tags: ,

Tuesday, October 15th, 2013 Oracle No Comments

Scala function returns Unit but should return Int

I just started learning Scala last week. I created a stub method in one of my programs but was getting an error. Here’s my function:

def howMuch(max: Int) {
  var n = 0
  n
}

I tried to use the result in expression, e.g.

var m = 1
var n = howMuch(100)
m += n

Here’s the error I received:

error: overloaded method value - with alternatives:
  (x: Int)Int <and>
  (x: Char)Int <and>
  (x: Short)Int <and>
  (x: Byte)Int
 cannot be applied to (Unit)
             m += n

It seemed clear to me that my function should be returning an Int, namely zero. Why was it returning the Unit value?

Answer: a missing equals sign in the function assignment. Here’s the function that returns an Int:

def howMuch(max: Int) = {
  var n = 0
  n
}

Tags:

Monday, September 2nd, 2013 Scala 1 Comment

Using a lightweight web server to debug requests

I’ve been working a lot with the Canvas API lately. One task was to add communication channels (e-mail addresses) to user accounts. I was able to add them successfully following the documentation’s cURL example using the skip_confirmation flag:

curl 'https://mycanvas.instructure.com/api/v1/users/[user id]/communication_channels'
-H "Authorization: Bearer [...]"
-d 'communication_channel[address]=chris@osric.com'
-d 'communication_channel[type]=email'
-d 'skip_confirmation=1'

However, when I ran the ColdFusion script I wrote, the user received notification of the addition even though the skip_confirmation flag was set:

<cfhttp url="https://mycanvas.instructure.com/api/v1/users/[user id]/communication_channels" method="post">
<cfhttpparam type="header" name="Authorization" value="Bearer [...]" />
<cfhttpparam type="formfield" name="communication_channel[address]" value="chris@osric.com" />
<cfhttpparam type="formfield" name="communication_channel[type]" value="email" />
<cfhttpparam type="formfield" name="skip_confirmation" value="1" />
</cfhttp>

Why didn’t the latter work as expected?

I needed to be able to tell what was different about the 2 requests, but it would be difficult to capture that outbound requests. Tools like Fiddler and WireShark could help, but I was sending one request from my local machine and another from a remote web server.

My idea for capturing the request data was to send the request to a server under my control. I grabbed a Python echo server and modified it slightly to print the data received. Then, instead of sending the requests to the Canvas API I sent them to the echo server. Here are the results of the 2 requests:

cURL
POST / HTTP/1.1
User-Agent: curl/7.24.0 (i686-pc-cygwin) libcurl/7.24.0 OpenSSL/0.9.8t zlib/1.2.5 libidn/1.22 libssh2/1.3.0
Host: osric.com:50000
Accept: */*
Authorization: Bearer [...]
Content-Length: 100
Content-Type: application/x-www-form-urlencoded

communication_channel[address]=chris@osric.com&communications_channel[type]=email&skip_confirmation=1

ColdFusion
POST / HTTP/1.1
User-Agent: ColdFusion
Content-Type: application/x-www-form-urlencoded
Connection: close
Authorization: Bearer [...]
Content-Length: 118
Host: osric.com:50000

communication%5Fchannel%5Baddress%5D=chris%40osric%2Ecom&communications%5Fchannel%5Btype%5D=email&skip%5Fconfirmation=1

I noted the different content lengths for the same data, and upon closer inspection, ColdFusion was URL-encoding characters. I added the encoded attribute to the cfhttpparam tags with a value of “no”, and then the request bodies were identical:

<cfhttp url="https://mycanvas.instructure.com/api/v1/users/[user id]/communication_channels" method="post">
<cfhttpparam type="header" name="Authorization" value="Bearer [...]" />
<cfhttpparam type="formfield" encoded="no" name="communication_channel[address]" value="chris@osric.com" />
<cfhttpparam type="formfield" encoded="no" name="communication_channel[type]" value="email" />
<cfhttpparam type="formfield" encoded="no" name="skip_confirmation" value="1" />
</cfhttp>

Not surprisingly that solved the problem.

Tags: , , , , , , , , ,

Wednesday, August 14th, 2013 ColdFusion, Debugging No Comments

ANT deployment script and SFTP

My development team is moving away from developing on mapped drives/file shares to using cloud-hosted servers on Amazon Web Services (AWS). This is introducing a change to our usual workflow, as our access to the remote servers is limited to SSH and SFTP.

Although I previously used Apache Ant scripts through Eclipse to facilitate deploying application updates, the scripts were generally unpopular with the rest of the development team. (Many of them do not use Eclipse and preferred just to drop-and-drag files from their development sandboxes to the development or production servers.) Additionally, my original Ant scripts relied on the sync command to synchronize folders on the file shares.

Here is a revised Ant script that uses SCP (Secure Copy)–not SFTP but achieves the same goal–to deploy application files from a developer sandbox to the development or production server:

<project name="Deploy myapp" default="Sandbox to Dev">
  <input message="Username:" addproperty="username" />
  <input message="Password:" addproperty="passwd" />
  <property name="applicationFolder" value="myapp"/>
  <property name="site" value="osric.com"/>
  <property name="sandboxRoot" value="${basedir}"/>
  <property 
    name="development" 
    value="${username}:${passwd}@dev.osric.com:/home/web/${site}/${applicationFolder}"/>
  <property 
    name="production" 
    value="${username}:${passwd}@osric.com:/home/web/${site}/${applicationFolder}"/>
  <target name="Sandbox to Dev">
    <scp todir="${development}" trust="true">
      <fileset dir="${sandboxRoot}">
        <exclude name="**/build.xml"/>
        <exclude name="**/.*"/>
      </fileset>
    </scp>
  </target>
  <target name="Sandbox to Production">
    <scp todir="${production}" trust="true">
      <fileset dir="${sandboxRoot}">
        <exclude name="**/build.xml"/>
        <exclude name="**/.*"/>
      </fileset>
    </scp>
  </target>
</project>

There are a couple issues with this script to be aware of:

  • SCP is not included with Ant. The script produced the error “Problem: failed to create task or type scp”. I needed to:
    1. Download JSCH
    2. Place the file in Eclipse’s plugins/[ant folder]/lib folder
    3. Add the JAR file to the Ant build path (via Window–Preferences–Ant Home Entries (default)–Add External JARs…–select the jsch .jar file)
  • The password input is in plain text. Hiding password input in Ant provides a solution for Ant, but one that does not work from Eclipse. I have seen other possible solutions, so I’ll update this once I implement once and confirm that it works.

Tags: , ,

Wednesday, July 3rd, 2013 Process, Tips & Tricks No Comments

Set difference of two lists using BASH shell

Recently a handful of e-mail messages went undelivered due to some mis-communication between 2 servers.

One server had a record of all the addresses it thought it sent to over the period of time in question, and the other server a record of all the addresses to which it had actually delivered (including messages from several other servers).

I had both lists, but what I really wanted was just the set difference: only the elements of the first list that did not appear in the second. (In other words, a list of the recipients whose messages were never delivered).

I had two files:

  • possibly-delivered.txt
  • definitely-delivered.txt

First, the possibly-delivered.txt file had a bunch of extraneous lines, all of which contained the same term: “undelivered”. Since that term did not exist in any of the lines I was looking for, I removed all the lines using sed (stream editor):

sed '/undelivered/d' possibly-delivered.txt > possibly-delivered-edited.txt

I already knew (from prior investigations) that there should be 204 addresses in that list, so I performed a check to make sure there were 204 lines in the file using wc (word count):

wc -l possibly-delivered-edited.txt

204 lines returned. Great! Now, how to compare the 2 files to get only the results I wanted?

With a little help from Set Operations in the Unix Shell I found what I needed–comm (compare):

comm -23 possibly-delivered-edited.txt definitely-delivered.txt

However, comm warned me that the 2 files were not in sorted order, so first I had to sort them:

sort possibly-delivered-edited.txt > possibly-delivered-edited-sorted.txt
sort definitely-delivered.txt > definitely-delivered-sorted.txt

Again:
comm -23 possibly-delivered-edited-sorted.txt definitely-delivered-sorted.txt

This returned zero results. That was not possible (or at least, highly improbable!), so I checked the files. It looks like the sed command had converted my Windows linebreaks to Unix linebreaks, so I ran a command to put them back:
unix2dos possibly-delivered-edited-sorted.txt

Again:
comm -23 possibly-delivered-edited-sorted.txt definitely-delivered-sorted.txt

That returned my list of addresses from the first list that did not appear in the second list. (Quickly, accurately, and without tedium.)

Tags: , , , , , ,

Tuesday, March 5th, 2013 Tips & Tricks 1 Comment

Find Feature UI annoyance in Adobe Acrobat Pro

The Find feature in Adobe Acrobat Pro X has bothered me for some time now:

Adobe Acrobat's Find feature: note the size of the click-targets

Adobe Acrobat’s Find feature: note the size of the click-targets

The click-targets to find the previous or next occurrence of your search term are tiny. Minuscule. Verging on microscopic. Let’s say I’m searching a 600 page PDF on SharePoint and I’m looking for occurrences of the term workflow. As I click next repeatedly, trying to find the relevant section, I find that it’s very easy for my cursor to edge over just a little bit and close the Find dialog.

(Yes, I know: I can just keep pressing Enter and avoid this issue.)

Tags: ,

Friday, February 22nd, 2013 User Interface No Comments