SharePoint Breadcrumb Styles

SharePoint (and .NET) has a built-in SiteMapPath control that is designed to display a breadcrumb trail.

There are a lot of options in terms of styling the breadcrumb trail, but it is not obvious how all of them are used. Even more confusing, though, is when you make a change the the styles but you don’t see that change reflected on your site. For example, on one site I am working on I saw breadcrumbs like these:

An example of SharePoint breadcrumbs
An example of SharePoint breadcrumbs

I didn’t like how much vertical space this style occupied. I wanted something a little more compact. But how to change the appearance?
Continue reading SharePoint Breadcrumb Styles

Perl error when running W3C checklink

I’m using ActiveState Perl 5.14.2 on a 64-bit Windows 7 machine. I downloaded and installed the W3C checklink package via the Perl Package Manager.

When I attempted to run checklink on the command line like this:
C:\Users\chris\>checklinks http://osric.com
I got the following error message:
"-T" is on the #! line, it must also be used on the command line.

To get this to work, I had to run the command as follows:
C:\Users\chris\>perl -T C:\perl64\site\bin\checklink http://osric.com

The -T switch is to enable taint mode, which helps protect the program from malicious input.

Grammars and the Random Goth Lyric Generator

To celebrate one of the last days of National Poetry Month as well as The Accidental Developer’s 100th blog post, I will attempt to combine a bit of computer science and poetry.

I’ve been studying grammars and formal languages, among other things, this past semester in my Theory of Computation class. One thing that it reminded me of was the second Javascript application I ever developed (with the help of my friend and college classmate Miranda Tarrow): The Random Goth Lyric Generator.

I took a simple sentence structure (subject, verb, adjective, object) and made random substitutions for each line, 4 lines per stanza, 4 stanzas per poem.

The (slightly less-than-formal) grammar for each line would look something like this:
Line -> SSL | PSL
SSL -> SNP SV A O
PSL -> PNP PV A O
SNP -> Singular Noun | Singular Noun Phrase
PNP -> Plural Noun | Plural Noun Phrase
SV -> Singular Verb
PV -> Plural Verb
A -> Adjective
O -> Object

We could extend this to the entire poem:
Poem -> Stanza Stanza Stanza Stanza
Stanza - > Line Line Line Line

The word list was meant to be dark and foreboding but was often hilarious–the examples included:

Nouns & Noun Phrases:

  • My solitude
  • Your touch
  • A ravenous she-wolf
  • Spiders

Verbs & Verb Phrases:

  • entangles
  • summons
  • grovels before
  • spews forth

Adjectives:

  • labyrinthine
  • diseased
  • spectral
  • infernal

Line -> SSL -> SNP SV A O could become:

Your touch entangles infernal spiders.

I don’t know why the list of objects was a separate list of nouns, as it seems to me now that it could have pulled from the same list. Since the grammar used just one sentence structure, the results were very repetitive but frequently humorous. I often considered expanding the possible sentence structure (something as simple as making the adjective optional), but decided that the repetition was part of the charm. In fact, many poems and song lyrics feature repetition, and the results seemed eerily intentional at times.

The page was very popular for a time. I received quite a bit of e-mail regarding the page, including suggestions for additional words. Someone sent a song they’d recorded for which they used the random lyrics (with the addition of a shouted, “There’s that m———— word again!” in the middle). At least one randomly-generated poem was published in a small poetry journal.

I’ve considered creating a sequel to parody William Carlos Williams and loading it up with words from his own poems:
Poem -> S1 S2 S3 S4
S1 -> Noun Verb newline Preposition
S2 -> Article Adjective Noun newline Noun
S3 -> Adjective Preposition Adjective newline Noun
S4 -> Preposition Article Adjective newline Noun

Which might produce something like:

no one spilled
with

the whole honey
suckle

pressed after sweet
odor

while the urgent
petals

It just doesn’t seem quite as funny or compelling. I can venture a guess that William’s sparse form and carefully selected language doesn’t lend itself to random imitation as well as verbose and self-indulgent free verse. Although perhaps the sample is merely too small!

Using FFmpeg to programmatically slice and splice video

My wife has a research project in which she needs to analyze brief (8-second) segments of hundreds of much longer videos. My goal was to take the videos (~30 minutes each) and cut out only the relevant sections and splice them together, including a static marker between each segment. This should allow her and her colleagues to analyze the videos quickly and using precise time-points (instead of using a slider in a video player to locate and estimate time-points). I’ve posted my notes from this process below for my own reference, and in case it should prove useful to anyone else.

To my knowledge, the best tool for the job is FFmpeg, an open source video tool. Continue reading Using FFmpeg to programmatically slice and splice video

Robert Sedgewick: “Algorithms for the Masses”

On 9 April 2012, I saw Robert Sedgewick give the talk, “Algorithms for the Masses,” on the campus of Drexel University. I have several of Sedgewick’s books on my shelves at home, including Algorithms in Java, Third Edition, Parts 1-5 and Introduction to Programming in Java: An Interdisciplinary Approach. One of my previous computer science professors, Kenneth Sloan, counted Sedgewick among his classmates.

The basic thesis of the lecture was that good algorithms matter and that we need to champion good algorithms where they are most needed (particularly in the sciences).

One of his points was that computer science is currently very abstract and lacks a basis in the scientific method. Algorithms need to be tested against models to see how they actually perform. In some cases, the theoretical performance of an algorithm can be off by several orders of magnitude compared to actual performance. For example, the quicksort algorithm is quadratic (N2) in the worst case, but N log N most of the time. There’s a reason why quicksort (by the way, the subject of Sedgewick’s 1975 PhD dissertation) is widely used, in spite of the fact that it is O(N2) versus binary sort’s O(N log N).

Sedgewick said, though, that he has run into many computer scientists who fail to observe the difference between theoretical worst-case and actual performance. Some will choose an algorithm based on Big-O analysis alone. Sedgewick’s response: Big-O is an upper bound, but is your input an example of the worst-case? Probably not. Algorithms should be chosen based on their actual performance.

[As an alternative to Big-O notation, Sedgewick suggested Tilde notation, although from my perspective I don’t see that there is a great difference between them.]

He also gave an example of taking theory too far in the other direction. A computer scientist gave a talk demonstrating that his algorithm, Algorithm B, though exceedingly complex, was superior to the simpler Algorithm A. When Sedgewick asked him why, he explained that Algorithm B removed a log log N factor. Sedgewick’s analysis was that log log N, in this universe, amounts to 6 — hardly worth trading algorithms for what, realistically, amounts to a constant factor.

[Why 6? Wikipedia and other sources estimate the number of atoms in the observable universe at 1080. The natural logarithm of 1080 is 184. The natural logarithm of 184 is 5.2. 6 sounds like a fine estimate.]

Another point was that scientists often need algorithms in their daily work, but do things the hard way for a lack of knowledge. One example was a biologist who was trying to use Excel to calculate a standard deviation for over a million data points, an idea that caused several audience members to cringe.

How do we bring a better understanding of algorithms to the masses? (By masses, I think he really means the masses of college-educated scientists–not quite everyone, but still a much larger group than just computer scientists.) He had several suggestions:

Analytic Combinatorics
From what I gather, analytic combinatorics is a way of using formal languages to describe recurrence relations, and thus a simpler (and easier-to-teach) method of creating generating functions. I don’t exactly know what that means, but you can read the book on it (by Flajolet and, of course, Sedgewick): Analytic Combinatorics (PDF).

Testing Algorithms Empirically
In computer science classes, he suggests students run a program on an increasing series of inputs (e.g. n = { 1000, 2000, 4000, 8000, 16000, … }) and examining the ratios of input size to run times to understand the real impact of running in linear time, N log N time, quadratic time, etc. (This is something that some of my past computer science courses have included, so apparently they have already adopted this piece of Sedgewick’s advice.)

Changing Intro to CS Courses
Sedgewick recommends identifying core elements of computer science, such as classic algorithms, and teaching them to everyone as early as possible. Some changes made to the curriculum at Princeton (where Sedgewick teaches) have led to a dramatic increase in enrollment in intro computer science courses and from a wider range of majors.

Change Publishing
Sedgewick touted his Introduction to Programming in Java and its accompanying web site as a major change for textbooks. The programming examples are short and simple, but demonstrate a wide range of real-world applications across several branches of science. (Although he did not mention it in the lecture, I also appreciated that the examples in this book often include graphics, sound, and animation — which are far more thrilling results than the usual ASCII that intro CS students see as the fruits of their labors.)

He also criticized academic publishing for making journal articles look as much like boring print articles as possible, in spite of the fact that they are now primarily accessed online. Where are the full-color figures? The hyperlinks? Animated simulations? These things are all possible online, but instead publishers restrict content to the form that is least likely to be accessed.

Several times in the lecture, he also mentioned that freshly-minted computer scientists often have little or no background in science: no physics, no chemistry, no biology. Although he recognized this as a problem, he had no solutions (other than to, perhaps, require foundational science courses as part of a CS degree).

The last two items–changing the curriculum and publishing–really sounded like a Sedgewick paid-programming infomercial. “Everyone should take courses in the subject of which I am an expert, and they should use the book I wrote to teach it.” It was hard not to be a little skeptical of his motives. At the same time, I can’t help but think that he’s right.

Using a list or array as a PowerShell script input parameter

One of my colleagues created a PowerShell script that we use to migrate SharePoint 2010 sites from the SharePoint 2007 interface (UI 3) to the SharePoint 2010 interface (UI 4). The script works rather well for updating one or two sites at a time:

Set-UserInterface2010.ps1 -url "https://my.sharepoint.site/path/"

Today I received a request to update a list of 32 sites. After updating one, I thought–this is going to be tedious. We can improve this.

First, I updated the parameter to accept an array of strings instead of a single string.

Before
param([string]$url = $(throw "Please specify a Site URL to convert to the SP2010 look and feel"))

After
param([string[]]$url = $(throw "Please specify a Site URL to convert to the SP2010 look and feel"))

Next, I wrapped the call to the main function in a ForEach loop:
ForEach ( $siteurl in $url ) {
...
# Throw in a line break to separate output
Write-Host ("`n`n")
}

The script can still take a single site as input (the string is treated as a string array with a single element), but now I can also pass it a list:

Set-UserInterface2010.ps1 -url "https://my.sharepoint.site/path1/","https://my.sharepoint.site/path2/","https://my.sharepoint.site/path3/"

Here’s hoping that taking a couple minutes to update the script and running it once saved me more time than running it 32 times!

Douglas Crockford: “Programming Style and Your Brain”

On 13 January, 2012, I saw Javascript expert Douglas Crockford deliver a talk titled “Programming Style and Your Brain” on the campus of the University of Pennsylvania. The brain portion of the talk (which Mr Crockford said borrowed heavily from Daniel Kahneman’s book Thinking, Fast and Slow) was really just to emphasize that human beings have 2 distinct ways of thinking: Head (slow) and Gut (fast). Computer programming requires some of both, but the same Gut-thinking that can provide useful insights can sometimes also lead us astray.

For example, programmers have been arguing since the 1970s about the placement of curly braces. Some people prefer:

if ( true ) {
doSomething();
}

Others prefer:

if ( true )
{
doSomething();
}

Crockford says that if the compiler treats these 2 forms as equivalent, then there is really no difference (so long as you are consistent). These are Gut decisions. However, people will use their Head to try to rationalize their Gut decisions and come up with some ridiculous rationalizations.

OK, fine. What does that mean in practical terms, i.e. writing code?
Continue reading Douglas Crockford: “Programming Style and Your Brain”

Test-Driven Programming Assignments

I am enrolled in another graduate course this semester, Theory of Computation. Like last semester, the programming assignments include unit tests (JUnit tests for the Java assignments). One can be very confident prior to submitting an assignment that it is done properly if it passes all the unit tests!

I’ve been interested in unit testing and test-driven development for several years now, but have never put it into practice. It seemed like a good idea in theory, but it required picking a testing framework, installing it, configuring it, and, of course, writing good tests. Without any hands-on experience, it’s a difficult practice to adopt. Getting introduced to unit tests this way lifts nearly all those burdens. On top of that, the advantages are clear: the tests will tell you when you have it right.

I was a little surprised to run into unit testing via coursework, because I’d been under the impression that computer science education focuses a lot on theory, and skips over a lot of the practice. I’ve met a lot of people coming out of computer science programs who, while probably excellent theorists, don’t know heads from tails when you sit them at a terminal. I was happy to see that my coursework included practical knowledge as well.

All the unit tests are provided by the professor. I assume we may write some tests of our own later on–and certainly nothing is stopping us from writing our own tests now–but I do wonder how many students will be prepared to take that next step. From what I understand, writing good tests is the better part of successful unit testing.

Some things I’ve noticed about the provided unit tests:

  • There are a lot of them. The unit tests are half as long the code that passes them.
  • There are many sample values to test the same function, mostly to test specific edge cases. Failure at a specific edge case helps to identify where the code went wrong.
  • In some cases, the tests are within a loop that generates random test data. Although there’s nothing quite like real human input to break your programs, 1000 random tests might help.

In practice, do programmers tend to write their own unit tests? I imagine it would be ideal if you partnered with someone, and you wrote their unit tests, and they wrote yours. It might be easy to overlook an edge case or dismiss something as impossible if you are too involved with the specifications. At the same time, it is probably difficult to write a unit test without spending some time reading the specifications and understanding exactly what it is supposed to do.

When should alt text be blank?

The alt attribute of an image element is a required HTML attribute (see the IMG element). If it is not present, screen-reader software will typically read the src attribute instead. Text-based user agents such as Lynx, or browsers that allow users to disable images, will also typically use the src attribute in the absence of the alt attribute.

I had always heard that, unless the image conveys important information (e.g. a graphic of text used as navigation, or a chart or graph) that the alt text should be left blank:
<img src="myimage.png" alt="" />

A screen-reader passes such an image over without saying anything. This makes sense to me. When I’ve closed my eyes and tried navigating the web using a screen-reader like JAWS, anything non-essential was a distraction and just got in my way. Knowing that a page contained an image of, say, a corporate headquarters in no way helped me understand the page content.

However, a colleague of mine suggested that the alt text should describe, briefly, the image. He offered a compelling use case: you are browsing on your mobile device with images disabled, due to bandwidth. How would you know if there was an image that you did want to view if no description was provided? This is a case where the alt text can provide a better user experience for users without vision impairments. But does it make the experience worse for users with vision impairments using assistive technology?

(A case could be made that vision-impaired users should also know there is an image on the page for orientation purposes: “The link to the annual report is underneath the photo of the corporate headquarters.” However, what may appear above or below on screen may not make sense when page content is read linearly.)

I looked to the WCAG 2.0 section on text alternatives, which states that images used for decoration or formatting should be implemented in such a way that they “can be ignored by assistive technology.” That’s a good case for using an empty alt attribute. If the image is sensory (WCAG 2.0 has been criticized for being vague–obviously anything visual is sensory), then the item should “at least provide descriptive identification of the non-text content.”

What about that photo of the corporate headquarters then? It’s decorative, but not in the same way as a fleuron or a border. It may not be an inspiring image, but maybe it should have associated alt text.

I decided to check 4 sites that I thought might demonstrate best-practice, but found little consistency across these examples:

  • The National Federation for the Blind – they use rather extensive alt text for the main image on their homepage: “Graphic consisting of two photos. On left is a group of children with white canes on a hayride. Right is a close-up of a finger reading Braille.” However, they fail to use the alt attribute for their menu divider graphic, which is clearly a decorative element.
  • Freedom Scientific’s JAWS Screen Reading Software – the alt text “A student uses JAWS to do work on a desktop computer” accompanies a photo of a man at a computer. Decorative images (menu dividers, stars) use an empty string for the alt text.
  • WAIM – Web Accessibility in Mind – they avoid the issue on their services page by inserting the pictures as CSS background images. These would not appear at all to a screen reader, to a text-based browser, or to a user agent with images disabled. This would be functionally equivalent to using empty alt text.
  • The Social Security Administration’s Disabilities Benefits – this page gets it completely wrong, including alt text for images that do not even appear visible to users with normal vision (e.g. a tracking image with the alt text “DCSIMG” and a spacer image with the alt text “blank space”).

MIT’s general web-accessibility guidelines offer some additional guidance:

ALT tags are often misused, mostly people overuse them. It’s better to leave the ALT tag blank (ALT=””) then to enter a text description that’s not useful or is redundant. For example an image with a caption below it does not need alt text that matches the caption, leave the alt text blank to avoid redundancy.

The University of Michigan’s Accessibility Quick Guide suggests using empty alt text for non-informative images.

Unfortunately, we’re still left with a rather vague recommendation: use a description when useful or informative. How do we decide when a description is useful or informative? My gut feeling is to agree with MIT: it should be left blank in most cases (such as with the hypothetical photograph of a corporate headquarters), but I think no great harm is done if a brief description is included.

Updating a Windows timestamp

Let’s say you need to modify the timestamp, or last modified date, on a Windows file. In my case, I wanted to change the last modified date on a lot of PDF files, so that the last modified attribute would reflect the date the content was last modified (rather than the file).

The best and easiest way I’ve found to do this is via Cygwin:
touch -t 201109201145 *.pdf

When I searched for how to accomplish this, I found all sorts of sites advising me to download a $20 shareware application, or write a C# application to do it. Fortunately, in a *nix type environment, it’s a one-line command.

There are other ports of Unix/Linux/*nix commands for Windows, some of which may be more lightweight than Cygwin. (I rely on Cygwin quite a bit, so I have it installed on all my Windows machines.) I have not used any of the following, but here are links to some alternatives, in case you are interested: