The Accidental Developer – Page 15 – What if Gregor Samsa awoke a computer programmer?

Moving SharePoint sites using stsadm.exe

I recently needed to move a SharePoint site from one site collection to another. Fortunately, this is very easy to do using the stsadm tool (located in the bin directory of the 14 hive).

Following the instructions at Using Stsadm.exe to Migrate Site Data (a SharePoint 2007 document, but still applicable to SharePoint 2010), I used the stsadm export command from the 14 hive with the following parameters:

bin\stsadm.exe -o export -includeusersecurity -versions 4 -url 'https://mysite.url/oldcollection/myweb' -filename myweb.cmp

However, that returned the following error message:
Syntax error in argument: url

If you drop the quotation marks around the URL, it works:
bin\stsadm.exe -o export -includeusersecurity -versions 4 -url https://mysite.url/oldcollection/myweb -filename myweb.cmp

(I find that really unbelievable. Text parameters almost always appear as quoted values. Even if it doesn’t need the quotes, you’d still expect it to accept the quotes!)

The instructions for the export command mention cabsize but don’t indicate the default:
-cabsize <integer> Specifies the maximum size of the .cmp file in megabytes. The range is from 1 to 1024 MB. If the export data exceeds the maximum specified, the data is split into multiple files."

For the site I was working with, the export command saved the site as 4 files, each around 25 megabytes. I decided I would prefer a single .cmp file (and the site I was exporting did not exceed 1024Mb) so I increased the cabsize to the maximum:
bin\stsadm.exe -o export -cabsize 1024 -includeusersecurity -versions 4 -url 'https://mysite.url/oldcollection/myweb' -filename myweb.cmp

To import the site into the new site collection, run the import command:
bin\stsadm.exe -o import -url https://mysite/newcollection/myweb/ -filename myweb.cmp

Once you’ve moved the site–and checked the new location to make sure everything worked–don’t forget to delete the old site and the .cmp file.

Center City Philadelphia’s Lack of Pedestrian Signals

This is my second log-entry for my Human-Computer Interaction class this summer.

When I first moved to Center City, Philadelphia, one thing that struck me as odd was the use of regular traffic lights as pedestrian signals. Even at the intersection of two one-way streets, there would be traffic signals in all four directions. There are generally not separate pedestrian signals.

Pedestrian signals in Center City, Philadelphia

While you will find separate pedestrian signals at broad intersections, these are the exceptions rather than the rule. (Perhaps other cities do this too, but if so I have not taken notice.)

After living here for years, it very quickly becomes a part of the landscape and no longer seems abnormal. But I took notice again this year on the 4th of July: a big holiday in Philadelphia that draws a lot of tourists. I noticed a fair number of confused pedestrians, but also a couple of drivers who attempted to drive the wrong way down one-way streets.

It is a convention at most intersections in most cities that if one direction has a visible traffic light, traffic is expected to flow in that direction. Drivers from out-of-town, many of whom are not used to one-way streets, see these pedestrian signals and think they indicate the direction of traffic. But really, it’s only foot traffic, on the sidewalk, that flows that direction. Sure, there are other visible signs: one-way signs and cars parked facing only the opposite direction. In the absence of immediate oncoming traffic, though, those signals can really send the wrong signal.

Why did Philadelphia choose to use the usual automobile traffic signals for pedestrian signals? I assume it saves money: not that the signals themselves are necessarily more costly, but that one computer/controller–or perhaps a simpler controller–can manage each intersection. I don’t really know the reason, though. I definitely feel that it is a mistake to break such a common convention. On the other hand, though I have seen confused pedestrians and drivers, I have yet to see an accident caused by this confusion.Rental

Ambiguous “On” Indicators on Television Sets and Monitors

I’m currently taking a course on Human-Computer Interaction (HCI). The instructor advised us to keep logs of things we notice in the world that relate to the course material. This is one of my log entries.

One item I noticed today was the On indicator on a Samsung television at work. It’s a large flat-panel screen that we have connected to a PC for presentation purposes in a small conference room. I was preparing for a presentation and sat down at the keyboard and mouse. The power light glowed amber, so I wiggled the mouse. Nothing. I pressed CTRL-ALT-DEL. Nothing. I checked to make sure that the PC was on, and then I checked to make sure the cables were connected. Everything looked correct–why wasn’t the screen getting a signal?

amber-off-indicator — Bottom panel of a television set displaying an amber light. Note that although the indicator light is near the power symbol, it could be much closer.

Perl error when running W3C checklink

I’m using ActiveState Perl 5.14.2 on a 64-bit Windows 7 machine. I downloaded and installed the W3C checklink package via the Perl Package Manager.

When I attempted to run checklink on the command line like this:
C:\Users\chris\>checklinks http://osric.com
I got the following error message:
"-T" is on the #! line, it must also be used on the command line.

To get this to work, I had to run the command as follows:
C:\Users\chris\>perl -T C:\perl64\site\bin\checklink http://osric.com

The -T switch is to enable taint mode, which helps protect the program from malicious input.

Grammars and the Random Goth Lyric Generator

To celebrate one of the last days of National Poetry Month as well as The Accidental Developer’s 100th blog post, I will attempt to combine a bit of computer science and poetry.

I’ve been studying grammars and formal languages, among other things, this past semester in my Theory of Computation class. One thing that it reminded me of was the second Javascript application I ever developed (with the help of my friend and college classmate Miranda Tarrow): The Random Goth Lyric Generator.

I took a simple sentence structure (subject, verb, adjective, object) and made random substitutions for each line, 4 lines per stanza, 4 stanzas per poem.

The (slightly less-than-formal) grammar for each line would look something like this:
Line -> SSL | PSL SSL -> SNP SV A O PSL -> PNP PV A O SNP -> Singular Noun | Singular Noun Phrase PNP -> Plural Noun | Plural Noun Phrase SV -> Singular Verb PV -> Plural Verb A -> Adjective O -> Object

We could extend this to the entire poem:
Poem -> Stanza Stanza Stanza Stanza Stanza - > Line Line Line Line

The word list was meant to be dark and foreboding but was often hilarious–the examples included:

Nouns & Noun Phrases:

My solitude
Your touch
A ravenous she-wolf
Spiders

Verbs & Verb Phrases:

entangles
summons
grovels before
spews forth

Adjectives:

labyrinthine
diseased
spectral
infernal

Line -> SSL -> SNP SV A O could become:

Your touch entangles infernal spiders.

I don’t know why the list of objects was a separate list of nouns, as it seems to me now that it could have pulled from the same list. Since the grammar used just one sentence structure, the results were very repetitive but frequently humorous. I often considered expanding the possible sentence structure (something as simple as making the adjective optional), but decided that the repetition was part of the charm. In fact, many poems and song lyrics feature repetition, and the results seemed eerily intentional at times.

The page was very popular for a time. I received quite a bit of e-mail regarding the page, including suggestions for additional words. Someone sent a song they’d recorded for which they used the random lyrics (with the addition of a shouted, “There’s that m———— word again!” in the middle). At least one randomly-generated poem was published in a small poetry journal.

I’ve considered creating a sequel to parody William Carlos Williams and loading it up with words from his own poems:
Poem -> S1 S2 S3 S4 S1 -> Noun Verb newline Preposition S2 -> Article Adjective Noun newline Noun S3 -> Adjective Preposition Adjective newline Noun S4 -> Preposition Article Adjective newline Noun

Which might produce something like:

no one spilled
with

the whole honey
suckle

pressed after sweet
odor

while the urgent
petals

It just doesn’t seem quite as funny or compelling. I can venture a guess that William’s sparse form and carefully selected language doesn’t lend itself to random imitation as well as verbose and self-indulgent free verse. Although perhaps the sample is merely too small!

Using FFmpeg to programmatically slice and splice video

My wife has a research project in which she needs to analyze brief (8-second) segments of hundreds of much longer videos. My goal was to take the videos (~30 minutes each) and cut out only the relevant sections and splice them together, including a static marker between each segment. This should allow her and her colleagues to analyze the videos quickly and using precise time-points (instead of using a slider in a video player to locate and estimate time-points). I’ve posted my notes from this process below for my own reference, and in case it should prove useful to anyone else.

To my knowledge, the best tool for the job is FFmpeg, an open source video tool. Continue reading Using FFmpeg to programmatically slice and splice video

Robert Sedgewick: “Algorithms for the Masses”

On 9 April 2012, I saw Robert Sedgewick give the talk, “Algorithms for the Masses,” on the campus of Drexel University. I have several of Sedgewick’s books on my shelves at home, including Algorithms in Java, Third Edition, Parts 1-5 and Introduction to Programming in Java: An Interdisciplinary Approach. One of my previous computer science professors, Kenneth Sloan, counted Sedgewick among his classmates.

The basic thesis of the lecture was that good algorithms matter and that we need to champion good algorithms where they are most needed (particularly in the sciences).

One of his points was that computer science is currently very abstract and lacks a basis in the scientific method. Algorithms need to be tested against models to see how they actually perform. In some cases, the theoretical performance of an algorithm can be off by several orders of magnitude compared to actual performance. For example, the quicksort algorithm is quadratic (N²) in the worst case, but N log N most of the time. There’s a reason why quicksort (by the way, the subject of Sedgewick’s 1975 PhD dissertation) is widely used, in spite of the fact that it is O(N²) versus binary sort’s O(N log N).

Sedgewick said, though, that he has run into many computer scientists who fail to observe the difference between theoretical worst-case and actual performance. Some will choose an algorithm based on Big-O analysis alone. Sedgewick’s response: Big-O is an upper bound, but is your input an example of the worst-case? Probably not. Algorithms should be chosen based on their actual performance.

[As an alternative to Big-O notation, Sedgewick suggested Tilde notation, although from my perspective I don’t see that there is a great difference between them.]

He also gave an example of taking theory too far in the other direction. A computer scientist gave a talk demonstrating that his algorithm, Algorithm B, though exceedingly complex, was superior to the simpler Algorithm A. When Sedgewick asked him why, he explained that Algorithm B removed a log log N factor. Sedgewick’s analysis was that log log N, in this universe, amounts to 6 — hardly worth trading algorithms for what, realistically, amounts to a constant factor.

[Why 6? Wikipedia and other sources estimate the number of atoms in the observable universe at 10⁸⁰. The natural logarithm of 10⁸⁰ is 184. The natural logarithm of 184 is 5.2. 6 sounds like a fine estimate.]

Another point was that scientists often need algorithms in their daily work, but do things the hard way for a lack of knowledge. One example was a biologist who was trying to use Excel to calculate a standard deviation for over a million data points, an idea that caused several audience members to cringe.

How do we bring a better understanding of algorithms to the masses? (By masses, I think he really means the masses of college-educated scientists–not quite everyone, but still a much larger group than just computer scientists.) He had several suggestions:

Analytic Combinatorics
From what I gather, analytic combinatorics is a way of using formal languages to describe recurrence relations, and thus a simpler (and easier-to-teach) method of creating generating functions. I don’t exactly know what that means, but you can read the book on it (by Flajolet and, of course, Sedgewick): Analytic Combinatorics (PDF).

Testing Algorithms Empirically
In computer science classes, he suggests students run a program on an increasing series of inputs (e.g. n = { 1000, 2000, 4000, 8000, 16000, … }) and examining the ratios of input size to run times to understand the real impact of running in linear time, N log N time, quadratic time, etc. (This is something that some of my past computer science courses have included, so apparently they have already adopted this piece of Sedgewick’s advice.)

Changing Intro to CS Courses
Sedgewick recommends identifying core elements of computer science, such as classic algorithms, and teaching them to everyone as early as possible. Some changes made to the curriculum at Princeton (where Sedgewick teaches) have led to a dramatic increase in enrollment in intro computer science courses and from a wider range of majors.

Change Publishing
Sedgewick touted his Introduction to Programming in Java and its accompanying web site as a major change for textbooks. The programming examples are short and simple, but demonstrate a wide range of real-world applications across several branches of science. (Although he did not mention it in the lecture, I also appreciated that the examples in this book often include graphics, sound, and animation — which are far more thrilling results than the usual ASCII that intro CS students see as the fruits of their labors.)

He also criticized academic publishing for making journal articles look as much like boring print articles as possible, in spite of the fact that they are now primarily accessed online. Where are the full-color figures? The hyperlinks? Animated simulations? These things are all possible online, but instead publishers restrict content to the form that is least likely to be accessed.

Several times in the lecture, he also mentioned that freshly-minted computer scientists often have little or no background in science: no physics, no chemistry, no biology. Although he recognized this as a problem, he had no solutions (other than to, perhaps, require foundational science courses as part of a CS degree).

The last two items–changing the curriculum and publishing–really sounded like a Sedgewick paid-programming infomercial. “Everyone should take courses in the subject of which I am an expert, and they should use the book I wrote to teach it.” It was hard not to be a little skeptical of his motives. At the same time, I can’t help but think that he’s right.