Something in Maureen Dowd’s explanation of her apparent plagiarism of Josh Marshall just doesn’t make sense.
An eminent technologist once explained to me that any specific ordering of a relatively brief sequence of words — I forget the exact number, but it was certainly no more than nine — is distinct enough that (unless it is some boilerplate phrase that gets repeated over and over in some type of document) it can be used as a unique fingerprint for the entire document. He demonstrated this for me with Google searches. (Try it yourself, using the “exact phrase” setting — search string in quote marks.) It’s a pretty nifty idea with all sorts of implications.
One of them is that the odds of reproducing the exact phrasing of a 40-word passage by chance are almost impossibly low.
As you may have read, in a recent column Dowd included a passage of that length that happened to exactly match the wording of a recent post on Marshall’s Talking Points Memo (with only one phrase changed — “we” became “the Bush gang”). Dowd’s explanation has been that she was “talking to a friend” who suggested the same point that Marshall was making. She plugged her friend’s idea into the column without knowing that its original source was Marshall. (She and the Times have since posted corrections.) Dowd seems to have been very specific about distinguishing between actual spoken conversation with said friend rather than, say, emailing.
There are a couple of problems with this explanation. The lesser one is that it bespeaks an awfully casual attitude toward attribution — as if, though it would not be OK to lift an idea or passage from Josh Marshall, it is OK to do so from one’s friends.
More importantly, it is simply not possible to credit the idea that Dowd picked up this passage while talking with a friend — and somehow, by sheer coincidence, landed on exactly the same 40-word sequence that Marshall had used to express it. Doesn’t wash. Couldn’t have happened.
The evidence we have overwhelmingly suggests that Dowd either (a) cut and pasted this paragraph herself or (b) received it in some typed form from a friend who had cut and pasted it (or, who knows, recited it over the phone).
This strikes me as more of a misdemeanor than a felony — an act of carelessness and laziness, embarrassing but not career-ending. Unfortunately, it now seems Dowd is going down the cover-up road, despite the knowledge burned into every journalist’s psyche that the cover-up is always worse than the crime.
The Times wants to move on, and Marshall says that’s fine with him. But the fact still rankles, as does the Times’ apparent unwillingness to hold its stars to normal standards.
I’m not defending Dowd, but the “unique fingerprint” analysis is almost certainly wrong. In general, N words (9 sounds reasonable) can be used as a fingerprint for randomly generated text — that is, if your null hypothesis is that two sentence generators are randomly generating their sentences, you can distinguish between the null and plagiarism by looking at common sub-sentences of length 9. However, that’s not how humanity actually generates sentences; there are many stereotyped patterns for generating sentences, as well as sharp, specific memory committed to remembering and repeating well-turned phrases (poetry, anyone?) and your portrayal of Dowd’s explanation for what happened could fit.
(We use the same technique in bioinformatics for comparing genomes, and similar issues arise in a number of places.)
–titus
Titus, I’m not a scientist or a mathematician, so I can’t speak to the theoretical validity of the 9-word fingerprint. I can say that (a) I’ve found it to work (with Google) on the universe of documents online, when using non-boilerplate text, and (b) in this case, applying the concept to Dowd also seems to have been valid, since she has now changed her story and admitted that she swiped the words from an email message (see my new post).
Whether the math holds, I can’t vouch. I can say that most people with years of editorial experience will agree: 40 words of text (with one deliberate change introduced) does not get duplicated verbatim by accident.