Wordyard

Hand-forged posts since 2002

Archives

About

Greatest hits

Machine-readable data and human-memorable stories

September 12, 2006 by Scott Rosenberg

I spent much of the last few years immersing myself in the lore and culture of computer programmers, but not until today did I encounter the Lojban phenomenon. Lojban is an invented language (in the tradition of Esperanto, which I actually studied for a couple of months in seventh grade, thanks to Mr. Glidden). One Lojban enthusiast was profiled on the front page of today’s Wall Street Journal; the article was mostly about a German programmer who has led a campaign against software patents in Europe. But it mentioned in passing his interest in Lojban, “an artificial language…intended to eliminate ambiguity and promoted by some programmers.”

Eliminate ambiguity? No wonder programmers are leading the bandwagon.

The Journal’s shorthand description may not do full justice to Lojban, which turns out, according to Wikipedia, to be an evolution out of Loglan, a “logical language” intended to “test the Sapir-Whorf Hypothesis” (the idea that the structure and nature of language shapes human thought). There is much more here at Lojban.org and also here — including the idea that Lojban is structured to be more machine-readable (i.e., intelligible by computers) than naturally occurring human languages, making it well suited for “human-computer interaction and artificial intelligence research.”

I got to thinking about Lojban and the desire to smooth out all the fuzziness and overlap of our naturally evolving languages while reading Adrian Holovaty’s fascinating recent posting about the future of newspapers. Holovaty is a pioneering figure at the crossroads of the newspaper and technology industries; he started out working for newspapers in Lawrence, Kansas, where he and a group of Python developers created the content-management framework now known as Django; now he’s with the Washington Post.

Holovaty’s post argues that the “story” paradigm of newspaper journalism is a straitjacket the profession needs to shed if it expects to make full use of computers in its future. Stories are just “big blobs of text”; they’re not structured in ways that allow their data to be reused creatively. Newspapers are producing vast volumes of information each day, but because they don’t store the information in ways that allow it to be computer-readable in meaningful ways, they are failing to take real advantage of what technology can do with it all.

I think Holovaty is basically right, particularly when he points in the direction of information like weather data, sports scores, crime stats and the like — “news” that is essential metric, information that arrives from day to day in a relatively predictable format and ought to be stored in ways that let you compare it and reuse it. And he’s smart enough to understand that the structured-data model he is advocating wouldn’t and shouldn’t replace real old-fashioned stories: “News articles are great for telling stories…The two forms of information dissemination can coexist and complement each other.” Amen.

But I’d also like to pause and reflect for a moment on the enduring value of the “story” as a tool for human memory compression.

Unstructured information, Holovaty complains, is information with a short shelf-life: “The information gets distilled into a big blob of text — a newspaper story — that has no chance of being repurposed.” That’s not quite true: It has no chance of being repurposed by machine. But the process whereby a writer distills a volume of data and detail into a coherent narrative that sticks in the memory, if done with lively care and skill, is one that very much promotes “repurposing” by other people. The story sticks in the mind. You repeat it to your friend at work or your spouse over dinner. They get interested and repeat it. In exceptional cases the story becomes a part of the collective memory.

The kind of “repurposing” that machines do with structured data isn’t often going to result in that kind of experience. It’s closer to the stuff that has always been looked down on in newsrooms as “service journalism” or “news you can use.” That condescension is regrettable, but it’s in part inspired by journalists’ awareness that this sort of work really can be done pretty well by machine.

The “repurposing” of structured information that Holovaty describes — say, the ability of someone looking at a Little League schedule to call up the weather forecast for that day and location — is highly useful. So far, as he points out, the newspaper industry has failed to offer such services, or even see them as part of its mission. And so Yahoo and similar online “portal” businesses have moved into the vacuum and turned them into businesses that newspapers now eye jealously.

But Holovaty’s post suggests a way that newspapers — and, really, any journalistic enterprise — can get back into the game. If newsrooms begin to build up storehouses of structured data, someone’s going to need to look through them for patterns and insights. Why are state corporate tax returns dropping in a booming economy? If twice as many restaurants opened this year as last year, why were there only half as many health citations? Are sunspots governing the fortunes of the local high school football team? (OK, so there’s also room for fun and nonsense.) This is the kind of work newsrooms remain uniquely well-situated to perform.

In other words, there’s still plenty of room for the old-fashioned journalistic roles of fact-finder, truth-teller, story-creator. The quest to make more information more useful to machines isn’t an end in itself; it’s a way-station along the way to telling new kinds of stories — lovingly mined out of machine-organized data and then composed in “big blobs of text” for human consumption.

I’ll be glad to read those blobs in a language that still leave plenty of room for double meanings and for poetry. Lojban looks fascinating, but I’ll keep my ambiguity, thank you. Wordplay and nuance and music don’t fit easily into a database schema — but they’re how we encode data so it sticks with us long-term. They delight us, and that delight carves new pathways in our brains. Story is repurposed into memory. It’s our ancestral algorithm. Computers don’t really get it. But who says we have to change for them?
[tags]journalism, newspapers, structured data, storytelling[/tags]

Post Revisions:

There are no revisions for this post.

Filed Under: Media, Software