In the context of web context: How to check out any Web page

One of the great fears about the Web as it becomes our primary source of news is the notion that it rips stories from their moorings and delivers them to us context-free. We’re adrift! In a flood of soundbites! Borne upon a river of bits! Or something like that.

I’ve never understood this argument. As I tried to suggest in my Defense of Links posts, the convention of the link, properly used, provides more valuable context than most printed texts have ever been able to offer.

But links aren’t the only bearers of digital context. Every piece of information you receive online emits a welter of useful signals that can help you appraise it.

The techniques described here first filled my quiver in the ’90s, when I worked as Salon’s technology editor. We’d receive story tips and ideas, some of them pretty far out, and we’d scratch our heads and think, “Can this be for real?” I began applying an informal set of tests and checks to try to prevent us from being manipulated, pranked, or turned into a conduit for bad information. This was our way of trying to take the “discipline of verification” at the heart of the journalism we’d always practiced and apply it to the new medium. We knew we’d never be perfect. But there were scammers, hoaxsters and nuts out there, and we were damn sure not going to be pushovers for them.

Though some of the details have changed in the intervening years, the basic principles for evaluating an unknown source remain relevant, I think.

What’s the top-level domain? Is the page in question on a spammy top-level domain like “.info”? That’s not always a bad sign, but it raises your alert level a bit.
Look the domain name up with whois. Is the registration info available or hidden? Again, lots of domain owners hide their info for privacy reasons. But sometimes the absence of a public contact at the domain level is a sign that people would rather you not look into what they’re doing.
How old or new is the registration? If the site just suddenly appeared out of nowhere that can be another indication of mischief afoot.
Look up the site in the Internet Archive. Did it used to be something else? How has it changed over the years? Did it once reveal information that it now hides?
Look at the source code. Is there anything unusual or suspicious that you can see when you “view source”? (If you’re not up to this, technically, ask a friend who is.)
Check out the ads. Do they seem to be the main purpose of the site? Do they relate to the content or not?
Does the site tell you who runs it — in an about page, or a footer, or anywhere else? Is someone taking responsibility for what’s being published? If so, obviously you can begin this whole investigation again with that person or company’s name, if you need to dig deeper.
Is there a feedback option? Email address, contact form, public comments — any kind of feedback loop suggests there’s someone responsible at home.
What shape are the comments in? If they’re full of spam it may mean that nobody’s home. If people are posting critical comments and no one ever replies, that could also mean that the site owner has gone AWOL. (He might also be shy or uninterested in tangling with people.)
Is the content original and unique? Grab a chunk of text (a sentence or so), put it in quotes, and plug it into Google to see whether there are multiple versions of the text you’re reading. If so, which appears to be the original? Keep in mind that the original author might or might not be responsible for these multiple versions.
Does the article make reference to many specific sources or just a few? And are the references linked? More is usually a good sign, unless they appear to be assembled by script rather than by a human hand.
Links in are as important a clue as links out. If your hunt for links in turns up a ton of references from dubious sites, your article may be part of a Google-gaming effort. If you see lots of inbound links from sites that seem reputable to you, that’s a better sign.
Google the URL. Google the domain. Google the company name. Poke around if you have any doubts or questions. Then, of course, remember that every single question we’ve been applying here can be asked about every page Google points you to, as well.

Once you’ve done some or all of this work, it may be time to actually try to contact the author or site owner with your questions. If there’s no way to do so, that’s another bad sign. If there is, but they don’t answer, it might be a problem — or they might just be really swamped!

Software developers use the term “code smell” to describe the signals they catch from a chunk of program code that something might be off. What I’m trying to describe here is a rough equivalent for online journalism: Call it “Web smell.”

No one of these tests, typically, is conclusive in itself. But together they constitute a kind of sniff test for the quality of any given piece of Web-borne information.

There are probably many more tests that I’m not remembering — or that I never knew in the first place. If you know of some, do post them in the comments.

BONUS LINK: Craig Kanalley’s “How to verify a tweet” assembles a similar set of tests for tweets.

FOLLOWUP: Craig Silverman’s “How To Lose Your Gut” (at Columbia Journalism Review) has some more tips.

Archives

About

Greatest hits

In the context of web context: How to check out any Web page