‘All The News That’s Not Fit To Archive’

We relational database people are well-organized, methodical. We like analysis and business rules, strong notions of identity , the use of sets and non-significant keys, normalized designs and value-based links, precise versioning and time-stamps, and careful promotion of systems into production, with secure fall-back procedures. All that is tech-talk, but it means something in the real world. (One of the first articles I had published, back in 1980, in Datamation, was titled ‘The Importance of Good Relations’, which showed the link between solid database design and flexible business practices.)

Yet the Web has changed all this. When I first developed my website, under Microsoft’s FrontPage, there was some semblance of a test environment and a production environment. I would develop the site on my computer, and when I was ready, and had made sure all the links were defined, and pointed to real pages, I would upload the whole kit and caboodle to the host site, where the new system would replace the old, giving me the option of importing all pages that had changed (but admittedly with no easy fall-back to the previous version). No more. I now use something called WordPress, which I invoke on a remote server. It allows me to compose and save drafts of individual pages, but it is otherwise tightly integrated with the production system. If I promote a new page, it goes live immediately, and if I change it again ten seconds later, the page is immediately replaced, with the previous one lost for ever. (Unless it found its path to some entity called the Wayback Machine, which is described in a fascinating article by Jill Lepore in the New Yorker of January 26, 2015, titled The Cobweb: Can the Internet be archived?)

I mention all this in connection with my last plaint from the January blog, about the New York Times, and its practice of making changes to its electronic versions of articles after they have been published in the printed version (or the late printed version, since that happens, too. We in North Carolina get an earlier version than the people up in New York, for example.) The reason this concerns me is primarily one of research integrity, since there is no longer a ‘paper of record’ on which historians can rely. I made this point in an email to the Public Editor, whose office eventually acknowledged my inquiry, promised to look into it, but then withdrew in silence. So, after a couple of weeks, I checked out the paper’s Statement of Standards and Ethics, and wrote to the Vice-President of Corporate Communications. The essence of my message ran as follows:

“For there is a vital question to be answered: ‘What is the paper of record?’ Your slogan on the first page of the printed edition is still ‘All The News That’s Fit To Print’, but apparently some of that news is Not Fit To Archive. What happens when historians attempt to use the paper for research purposes? Do they have to keep separate clippings files, since the electronic version is unreliable, and has been purified in some way for later consumption? Is there an active policy under way here that should affect your Ethics statement? How are decisions made to ‘improve’ the content of articles that have already appeared in the printed edition? Why are these not considered ‘Corrections’ that would normally be posted in the relevant section? How often does this happen?”

I received a prompt response, but it was all very dismissive and casual:

“The change you noticed was simply the result of normal editing, which takes place constantly for news stories, both between print editions and for successive online versions. In this case, additional information (including crowd estimates) was added to the story between the early print edition and the final print edition, which meant something had to be cut for the story to fit in the same space. In most cases, the final print version is the one that remains permanently on nytimes.com, though in some cases a story continues to be updated or revised online even after the final print edition.”

So I countered as follows:

“But I must state that I think that you (and I am not sure who ‘you’ are in this case) are being far too casual about this policy, simply treating the process as ‘normal editing’. Is there an audit trail? Do you keep all versions? What changes are allowed to be made after the final print version? Why cannot the on-line version (which has no size constraints) include all the text? Is there any period of limitation after which no further amendments can be made? How do you plan to explain this policy to readers, whose ‘trust’ you say you value so much?

I am sure you must be aware of the current debate that is being carried on in the world of academic research, where annotations to URLs in serious articles often turn out to be dead links instead of reliable sources. A Times ‘page’ no longer has a unique and durable identity, which I believe is an important issue.

I look forward to some deeper explanation of this policy in the newspaper.”

Well, maybe I should get out more. As Sylvia would suggest to me: “You clearly need something better to do.”  But I maintain that it is an important problem, not just concerning journalistic integrity, and getting the story right the first time, and not correcting quotations that the speaker wanted to withdraw (which we are told goes on).  It is more to do with what is known as ‘content drift’ and ‘reference rot’. As Jill Lepore’s article states: “. . .a 2013 survey of law- and policy-related publications found that, at the end of six years, nearly fifty per cent of the URLs cited in those publications no longer worked. According to a 2014 study conducted at Harvard Law School, ‘more than 70% of the URLs within the Harvard Law Review and other journals, and 50% of the URLs within United States Supreme Court opinions, do not link to the original cited information.” A more subtle problem is that the links may work, but the content may have changed  ̶  may have been edited, corrected, improved, revised, or sanitised. For researchers like me, this can be very annoying, as books these days frequently cite URLs rather than printed sources in their references, and when those pages do not exist, one feels cheated, and may also wonder whether they have been modified. The academic process has been debased. If one has text in the New York Times that is no longer on the archive, does it still exist? Is it still valid? Do I really have to maintain my clippings files, as opposed to an index of URLs? (To make her point, the Times Vice-President had to send me a scan of the two printed versions of the relevant page in question.)

We shall see. I haven’t received a follow-up to my second inquiry yet. Either the Times doesn’t believe it is an issue, or the managers there are having a big debate about the topic, which they don’t currently wish to share. I’ll provide an update if I do hear anything.

The normal set of Commonplace Updates this month. (February 28, 2015)

Leave a Comment

Filed under Media, Personal, Technology

Leave a Reply

Your email address will not be published.