(November 2002)

Feedback on 'All the Dead Data'

Bruce Gillespie (April 1999)
Leanne Frahm (April 1999)
Gerald Smith (April 1999)
Eric Lindsay (April 1999)
Karen Johnson (April 1999)
Lucy Schmeidler (April 1999)
John Newman (April 1999)
Bill Wright (April 1999)
Sue Grigg (April 1999)
Jeanne Mealy (April 1999)
Marc Ortlieb (June 1999)
Eric Lindsay (June 1999)
Lucy Schmeidler (August 1999)
Eric Lindsay (August 1999)
Cath Ortlieb (October 1999)
James Allen (December 1999)

BRUCE GILLESPIE writes (April 1999):

The most recent versions of WordPerfect itself do not have filters for WP 4.1 and 4.2. Word 6 does not import WordPerfect 4.2 files, but if you have a copy of Word 2 for Windows, it has filters for both WordPerfect 4.1 and 4.2. I keep Word 2 on the computer as well as all later versions of Word. Word 2 has a filter for WordStar 6 for DOS, but Word 6 doesn't. But add Word 2's WordStar filter to the Windows system, then copy it to Word 6's filters subdirectory. Suddenly you find you have a WordStar 6 filter available for Words 6 and 7. This fact possibly doesn't excite anybody but me.

The easiest way to convert the more 'primitive' files is to use the last pre-Corel version of Ventura (4.1). Bring in a file as WordStar, and save as WordPerfect 5.1 or Word 2 or any of the others that were popular at the time.

Apart from writing those two paragraphs, I can only say that I agree with everything you say. Already my files of all the books I desk-topped for Macmillan are useless. Their current production people couldn't care less that I conscientiously saved every file of every book I did for them. It's much cheaper for Macmillan, when doing new editions, to scan everything, then add corrections.

I reply (June 1999):

Re: The Windows Registry.

Here is a quotation from the illustrious "Scientific American" of September 1998:

"Some recent changes - such as the Windows 95/98/NT registry, designed to be read and edited in the original Martian by people who like walking over a 1,000-foot canyon on a tightrope with no safety-net -seem to me purely lethal. Here is a single database that most people don't know how to back up and whose corruption or loss wipes out all your customization and configeration settings and makes your computer forget it has any software installed. It would have been perfectly possible to design the registry to be forgiving and to track all changes, so that you could go back and undo the last change you made or the changes the program you just installed made that knocked out your network. Instead changes are made on the fly, and there's no way back."

Grossman, Wendy M. Opaque Transparency in "Scientific American" Vol.279. No.3. September 1998. p.42.

(An excellent article that explains how attempts to make computers more user-friendly by hiding the inner workings of the operating system behind a GUI - has, paradoxically, confused and disempowered the user.)

Thank-you for the detailed instructions on how to convert WordPerfect 4.2 files to more contemporary word- processor formats. (All I'd need is a copy of some of the old software - and a machine to run them on.) I wasn't aware that Word 6 had a filters directory - the only clever copying I've done is of document templates for work when we upgraded to Office 97.

LEANNE FRAHM writes (April 1999):

Your articles on software and the data cage were fascinating to me, especially now, when I'm new to all this and finding it so interesting. I never expected to find computering to involve philosophical questions and debate, but it certainly does.

GERALD SMITH writes (April 1999):

Isn't technology wonderful! I remember when I moved up from my old 486 machine to the present Pentium that some of the software I had didn't work very well in the new environment. But that was nothing major - just a few games that I hardly used anymore anyway - certainly nothing compared with company records or the like.

It's amazing isn't it? Here we have one of mankind's greatest tools - one that enables us to perform complicated activities with relative ease. Then we become so good at improving it all so quickly that earlier produced data becomes worthless.

ERIC LINDSAY writes (April 1999):

I'm not surprised that magnetic tapes from the 1960's are sitting slowly rotting. The same thing happens to old films, video tapes and even books. With woodpulp paper, the life of a book is probably closer to 50 years now. Nor do libraries any longer hold old books, as they no longer have the storage space.

One of the reasons we discard the old is that the new is simply much, much better. In the case of magnetic storage, the increase in capacity has been 60% per year for the past 30 years! This at least means that the capacity to store stuff keeps increasing. At the same time costs halve every year. Sometime this year the cost of magnetic storage will drop below 1 cent US per megabyte. CD-R is already under a half cent per megabyte. Any competing technology has to either store more or be even cheaper. So we can see we can store the text of books for under a cent per book. Keeping copies around, once they get into an electronic form, should not be an economic problem (why does that remind me of the old phrase "too cheap to meter" from the days when atomic energy was going to power the world?)

The first Winchester drive I bought at the University was 400 megabytes, was the size of a washing machine, and cost $12,000 in 1987. It was withdrawn from service in 1994, without ever having had an error, and all data on it was transferred to newer systems. The cost of keeping the ancient (in computer terms) working exceeds their utility, however the cost to users of converting their data is rarely priced. However, in terms of time it is considerable. I know that damn well, because I had to help people convert.

My own solution is to absolutely avoid all proprietory data formats, and to use only open source formats where the exact nature of the storage format is published. Proprietory formats are like drugs; you just have to say no, before you get hooked. What type of formats, I hear you ask. Stuff that is simple: ASCII, RTF. Or stuff that can very easily be converted from one format to another: HTML 2, Postscript, DVI, TeX. For data, CSV. More complex material, like spreadsheets, drawings and photos, are much harder. There are fewer non-proprietory formats for these, although if you can accept its lossy nature JPG works for photos.

Sound files are also an interesting case. One relatively obvious solution is to simply record as either 8 bit or 16 bit sound levels, at any frequency you like, but have a short ASCII header saying the number of bits, whether it is big endian or little endian, and the frequency it was recorded at. This isn't as efficient as some methods of sound recording, but it can be played back with really minimal hardware and software. There are several people in this apa who could readily work out how to handle that sort of file (now we wait and see if they stick up their hands with a method).

The real problem is that users in general are more impressed by the ease and superior performance (and I'm not being cynical here) of proprietory solutions. I think many more people and organisations will lose their historic data before the merits of open source sink in.

I've long been pushing the idea that the web needs sensible, philosophical librarians more than it needs any technical advances (or more search engines).

KAREN JOHNSON writes (April 1999):

We've had at least one computer in the house for more than 10 years, which means that we've confronted the 'dead' data problem more than once as we've upgraded our software. A few things have been important enough to convert appropriately, but a lot of data has just been let die. The latest data conversion job was a huge one, and took me at least a week's work. While we were still using WordPerfect, Heather typed her entire recipe collection (about 500 recipes) onto the computer. When we changed to Word a couple of years ago, I converted them all, which involved going into Word, opening each file individually, reformatting the text and page layout, and then saving in Word 6 format. My other major data upgrade occurred when we started using Windows 95. It supports long file names, so I used Norton File manager to go through my entire file collection previewing each file (so I could be sure what it was) and then replacing the old 8-character, no spaces file name with something appropriately descriptive.

LUCY SCHMEIDLER writes (April 1999):

Re they (Microsoft Word) still having a WordStar filter: Why not? WordStar is still in business, with version 6 or higher; I use 3 by preference, though I have a copy of 5 with a spell checker, which my version 3 lacks; I run version 3 mss. through the spell checker and jot down its findings and then go back to 3 to do the fixes, because I don't like the later version (grumble, grumble).

Re data formats: I usually keep all my data in ascii, except for my WordStar files, for which I have 2 different WS-to-ascii programs that my son wrote me: one that runs on the PC and one that I use on Panix, compiled from his "C" source code. I don't see what's so terrible about saving my letters in ascii format, except possibly that, when I am dead and gone, someone may come across them and actually be able to read them all. And indeed, all my email letters are written in straight ascii format (using vi or emacs--find a Unix programmer to ask if you don't know what those are (Hint: they are not word processors)).

JOHN NEWMAN writes (April 1999):

Regarding 'Data Cages', I guess it's all a reflection of the immaturity of our whole 'personal computer' infrastructure. Like it being dominated by one semiconductor manufacturer and one software supplier, and the tendency to run the same operating system software everywhere, no matter how inappropriate. This too shall pass, but it's a complicated business, which will reasonably enough take a while to get straight.

I guess computing is very much a bellwether when it comes to widespread dissemination of complex, intensive technology for the use of all sorts of people. Even the telephone, the older benchmark for technology diffusion, had it orders of magnitude easier. The functionality was less, the rate of change was less and the interface was minimal. We've got to the point now where some folk put in a phone so they can get on the Internet!

I think most folk like to assume computer software is simple, and we should certainly make it as simple as is reasonable to use, but it's not simple and never will be. The issues you raise stem from this, and from sociology of course!

The whole thing is a great big learning exercise for everyone, which is by no means fully started yet, let alone ending.

BILL WRIGHT writes (April 1999):

Your essay on dead data started two chains of thought in my mind. One strand speculated that the fact that most of the information stored in computers thus far has been lost is a Good Thing. Anything lost for good has to be rethought and built again if it is to be tackled at all. The world goes on. The past is very largely opaque to us in a detailed sense. And yet .. wouldn't it be nice if, for once, the legislators were ahead of (instead of reacting to) technological evolution.

Starting with the legal doctrine of effective ownership rights accruing to the creators of data, Parliaments all over the world might compel computer manufacture[r]s and software suppliers to guarantee interpretation facilities for all time. International covenants to that effect could be registered with the United Nations. Then the planet would drown in a sea of information. Those now alive would have access to all the data created by themselves and a significant proportion of the data created by anybody else, alive or dead.

Think of the consequences. With so much of the past to draw upon, what need then for original thought? Librarian swould come into their own, at last. They would have by now if the powers that be had twigged to their usefulness. Perhaps they fear that librarians lust for control. Would it were so.

I reply (June 1999):

"...the fact that most of the information stored in computers thus far has been lost is a Good Thing. Anything lost for good has to be rethought and built again if it is to be tackled at all. The world goes on. The past is very largely opaque to us in a detailed sense."
(Interstellar Ramjet Scoop 187).

The winnowing of historical records is always occurring. The conflict is between the elimination of the trivial (the need for selective forgetting) and the retention of the significant. [The James Gleick essay mentioned in the references is relevant here.]

An irony lies in that we cannot always identify what is of lasting importance at the time when it is new.

Historians love primary resources because they give the unenameled facts without the distortion of reinterpretation.

But always humans try to simplify - to get at the pattern of their age.

One consolation for us is that the significant tends to get commented upon, reinterpreted in newspapers, summarised and explained in textbooks. It echoes in many places. Thus - even if the original message is lost - its meaning is retained.

SUE GRIGG writes (April 1999):

Interesting turnaround in your "Reference" notes, where you say you have a review in a library journal, that you cannot find at the moment. The journal is in an easy to understand format (text on page), but is not physically locatable. The data in old software format is easy to locate physically, but not accessable. Is either form therefore more useful than the other, given that neither gives more than the memory of what it contained?

I reply (June 1999):

Re: All The Dead Data

"Interesting turnaround in your "Reference" notes, where you say you have a review in a library journal, that you cannot find at the moment. the journal is in an easy to understand format (text on page), but is not physically locatable. The data in old software format is easy to locate physically, but not accessible. Is either form therefore more useful than the other, given that neither gives more than the memory of what it contained?"
(Megatheriums for Breakfast 19. ANZAPA 187)

(You could also consider my inability to find the James Watson quotation for "Is There Meaning In Dreams?" - either in my clippings or on the Internet.)

There are two issues here.

The first, which my essay addresses, is the inability to access old electronic records because of the superseding of old proprietory standards for datafiles, and of old storage media.

The second issue, which relates to the unlocatable magazine article, has to do with the problem of inadequate indexing and/or having stuff in the wrong location. (It is a management or organising problem - not a technology-related one.)

You might want to compare a typical public library with the Internet. The Internet has more information in it but the websites lack an overarching organisational structure which makes it much harder to find exactly what you are after, whereas the library has its lesser information resources properly indexed and organised so that it is much easier to find exactly what you want.

Which is the more valuable information resource?

JEANNE MEALY writes (April 1999):

Technological obsolescence: ah, so there is a downside to all this forward movement.

MARC ORTLIEB writes (June 1999):

Yep, all that dead data. I suspect that this has already hit the Timebinders mob. I know that the electronic versions of the Aussiecon two Tiggers are no longer available to me. they're on 5 1/2" floppies that used to run in my Microbee CP/M system. (I found it typical that, the moment I bought a cable to connect the Bee to my Mac, the Bee stopped working. I had transferred some material, but that was the hard way - 300 modem to a Mac I had rigged as a Bulletin Board at school, and then back, either on disk from school, or by 2400 modem to the Mac.)

The more I read your piece, the more I realise that we poor "moderns" have got it all wrong. The Egyptians and whoever created Stonehenge were right. If you have stuff you really want to keep, like a decent solar calendar or a picture of the results of your biotechnology experiment in crossing a lion with a human, then you store it as really big bits of stone. Either that, or you create a data storage system that makes its own backups, utilises ambient energy to run and which has built-in error checking. I've even worked out a name for it - Dynamic Naus Accumulator, or DNA for short. The moment I work out a couple of bugs in my Fiat Lux sub-routine, I intend to try it.

ERIC LINDSAY (June 1999):

On moving data from obsolete computers, most current models are sufficiently fast that they can be made to emulate earlier machines, and thus run a lot of their programs. You can get Apple II, Atari, Amiga, CP/M, Sinclair, Psion and various other emulators.

I reply (August 1999):

Great minds...etc.

LUCY SCHMEIDLER writes (August 1999):

Re your Dead Data addenda: emulators will allow you to run the equivalent of all the popular software of the past, but won't maintain my pure ascii files, for which I'll have to find a suitable environment to maintain them myself, most likely a Linux partition on my PC.

ERIC LINDSAY (August 1999):

My own attempt to keep my data viable long term keeps everything in ASCII, and the formatting in either HTML or in Postscript. However my Postscript is also human readable since it is hand coded, and very clean in terms of layout.

I agree with you absolutely that five out of five workers are frustrated with their computers. Don't blame them one bit either. I see the Windows Registry as one of the worst disasters to ever happen to the unsuspecting computing public. Why Microsoft couldn't have come up with some simple and sensible design for installing software is beyond me. The little Psion organiser I use (which is a true multitasking system) manages to detect new software if you feed it a compact flash memory, and install it on the system within seconds of putting the memory unit in it. And if you remove the memory, the software goes away from the main system, all without you doing any installing of anything. You can change drives whenever you want, and whatever is on the drive is installed automatically, and all without even closing down the system.

CATH ORTLIEB (October 1999):

Good point about relying on modern technology to record data - it changes so much our children probably won't have access to hardware/software that can read our 'disks'.

JAMES ALLEN (December 1999):

I have all sorts of emulators, but Amiga owners tend to. Just got a whole lot of Infocom tect games. I used to use pc task to run an IBM version of Hitch Hikers Guide game.

I reply (April 2000):

I haven't got any Amiga emulation programs! Probably because there is nothing in PC or Mac land which I've felt I just had to be able to run.

CrossDOS does pretty well for transferring stuff via floppy between work and home.

But now that the old pre-PowerPC Macintosh operating systems have become public domain, I am prepared to consider getting hold of a Mac emulator (they work very well in Amigas) - but I'll get a larger harddrive first.

The essay

