Inspired by blog articles from Michael Dominic K. and Henri Bergius, in part one of this series I touched on the conflict between the order of databases and the freedom of files. The latter are fiercely hoarded and guarded by information workers who in many cases don’t really care about the object so much as the information it conveys. But what are files, and why does this issue of managing their chaos even exist?
The modern electronic file is a data or readable text object meant to store and/or convey useful information. It can constitute one or more records (I’ll even consider a graphic image as a record in this context). The beauty of a file is its common independence from the confines of enterprise data stores. This is especially true in the case of files based on open source and highly-accepted commercial formats; they are often accessed by a multitude of viewers or renderers.
But that independence comes at a cost. The further a file object lives from a data management system, the more risk it can introduce and encounter. Versioning, detail accuracy and unapproved copying are just some of the risks. Files also consume storage space often disproportionate to the value they add, especially as they are wastefully replicated in myriad emails and hard drives. The cheapness of onsite storage has helped exacerbate this, removing old constraints that once encouraged conservation over ease of access.
A really rogue file can even be considered viral in some environments. A good example I once faced was with manufacturing engineers who insisted on building products to uncontrolled design prints. Failing to keep up with design changes, and marking their personal prints up independent of the larger picture, they were responsible for many expensive mistakes on the production lines.
Unfortunately that scenario is all too common. To reiterate a premise from Part 1: knowledge is power. To relenquish control over one’s files is to cede control over a personal data fiefdom. Checking files into a data management system is a part of that, subconsciously seen as a threat to individual influence and passive-aggressively fought by those fearful of themselves or their work being rendered unnecessary.
Nevertheless, master data management has been growing. Uncontrolled information is too costly for businesses to ignore. So let’s try to envision the trend of increased implementation continuing. What will that eventually mean for the freedom of files?
Maybe all it will involve is a redefinition of what a “file” is. Instead of autonomous blobs of binary or text, perhaps future file specifications will define them as portable database fragments, easily extracted from and re-inserted into data stores albeit encoded with some form of authentication. Formats such as XML have been a good start to enable such a scenario. In fact the metadata (loosely, “data about data”) aspects of XML (XMI) are exactly the sort of approach that has allowed data management systems to succeed in a sea of file formats; encapsulating any sort of file with strong metadata attributes transforms it into a superior of its raw self and elevates its usefulness. This was intended to be taken further by Microsoft’s WinFS file management system, which did not make it into Windows Vista and appears to be in limbo. Interestingly, the requirement of WinFS that files have strong schemas actually increased their independence in the fact that they would be wrapped in metadata that described their nature– a much more valuable and portable identification mechanism than a simple file extension.
In this information utopia, a degree of file freedom is sacrificed for convenience. That may be driven to the point of necessity for a future highly-mobile workforce. Streaming timely bits of highly-controlled information to handheld devices rather than copying gigabytes of largely unneeded overhead makes a lot of sense in that context. The birth pangs of such a future can be seen in the popularity of SMS and twitter… although controls are typically missing.
Author and journalist Cory Doctorow is understandably pessimistic about the prospects of such a scenario. He cites several obstacles to its fulfillment and it is difficult to argue with his reasoning. This series will continue by exploring the challenges further and seeing if Doctorow’s doubt is ironclad.
Pingback: Twitted by tweetcloud
Pingback: Cloudy days for data, Part 3 (conclusion) « Tabula Crypticum
A little update: in 2001 Cory Doctorow wrote (link in article) that he could find several instances of items described as “Plam” instead of “Palm”.
I ran that search today (17 November 2009), and found 180 such dyslexic transpositions. Try your luck!
http://shop.ebay.com/?_from=R40&_trksid=p3907.m38.l1313&_nkw=plam&_sacat=See-All-Categories