After a bit of a delay I’m going to wrap up this series tonight. In the last segment, I finished up describing the struggle between file independence and data management, ending with the usefulness of metadata and touching on Cory Doctorow’s dissension on that topic.
I’d rather that readers take some time to read Doctorow’s interesting thesis than for me to repeat too much of the content here, but suffice to say he pokes seven disturbing holes into what he terms “meta-utopia” and as I said previously it isn’t easy to disregard his points. The most troublesome to me concerns the problem of crafting agnostic, universal schemas, which tend to be two-dimensional and thus create difficulties for diverse parties trying to arrive at design consensus. Doctorow correctly points out that data sets and needs tend to be richer than that, extending easily into three dimensions (my extrapolation) in order to properly define any object. Just as databases are better than flat files, data cubes are better still– and there has been work along those lines with XML.
Doctorow ends on a somewhat promising note by admitting that, flaws and all, metadata is still useful. It’s just that those choosing to rely on it as part of any information management or publishing solution need to be aware of the pitfalls going in (his article should be part of the solution design requirements in my opinion).
Which brings us full circle back to cloud computing. Moving to massive, remote data blobs that dispense contextual tweets or SMS messages requires a well-ordered and managed ecosystem. Whereas independent files bring risks such as computer viruses, metadata-driven cloudy solutions exacerbate the risks of crap information. Users need the ability to introduce tags into the source, for their own purposes, but along with that comes spam. In addition, idioms, slang and buzzwords just add further to the noise. So does poor language translation– although comedians might say that humor value in the latter serves their purpose.
Bottom line, any degree of freedom can introduce noise even into a managed system so this risk needs to be understood from the outset. Drilling deep into anything containing garbage can get ugly quickly.
Nevertheless, even worse is the current de facto situation of heavily-siloed data farms. It’s easy to point to enterprise information management systems as a solution for the typical multiple silo environment, and that extends to clouds as well. But given the hurdles to universal schema construction, it can be said that these projects simply create bigger siloes, e.g., great walled gardens defined by their unique schemas. Denizens of those clouds will be richly rewarded with a wealth of content but then find it isn’t readily translated to other cloud formations.
A while back I decribed how the wild internet facilitates the emergence of microsocieties; an ordered internet tends instead to cultivate macrosocieties around coalesced clouds. What this will mean for future socioeconomics is a provocative subject.
And I haven’t even gotten into augmented reality, a subject recently addressed by CNN and Wired. Maybe later.
Of course, along with the advent of cloud societies come individual concerns. Who owns my data? Facebook and Google have diverging opinions– although I cynically suspect Google’s stance has much more to do with its own self-interests than consumer advocacy. And even if you do own your data, can you kill it as you like? Currently, web archival engines say no.
In the end, the prospect of cloud data management and publication brings many advantages to an increasingly mobile workforce (along with the critical caveats mentioned). Synchronizing content between various appliances was a good interim solution to the “my stuff anywhere” problem, but that no longer suffices for users who have grown to expect a higher degree of service and device independence. No more cables, no more cords. Let it rain.
Pingback: Cloudy days for data, Part 2 « Tabula Crypticum