Acting on a WIM for SIS
Monica Bay calls a discussion that's too rarefied, i.e., too much for "insiders," as being "too Inside Baseball." This is going to be one of those, but if we can't be high level about e-discovery in a blog devoted to e-discovery; pray tell, where do we get in touch with our inner geek?
I was reading an article about SIS which, for those like me who have only brothers, means Single Instance Storage. It's a de-duplication mechanism that rather smartly eliminates multiple identical instances of files in favor of pointers to a single instance of the target file. It's implemented to save backup storage space, and it's a great idea. Assuming you can hash match a file and be certain it's identical, why not store just one instance of data iterated hundreds or thousands of times in an enterprise? If it's truly identical, one instance is all you need. My thought was SIS implementations might be a smarter way to optimize e-discovery efforts than all that costly, cumbersome de-duplication we do after collection and restoration. Plus, it's cost-effective and green.
Green? Sure! Single instance storage means less storage volume. Less storage volume means fewer primary drives and less cooling. Fewer primary drives means fewer backup drives and less cooling. Recent estimates put data center power consumption at 1-2% of all U.S. electricity consumption and, lest we forget that being green is also good business, the annual average utility cost of a 100,000-sq-ft data center tops $5.5 million. Of course, all of us over here in EDD-land would add in the cost savings we would see in reduced processing costs and simplified review flowing from SIS implementation on the input side.
But, though it's a subject meriting considerable enthusiastic discussion, this isn't a post about SIS. It's about WIM.
While I was boning up on SIS, I stumbled across WIM, which stands for Windows Imaging Format. WIM is a file-based system for disk volume imaging--think a more efficient version of "ghosting" a drive than using Ghost. WIM implements SIS, so it's very efficient indeed. I started to wonder if anyone was using WIM in e-discovery collection. The structure is open for developers to emulate. There's even a white paper detailing the file structure here (though it's not for the casual reader).
One of the uber-cool things about WIM is that, once a machine is imaged in WIM, the WIM image can be mounted as a virtual drive in the Windows environment. Plus, it can be mounted read only. That just seems to create so many possibilities for great desktop tools for e-discovery in small- and mid-volume cases (where vendors and budgets have left everyone to pretty much fend for themselves).
So my "Inside Baseball" request for discussion is this: Is anyone aware of folks using WIM in an e-discovery setting? Anyone see any problems with it? (hint: I question its treatment of the Recycle Bin's contents). Does any reader care this much to be so Inside Baseball?