_Environmental Artists Datasets._ _This dataset consists of a set of disk images,  a back-ups of a current workstation using TimeMachine, and an email archive exported from Gmail. The dataset contains a wide variety of unknown document types and it is believed that most of the disk images HSF and HSF+._


_The dataset consists of a 200 GB disk image of an old iMac (200 GB), and the disk image of a Mac Mini (80.38 GB)_.  These two datasets created by the environmental artists and their students from the late 1990s through 2010 and have not been analyzed. 

_The datasets are the property of Stanford University Libraries.  All literary rights reside the document creators. Any copying or replication of the dataset is prohibited without express permission of Stanford University Libraries and the document creators. _

Hash files of the datasets are available here is zip file \-  [^]
*NOTE -* Due to the shortness of time the percentage of duplication column was created in excel rather than incorporated into the script. With more time and effort we could script this function.

* _it would be useful to generate a report that outlines the similarities between all of the related datasets such as:_
** _percentage of files that are duplicates_
** _graph of overlap between various disk images_
** _user profiles graph - what users were working with particular groups of content over time_
* it would be useful to provide the collection creator with the ability to redact documents and construct a view of their born digital archive that can be presented online

