Leeds image duplicates and versions

Skip to end of metadata
Go to start of metadata
Duplicate image files a.k.a find the master or external drive of horrors
Description 150G folder of image files, unknown origin, some are duplicates, some are derivatives, many are 3rd or 4th generation copies...
20,000+ files and where does one begin?
We need to determine which files we should save and which file can we delete? 
We need to clean this drive out and migrate content to the digital library file store and stop the madness!
Licensing non-exclusive use for undertaking preservation testing.  clear use with Leeds prior to copying dataset, majority of content is ok, but includes some provate data
Owner University of Leeds
Dataset Location find Jodie, she has it on an external drive, enjoy
Collection Champion Jodie Double
Issues brainstorm
  • duplicate files
  • versions (copy of a copy of a copy)
  • which copy is the master file?
  • is the file worth saving? or should we reshoot it
  • is the file what it says it is (JPG is a TIFF or TIFF is a JPG or JPG is really a PNG)
  • folder structure has multiplied complexity of finding content
  • what is the colour space? has that been tampered with
  • is there any EXIF data?
  • have the images been cropped
  • has the image been corrected?
  • is this the original scan
  • how many times has the JPG been opened?
  • relationships
  • dependencies
List of Issues A list of links to detailed Issue pages relevant to this Dataset
dataset dataset Delete
image image Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.