I need help figuring out if 2 disk images contain the same content. Both are 558 GB, both have 281 files. Within each disk image, there are only 7 files where the checksums don't match. 3 of those 7 files are .txt files from the bagging process, makes sense given diff metadata assigned


The disk images are expert witness format. The other 4 files w diff checksums are .EGA, .EGY, .E01, .info. Not sure what if that is a solid indicator of dissimilarilty of the disk images? These bags are tarred and on preservation storage and I'm trying to do the least effort required to say 'yes they're the same (except for metadata assigned during accessioning)' without pulling them both back. Any thoughts appreciated!

If I sound ignorant about disk images it's because I am, but trying to learn.

@elizabeengland We talked about this on Twitter, but seeing it on Mastodon, it reminds me that you have what seems like a rare opportunity to evaluate how the same imaging process can produce different data. I hope you publish what you find at least as a lightening talk or blog post in the future.

@ashley @nkrabben thank you both! I have a lot more digging to do with it, but it's giving good material for my SAA panel this summer with @amelish & others on transparency & complexity in digital preservation

