I need help figuring out if 2 disk images contain the same content. Both are 558 GB, both have 281 files. Within each disk image, there are only 7 files where the checksums don't match. 3 of those 7 files are .txt files from the bagging process, makes sense given diff metadata assigned


The disk images are expert witness format. The other 4 files w diff checksums are .EGA, .EGY, .E01, .info. Not sure what if that is a solid indicator of dissimilarilty of the disk images? These bags are tarred and on preservation storage and I'm trying to do the least effort required to say 'yes they're the same (except for metadata assigned during accessioning)' without pulling them both back. Any thoughts appreciated!

If I sound ignorant about disk images it's because I am, but trying to learn.

@elizabeengland We talked about this on Twitter, but seeing it on Mastodon, it reminds me that you have what seems like a rare opportunity to evaluate how the same imaging process can produce different data. I hope you publish what you find at least as a lightening talk or blog post in the future.

@ashley @nkrabben thank you both! I have a lot more digging to do with it, but it's giving good material for my SAA panel this summer with @amelish & others on transparency & complexity in digital preservation

Sign in to participate in the conversation

digipres.club is a space for folks interested in productive conversations about, well, digital preservation! If you enjoy talking about how to do memory work with computers, or even with cardboard boxes of old photos, you belong with us on digipres.club. Many of us are/were Twitter users looking for an inclusive and community supported approach to social media. If any of these things sound good to you, consider joining us now.