Follow

web archiving folks: do you/how do you document decisions such as collection scope rules? I'm struggling with the transparency/authority of such decisions, like: who decided to block this subdomain, when, and why? Where should that info live? cc @sabrams @amelish @landlibrarian @joe

@elizabeengland i think about this all the time (and would be down for, like, a call about this, just fyi @landlibrarian, @joe, @amelish). i record scoping decisions / redirect / etc. in the 'seed-level note section' that archive-it provides, but i don't know what to do about other decisions.

@sabrams thx! So are you currently not documenting decisions such as blocking an entire host? (If you couldn't tell, that's the one that is grinding my gears rn). Also @amelish I'm thinking about changing my SAA panel topic to something on web archiving transparency ¯\_(ツ)_/¯

@elizabeengland @amelish i document that in the seed notes section, too! though it hasn't happened that often yet. do you see a lot of that with institutional sites?

@elizabeengland i get stuck because every one of our collections is built by a different group of selectors with different priorities, biases, goals, etc., and, like, how do i document that? where does it live? we don't do finding aids, and it seems out of place in a marc record?

@elizabeengland @landlibrarian @amelish @sabrams If my last toot is any indication, I'm still learning about scoping rules that are in place here. Metadata about them is not there, mostly because our use of AIT is still nascent. So the ideas on this thread have been very helpful!

@joe @amelish @landlibrarian @elizabeengland another question: if you crawl a site using webrecorder and then upload the warcs to archive-it, does that merit some sort of scope note somewhere?

@sabrams @elizabeengland @landlibrarian @amelish I haven't used the upload feature before, but would that it's from a webrecorder crawl be mentioned in the header when you view it in archive-it wayback? Or are you talking about documenting the decision to use webrecorder?

Sign in to participate in the conversation
digipres.club

digipres.club is a space for folks interested in productive conversations about, well, digital preservation! If you enjoy talking about how to do memory work with computers, or even with cardboard boxes of old photos, you belong with us on digipres.club. Many of us are/were Twitter users looking for an inclusive and community supported approach to social media. If any of these things sound good to you, consider joining us now.