Follow

All my web archivist colleagues - is there a tool/script floating out that I can just feed a list/CSV of URLs and get back (if any) Wayback links?

Even better would be if I could just point it a plain-text/source file and it just pulled the URLs itself

something like the Broken Link Checker plugin I use on my Wordpress blog - but, y'know, not in PHP...

github.com/wpmudev/broken-link

@The_BFOOL IA-checker.py might be useful here but it's prone to breaking 😅 It doesn't give you the wayback url, this might be something that's included in the API but I don't remember github.com/jfcarrano/archives-

@joe oh thanks! the API can definitely return the captured snapshot URL, so I could probably figure out how to tweak that

archive.org/help/wayback_api.p

@edsu @sam thank you!!! yes, either/both of these seem like they will probably do the trick, or get me close enough

Just, no matter how many tutorials I've done, I'm very bad at thinking through/starting a Python script from scratch, and much better at looking at an existing thing that's close and make it do exactly what I want

@The_BFOOL @sam happy to field questions - the cdx api is non trivial and kind of finicky

@edsu @sam I don't think I need the CDX API, just the Availability JSON API - just want to pull the URL for the newest snapshot

this one (linked in that gist you sent) seems to do it cleanly. turns out cleanly parsing the URLs to give to it is the trickier bit, so I'm in the mode of calculating "will it take longer to figure out how to automate this or just feed 'em individually and copy-paste" (it's not that big a file...)

github.com/akamhy/waybackpy

@The_BFOOL About 6 years ago I wrote something in Python that does exactly this: github.com/bitsgalore/trouwRam Example input file: github.com/bitsgalore/trouwRam Just did a quick test and it still seems to work. You'll have to use Python 2.7.x though (or modify it for Python 3).

Sign in to participate in the conversation
digipres.club

digipres.club is a space for folks interested in productive conversations about, well, digital preservation! If you enjoy talking about how to do memory work with computers, or even with cardboard boxes of old photos, you belong with us on digipres.club. Many of us are/were Twitter users looking for an inclusive and community supported approach to social media. If any of these things sound good to you, consider joining us now.