Show newer

I just exported all my contacts using:

Works great, but UI is slightly confusing because by default it only reports contacts with a known Fediverse presence. To report ALL contacts, make sure to:

1. Select all followings, followers and list members
2. Click on the SMALL download link at the bottom (large one only returns contacts with a known Fediverse handle!)

Result: a CSV file with all Twitter handles, names, bios + matching Fediverse handle, if available.

The end of twitter as a microcosm of the end of capitalism:

People banding together, helping each other out, trying to salvage what is good, shifting to something unfamiliar, discovering new norms and ways of life.

Revealing the revulsion and stupidity of kings & oligarchs. Former hierarchies tumbling down, some being resurrected, but without the same kind of hold.

Rest, repair, reconciliation.

python library 

trafilatura ( is a nice python library that we use to extract article full text from HTML documents for indexing in scholar. It has good accuracy and recall, works with "old" HTML (eg from web archives), and pulls out metadata like title, author, and date. There are lots of similar tools, mostly focused on news articles, and trafilatura is an improvement.

Thanks to Adrien Barbaresi for maintaining it!

#freesoftware #python #webarchiving #digitallibrary

@katethornhill I’d say no. Preservation is a system of care, which can be implemented with our without a singular dedicated technology system.

e.g. what matters is that data is replicated and monitored. Whether or not all the data is in the same storage system or repository is secondary.

Having said that, having a single dedicated preservation system often makes it easier to know what’s going on.

Hey folks. Can you share with me why you think or do not think having a preservation repository system is a requirement for running a library and archive service? Asking for a friend.

Preview release! -- the OpenCheck people directory.

I'm only announcing this to Mastodonians for now (we need it more over here). A bunch more people need to add themselves or flip the visibility toggle, but it's up, complete with Mastodon .csv download per topic. Details at the top of the page.

This is super bare bones, topic extraction is a little random, and I don't even have profile bios, but it's a start! Feedback welcome!

Excited to come across the beta of a version of Internet Archive focused on academic journal articles. The version for books is fantastic, so am hopeful about this:

(but surprised that it apparently starts with 'eighteenth-century journals', rather than 17thC journals... #JournalDesScavans is available on #Gallica; #PhilosophicalTransactions is available from #RoyalSociety)

"What are the most effective ways to minimize latency issues in geographically distributed collections?" Latest question on Q&A .

Sanity check: I frequently use "bitstream", "logical" and "semantic" as in Thibodeau's 3 levels of a digital object to describe different categories of risks and respective mitigation strategies for a digital object's long-term availability/interpretability. I was now told that "logical and semantic preservation are not known / used in digital preservation".
Curious to hear from y'all if you know / use the concept.

I cleaned up the releases so there's only one tag per release.

Still more to improve with the distros...but this is good enough for now.

Tomorrow, back to working on the application.

Show thread

File format dissection 

I uploaded a small revision of my JPEG image format dissection, along with a PDF version - and a minor bugfix.

For some reason i was a bit AWOL on . And while I did catch gems like @bitsgalore / Digital Dark Age Crew's I somehow totally missed this amazing musical contribution by U Calgary Library.
If you haven't seen it, check out the video below. I'll be singing
"media, i do believe i failed you. media, i know i've let you down" the rest of the night for sure.

On some folk were keen on the idea of a community-managed glossary of digital preservation terms.

If anyone wants to pitch in please make yourself known at

The Webrecorder gang released some new functionality that lets you embed #webarchives in your web pages, which includes a "Receipts" view that displays provenance information about the archive. I provided a little example on my (Jekyll) blawg:

I just discovered that #inkscape has a powerful set of command-line options. I will no longer use the graphical export page when doing web dev. This will save me so much time. Thank you, devs!

Example to convert all svg files recursively to png with twice the resolution:
find ./ -type f -name '*.svg' -exec sh -c 'inkscape -C $1 -d 192 -o "${1%.svg}-2x.png"' _ {} \;

The UNIX Pipe Card Game -- umm, this is the cutest?? I think I'd definitely incorporate this into learning if I were still teaching digipres students

Lots to digest and follow up on from the event my colleague Leontien Talboom and I ran yesterday. It was great that peers from a range of disciplines could join!
About the project:

There’s a brand new OpenRefine forum for the whole community. Whether your a user, a developer or a potential user/developer pop along and say hello!

Show older

Hometown is adapted from Mastodon, a decentralized social network with no ads, no corporate surveillance, and ethical design.