Pinned post

New blog post (apparently by popular demand) - how to preserve your personal archive, featuring @timhutton's Archive parser and @Luca's FediFinder tools. Feedback welcome!

"The longer you can delay exposure to , the more likely an exposure won’t result in infection due to improved vaccines or mitigation measures, and the more likely there will be a better treatment available if you do become infected"

Excellent piece by @chrischirp and @kityates (also pretty much sums up my own approach since the start of the pandemic)

Still figuring this space out but new thing on Sustainability of Digital Formats is our 2022-2023 workplan for new fdds as well as a recent publication log to document when new fdds are out: All comments welcome

We have a blog post coming out next week about this and other digipres file format things we've been up to since June.

@bitsgalore and if you want a junk detector to offer some indication of when text extraction has failed, e.g. mojibake or missing Unicode mappings in PDF, look no farther than #ApacheTika’s tika-eval module.

Only today found out about the "isutf8" tool, which checks if one or more input files are valid UTF-8.

More info in this blog post from 2012:

Helping out on fixing some weird issues in a file earlier today reminded me of the importance of articles like these:

Ten Simple Rules for Digital Data Storage:

Data Organization in Spreadsheets:

Oh, apparently the Open Preservation Foundation is working on a validator for Open Document Format Spreadsheets, looks intriguing 📎 👀

Took a while, but I finally uploaded both the video and a FLAC audio file of the Digital Dark Age Crew's "Wheel Out the Digital Dark Age Klaxon" to for posterity (and immediate download convenience) 📯

Natalia, the lead developer of the open-source project, have just published the latest progress report:

For anyone still celebrating World Digital Dark Age Day, please note that associated files /AF may appear anywhere in a PDF file, not just in annotations or in the EmbeddedFiles name tree!

🤮 🤮 🤮

New blog post:

Archive your Tweets with Tweetback

Tweetback is built with @eleventy and I do think #Eleventy plays a special role here. Eleventy is a production ready, stable site generator that now has very concrete public proof of many projects with ~50,000 page builds (and even one in there with >118,000 pages—hi @nhoizey)

The paper I presented at is now on Open Science Framework:

"Digital Preservation Pipeline for Data Storage Media At The Cinémathèque Suisse"

You can also find my beautiful slides illustrated with some of the nice images from the collections of the Swiss National Film Archive.

New paper, "Slide Decks as Government Publications: Exploring Two Decades of PowerPoint Files Archived from U.S. Government Websites" with both links to the final version and an OA preprint ->

the Digital Preservation team at Cambridge University Libraries has a blog! Here you can read about what team members are working on, issues encountered, events attended, etc. It can be found at digitalpreservation-blog.lib.c

The story of the development of Quattro Pro is amazing.
Early development involved Robert Stein, who was the guy who brought Tetris to the west.

I wrote a blog about analyzing one instance of the many possible JHOVE PDF-HUL-38 scenarios. Happy about any feedback / comment / recommendation/ correction:

I made a visualized dissection of a tiny MP4 file. It's not final, and maybe even the dissected file could be improved ?
I welcome feedback.

Show older

Hometown is adapted from Mastodon, a decentralized social network with no ads, no corporate surveillance, and ethical design.