Follow

dear PDF-crowd - what's your favorite tool to identify dead references (to objects) and orphaned objects?

· · Web · 1 · 3 · 2

@mickylindlar

By "dead references", you mean references to objects that don't exist?

Most parsers should complain early and often about this kind of thing?!

Although in practice, at least with poppler and friends, we've found that specific tools really do focus only on what they're intended for. For example, pdfimages may not care if fonts are missing, whereas pdftotext does.

@mickylindlar Starting to dig ending every sentence about what _should_ be with PDF with a '?!'!

@wtfpdf

Yup, that's exactly what i meant by dead references. I also suspect that those are spotted early on that searching obj for their references is common, but orphans are a blind spot, somehow.

@wtfpdf @DidierStevens

Thanks! Will have to deep-dive into the Arlington model checker. In @DidierStevens pdf-parser i so far only found the possiblility to check for a (known and suspected to be orphaned) obj via regular search function.
it seems that while many readers ignore orphaned objects, many validators don't ;-P the question behind it being "if no one cares that it's there, but not really, shoudl validators care"?
Ah, the philosophic mysticism that is

@mickylindlar

From an information disclosure perspective validators MUST care IMHO.

Same goes for incremental updates, no?

@wtfpdf

i think both are very important aspects and should be reported by validators/techMD extractors.

trying to wrap my head better around how orphaned objects are treated by different tools.

from a cultural heritage perspective, it is just one (of the unforunately many) neglected points when it comes to questions like authenticity / content to be archived.

Sign in to participate in the conversation

Hometown is adapted from Mastodon, a decentralized social network with no ads, no corporate surveillance, and ethical design.