dear PDF-crowd - what's your favorite tool to identify dead references (to objects) and orphaned objects?
By "dead references", you mean references to objects that don't exist?
Most parsers should complain early and often about this kind of thing?!
Although in practice, at least with poppler and friends, we've found that specific tools really do focus only on what they're intended for. For example, pdfimages may not care if fonts are missing, whereas pdftotext does.
@mickylindlar Starting to dig ending every sentence about what _should_ be with PDF with a '?!'!
Yup, that's exactly what i meant by dead references. I also suspect that those are spotted early on that searching obj for their references is common, but orphans are a blind spot, somehow.
Thanks! Will have to deep-dive into the Arlington model checker. In @DidierStevens pdf-parser i so far only found the possiblility to check for a (known and suspected to be orphaned) obj via regular search function.
it seems that while many readers ignore orphaned objects, many validators don't ;-P the question behind it being "if no one cares that it's there, but not really, shoudl validators care"?
Ah, the philosophic mysticism that is #wtfPDF
From an information disclosure perspective validators MUST care IMHO.
Same goes for incremental updates, no?
i think both are very important aspects and should be reported by validators/techMD extractors.
trying to wrap my head better around how orphaned objects are treated by different tools.
from a cultural heritage perspective, it is just one (of the unforunately many) neglected points when it comes to questions like authenticity / content to be archived.
@mickylindlar As you know, so many parsers ignore orphaned objects... I'm following this with interest!
Arlington model checker https://github.com/pdf-association/arlington-pdf-model/tree/master/TestGrammar?
#qpdf?
Maybe @DidierStevens ? https://blog.didierstevens.com/programs/pdf-tools/