Here's an interesting discussion about why it's difficult to archive Facebook, at least with current web archiving tech.
https://forum.webrecorder.net/t/archiving-facebook/37
FB's user interface is driven by HTTP POSTs to facebook.com/api/graphql/ so the usual web archiving crawlers (which typically discover URLs and GET them) won't work. Archiving bots or tools like Webrecorder that load and interact with the DOM have more luck recording.
@edsu I’m not 100% sure TBH - lemme know what you find! 🙂
@anj it looks like fuzzy matching was started for HTTP GETs. Here are a bunch of rules that get compiled in:
https://github.com/webrecorder/pywb/blob/c7373ba785c1a8173ef2126a46ffaf5725690b10/pywb/rules.yaml
Maybe I'm missing it or looking in the wrong place, but I don't actually see one for FB's GraphQL.
@anj nice, thanks Andy! Is the fuzzy matching kind of like a levenshtein distance type of thing? I guess I could take a peek in the code...