ClueWeb22 is released! It’s a dataset of 10 billion web pages in warc format with an accompanying paper.

Read more: idf.social/@ArxivIR/1094302830

Project page: lemurproject.org/clueweb22.php

The video of Michele C. Weigle's #NLMHistTalk "What’s in a Web Archive Collection? Summarization and Discovery of Archived Webpages," is now available at https://videocast.nih.gov/watch=44481

Dr. Weigle is a faculty member leading the Web Science and Digital Libraries Group (WS-DL) at Old Dominion University's Computer Science Department. If you are curious about her work or the rest of the WS-DL's work, feel free to DM me.

#WebArchiving #Summarization #Discovery #NLP #MachineLearning #ML #DigitalLibraries

TODAY! Michele C. Weigle, PhD, of Old Dominion University, will give the final #NLMHistTalk of the year, "What’s in a Web Archive Collection? Summarization and Discovery of Archived Webpages," today at 2:00 p.m. ET. Watch live: loom.ly/tT-T_ak

Ref: https://twitter.com/nlm_nih/status/1593258365584826369?s=61&t=N04klA_aeP_5FLCwRxhDPA

#WebArchiving #Summarization #Discovery #NLP #MachineLearning #ML #Collections

Hometown is adapted from Mastodon, a decentralized social network with no ads, no corporate surveillance, and ethical design.