Poster sessi:on at #WAC. I learn that there are two countries in Europe *without* a legal deposit of the Web. I let you guess their names.

🇳🇱 🇨🇭

#answers #WebLegalDeposit #WAC

Poster session at #WAC: I loved the "(most|least) cool club", a club of people managing Web crawlers and sharing (unfortunately with JIra and Confluence) regexps of things to exclude from the crawl (loop traps and things like that).

Now, panel at #WAC "Archiving Social Media In An Age of APIcalypse" (Twitter closed its API).

[Wondering how to archive the fediverse.]

@f_moncomble

Facebook closed its API long ago, pretending it was because of Cambridge Analytica (but the real reason was commecial), shutting down many research projects on social media.

#WAC #APIcalypse

Several speakers in the panel do not follow the title: they talk about what they did *before*, when API use was possible.
Anat Ben-David, on the contrary, explains what could be done in the future. APIs were not so good, after all (for instance, you are never sure of what they hide).

#WAC

Jérôme Thièvre (INA) is the first to mention Bluesky (which apparently has a working API but little content).

#WAC

TikTok has an API but its use requires that all research papers where it was used have to be pre-approved by TikTok!

#WAC

In the end, back to traditional Web scraping, and data donations.
(Medialab SciencesPo currently crawls and scraps #Doctissimo.)

#WAC

Now, back to identifiers, at #WAC "The Potentials and Challenges for Researchers and Web Archives Using the Persistent Web IDentifier (PWID)" (The speakers even have T-shirts branded "PWID".)

(The english-speaking Wikipedia page on PWID is not the expected one.)

An example of PWID:
urn:pwid:archive.org:2016-01-22T10:08:23Z:page:https://www.dr.dk

(from the Internet-Draft)

DR: Nyheder - Breaking - TV - Radio

Dit Nyhedsoverblik: Breaking news og seneste nyheder - Stream DR’s programmer på DRTV - Hør podcast på DR LYD

DR
Argh, Mastodon butchers the PWID :-(

The Internet-Draft is expired and there is no plan to revive it. The only specification for #PWID seems to be https://www.iana.org/assignments/urn-formal/pwid

#WAC

Now, "bit preservation" (presevving bits for the long term, not taking semantics into account).
* several copies (and no SPOF)
* check them

#WAC

"Decentralized Web Archiving and Replay via InterPlanetary Archival Record Object (IPARO)" is an attractive title.

#WAC

If I understand correctly, the goal is to put Internet Archive on IPFS (as WARC files).

But IPNS (IPFS naming system) has limits. Hence the new type, the IPARO, a list of IPFS names pointing to the various versions of an archived Web page.

Plus links going directlty to some points in the list (for resiliency, since IPFS does not guarantee persistence.)

#WAC

ReproZip-Web, a tool to capture (in the mathematical sense) all what is needed to reproduce a very dynamic Web site, with dependencies. (Something that ordinary crawlers cannot get.)

https://github.com/reprozip-news-apps/reprozip-web

"You need a Linux operating system and it can be difficult to get access at such a system.."

#WAC

GitHub - reprozip-news-apps/reprozip-web: ReproZip for the Preservation of Web Applications

ReproZip for the Preservation of Web Applications. Contribute to reprozip-news-apps/reprozip-web development by creating an account on GitHub.

GitHub

Example done with a Web site depending on PHP 5 :-)

#WAC

Now, sorry, but I have to leave #WAC for another event. Bye
https://cis.cnrs.fr/materialites-du-numerique/#sem-matnum
Matérialités du numérique - Centre Internet et Société

Centre Internet et Société
@bortzmeyer thank you for the live toot!
@Sphinx_Pouet Until recently,; there was even not an API.
@Sphinx_Pouet Valérie Schafer reported during the Q&A that it is quite common in business studies: many companies require that before giving access to their archives.
@Sphinx_Pouet Also, Benjamin Ooghe-Tabanou reported that we are not sure of what TikTok checks in the papers, but probably everything that may be bad for their reputation.
@bortzmeyer Merci pour les compléments 🙇
@Sphinx_Pouet Of course, it is here more complicated. Unlike corporations' archives, the content was no created by TikTok.
mc.fly (@[email protected])

Attached: 1 image Facebook gave Netflix all of your private messages in exchange for all your watch history. #fediverse rocks.

Milliways.social