• Dans Digital Preservation Metadata for Practitioners
  • Éditeur : Springer International Publishing
  • Pages : 59-82

Résumé

Twenty years after the pioneering experiments performed by Internet Archive and few national libraries, web archiving has become a common activity of many scientific, cultural, and heritage institutions. They are using a set of tools, generally open source, to identify, harvest, store, index, make available to end users, and preserve internet content over the long term. Institutions seeking to preserve web archives are however facing major challenges: not only the huge amount of collected data, but also the lack of fully reliable metadata, which are crucial to understand the web archives and inform future preservation actions upon them. Web archives are generally stored in container formats, notably the ARC file format and its successor, the WARC format—an ISO standard. Context and Provenance information, generated prior to or as part of the harvesting process, is stored in these container formats, but other metadata—especially information on the formats of the collected files—may be generated afterwards. To store and archive these assets in digital repositories, it is necessary to record and manage their metadata. Therefore, institutions need to make data and metadata modeling choices, which should be consistent not only with the design of their own repository and the kind and amount of data they have to preserve, but also with their conceptual view of the nature of web archives. This paper presents the choices and achievements of the National Library of France, called “container modeling”. It then compares it to the approaches of other members of the International Internet Preservation Consortium and to the projects of the New York Art Resources Consortium. It underlines how the different solutions are implemented with PREMIS and concludes with the use of format identification tools and metadata vocabularies for emulation strategies.

Partager sur les réseaux sociaux

Publications de chercheur

Publications aux éditions de l’École

Sur les mêmes thématiques

Applications, éditions et jeux de données