Accès rapide
Documents anciens et reconnaissance automatique des écritures manuscrites
Vidéo
Cycle de conférences
Intervenant(s)
Date de captation : 23 juin 2022
Résumé
Sharing HTR datasets with standardized metadata: the HTR United initiative
Par Alix Chagué et Thibault Clérice.
Since some scholars adopted Ocropy in the mid-2010s, production of HTR or OCR ground truth has seen an impressive and steady growth. However, few projects share their gold dataset, and when they do, they are scattered across many different hosting options (Github, zenodo, gitlab, institutional repository, etc.) making them very hard to find. For reuse, when they are “discovered”, their description is often lacking crucial details. The HTR-United initiative is an answer to this problem: with a standardized metadata schema, a curated catalogue and tools focusing on helping them through every step, owners can now easily publish and make their dataset findable.
Partager sur les réseaux sociaux
Applications, éditions et jeux de données
Production