• Revue : Studi francesi (206)
  • Pages : 377-383

Résumé

<div><p>Automatic recognition of handwritten text is a fundamental step in digital humanities for creating corpora and analyzing texts. Typically, for documents with regular scripts and standard layouts, high recognition accuracy above 95% is achievable. However, these success rates drastically decline when dealing with damaged documents, often resulting in significant content loss or making portions of the page difficult to decipher. Paleographers and philologists can often infer potential readings based on their expertise and knowledge of textual traditions. Nonetheless, with automatic document recognition, information loss severely impacts both the detection of content and readability. This study focuses on analyzing the layout of a burned document, specifically the manuscript L.II.14 from Turin. Experimental results using generative neural networks demonstrate demonstrate a 69-percentage point improvement in identifying content areas compared to traditional handwritten text recognition (HTR) models. This enhanced pipeline not only advances computational processing of damaged historical documents but also opens new avenues for document reconstruction.</p></div>

Disciplines

    Partager sur les réseaux sociaux

    À découvrir

    Découvrez d'autres productions de l'École sur les mêmes thématiques.

    Humanités numériques