Synthetic lines from historical manuscripts: an experiment using GAN and style transfer

Congrès : Visual Processing of Digital Manuscripts: Workflows, Pipelines, Best Practices. ICIAP 2023 Workshops. ICIAP 2023. (2023-09-11 - 2023-09-15)
Éditeur : Springer Nature Switzerland
Pages : 477-488

Consulter la fiche HAL

Résumé

Given enough data of sufficient quality, HTR systems can achieve high accuracy, regardless of language, script or medium. Despite growing pooling of datasets, the question of the required quantity of training material still remains crucial for the transfer of models to out-of-domain documents, or the recognition of new scripts and under-resourced character classes. We propose a new data augmentation strategy, using generative adversarial networks (GAN). Inspired by synthetic lines generation for printed documents, our objective is to generate handwritten lines in order to massively produce data for a given style or under-resourced character class. Our approach, based on a variant of ScrabbleGAN, demonstrates the feasibility for various scripts, either in the presence of a high number and variety of abbreviations (Latin) and spellings or letter forms (Old French), in a situation of data scarcity (Armenian), or in the instance of a very cursive script (Arabic Maghribi). We then study the impact of synthetic line generation on HTR, by evaluating the gain for out-of-domain documents and under-resourced classes.

Disciplines

Humanités numériques

Partager sur les réseaux sociaux

À découvrir

Découvrez d'autres productions de l'École sur les mêmes thématiques.

Humanités numériques

Consulter la page «Humanités numériques»

Computational Museology in the Age of Experience

Vidéo
- Sarah Kenderdine
Whose Pen Wrote the Map? Battling Over the Armenian Medieval Text Ashkharhatsuyts with Stylometry

Publication de chercheur
- Jean-Baptiste Camps,
  Chahan Vidal-Gorène
From questions to insights: a reproducible question-answering pipeline for historiographical corpus exploration

Publication de chercheur
- Lucas Terriel,
  Vincent Jolivet
A Riddle in a Haystack: LLM Detection of Intricate Wordplays in Colette and Willy's Novels for Authorship Attribution

Publication de chercheur
- Florian Cafiero,
  Marie Puren
Greening your database of literary works: How to avoid reinventing vocabularies, in favor of sustainable, reusable models

Publication de chercheur
- Kelly Christensen,
  Jean-Baptiste Camps
Évaluation automatique du retour à la source dans un contexte historique long et bruité : les débats parlementaires de la Troisième République française

Publication de chercheur
- Aurélien Pellet,
  Julien Perez,
  Marie Puren
Style in Eight Syllables: Metric Annotation and Stylometry of Chrétien de Troyes and Contemporaries

Publication de chercheur
- Jean-Baptiste Camps,
  Florian Cafiero,
  Philippe Chaumet-Riffaud,
  Damien Conceicao,
  Ulysse Godreau,
  Émilie Guidi,
  Théo Moins,
  Pierre-Alexandre Nistor,
  Benedetta Salvati,
  Alexandre Lionnet-Rollin
The times are a-changin': présent vs passé simple in French novels (1811-2024)

Publication de chercheur
- Simon Gabay,
  Jean Barré,
  Florian Cafiero
Consulter la page «Humanités numériques»

Nous suivre

Synthetic lines from historical manuscripts: an experiment using GAN and style transfer

Résumé

Résumé

Disciplines

Humanités numériques

Partager sur les réseaux sociaux

À découvrir

Humanités numériques

Computational Museology in the Age of Experience

Whose Pen Wrote the Map? Battling Over the Armenian Medieval Text Ashkharhatsuyts with Stylometry

From questions to insights: a reproducible question-answering pipeline for historiographical corpus exploration

A Riddle in a Haystack: LLM Detection of Intricate Wordplays in Colette and Willy's Novels for Authorship Attribution

Greening your database of literary works: How to avoid reinventing vocabularies, in favor of sustainable, reusable models

Évaluation automatique du retour à la source dans un contexte historique long et bruité : les débats parlementaires de la Troisième République française

Style in Eight Syllables: Metric Annotation and Stylometry of Chrétien de Troyes and Contemporaries

The times are a-changin': présent vs passé simple in French novels (1811-2024)