Tables to LaTeX: structure and content extraction from scientific tables

Show simple item record

dc.contributor.author Kayal, Pratik
dc.contributor.author Anand, Mrinal
dc.contributor.author Desai, Harsh
dc.contributor.author Singh, Mayank
dc.coverage.spatial United Kingdom
dc.date.accessioned 2022-11-03T05:41:12Z
dc.date.available 2022-11-03T05:41:12Z
dc.date.issued 2022-10
dc.identifier.citation Kayal, Pratik; Anand, Mrinal; Desai, Harsh and Singh, Mayank, "Tables to LaTeX: structure and content extraction from scientific tables", International Journal on Document Analysis and Recognition (IJDAR), DOI: 10.1007/s10032-022-00420-9, Oct. 2022. en_US
dc.identifier.issn 1433-2833
dc.identifier.issn 1433-2825
dc.identifier.uri https://doi.org/10.1007/s10032-022-00420-9
dc.identifier.uri https://repository.iitgn.ac.in/handle/123456789/8280
dc.description.abstract Scientific documents contain tables that list important information in a concise fashion. Structure and content extraction from tables embedded within PDF research documents is a very challenging task due to the existence of visual features like spanning cells and content features like mathematical symbols and equations. Most existing table structure identification methods tend to ignore these academic writing features. In this paper, we adapt the transformer-based language modeling paradigm for scientific table structure and content extraction. Specifically, the proposed model converts a tabular image to its corresponding LaTeX source code. Overall, we outperform the current state-of-the-art baselines and achieve an exact match accuracy of 70.35 and 49.69% on table structure and content extraction, respectively. Further analysis demonstrates that the proposed models efficiently identify the number of rows and columns, the alphanumeric characters, the LaTeX tokens, and symbols.
dc.description.statementofresponsibility by Pratik Kayal, Mrinal Anand, Harsh Desai and Mayank Singh
dc.language.iso en_US en_US
dc.publisher Springer en_US
dc.subject LaTeX en_US
dc.subject L-OCR en_US
dc.subject TSR en_US
dc.subject PGRT en_US
dc.subject FGRT en_US
dc.title Tables to LaTeX: structure and content extraction from scientific tables en_US
dc.type Journal Paper en_US
dc.relation.journal International Journal on Document Analysis and Recognition (IJDAR)


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search Digital Repository


Browse

My Account