Sparse graph representations for procedural instructional documents

Show simple item record

dc.contributor.author Singh, Shruti
dc.contributor.author Gupta, Rishabh
dc.coverage.spatial United States of America
dc.date.accessioned 2024-02-14T10:09:33Z
dc.date.available 2024-02-14T10:09:33Z
dc.date.issued 2024-02
dc.identifier.citation Singh, Shruti and Gupta, Rishabh, "Sparse graph representations for procedural instructional documents", arXiv, Cornell University Library, DOI: arXiv:2402.03957, Feb. 2024.
dc.identifier.issn 2331-8422
dc.identifier.uri https://doi.org/10.48550/arXiv.2402.03957
dc.identifier.uri https://repository.iitgn.ac.in/handle/123456789/9759
dc.description.abstract Computation of document similarity is a critical task in various NLP domains that has applications in deduplication, matching, and recommendation. Traditional approaches for document similarity computation include learning representations of documents and employing a similarity or a distance function over the embeddings. However, pairwise similarities and differences are not efficiently captured by individual representations. Graph representations such as Joint Concept Interaction Graph (JCIG) represent a pair of documents as a joint undirected weighted graph. JCIGs facilitate an interpretable representation of document pairs as a graph. However, JCIGs are undirected, and don't consider the sequential flow of sentences in documents. We propose two approaches to model document similarity by representing document pairs as a directed and sparse JCIG that incorporates sequential information. We propose two algorithms inspired by Supergenome Sorting and Hamiltonian Path that replace the undirected edges with directed edges. Our approach also sparsifies the graph to O(n) edges from JCIG's worst case of O(n2). We show that our sparse directed graph model architecture consisting of a Siamese encoder and GCN achieves comparable results to the baseline on datasets not containing sequential information and beats the baseline by ten points on an instructional documents dataset containing sequential information.
dc.description.statementofresponsibility by Shruti Singh and Rishabh Gupta
dc.language.iso en_US
dc.publisher Cornell University Library
dc.title Sparse graph representations for procedural instructional documents
dc.type Article
dc.relation.journal arXiv


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search Digital Repository


Browse

My Account