STEP: simultaneous tracking and estimation of pose for animals and humans

DR Home
→
Electrical Engineering
→
E-print Articles
→
View Item

dc.contributor.author	Verma, Shashikant
dc.contributor.author	Katti, Harish
dc.contributor.author	Debnath, Soumyaratna
dc.contributor.author	Swami, Yamuna
dc.contributor.author	Raman, Shanmuganathan
dc.coverage.spatial	United States of America
dc.date.accessioned	2025-03-28T15:38:35Z
dc.date.available	2025-03-28T15:38:35Z
dc.date.issued	2025-03
dc.identifier.citation	Verma, Shashikant; Katti, Harish; Debnath, Soumyaratna; Swami, Yamuna and Raman, Shanmuganathan, "STEP: simultaneous tracking and estimation of pose for animals and humans", arXiv, Cornell University Library, DOI: arXiv:2503.13344, Mar. 2025.
dc.identifier.uri	http://arxiv.org/abs/2503.13344
dc.identifier.uri	https://repository.iitgn.ac.in/handle/123456789/11138
dc.description.abstract	We introduce STEP, a novel framework utilizing Transformer-based discriminative model prediction for simultaneous tracking and estimation of pose across diverse animal species and humans. We are inspired by the fact that the human brain exploits spatiotemporal continuity and performs concurrent localization and pose estimation despite the specialization of brain areas for form and motion processing. Traditional discriminative models typically require predefined target states for determining model weights, a challenge we address through Gaussian Map Soft Prediction (GMSP) and Offset Map Regression Adapter (OMRA) Modules. These modules remove the necessity of keypoint target states as input, streamlining the process. Our method starts with a known target state in the initial frame of a given video sequence. It then seamlessly tracks the target and estimates keypoints of anatomical importance as output for subsequent frames. Unlike prevalent top-down pose estimation methods, our approach doesn't rely on per-frame target detections due to its tracking capability. This facilitates a significant advancement in inference efficiency and potential applications. We train and validate our approach on datasets encompassing diverse species. Our experiments demonstrate superior results compared to existing methods, opening doors to various applications, including but not limited to action recognition and behavioral analysis.
dc.description.statementofresponsibility	by Shashikant Verma, Harish Katti, Soumyaratna Debnath, Yamuna Swami and Shanmuganathan Raman
dc.language.iso	en_US
dc.publisher	Cornell University Library
dc.subject	Pose estimation
dc.subject	Visual tracking
dc.subject	Target model prediction
dc.title	STEP: simultaneous tracking and estimation of pose for animals and humans
dc.type	Article
dc.relation.journal	arXiv

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

E-print Articles [88]

Show simple item record

Search Digital Repository

Browse

All of DSpace
This Collection
- Titles
- Authors
- By Advisor
- By Issue Date
- Subjects
- By Type
- By Degree
- By Department

STEP: simultaneous tracking and estimation of pose for animals and humans

Files in this item

This item appears in the following Collection(s)

Search Digital Repository

Browse

All of DSpace

This Collection

My Account