Pose detection of humans using Camera

Category	Approach Description	Key Papers / Models	Pros	Cons
Top-Down 2D Pose Estimation	Detect person → estimate keypoints inside each bounding box	OpenPose (2017), AlphaPose, HRNet (2019), SimpleBaseline	High accuracy, robust for single/multi-person	Slow for many people, depends on detector
Bottom-Up 2D Pose Estimation	Detect all keypoints globally → group into individuals	OpenPose PAF, Associative Embedding, DeepCut, HigherHRNet	Fast in crowds, no person detector needed	Grouping errors, slightly less accurate
Transformer-Based 2D Pose	Global joint reasoning using Vision Transformers	TokenPose, HRFormer, PETR, MViT-Pose	SOTA accuracy, long-range context	Computationally heavy
Monocular 3D Pose (Image → 3D Skeleton)	Predict 3D joints from a single RGB frame	VNect (2017), HMR, SPIN, METRO, CLIFF, PARE	No multi-view needed, real-time possible	Depth ambiguity, relies on priors
Video-Based 3D Pose (Temporal)	Use sequences (RNN/CNN/Transformers) for stable 3D pose	VIBE (2020), TCMR, TMR, MotionBERT, MixSTE	Smooth & stable, handles occlusion	Requires video stream
Model-Based Mesh Recovery	Predict full human mesh (SMPL) from a monocular RGB	HMR, SMPLify-X, SPIN, PARE, FrankMocap, ROMP	Full mesh output, good for animation	Complex, compute-heavy
Regression-Based Pose	Directly regress joint coordinates or heatmaps	DeepPose (Google)	Simple & fast	Lower accuracy today
Heatmap-Based Keypoints	Predict per-joint 2D heatmaps	Hourglass, CPN, HRNet	High accuracy, robust	More memory, extra post-processing

GATs compute attention coefficients between nodes, allowing the model to focus on the most relevant neighbors when updating node embeddings.

✔️ Advantages

Learns which neighbors matter most

Works on graphs with varying degrees

Multi-head attention improves stability

Parallelizable (unlike RNN-based GNNs)

No need for Laplacian eigenvectors

Model	Key Mechanism	Strengths	Weaknesses	Best For
GCN (Kipf & Welling, 2017)	Fixed normalized adjacency for message passing	Fast, simple	Treats all neighbors equally	Node classification, semi-supervised learning
GraphSAGE (2017)	Sampling + aggregation (mean, LSTM, pooling)	Handles large graphs	Aggregators may lose structure	Inductive learning on huge graphs
GAT (2018)	Attention-based neighbor weighting	Learns importance of neighbors	Slower on large/dense graphs	Social networks, molecules, citation networks
Gated GNN / GGNN (2016)	Gated recurrent updates	Captures sequential structure	Harder to train	Program analysis, chemistry
Graph Transformer (2020–)	Global self-attention	Handles long-range relations	Computationally heavy	Molecules, scene graphs, NLP-style tasks

Monocular Depth Estimation — Inferring 3D from a Single Camera

📄

View Resume

LiDAR Perception Methods Comparison

🎥 Related Video