Updated on 2025.06.28
Usage instructions: here
3D Segmentation
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-06-26 | SiM3D: Single-instance Multiview Multimodal and Multisetup 3D Anomaly Detection Benchmark | Alex Costanzino et.al. | 2506.21549 | null |
2025-06-26 | GroundFlow: A Plug-in Module for Temporal Reasoning on 3D Point Cloud Sequential Grounding | Zijun Lin et.al. | 2506.21188 | null |
2025-06-24 | ReCoGNet: Recurrent Context-Guided Network for 3D MRI Prostate Segmentation | Ahmad Mustafa et.al. | 2506.19687 | null |
2025-06-22 | Auto-Regressive Surface Cutting | Yang Li et.al. | 2506.18017 | null |
2025-06-17 | I Speak and You Find: Robust 3D Visual Grounding with Noisy and Ambiguous Speech Inputs | Yu Qi et.al. | 2506.14495 | null |
2025-06-20 | Cross-Modal Geometric Hierarchy Fusion: An Implicit-Submap Driven Framework for Resilient 3D Place Recognition | Xiaohui Jiang et.al. | 2506.14243 | link |
2025-06-17 | Unified Representation Space for 3D Visual Grounding | Yinuo Zheng et.al. | 2506.14238 | null |
2025-06-09 | PIG: Physically-based Multi-Material Interaction with 3D Gaussians | Zeyu Xiao et.al. | 2506.07657 | null |
2025-06-06 | NeurNCD: Novel Class Discovery via Implicit Neural Representation | Junming Wang et.al. | 2506.06412 | null |
2025-06-05 | From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes | Tianxu Wang et.al. | 2506.04897 | null |
2025-06-05 | Midplane based 3D single pass unbiased segment-to-segment contact interaction using penalty method | Indrajeet Sahu et.al. | 2506.04841 | null |
2025-06-05 | OpenMaskDINO3D : Reasoning 3D Segmentation via Large Language Model | Kunshen Zhang et.al. | 2506.04837 | link |
2025-05-28 | Zero-Shot 3D Visual Grounding from Vision-Language Models | Rong Li et.al. | 2505.22429 | null |
2025-05-26 | Rep3D: Re-parameterize Large 3D Kernels with Low-Rank Receptive Modeling for Medical Imaging | Ho Hin Lee et.al. | 2505.19603 | null |
2025-05-23 | SVL: Spike-based Vision-language Pretraining for Efficient 3D Open-world Understanding | Xuerui Qiu et.al. | 2505.17674 | null |
2025-06-03 | A Unified Multi-Scale Attention-Based Network for Automatic 3D Segmentation of Lung Parenchyma & Nodules In Thoracic CT Images | Muhammad Abdullah et.al. | 2505.17602 | link |
2025-05-23 | From Flight to Insight: Semantic 3D Reconstruction for Aerial Inspection via Gaussian Splatting and Language-Guided Segmentation | Mahmoud Chick Zaouali et.al. | 2505.17402 | null |
2025-05-18 | Attention-Enhanced U-Net for Accurate Segmentation of COVID-19 Infected Lung Regions in CT Scans | Amal Lahchim et.al. | 2505.12298 | null |
2025-05-17 | iSegMan: Interactive Segment-and-Manipulate 3D Gaussians | Yian Zhao et.al. | 2505.11934 | null |
2025-05-15 | MOSAIC: A Multi-View 2.5D Organ Slice Selector with Cross-Attentional Reasoning for Anatomically-Aware CT Localization in Medical Organ Segmentation | Hania Ghouse et.al. | 2505.10672 | null |
2025-05-27 | HWA-UNETR: Hierarchical Window Aggregate UNETR for 3D Multimodal Gastric Lesion Segmentation | Jiaming Liang et.al. | 2505.10464 | link |
2025-05-13 | Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving | Zongchuang Zhao et.al. | 2505.08725 | link |
2025-05-08 | DenseGrounding: Improving Dense Language-Vision Semantics for Ego-Centric 3D Visual Grounding | Henry Zheng et.al. | 2505.04965 | null |
2025-05-20 | AS3D: 2D-Assisted Cross-Modal Understanding with Semantic-Spatial Scene Graphs for 3D Visual Grounding | Feng Xiao et.al. | 2505.04058 | link |
2025-05-04 | Spotting the Unexpected (STU): A 3D LiDAR Dataset for Anomaly Segmentation in Autonomous Driving | Alexey Nekrasov et.al. | 2505.02148 | null |
2025-05-03 | Probabilistic Interactive 3D Segmentation with Hierarchical Neural Processes | Jie Liu et.al. | 2505.01726 | null |
2025-04-30 | SAM4EM: Efficient memory-based two stage prompt-free segment anything model adapter for complex 3D neuroscience electron microscopy stacks | Uzair Shah et.al. | 2504.21544 | link |
2025-05-04 | Pixels2Points: Fusing 2D and 3D Features for Facial Skin Segmentation | Victoria Yue Chen et.al. | 2504.19718 | null |
2025-04-24 | OmniMamba4D: Spatio-temporal Mamba for longitudinal CT lesion segmentation | Justin Namuk Kim et.al. | 2504.09655 | null |
2025-04-13 | Ges3ViG: Incorporating Pointing Gestures into Language-Based 3D Visual Grounding for Embodied Reference Understanding | Atharv Mahesh Mane et.al. | 2504.09623 | link |
2025-04-11 | DSM: Building A Diverse Semantic Map for 3D Visual Grounding | Qinghongbing Xie et.al. | 2504.08307 | null |
2025-04-09 | MedSegFactory: Text-Guided Generation of Medical Image-Mask Pairs | Jiawei Mao et.al. | 2504.06897 | null |
2025-04-08 | InvNeRF-Seg: Fine-Tuning a Pre-Trained NeRF for 3D Object Segmentation | Jiangsan Zhao et.al. | 2504.05751 | null |
2025-04-01 | Deconver: A Deconvolutional Network for Medical Image Segmentation | Pooya Ashtari et.al. | 2504.00302 | link |
2025-03-30 | ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and Reasoning | Zhenyang Liu et.al. | 2503.23297 | null |
2025-03-28 | TranSplat: Lighting-Consistent Cross-Scene Object Transfer with 3D Gaussian Splatting | Boyang et.al. | 2503.22676 | null |
2025-03-28 | NuGrounding: A Multi-View 3D Visual Grounding Framework in Autonomous Driving | Fuhao Li et.al. | 2503.22436 | null |
2025-03-28 | Segment then Splat: A Unified Approach for 3D Open-Vocabulary Segmentation based on Gaussian Splatting | Yiren Lu et.al. | 2503.22204 | null |
2025-03-26 | COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting | Jiaxin Zhang et.al. | 2503.19443 | link |
2025-03-24 | DINO in the Room: Leveraging 2D Foundation Models for 3D Segmentation | Karim Abou Zeid et.al. | 2503.18944 | link |
2025-03-24 | ZECO: ZeroFusion Guided 3D MRI Conditional Generation | Feiran Wang et.al. | 2503.18246 | link |
2025-03-23 | PanopticSplatting: End-to-End Panoptic Gaussian Splatting | Yuxuan Xie et.al. | 2503.18073 | null |
2025-03-19 | SPNeRF: Open Vocabulary 3D Neural Scene Segmentation with Superpoints | Weiwen Hu et.al. | 2503.15712 | null |
2025-03-19 | Federated Continual 3D Segmentation With Single-round Communication | Can Peng et.al. | 2503.15414 | null |
2025-03-18 | Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting | Runsong Zhu et.al. | 2503.14029 | link |
2025-03-17 | Adaptive Transformer Attention and Multi-Scale Fusion for Spine 3D Segmentation | Yanlin Xiang et.al. | 2503.12853 | null |
2025-03-12 | QuickDraw: Fast Visualization, Analysis and Active Learning for Medical Image Segmentation | Daniel Syomichev et.al. | 2503.09885 | link |
2025-03-17 | WildSeg3D: Segment Any 3D Objects in the Wild from 2D Images | Yansong Guo et.al. | 2503.08407 | null |
2025-03-11 | nnInteractive: Redefining 3D Promptable Segmentation | Fabian Isensee et.al. | 2503.08373 | link |
2025-03-11 | Talk2PC: Enhancing 3D Visual Grounding through LiDAR and Radar Point Clouds Fusion for Autonomous Driving | Runwei Guan et.al. | 2503.08336 | null |
2025-03-10 | SegResMamba: An Efficient Architecture for 3D Medical Image Segmentation | Badhan Kumar Das et.al. | 2503.07766 | null |
2025-03-07 | HexPlane Representation for 3D Semantic Scene Understanding | Zeren Chen et.al. | 2503.05127 | null |
2025-03-03 | OnlineAnySeg: Online Zero-Shot 3D Segmentation by Visual Foundation Model Guided 2D Mask Merging | Yijie Tang et.al. | 2503.01309 | null |
2025-02-27 | Open-Vocabulary Semantic Part Segmentation of 3D Human | Keito Suzuki et.al. | 2502.19782 | null |
2025-02-27 | Deep Learning-Based Approach for Automatic 2D and 3D MRI Segmentation of Gliomas | Kiranmayee Janardhan et.al. | 2502.19760 | null |
2025-02-27 | ProxyTransformation: Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding | Qihang Peng et.al. | 2502.19247 | null |
2025-02-26 | Subclass Classification of Gliomas Using MRI Fusion Technique | Kiranmayee Janardhan et.al. | 2502.18775 | null |
2025-02-22 | Pointmap Association and Piecewise-Plane Constraint for Consistent and Compact 3D Gaussian Segmentation Field | Wenhao Hu et.al. | 2502.16303 | null |
2025-02-20 | Structurally Disentangled Feature Fields Distillation for 3D Understanding and Editing | Yoel Levy et.al. | 2502.14789 | null |
2025-02-19 | Pericoronary adipose tissue attenuation as a predictor of functional severity of coronary stenosis | Marta Pillitteri et.al. | 2502.13649 | null |
2025-02-18 | Learning Wall Segmentation in 3D Vessel Trees using Sparse Annotations | Hinrich Rahlfs et.al. | 2502.12801 | null |
2025-02-14 | Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding | Wenxuan Guo et.al. | 2502.10392 | link |
2025-02-04 | Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation | Junha Lee et.al. | 2502.02548 | null |
2025-02-20 | Evolving Symbolic 3D Visual Grounder with Weakly Supervised Reflection | Boyu Mi et.al. | 2502.01401 | link |
2025-02-01 | Vision-Language Modeling in PET/CT for Visual Grounding of Positive Findings | Zachary Huemann et.al. | 2502.00528 | null |
2025-01-31 | Laser: Efficient Language-Guided Segmentation in Neural Radiance Fields | Xingyu Miao et.al. | 2501.19084 | link |
2025-01-30 | Full-Head Segmentation of MRI with Abnormal Brain Anatomy: Model and Data Release | Andrew M Birnbaum et.al. | 2501.18716 | link |
2025-01-29 | 3DSES: an indoor Lidar point cloud segmentation dataset with real and pseudo-labels from a 3D model | Maxime Mérizette et.al. | 2501.17534 | null |
2025-01-27 | CLISC: Bridging clip and sam by enhanced cam for unsupervised brain tumor segmentation | Xiaochuan Ma et.al. | 2501.16246 | null |
2025-01-18 | No More Sliding Window: Efficient 3D Medical Image Segmentation with Differentiable Top-k Patch Sampling | Young Seok Jeon et.al. | 2501.10814 | null |
2025-01-16 | AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring | Xinyi Wang et.al. | 2501.09428 | null |
2025-01-17 | Text-guided Synthetic Geometric Augmentation for Zero-shot 3D Understanding | Kohei Torimi et.al. | 2501.09278 | null |
2025-01-12 | 3DCoMPaT200: Language-Grounded Compositional Understanding of Parts and Materials of 3D Shapes | Mahmoud Ahmed et.al. | 2501.06785 | link |
2025-01-10 | Swin-X2S: Reconstructing 3D Shape from 2D Biplanar X-ray with Swin Transformers | Kuan Liu et.al. | 2501.05961 | null |
2025-01-07 | Self-adaptive vision-language model for 3D segmentation of pulmonary artery and vein | Xiaotong Guo et.al. | 2501.03722 | null |
2025-01-09 | GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models | Zhangyang Qi et.al. | 2501.01428 | link |
2025-01-02 | ViGiL3D: A Linguistically Diverse Dataset for 3D Visual Grounding | Austin T. Wang et.al. | 2501.01366 | null |
2024-12-31 | OVGaussian: Generalizable 3D Gaussian Segmentation with Open Vocabularies | Runnan Chen et.al. | 2501.00326 | null |
2024-12-28 | Advances in Additive Manufacturing of 3D-segmented Plastic Scintillator Detectors for Particle Tracking and Calorimetry | Umut Kose et.al. | 2412.20267 | null |
2024-12-24 | LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding | Hao Li et.al. | 2412.17635 | null |
2024-12-22 | GSemSplat: Generalizable Semantic 3D Gaussian Splatting from Uncalibrated Image Pairs | Xingrui Wang et.al. | 2412.16932 | link |
2024-12-18 | MobiFuse: A High-Precision On-device Depth Perception System with Multi-Data Fusion | Jinrui Zhang et.al. | 2412.13848 | null |
2024-12-14 | DCSEG: Decoupled 3D Open-Set Segmentation using Gaussian Splatting | Luis Wiedmann et.al. | 2412.10972 | link |
Reasoning Segmentation
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-06-12 | MedSeg-R: Reasoning Segmentation in Medical Images with Multimodal Large Language Models | Yu Huang et.al. | 2506.10465 | null |
2025-06-11 | Decoupling the Image Perception and Multimodal Reasoning for Reasoning Segmentation with Digital Twin Representations | Yizhen Li et.al. | 2506.07943 | null |
2025-06-05 | OpenMaskDINO3D : Reasoning 3D Segmentation via Large Language Model | Kunshen Zhang et.al. | 2506.04837 | link |
2025-06-04 | RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought | Yi Lu et.al. | 2506.04277 | null |
2025-05-29 | PixelThink: Towards Efficient Chain-of-Pixel Reasoning | Song Wang et.al. | 2505.23727 | null |
2025-06-15 | PARTONOMY: Large Multimodal Models with Part-Level Visual Understanding | Ansel Blume et.al. | 2505.20759 | null |
2025-05-24 | Reasoning Segmentation for Images and Videos: A Survey | Yiqing Shen et.al. | 2505.18816 | null |
2025-05-22 | PRS-Med: Position Reasoning Segmentation with Vision-Language Model in Medical Imaging | Quoc-Huy Trinh et.al. | 2505.11872 | null |
2025-05-17 | RVTBench: A Benchmark for Visual Reasoning Tasks | Yiqing Shen et.al. | 2505.11838 | link |
2025-05-05 | LISAT: Language-Instructed Segmentation Assistant for Satellite Imagery | Jerome Quenum et.al. | 2505.02829 | null |
2025-04-17 | SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding | Qianqian Sun et.al. | 2504.12704 | null |
2025-04-23 | MediSee: Reasoning-based Pixel-level Perception in Medical Images | Qinyue Tong et.al. | 2504.11008 | null |
2025-04-15 | LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation | Hanning Chen et.al. | 2504.10854 | null |
2025-04-01 | POPEN: Preference-Based Optimization and Ensemble for LVLM-Based Reasoning Segmentation | Lanyun Zhu et.al. | 2504.00640 | null |
2025-03-27 | Online Reasoning Video Segmentation with Just-in-Time Digital Twins | Yiqing Shen et.al. | 2503.21056 | null |
2025-03-26 | Operating Room Workflow Analysis via Reasoning Segmentation over Digital Twins | Yiqing Shen et.al. | 2503.21054 | null |
2025-03-23 | MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation | Jiaxin Huang et.al. | 2503.18135 | null |
2025-03-19 | VEGGIE: Instructional Editing and Reasoning of Video Concepts with Grounded Generation | Shoubin Yu et.al. | 2503.14350 | null |
2025-03-18 | MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning Segmentation | Donggon Jang et.al. | 2503.13881 | link |
2025-03-13 | Unveiling the Invisible: Reasoning Complex Occlusions Amodally with AURA | Zhixuan Li et.al. | 2503.10225 | null |
2025-03-11 | TSCnet: A Text-driven Semantic-level Controllable Framework for Customized Low-Light Image Enhancement | Miao Zhang et.al. | 2503.08168 | null |
2025-03-25 | Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts | Shiu-hong Kao et.al. | 2503.07503 | null |
2025-03-13 | InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models | Yuchen Yan et.al. | 2503.06692 | null |
2025-03-09 | Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement | Yuqi Liu et.al. | 2503.06520 | link |
2025-03-04 | UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface | Hao Tang et.al. | 2503.01342 | link |
2025-02-13 | Pixel-Level Reasoning Segmentation via Multi-turn Conversations | Dexian Cai et.al. | 2502.09447 | link |
2025-01-15 | The Devil is in Temporal Token: High Quality Video Reasoning Segmentation | Sitong Gong et.al. | 2501.08549 | link |
2024-12-19 | PRIMA: Multi-Image Vision-Language Models for Reasoning Segmentation | Muntasir Wahed et.al. | 2412.15209 | null |
2024-12-18 | InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models | Cong Wei et.al. | 2412.14006 | link |
2024-12-02 | HyperSeg: Towards Universal Visual Segmentation with Large Language Model | Cong Wei et.al. | 2411.17606 | link |
2024-11-21 | Multimodal 3D Reasoning Segmentation with Complex Scenes | Xueying Jiang et.al. | 2411.13927 | null |
2024-11-15 | Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level | Andong Deng et.al. | 2411.09921 | null |
2024-10-31 | SegLLM: Multi-round Reasoning Segmentation | XuDong Wang et.al. | 2410.18923 | null |
2024-09-29 | One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos | Zechen Bai et.al. | 2409.19603 | link |
2024-09-20 | Instruction-guided Multi-Granularity Segmentation and Captioning with Large Multimodal Model | Li Zhou et.al. | 2409.13407 | link |
2025-02-10 | Visual Agents as Fast and Slow Thinkers | Guangyan Sun et.al. | 2408.08862 | link |
3D Generative
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-06-26 | Geometry and Perception Guided Gaussians for Multiview-consistent 3D Generation from a Single Image | Pufan Li et.al. | 2506.21152 | null |
2025-06-25 | WonderFree: Enhancing Novel View Quality and Cross-View Consistency for 3D Scene Exploration | Chaojun Ni et.al. | 2506.20590 | null |
2025-06-23 | 3D Arena: An Open Platform for Generative 3D Evaluation | Dylan Ebert et.al. | 2506.18787 | null |
2025-06-23 | Geometry-Aware Preference Learning for 3D Texture Generation | AmirHossein Zamani et.al. | 2506.18331 | null |
2025-06-13 | VEIGAR: View-consistent Explicit Inpainting and Geometry Alignment for 3D object Removal | Pham Khai Nguyen Do et.al. | 2506.15821 | null |
2025-06-18 | Nabla-R2D3: Effective and Efficient 3D Diffusion Alignment with 2D Rewards | Qingming Liu et.al. | 2506.15684 | null |
2025-06-18 | Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material | Team Hunyuan3D et.al. | 2506.15442 | link |
2025-06-17 | RobotSmith: Generative Robotic Tool Design for Acquisition of Complex Manipulation Skills | Chunru Lin et.al. | 2506.14763 | null |
2025-06-16 | Disentangling 3D from Large Vision-Language Models for Controlled Portrait Generation | Nick Yiwen Huang et.al. | 2506.14015 | null |
2025-06-16 | Dive3D: Diverse Distillation-based Text-to-3D Generation via Score Implicit Matching | Weimin Bai et.al. | 2506.13594 | null |
2025-06-11 | DreamCS: Geometry-Aware Text-to-3D Generation with Unpaired 3D Reward Supervision | Xiandong Zou et.al. | 2506.09814 | null |
2025-06-10 | Orientation Matters: Making 3D Generative Models Orientation-Aligned | Yichong Lu et.al. | 2506.08640 | null |
2025-06-09 | Squeeze3D: Your 3D Generation Model is Secretly an Extreme Neural Compressor | Rishit Dagli et.al. | 2506.07932 | null |
2025-06-09 | R3D2: Realistic 3D Asset Insertion via Diffusion for Autonomous Driving Simulation | William Ljungbergh et.al. | 2506.07826 | null |
2025-06-09 | NOVA3D: Normal Aligned Video Diffusion Model for Single Image to 3D Generation | Yuxiao Yang et.al. | 2506.07698 | null |
2025-06-05 | PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers | Yuchen Lin et.al. | 2506.05573 | null |
2025-06-02 | ShapeLLM-Omni: A Native Multimodal LLM for 3D Generation and Understanding | Junliang Ye et.al. | 2506.01853 | link |
2025-05-31 | ArtiScene: Language-Driven Artistic 3D Scene Generation Through Image Intermediary | Zeqi Gu et.al. | 2506.00742 | null |
2025-05-30 | LTM3D: Bridging Token Spaces for Conditional 3D Generation with Auto-Regressive Diffusion Framework | Xin Kang et.al. | 2505.24245 | null |
2025-05-29 | Universal Radial Scaling of Large-Scale Black Hole Accretion for Magnetically Arrested And Rocking Accretion Disks | Aretaios Lalakos et.al. | 2505.23888 | null |
2025-05-28 | Advancing high-fidelity 3D and Texture Generation with 2.5D latents | Xin Yang et.al. | 2505.21050 | null |
2025-05-27 | Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction | Yifei Wang et.al. | 2505.20755 | null |
2025-05-30 | ART-DECO: Arbitrary Text Guidance for 3D Detailizer Construction | Qimin Chen et.al. | 2505.20431 | null |
2025-05-26 | Harnessing the Power of Training-Free Techniques in Text-to-2D Generation for Text-to-3D Generation via Score Distillation Sampling | Junhong Lee et.al. | 2505.19868 | null |
2025-05-26 | Global stability for the compressible isentropic magnetohydrodynamic equations in 3D bounded domains with Navier-slip boundary conditions | Yang Liu et.al. | 2505.19749 | null |
2025-05-23 | SeaLion: Semantic Part-Aware Latent Point Diffusion Models for 3D Generation | Dekai Zhu et.al. | 2505.17721 | null |
2025-05-26 | Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention | Shuang Wu et.al. | 2505.17412 | null |
2025-05-22 | MAGIC: Motion-Aware Generative Inference via Confidence-Guided LLM | Siwei Meng et.al. | 2505.16456 | null |
2025-05-21 | Constructing a 3D Town from a Single Image | Kaizhi Zheng et.al. | 2505.15765 | null |
2025-05-20 | Personalize Your Gaussian: Consistent 3D Scene Personalization from a Single Image | Yuxuan Wang et.al. | 2505.14537 | null |
2025-05-21 | Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling | Zhihao Li et.al. | 2505.14521 | null |
2025-05-19 | Touch2Shape: Touch-Conditioned 3D Diffusion for Shape Exploration and Reconstruction | Yuanbo Wang et.al. | 2505.13091 | null |
2025-05-15 | Pharmacophore-Conditioned Diffusion Model for Ligand-Based De Novo Drug Design | Amira Alakhdar et.al. | 2505.10545 | null |
2025-05-13 | Long timescale numerical simulations of large, super-critical accretion discs | P. Chris Fragile et.al. | 2505.08859 | null |
2025-05-12 | Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets | Weiyu Li et.al. | 2505.07747 | null |
2025-05-11 | CMD: Controllable Multiview Diffusion for 3D Editing and Progressive Generation | Peng Li et.al. | 2505.07003 | null |
2025-05-07 | Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation | Yiming Qin et.al. | 2505.05505 | link |
2025-05-07 | Score Distillation Sampling for Audio: Source Separation, Synthesis, and Beyond | Jessie Richter-Powell et.al. | 2505.04621 | null |
2025-05-07 | Bridging Geometry-Coherent Text-to-3D Generation with Multi-View Diffusion Priors and Gaussian Splatting | Feng Yang et.al. | 2505.04262 | null |
2025-05-07 | S3D: Sketch-Driven 3D Model Generation | Hail Song et.al. | 2505.04185 | link |
2025-05-06 | Effects of transient stellar emissions on planetary climates of tidally-locked exo-earths | Howard Chen et.al. | 2505.03723 | null |
2025-05-03 | Rethinking Score Distilling Sampling for 3D Editing and Generation | Xingyu Miao et.al. | 2505.01888 | null |
2025-04-30 | 3D Stylization via Large Reconstruction Model | Ipek Oztas et.al. | 2504.21836 | null |
2025-04-29 | A 3D pocket-aware and affinity-guided diffusion model for lead optimization | Anjie Qiao et.al. | 2504.21065 | null |
2025-04-28 | CoherenDream: Boosting Holistic Text Coherence in 3D Generation via Multimodal Large Language Models Feedback | Chenhan Jiang et.al. | 2504.19860 | null |
2025-04-27 | Making Physical Objects with Generative AI and Robotic Assembly: Considering Fabrication Constraints, Sustainability, Time, Functionality, and Accessibility | Alexander Htet Kyaw et.al. | 2504.19131 | null |
2025-04-25 | Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation | Shivam Duggal et.al. | 2504.18509 | null |
2025-04-24 | DiMeR: Disentangled Mesh Reconstruction Model | Lutao Jiang et.al. | 2504.17670 | link |
2025-04-23 | Global stability for compressible isentropic Navier-Stokes equations in 3D bounded domains with Navier-slip boundary conditions | Yang Liu et.al. | 2504.17136 | null |
2025-04-22 | Text-based Animatable 3D Avatars with Morphable Model Alignment | Yiqian Wu et.al. | 2504.15835 | link |
2025-04-21 | Cyc3D: Fine-grained Controllable 3D Generation via Cycle Consistency Regularization | Hongbin Xu et.al. | 2504.14975 | null |
2025-04-17 | HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation | Wenqi Dong et.al. | 2504.13072 | null |
2025-04-17 | RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins | Yao Mu et.al. | 2504.13059 | null |
2025-04-17 | SOPHY: Generating Simulation-Ready Objects with Physical Materials | Junyi Cao et.al. | 2504.12684 | null |
2025-04-16 | Recent Advance in 3D Object and Scene Generation: A Survey | Xiang Tang et.al. | 2504.11734 | null |
2025-04-15 | 3D full-GR simulations of magnetorotational core-collapse supernovae on GPUs: A systematic study of rotation rates and magnetic fields | Swapnil Shankar et.al. | 2504.11537 | null |
2025-04-14 | Art3D: Training-Free 3D Generation from Flat-Colored Illustration | Xiaoyan Cong et.al. | 2504.10466 | null |
2025-04-14 | ESCT3D: Efficient and Selectively Controllable Text-Driven 3D Content Generation with Gaussian Splatting | Huiqi Wu et.al. | 2504.10316 | null |
2025-04-16 | GaussVideoDreamer: 3D Scene Generation with Video Diffusion and Inconsistency-Aware Gaussian Splatting | Junlin Hao et.al. | 2504.10001 | null |
2025-04-11 | GeoTexBuild: 3D Building Model Generation from Map Footprints | Ruizhe Wang et.al. | 2504.08419 | null |
2025-04-11 | Generative AI for Film Creation: A Survey of Recent Advances | Ruihan Zhang et.al. | 2504.08296 | null |
2025-04-10 | Gen3DEval: Using vLLMs for Automatic Evaluation of Generated 3D Objects | Shalini Maiti et.al. | 2504.08125 | null |
2025-04-10 | ContrastiveGaussian: High-Fidelity 3D Generation with Contrastive Learning and Gaussian Splatting | Junbang Liu et.al. | 2504.08100 | link |
2025-04-11 | Objaverse++: Curated 3D Object Dataset with Quality Annotations | Chendi Lin et.al. | 2504.07334 | link |
2025-04-10 | Stochastic Ray Tracing of 3D Transparent Gaussians | Xin Sun et.al. | 2504.06598 | null |
2025-04-05 | Video4DGen: Enhancing Video and 4D Generation through Mutual Optimization | Yikai Wang et.al. | 2504.04153 | link |
2025-04-04 | D-Garment: Physics-Conditioned Latent Diffusion for Dynamic Garment Deformations | Antoine Dumoulin et.al. | 2504.03468 | null |
2025-04-03 | Efficient Autoregressive Shape Generation via Octree-Based Adaptive Tokenization | Kangle Deng et.al. | 2504.02817 | null |
2025-04-03 | ConsDreamer: Advancing Multi-View Consistency for Zero-Shot Text-to-3D Generation | Yuan Zhou et.al. | 2504.02316 | link |
2025-04-03 | WonderTurbo: Generating Interactive 3D World in 0.72 Seconds | Chaojun Ni et.al. | 2504.02261 | null |
2025-04-02 | WorldPrompter: Traversable Text-to-Scene Generation | Zhaoyang Zhang et.al. | 2504.02045 | null |
2025-04-02 | 3DBonsai: Structure-Aware Bonsai Modeling Using Conditioned 3D Gaussian Splatting | Hao Wu et.al. | 2504.01619 | null |
2025-04-02 | High-fidelity 3D Object Generation from Single Image with RGBN-Volume Gaussian Reconstruction Model | Yiyang Shen et.al. | 2504.01512 | null |
2025-04-03 | Distilling Multi-view Diffusion Models into 3D Generators | Hao Qin et.al. | 2504.00457 | null |
2025-03-31 | Pre-training with 3D Synthetic Data: Learning 3D Point Cloud Instance Segmentation from 3D Synthetic Scenes | Daichi Otsuka et.al. | 2503.24229 | null |
2025-03-28 | DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness | Ruining Li et.al. | 2503.22677 | null |
2025-03-28 | Clouds and Hazes in GJ 1214b’s Metal-Rich Atmosphere | Isaac Malsky et.al. | 2503.22608 | null |
2025-03-28 | CoGen: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving | Yishen Ji et.al. | 2503.22231 | null |
2025-03-27 | 3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models | Yuhan Zhang et.al. | 2503.21745 | null |
2025-03-27 | Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data | Zhiyuan Ma et.al. | 2503.21694 | link |
2025-03-29 | GenFusion: Closing the Loop between Reconstruction and Generation via Videos | Sibo Wu et.al. | 2503.21219 | null |
2025-03-26 | FB-4D: Spatial-Temporal Coherent Dynamic 3D Content Generation with Feature Banks | Jinwei Li et.al. | 2503.20784 | link |
2025-03-27 | MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Generation | Jinnan Chen et.al. | 2503.20519 | null |
2025-03-24 | MuMA: 3D PBR Texturing via Multi-Channel Multi-View Generation and Agentic Post-Processing | Lingting Zhu et.al. | 2503.18461 | null |
2025-03-23 | Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook | Xu Zheng et.al. | 2503.18016 | null |
2025-03-20 | SynCity: Training-Free Generation of 3D Worlds | Paul Engstler et.al. | 2503.16420 | null |
2025-03-26 | Unleashing Vecset Diffusion Model for Fast Shape Generation | Zeqiang Lai et.al. | 2503.16302 | link |
2025-03-21 | Uni-3DAR: Unified 3D Generation and Understanding via Autoregression on Compressed Spatial Tokens | Shuqi Lu et.al. | 2503.16278 | link |
2025-03-20 | Repurposing 2D Diffusion Models with Gaussian Atlas for 3D Generation | Tiange Xiang et.al. | 2503.15877 | null |
2025-03-19 | Shap-MeD | Nicolás Laverde et.al. | 2503.15562 | null |
2025-03-18 | MeshFleet: Filtered and Annotated 3D Vehicle Dataset for Domain Specific Generative Modeling | Damian Boborzi et.al. | 2503.14002 | link |
2025-03-17 | Amodal3R: Amodal 3D Reconstruction from Occluded 2D Images | Tianhao Wu et.al. | 2503.13439 | null |
2025-03-16 | VRsketch2Gaussian: 3D VR Sketch Guided 3D Object Generation with Gaussian Splatting | Songen Gu et.al. | 2503.12383 | null |
2025-03-15 | DecompDreamer: Advancing Structured 3D Asset Generation with Multi-Object Decomposition and Gaussian Splatting | Utkarsh Nath et.al. | 2503.11981 | null |
2025-03-14 | PBR3DGen: A VLM-guided Mesh Generation with High-quality PBR Texture | Xiaokang Wei et.al. | 2503.11368 | null |
2025-03-08 | Text-to-3D Generation using Jensen-Shannon Score Distillation | Khoi Do et.al. | 2503.10660 | null |
2025-03-13 | Hyper3D: Efficient 3D Representation via Hybrid Triplane and Octree Feature for Enhanced 3D Shape Variational Auto-Encoders | Jingyu Guo et.al. | 2503.10403 | null |
2025-03-13 | RewardSDS: Aligning Score Distillation via Reward-Weighted Sampling | Itay Chachy et.al. | 2503.09601 | link |
2025-03-11 | MEAT: Multiview Diffusion Model for Human Generation on Megapixels with Mesh Attention | Yuhan Wang et.al. | 2503.08664 | link |
2025-03-12 | CDI3D: Cross-guided Dense-view Interpolation for 3D Reconstruction | Zhiyuan Wu et.al. | 2503.08005 | null |
2025-03-10 | DirectTriGS: Triplane-based Gaussian Splatting Field Representation for 3D Generation | Xiaoliang Ju et.al. | 2503.06900 | null |
2025-03-09 | A Mesh Is Worth 512 Numbers: Spectral-domain Diffusion Modeling for High-dimension Shape Generation | Jiajie Fan et.al. | 2503.06485 | null |
2025-03-08 | GSV3D: Gaussian Splatting-based Geometric Distillation with Stable Video Diffusion for Single-Image 3D Object Generation | Ye Tao et.al. | 2503.06136 | null |
2025-03-07 | Decay of solutions of nonlinear Dirac equations | Sebastian Herr et.al. | 2503.05410 | null |
2025-03-06 | Simulating the Real World: A Unified Survey of Multimodal Generative Models | Yuqi Hu et.al. | 2503.04641 | link |
2025-03-03 | On the behavior of the Generalized Alignment Index (GALI) method for dissipative systems | Henok Tenaw Moges et.al. | 2503.01784 | null |
2025-03-03 | The Interplay between Dust Dynamics and Turbulence Induced by the Vertical Shear Instability | Pinghui Huang et.al. | 2503.01656 | null |
2025-03-03 | Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation | Jiantao Lin et.al. | 2503.01370 | link |
2025-03-02 | DreamPrinting: Volumetric Printing Primitives for High-Fidelity 3D Printing | Youjia Wang et.al. | 2503.00887 | null |
2025-03-01 | GenVDM: Generating Vector Displacement Maps From a Single Image | Yuezhi Yang et.al. | 2503.00605 | null |
2025-02-28 | CADDreamer: CAD object Generation from Single-view Images | Yuan Li et.al. | 2502.20732 | null |
2025-02-27 | Text2VDM: Text to Vector Displacement Maps for Expressive and Interactive 3D Sculpting | Hengyu Meng et.al. | 2502.20045 | null |
2025-02-27 | GenPC: Zero-shot Point Cloud Completion via 3D Generative Priors | An Li et.al. | 2502.19896 | null |
2025-02-24 | Evidence for Low Universal Equilibrium Black Hole Spin in Luminous Magnetically Arrested Disks | Beverly Lowell et.al. | 2502.17559 | null |
2025-02-24 | RELICT: A Replica Detection Framework for Medical Image Generation | Orhun Utku Aydin et.al. | 2502.17360 | link |
2025-02-25 | Evolution 6.0: Evolving Robotic Capabilities Through Generative Design | Muhammad Haris Khan et.al. | 2502.17034 | null |
2025-02-23 | Dragen3D: Multiview Geometry Consistent 3D Gaussian Generation with Drag-Based Control | Jinbo Yan et.al. | 2502.16475 | null |
2025-02-21 | Generative AI Framework for 3D Object Generation in Augmented Reality | Majid Behravan et.al. | 2502.15869 | null |
2025-02-28 | WorldCraft: Photo-Realistic 3D World Creation and Customization via LLM Agents | Xinhang Liu et.al. | 2502.15601 | null |
2025-02-20 | Hier-SLAM++: Neuro-Symbolic Semantic SLAM with a Hierarchically Categorical Gaussian Splatting | Boying Li et.al. | 2502.14931 | null |
2025-02-18 | CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image | Kaixin Yao et.al. | 2502.12894 | null |
2025-02-18 | RecDreamer: Consistent Text-to-3D Generation via Uniform Score Distillation | Chenxi Zheng et.al. | 2502.12640 | null |
2025-02-18 | NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation | Zhiyuan Liu et.al. | 2502.12638 | link |
2025-02-18 | Not-So-Optimal Transport Flows for 3D Point Cloud Generation | Ka-Hei Hui et.al. | 2502.12456 | null |
2025-02-17 | A new convection scheme for GCMs of temperate sub-Neptunes | Edouard F. L. Barrier et.al. | 2502.12234 | null |
2025-02-17 | GaussianMotion: End-to-End Learning of Animatable Gaussian Avatars with Pose Guidance from Text | Gyumin Shim et.al. | 2502.11642 | null |
2025-02-13 | X-SG $^2$ S: Safe and Generalizable Gaussian Splatting with X-dimensional Watermarks | Zihang Cheng et.al. | 2502.10475 | null |
2025-02-13 | Latent Radiance Fields with 3D-aware 2D Representations | Chaoyi Zhou et.al. | 2502.09613 | null |
2025-02-17 | ConsistentDreamer: View-Consistent Meshes Through Balanced Multi-View Gaussian Optimization | Onat Şahin et.al. | 2502.09278 | null |
2025-02-10 | Grounding Creativity in Physics: A Brief Survey of Physical Priors in AIGC | Siwei Meng et.al. | 2502.07007 | null |
2025-02-10 | Transfer Your Perspective: Controllable 3D Generation from Any Viewpoint in a Driving Scene | Tai-Yu Pan et.al. | 2502.06682 | null |
2025-02-10 | TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models | Yangguang Li et.al. | 2502.06608 | link |
2025-02-10 | Relativistic Gas Accretion onto Supermassive Black Hole Binaries from Inspiral through Merger | Lorenzo Ennoggi et.al. | 2502.06389 | null |
2025-02-05 | DreamDPO: Aligning Text-to-3D Generation with Human Preferences via Direct Preference Optimization | Zhenglin Zhou et.al. | 2502.04370 | null |
2025-02-04 | ShapeShifter: 3D Variations Using Multiscale and Sparse Point-Voxel Diffusion | Nissim Maruani et.al. | 2502.02187 | null |
2025-01-31 | TRAPPIST-1 d: Exo-Venus, Exo-Earth or Exo-Dead? | M. J. Way et.al. | 2502.00132 | null |
2025-01-29 | Towards Training-Free Open-World Classification with 3D Generative Models | Xinzhe Xia et.al. | 2501.17547 | null |
2025-01-28 | CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation | Nikolai Kalischek et.al. | 2501.17162 | null |
2025-01-28 | DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation | Chenguo Lin et.al. | 2501.16764 | null |
2025-01-27 | BAG: Body-Aligned 3D Wearable Asset Generation | Zhongjin Luo et.al. | 2501.16177 | null |
2025-01-26 | Comparative clinical evaluation of “memory-efficient” synthetic 3d generative adversarial networks (gan) head-to-head to state of art: results on computed tomography of the chest | Mahshid shiri et.al. | 2501.15572 | null |
2025-01-22 | InsTex: Indoor Scenes Stylized Texture Synthesis | Yunfan Zhang et.al. | 2501.13969 | null |
2025-01-22 | Orchid: Image Latent Diffusion for Joint Appearance and Geometry Generation | Akshay Krishnan et.al. | 2501.13087 | null |
2025-01-17 | Mitigating Hallucinations on Object Attributes using Multiview Images and Negative Instructions | Zhijie Tan et.al. | 2501.10011 | null |
2025-01-16 | CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation | Hwan Heo et.al. | 2501.09433 | link |
2025-01-13 | UnCommon Objects in 3D | Xingchen Liu et.al. | 2501.07574 | link |
2025-01-12 | Synthetic Prior for Few-Shot Drivable Head Avatar Inversion | Wojciech Zielonka et.al. | 2501.06903 | null |
2025-01-09 | Consistent Flow Distillation for Text-to-3D Generation | Runjie Yan et.al. | 2501.05445 | null |
2025-01-09 | Zero-1-to-G: Taming Pretrained 2D Diffusion Model for Direct 3D Generation | Xuyi Meng et.al. | 2501.05427 | null |
2025-01-07 | Chirpy3D: Continuous Part Latents for Creative 3D Bird Generation | Kam Woh Ng et.al. | 2501.04144 | link |
2025-01-04 | Taming Feed-forward Reconstruction Models as Latent Encoders for 3D Generative Models | Suttisak Wizadwongsa et.al. | 2501.00651 | null |
2024-12-30 | PERSE: Personalized 3D Generative Avatars from A Single Portrait | Hyunsoo Cha et.al. | 2412.21206 | null |
2025-01-02 | Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation | Yuanbo Yang et.al. | 2412.21117 | null |
2024-12-29 | Toward Scene Graph and Layout Guided Complex 3D Scene Generation | Yu-Hsiang Huang et.al. | 2412.20473 | null |
2024-12-26 | Habitability in 4-D: Predicting the Climates of Earth Analogs across Rotation and Orbital Configurations | Arthur D. Adams et.al. | 2412.19357 | link |
2024-12-29 | PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models | Minghao Chen et.al. | 2412.18608 | null |
2024-12-23 | ArchComplete: Autoregressive 3D Architectural Design Generation with Hierarchical Diffusion-Based Upsampling | S. Rasoulzadeh et.al. | 2412.17957 | link |
2024-12-21 | GANFusion: Feed-Forward Text-to-3D with Diffusion in GAN Space | Souhaib Attaiki et.al. | 2412.16717 | null |
2024-12-18 | AdvIRL: Reinforcement Learning-Based Adversarial Attacks on 3D NeRF Models | Tommy Nguyen et.al. | 2412.16213 | link |
2024-12-20 | GCA-3D: Towards Generalized and Consistent Domain Adaptation of 3D Generators | Hengjia Li et.al. | 2412.15491 | null |
2024-12-18 | DreaMark: Rooting Watermark in Score Distillation Sampling Generated Neural Radiance Fields | Xingyu Zhu et.al. | 2412.15278 | null |
2024-12-19 | DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation | Wang Zhao et.al. | 2412.15200 | null |
2024-12-19 | LiftRefine: Progressively Refined View Synthesis from 3D Lifting with Volume-Triplane Representations | Tung Do et.al. | 2412.14464 | null |
2024-12-18 | GraphicsDreamer: Image to 3D Generation with Physical Consistency | Pei Chen et.al. | 2412.14214 | null |
2024-12-15 | Benchmarking and Learning Multi-Dimensional Quality Evaluator for Text-to-3D Generation | Yujie Zhang et.al. | 2412.11170 | null |
2024-12-17 | Virtual Trial Room with Computer Vision and Machine Learning | Tulashi Prasad Joshi et.al. | 2412.10710 | null |
2024-12-13 | GT23D-Bench: A Comprehensive General Text-to-3D Generation Benchmark | Sitong Su et.al. | 2412.09997 | null |
2024-12-11 | DSplats: 3D Generation by Denoising Splats-Based Multiview Diffusion Models | Kevin Miao et.al. | 2412.09648 | null |
2024-12-19 | SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing | Xueting Li et.al. | 2412.09545 | null |
2024-12-09 | Tactile DreamFusion: Exploiting Tactile Sensing for 3D Generation | Ruihan Gao et.al. | 2412.06785 | link |
2024-12-09 | Diverse Score Distillation | Yanbo Xu et.al. | 2412.06780 | null |
2024-12-14 | You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale | Baorui Ma et.al. | 2412.06699 | link |
2024-12-09 | Gen-3Diffusion: Realistic Image-to-3D Generation via 2D & 3D Diffusion Synergy | Yuxuan Xue et.al. | 2412.06698 | null |
2024-12-08 | Enhanced 3D Generation by 2D Editing | Haoran Li et.al. | 2412.05929 | null |
2024-12-07 | Text-to-3D Gaussian Splatting with Physics-Grounded Motion Generation | Wenqing Wang et.al. | 2412.05560 | null |
2024-12-06 | DNF: Unconditional 4D Generation with Dictionary-based Neural Fields | Xinyi Zhang et.al. | 2412.05161 | null |
2024-12-05 | PaintScene4D: Consistent 4D Scene Generation from Text Prompts | Vinayak Gupta et.al. | 2412.04471 | null |
2024-12-05 | Turbo3D: Ultra-fast Text-to-3D Generation | Hanzhe Hu et.al. | 2412.04470 | null |
2024-12-05 | InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models | Yifan Lu et.al. | 2412.03934 | null |
2024-12-04 | MV-Adapter: Multi-view Consistent Image Generation Made Easy | Zehuan Huang et.al. | 2412.03632 | null |
2024-12-04 | MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation | Zehuan Huang et.al. | 2412.03558 | null |
2024-12-04 | CLAS: A Machine Learning Enhanced Framework for Exploring Large 3D Design Datasets | XiuYu Zhang et.al. | 2412.02996 | null |
2024-12-03 | Sharp-It: A Multi-view to Multi-view Diffusion Model for 3D Synthesis and Manipulation | Yiftach Edelstein et.al. | 2412.02631 | null |
2024-12-03 | Continual Learning of Personalized Generative Face Models with Experience Replay | Annie N. Wang et.al. | 2412.02627 | null |
2024-12-03 | HumanRig: Learning Automatic Rigging for Humanoid Character in a Large Scale Dataset | Zedong Chu et.al. | 2412.02317 | link |
2024-12-03 | Viewpoint Consistency in 3D Generation via Attention and CLIP Guidance | Qing Zhang et.al. | 2412.02287 | null |
2024-12-03 | 3D representation in 512-Byte:Variational tokenizer is the key for autoregressive 3D generation | Jinzhi Zhang et.al. | 2412.02202 | null |
2024-12-03 | CLERF: Contrastive LEaRning for Full Range Head Pose Estimation | Ting-Ruen Wei et.al. | 2412.02066 | null |
2024-12-02 | World-consistent Video Diffusion with Explicit 3D Modeling | Qihang Zhang et.al. | 2412.01821 | null |
2024-12-02 | Structured 3D Latents for Scalable and Versatile 3D Generation | Jianfeng Xiang et.al. | 2412.01506 | link |
2024-11-30 | Instant3dit: Multiview Inpainting for Fast Editing of 3D Objects | Amir Barda et.al. | 2412.00518 | null |
2024-11-28 | 3D-WAG: Hierarchical Wavelet-Guided Autoregressive Generation for High-Fidelity 3D Shapes | Tejaswini Medi et.al. | 2411.19037 | null |
2024-11-28 | RIGI: Rectifying Image-to-3D Generation Inconsistency via Uncertainty-aware Learning | Jiacheng Wang et.al. | 2411.18866 | null |
2024-11-27 | G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation | Tianxing Chen et.al. | 2411.18369 | null |
2024-11-27 | ModeDreamer: Mode Guiding Score Distillation for Text-to-3D Generation using Reference Image Prompts | Uy Dieu Tran et.al. | 2411.18135 | null |
2024-11-26 | Symmetry Strikes Back: From Single-Image Symmetry Detection to 3D Generation | Xiang Li et.al. | 2411.17763 | null |
2024-11-27 | SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE | Yongwei Chen et.al. | 2411.16856 | null |
2024-11-27 | DetailGen3D: Generative 3D Geometry Enhancement via Data-Dependent Flow | Ken Deng et.al. | 2411.16820 | null |
2024-11-25 | SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis | Hyojun Go et.al. | 2411.16443 | link |
2024-11-24 | Fixing the Perspective: A Critical Examination of Zero-1-to-3 | Jack Yu et.al. | 2411.15706 | null |
2024-11-26 | Efficient Long Video Tokenization via Coordinate-based Patch Reconstruction | Huiwon Jang et.al. | 2411.14762 | null |
2024-11-22 | Any-to-3D Generation via Hybrid Diffusion Supervision | Yijun Fan et.al. | 2411.14715 | null |
2024-11-26 | Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation | Yuanhao Cai et.al. | 2411.14384 | null |
2024-11-19 | Automated 3D Physical Simulation of Open-world Scene with Gaussian Splatting | Haoyu Zhao et.al. | 2411.12789 | null |
2024-11-21 | FruitNinja: 3D Object Interior Texture Generation with Gaussian Splatting | Fangyu Wu et.al. | 2411.12089 | null |
2024-11-18 | sMoRe: Enhancing Object Manipulation and Organization in Mixed Reality Spaces with LLMs and Generative AI | Yunhao Xing et.al. | 2411.11752 | null |
2024-11-18 | MVLight: Relightable Text-to-3D Generation via Light-conditioned Multi-View Diffusion | Dongseok Shim et.al. | 2411.11475 | null |
2024-11-18 | Thickness-dependent Topological Phases and Flat Bands in Rhombohedral Multilayer Graphene | H. B. Xiao et.al. | 2411.11359 | null |
2024-11-17 | Direct and Explicit 3D Generation from a Single Image | Haoyu Wu et.al. | 2411.10947 | null |
2024-11-16 | ARM: Appearance Reconstruction Model for Relightable 3D Generation | Xiang Feng et.al. | 2411.10825 | null |
2024-11-14 | LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models | Zhengyi Wang et.al. | 2411.09595 | null |
2024-11-16 | A Survey on Vision Autoregressive Model | Kai Jiang et.al. | 2411.08666 | null |
2024-11-12 | GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation | Yushi Lan et.al. | 2411.08033 | null |
2024-11-12 | Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet Encodings | Aditya Sanghi et.al. | 2411.08017 | link |
2024-11-16 | SAMPart3D: Segment Any Part in 3D Objects | Yunhan Yang et.al. | 2411.07184 | link |
2024-11-09 | AI-Driven Stylization of 3D Environments | Yuanbo Chen et.al. | 2411.06067 | null |
2024-11-08 | Autoregressive Models in Vision: A Survey | Jing Xiong et.al. | 2411.05902 | link |
2024-11-07 | DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion | Wenqiang Sun et.al. | 2411.04928 | null |
2024-11-05 | Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation | Xianghui Yang et.al. | 2411.02293 | null |
2024-11-03 | DreamPolish: Domain Score Distillation With Progressive Geometry Generation | Yean Cheng et.al. | 2411.01602 | null |
2024-10-31 | Manipulating Vehicle 3D Shapes through Latent Space Editing | JiangDong Miao et.al. | 2410.23931 | null |
2024-11-01 | Fast Transients from Magnetic Disks Around Non-Spinning Collapsar Black Holes | Justin Bopp et.al. | 2410.22401 | null |
2024-10-16 | TV-3DG: Mastering Text-to-3D Customized Generation with Visual Prompt | Jiahui Yang et.al. | 2410.21299 | null |
2024-10-28 | CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians | Chongjian Ge et.al. | 2410.20723 | null |
2024-10-30 | DiffGS: Functional Gaussian Splatting Diffusion | Junsheng Zhou et.al. | 2410.19657 | null |
2024-10-24 | 3D-Adapter: Geometry-Consistent Multi-View Diffusion for High-Quality 3D Generation | Hansheng Chen et.al. | 2410.18974 | link |
2024-10-23 | GenUDC: High Quality 3D Mesh Generation with Unsigned Dual Contouring Representation | Ruowei Wang et.al. | 2410.17802 | link |
2024-10-23 | Under the magnifying glass: A combined 3D model applied to cloudy warm Saturn type exoplanets around M-dwarfs | Sven Kiefer et.al. | 2410.17716 | null |
2024-10-21 | MvDrag3D: Drag-based Creative 3D Editing via Multi-view Generation-Reconstruction Priors | Honghua Chen et.al. | 2410.16272 | null |
2024-10-22 | LucidFusion: Generating 3D Gaussians with Arbitrary Unposed Images | Hao He et.al. | 2410.15636 | null |
2024-10-20 | Layout-your-3D: Controllable and Precise 3D Generation with 2D Blueprint | Junwei Zhou et.al. | 2410.15391 | null |
2024-10-16 | DreamCraft3D++: Efficient Hierarchical 3D Generation with Multi-Plane Reconstruction Model | Jingxiang Sun et.al. | 2410.12928 | null |
2024-10-15 | Robotic Arm Platform for Multi-View Image Acquisition and 3D Reconstruction in Minimally Invasive Surgery | Alexander Saikia et.al. | 2410.11703 | null |
2024-10-15 | Evolutionary Retrofitting | Mathurin Videau et.al. | 2410.11330 | null |
2024-10-13 | GALA: Geometry-Aware Local Adaptive Grids for Detailed 3D Generation | Dingdong Yang et.al. | 2410.10037 | null |
2024-10-12 | ControLRM: Fast and Controllable 3D Generation via Large Reconstruction Model | Hongbin Xu et.al. | 2410.09592 | null |
2024-10-12 | Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors | Hritam Basak et.al. | 2410.09467 | null |
2024-10-11 | SceneCraft: Layout-Guided 3D Scene Generation | Xiuyu Yang et.al. | 2410.09049 | link |
2024-10-11 | Semantic Score Distillation Sampling for Compositional Text-to-3D Generation | Ling Yang et.al. | 2410.09009 | link |
2024-10-11 | One-shot Generative Domain Adaptation in 3D GANs | Ziqiang Li et.al. | 2410.08824 | link |
2024-10-10 | RGM: Reconstructing High-fidelity 3D Car Assets with Relightable 3D-GS Generative Model from a Single Image | Xiaoxue Chen et.al. | 2410.08181 | null |
2024-10-10 | SeMv-3D: Towards Semantic and Mutil-view Consistency simultaneously for General Text-to-3D Generation with Triplane Priors | Xiao Cai et.al. | 2410.07658 | null |
2024-10-09 | DreamMesh4D: Video-to-4D Generation with Sparse-Controlled Gaussian-Mesh Hybrid Representation | Zhiqi Li et.al. | 2410.06756 | null |
2024-10-02 | OCC-MLLM-Alpha:Empowering Multi-modal Large Language Model for the Understanding of Occluded Objects with Self-Supervised Test-Time Learning | Shuxin Yang et.al. | 2410.01861 | null |
2024-10-02 | Towards Native Generative Model for 3D Head Avatar | Yiyu Zhuang et.al. | 2410.01226 | null |
2024-10-01 | Extreme scale height variations and nozzle shocks in warped disks | Nicholas Kaaz et.al. | 2410.00961 | null |
2024-10-02 | Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation | Junlin Han et.al. | 2410.00890 | null |
2024-09-29 | Global well-posedness of the fractional dissipative system in the framework of variable Fourier–Besov spaces | Gastón Vergara-Hermosilla et.al. | 2410.00060 | null |
2024-09-30 | Dual Encoder GAN Inversion for High-Fidelity 3D Head Reconstruction from Single Images | Bahri Batuhan Bilecen et.al. | 2409.20530 | null |
2024-09-27 | Speech to Reality: On-Demand Production using Natural Language, 3D Generative AI, and Discrete Robotic Assembly | Alexander Htet Kyaw et.al. | 2409.18390 | null |
2024-09-26 | Long-lived neutron-star remnants from asymmetric binary neutron star mergers: element formation, kilonova signals and gravitational waves | Sebastiano Bernuzzi et.al. | 2409.18185 | null |
2024-09-25 | Disco4D: Disentangled 4D Human Generation and Animation from a Single Image | Hui En Pang et.al. | 2409.17280 | null |
2024-09-19 | 3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion | Zhaoxi Chen et.al. | 2409.12957 | link |
2024-09-18 | Vista3D: Unravel the 3D Darkside of a Single Image | Qiuhong Shen et.al. | 2409.12193 | link |
2024-09-17 | Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion | Zhenwei Wang et.al. | 2409.11406 | null |
2024-09-16 | The Spin Zone: Synchronously and Asynchronously Rotating Exoplanets Have Spectral Differences in Transmission | Nicholas Scarsdale et.al. | 2409.10752 | null |
2024-09-11 | DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation | Haibo Yang et.al. | 2409.07454 | null |
2024-09-11 | Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models | Haibo Yang et.al. | 2409.07452 | link |
2024-09-11 | Some effects of limited wall-sensor availability on flow estimation with 3D-GANs | Antonio Cuéllar et.al. | 2409.07348 | null |
2024-09-11 | Detectability Simulations of a NIR Surface Biosignature on Proxima Centauri b with Future Space Observatories | Connor O. Metz et.al. | 2409.07289 | null |
2024-09-12 | 3DGCQA: A Quality Assessment Database for 3D AI-Generated Contents | Yingjie Zhou et.al. | 2409.07236 | link |
2024-09-10 | G3PT: Unleash the power of Autoregressive Modeling in 3D Generation via Cross-scale Querying Transformer | Jinzhi Zhang et.al. | 2409.06322 | null |
2024-09-19 | DreamMapping: High-Fidelity Text-to-3D Generation via Variational Distribution Mapping | Zeyu Cai et.al. | 2409.05099 | null |
2024-09-04 | Human-VDM: Learning Single-Image 3D Human Gaussian Splatting from Video Diffusion Models | Zhibin Liu et.al. | 2409.02851 | link |
2024-09-03 | ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis | Wangbo Yu et.al. | 2409.02048 | null |
2024-08-27 | OctFusion: Octree-based Diffusion Models for 3D Shape Generation | Bojun Xiong et.al. | 2408.14732 | link |
2024-08-28 | PhysPart: Physically Plausible Part Completion for Interactable Objects | Rundong Luo et.al. | 2408.13724 | null |
2024-08-26 | Focus on Neighbors and Know the Whole: Towards Consistent Dense Multiview Text-to-Image Generator for 3D Creation | Bonan Li et.al. | 2408.13149 | null |
2024-08-23 | Atlas Gaussians Diffusion for 3D Generation with Infinite Number of Points | Haitao Yang et.al. | 2408.13055 | null |
2024-08-22 | Multimodal Foundational Models for Unsupervised 3D General Obstacle Detection | Tamás Matuszka et.al. | 2408.12322 | null |
2024-08-27 | Pano2Room: Novel View Synthesis from a Single Indoor Panorama | Guo Pu et.al. | 2408.11413 | link |
2024-08-20 | Large Point-to-Gaussian Model for Image-to-3D Generation | Longfei Lu et.al. | 2408.10935 | null |
2024-08-19 | SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views | Chao Xu et.al. | 2408.10195 | null |
2024-08-15 | Single-image coherent reconstruction of objects and humans | Sarthak Batra et.al. | 2408.08086 | null |
2024-08-15 | MVInpainter: Learning Multi-View Consistent Inpainting to Bridge 2D and 3D Editing | Chenjie Cao et.al. | 2408.08000 | null |
2024-08-12 | Efficient and Scalable Point Cloud Generation with Sparse Point-Voxel Diffusion Models | Ioannis Romanelis et.al. | 2408.06145 | link |
2024-08-12 | Deep Geometric Moments Promote Shape Consistency in Text-to-3D Generation | Utkarsh Nath et.al. | 2408.05938 | null |
2024-08-09 | DreamCouple: Exploring High Quality Text-to-3D Generation Via Rectified Flow | Hangyu Li et.al. | 2408.05008 | null |
2024-08-06 | An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion | Xingguang Yan et.al. | 2408.03178 | null |
2024-08-09 | DreamLCM: Towards High-Quality Text-to-3D Generation via Latent Consistency Model | Yiming Zhong et.al. | 2408.02993 | link |
2024-08-05 | SceneMotifCoder: Example-driven Visual Program Learning for Generating 3D Object Arrangements | Hou In Ivan Tam et.al. | 2408.02211 | null |
2024-08-02 | A General Framework to Boost 3D GS Initialization for Text-to-3D Generation by Lexical Richness | Lutao Jiang et.al. | 2408.01269 | null |
2024-07-30 | Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering | Yanpeng Zhao et.al. | 2407.20908 | link |
2024-07-28 | Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle | Zhenyu Tang et.al. | 2407.19548 | null |
2024-07-25 | Signatures of Low Mass Black Hole-Neutron Star Mergers | Rahime Matur et.al. | 2407.18045 | null |
2024-07-23 | She’s Got Her Mother’s Hair: End-to-End Collapsar Simulations Unveil the Origin of Black Holes’ Magnetic Field | Ore Gottlieb et.al. | 2407.16745 | null |
2024-07-23 | DreamDissector: Learning Disentangled Text-to-3D Generation from 2D Diffusion Priors | Zizheng Yan et.al. | 2407.16260 | null |
2024-07-19 | HOTS3D: Hyper-Spherical Optimal Transport for Semantic Alignment of Text-to-3D Generation | Zezeng Li et.al. | 2407.14419 | null |
2024-07-19 | PlacidDreamer: Advancing Harmony in Text-to-3D Generation | Shuo Huang et.al. | 2407.13976 | link |
2024-07-20 | Connecting Consistency Distillation to Score Distillation for Text-to-3D Generation | Zongrui Li et.al. | 2407.13584 | link |
2024-07-17 | 4Dynamic: Text-to-4D Generation with Hybrid Priors | Yu-Jie Yuan et.al. | 2407.12684 | null |
2024-07-17 | JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation | Chenhan Jiang et.al. | 2407.12291 | null |
2024-07-16 | Superintegrable families of magnetic monopoles with non-radial potential in curved background | Antonella Marchesiello et.al. | 2407.11709 | null |
2024-07-17 | VividDreamer: Invariant Score Distillation For Hyper-Realistic Text-to-3D Generation | Wenjie Zhuo et.al. | 2407.09822 | null |
2024-07-08 | Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images | Zhangyang Qi et.al. | 2407.06191 | null |
2024-07-08 | On a new 3D generalized Hunter-Saxton equation | Sergei Sakovich et.al. | 2407.05723 | null |
2024-07-05 | Benchmarking structure-based three-dimensional molecular generative models using GenBench3D: ligand conformation quality matters | Benoit Baillif et.al. | 2407.04424 | link |
2024-07-05 | Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos | Leonhard Sommer et.al. | 2407.04384 | link |
2024-07-03 | NEBULA: Neural Empirical Bayes Under Latent Representations for Efficient and Controllable Design of Molecular Libraries | Ewa M. Nowara et.al. | 2407.03428 | link |
2024-07-02 | Meta 3D AssetGen: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials | Yawar Siddiqui et.al. | 2407.02445 | null |
2024-07-02 | ScaleDreamer: Scalable Text-to-3D Synthesis with Asynchronous Score Distillation | Zhiyuan Ma et.al. | 2407.02040 | link |
2024-07-01 | fVDB: A Deep-Learning Framework for Sparse, Large-Scale, and High-Performance Spatial Intelligence | Francis Williams et.al. | 2407.01781 | null |
2024-07-01 | VolETA: One- and Few-shot Food Volume Estimation | Ahmad AlMughrabi et.al. | 2407.01717 | link |
2024-07-01 | GaussianStego: A Generalizable Stenography Pipeline for Generative 3D Gaussians Splatting | Chenxin Li et.al. | 2407.01301 | null |
2024-06-27 | From Efficient Multimodal Models to World Models: A Survey | Xinji Mai et.al. | 2407.00118 | null |
2024-06-27 | In LIGO’s Sight? Vigorous Coherent Gravitational Waves from Cooled Collapsar Disks | Ore Gottlieb et.al. | 2406.19452 | null |
2024-06-26 | Repeat and Concatenate: 2D to 3D Image Translation with 3D to 3D Generative Modeling | Abril Corona-Figueroa et.al. | 2406.18422 | link |
2024-06-25 | Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text | Xinyang Li et.al. | 2406.17601 | link |
2024-06-25 | Masked Generative Extractor for Synergistic Representation and 3D Generation of Point Clouds | Hongliang Zeng et.al. | 2406.17342 | null |
2024-07-01 | Geometry-Aware Score Distillation via 3D Consistent Noising and Gradient Consistency Modeling | Min-Seop Kwak et.al. | 2406.16695 | null |
2024-06-24 | YouDream: Generating Anatomically Controllable Consistent Text-to-3D Animals | Sandeep Mishra et.al. | 2406.16273 | null |
2024-06-21 | GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation | Chubin Zhang et.al. | 2406.15333 | link |
2024-06-21 | A3D: Does Diffusion Dream about 3D Alignment? | Savva Ignatyev et.al. | 2406.15020 | null |
2024-06-21 | VividDreamer: Towards High-Fidelity and Efficient Text-to-3D Generation | Zixuan Chen et.al. | 2406.14964 | null |
2024-06-14 | OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control | Yuzhong Huang et.al. | 2406.10000 | null |
2024-06-14 | GradeADreamer: Enhanced Text-to-3D Generation Using Gaussian Splatting and Multi-View Diffusion | Trapoom Ukarapol et.al. | 2406.09850 | link |
2024-06-15 | 2.5D Multi-view Averaging Diffusion Model for 3D Medical Image Translation: Application to Low-count PET Reconstruction with CT-less Attenuation Correction | Tianqi Chen et.al. | 2406.08374 | null |
2024-06-12 | Outdoor Scene Extrapolation with Hierarchical Generative Cellular Automata | Dongsu Zhang et.al. | 2406.08292 | null |
2024-06-12 | SynthForge: Synthesizing High-Quality Face Dataset with Controllable 3D Generative Models | Abhay Rawat et.al. | 2406.07840 | null |
2024-06-11 | C3DAG: Controlled 3D Animal Generation using 3D pose guidance | Sandeep Mishra et.al. | 2406.07742 | null |
2024-06-11 | Instant 3D Human Avatar Generation using Image Diffusion Models | Nikos Kolotouros et.al. | 2406.07516 | null |
2024-06-11 | 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models | Heng Yu et.al. | 2406.07472 | null |
2024-06-11 | Efficient 3D Molecular Generation with Flow Matching and Scale Optimal Transport | Ross Irwin et.al. | 2406.07266 | null |
2024-06-10 | PatchRefiner: Leveraging Synthetic Data for Real-Domain High-Resolution Monocular Metric Depth Estimation | Zhenyu Li et.al. | 2406.06679 | null |
2024-06-10 | GaussianCity: Generative Gaussian Splatting for Unbounded 3D City Generation | Haozhe Xie et.al. | 2406.06526 | link |
2024-06-10 | MVGamba: Unify 3D Content Generation as State Space Sequence Modeling | Xuanyu Yi et.al. | 2406.06367 | link |
2024-06-09 | GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement | Peiye Zhuang et.al. | 2406.05649 | null |
2024-06-11 | Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion | Fangfu Liu et.al. | 2406.04338 | null |
2024-06-07 | DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data | Qihao Liu et.al. | 2406.04322 | link |
2024-06-07 | GeoGen: Geometry-Aware Generative Modeling via Signed Distance Functions | Salvatore Esposito et.al. | 2406.04254 | null |
2024-06-05 | Text-to-Image Rectified Flow as Plug-and-Play Priors | Xiaofeng Yang et.al. | 2406.03293 | link |
2024-06-05 | Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion | Hao Wen et.al. | 2406.03184 | link |
2024-06-05 | Adversarial Generation of Hierarchical Gaussians for 3D Generative Model | Sangeek Hyun et.al. | 2406.02968 | link |
2024-06-03 | TAGMol: Target-Aware Gradient-guided Molecule Generation | Vineeth Dorna et.al. | 2406.01650 | link |
2024-06-03 | Tetrahedron Splatting for 3D Generation | Chun Gu et.al. | 2406.01579 | link |
2024-06-04 | Towards Practical Single-shot Motion Synthesis | Konstantinos Roditakis et.al. | 2406.01136 | null |
2024-06-02 | Freeplane: Unlocking Free Lunch in Triplane-Based Sparse-View Reconstruction Models | Wenqiang Sun et.al. | 2406.00750 | null |
2024-06-04 | Lay-A-Scene: Personalized 3D Object Arrangement Using Text-to-Image Priors | Ohad Rahamim et.al. | 2406.00687 | link |
2024-05-31 | Fourier123: One Image to High-Quality 3D Object Generation with Hybrid Fourier Score Distillation | Shuzhou Yang et.al. | 2405.20669 | link |
2024-05-30 | What makes a cosmic filament? The dynamical origin and identity of filaments I. fundamentals in 2D | Job Feldbrugge et.al. | 2405.20475 | null |
2024-05-30 | GECO: Generative Image-to-3D within a SECOnd | Chen Wang et.al. | 2405.20327 | null |
2024-06-05 | PLA4D: Pixel-Level Alignments for Text-to-4D Gaussian Splatting | Qiaowei Miao et.al. | 2405.19957 | link |
2024-05-28 | Atlas3D: Physically Constrained Self-Supporting Text-to-3D for Simulation and Fabrication | Yunuo Chen et.al. | 2405.18515 | null |
2024-05-28 | SubDLe: identification of substructures in cosmological simulations with deep learning | Michela Esposito et.al. | 2405.18257 | null |
2024-05-27 | PivotMesh: Generic 3D Mesh Generation via Pivot Vertices Guidance | Haohan Weng et.al. | 2405.16890 | null |
2024-05-27 | Sync4D: Video Guided Controllable Dynamics for Physics-Based 4D Generation | Zhoujie Fu et.al. | 2405.16849 | null |
2024-05-24 | ExactDreamer: High-Fidelity Text-to-3D Content Creation via Exact Score Matching | Yumin Zhang et.al. | 2405.15914 | link |
2024-05-24 | Score Distillation via Reparametrized DDIM | Artem Lukoianov et.al. | 2405.15891 | link |
2024-05-24 | Automating the Diagnosis of Human Vision Disorders by Cross-modal 3D Generation | Li Zhang et.al. | 2405.15239 | link |
2024-05-23 | CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner | Weiyu Li et.al. | 2405.14979 | link |
2024-05-23 | Direct3D: Scalable Image-to-3D Generation via 3D Latent Diffusion Transformer | Shuang Wu et.al. | 2405.14832 | null |
2024-05-23 | MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes | Ruiyuan Gao et.al. | 2405.14475 | null |
2024-05-22 | Multi-Zone Modeling of Black Hole Accretion and Feedback in 3D GRMHD: Bridging Vast Spatial and Temporal Scales | Hyerin Cho et.al. | 2405.13887 | null |
2024-05-22 | Metabook: An Automatically Generated Augmented Reality Storybook Interaction System to Improve Children’s Engagement in Storytelling | Yibo Wang et.al. | 2405.13701 | null |
2024-05-18 | Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching | Xingyu Miao et.al. | 2405.11252 | link |
2024-05-16 | Flow Score Distillation for Diverse Text-to-3D Generation | Runjie Yan et.al. | 2405.10988 | null |
2024-05-23 | Describing heat dissipation in the resistive state of three-dimensional superconductors | Leonardo Rodrigues Cadorim et.al. | 2405.10415 | null |
2024-05-16 | Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion | Xinyang Li et.al. | 2405.09874 | null |
2024-05-16 | The metallicity and carbon-to-oxygen ratio of the ultra-hot Jupiter WASP-76b from Gemini-S/IGRINS | Megan Weiner Mansfield et.al. | 2405.09769 | null |
2024-05-15 | A Survey On Text-to-3D Contents Generation In The Wild | Chenhan Jiang et.al. | 2405.09431 | null |
2024-05-15 | 3D Shape Augmentation with Content-Aware Shape Resizing | Mingxiang Chen et.al. | 2405.09050 | null |
2024-05-13 | DiffTF++: 3D-aware Diffusion Transformer for Large-Vocabulary 3D Generation | Ziang Cao et.al. | 2405.08055 | link |
2024-05-13 | Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning | Wenqi Dong et.al. | 2405.08054 | null |
2024-05-14 | SketchDream: Sketch-based Text-to-3D Generation and Editing | Feng-Lin Liu et.al. | 2405.06461 | null |
2024-04-30 | GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting | Kai Zhang et.al. | 2404.19702 | null |
2024-04-30 | MicroDreamer: Zero-shot 3D Generation in $\sim$ 20 Seconds by Score-based Iterative Reconstruction | Luxi Chen et.al. | 2404.19525 | link |
2024-04-26 | Multi-view Image Prompted Multi-view Diffusion for Improved 3D Generation | Seungwook Kim et.al. | 2404.17419 | null |
2024-04-25 | Interactive3D: Create What You Want by Interactive 3D Generation | Shaocong Dong et.al. | 2404.16510 | null |
2024-04-22 | X-Ray: A Sequential 3D Representation for Generation | Tao Hu et.al. | 2404.14329 | link |
2024-04-18 | MeshLRM: Large Reconstruction Model for High-Quality Mesh | Xinyue Wei et.al. | 2404.12385 | null |
2024-04-17 | Shaping Realities: Enhancing 3D Generative AI with Fabrication Constraints | Faraz Faruqi et.al. | 2404.10142 | null |
2024-04-14 | InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models | Jiale Xu et.al. | 2404.07191 | link |
2024-04-10 | Urban Architect: Steerable 3D Urban Scene Generation with Layout Prior | Fan Lu et.al. | 2404.06780 | null |
2024-04-09 | Magic-Boost: Boost 3D Generation with Mutli-View Conditioned Diffusion | Fan Yang et.al. | 2404.06429 | link |
2024-04-09 | DreamView: Injecting View-specific Text Guidance into Text-to-3D Generation | Junkai Yan et.al. | 2404.06119 | link |
2024-04-09 | Hash3D: Training-free Acceleration for 3D Generation | Xingyi Yang et.al. | 2404.06091 | link |
2024-04-08 | StylizedGS: Controllable Stylization for 3D Gaussian Splatting | Dingxi Zhang et.al. | 2404.05220 | null |
2024-04-11 | Diffusion Time-step Curriculum for One Image to 3D Generation | Xuanyu Yi et.al. | 2404.04562 | link |
2024-04-03 | Design2Cloth: 3D Cloth Generation from 2D Masks | Jiali Zheng et.al. | 2404.02686 | null |
2024-04-02 | Towards Robust 3D Pose Transfer with Adversarial Learning | Haoyu Chen et.al. | 2404.02242 | null |
2024-04-02 | Black Hole-Disk Interactions in Magnetically Arrested Active Galactic Nuclei: General Relativistic Magnetohydrodynamic Simulations Using A Time-Dependent, Binary Metric | Sean M. Ressler et.al. | 2404.02193 | null |
2024-04-02 | Diffusion $^2$ : Dynamic 3D Content Generation via Score Composition of Orthogonal Diffusion Models | Zeyu Yang et.al. | 2404.02148 | link |
2024-04-07 | Sketch3D: Style-Consistent Guidance for Sketch-to-3D Generation | Wangguandong Zheng et.al. | 2404.01843 | null |
2024-04-01 | FlexiDreamer: Single Image-to-3D Generation with FlexiCubes | Ruowen Zhao et.al. | 2404.00987 | link |
2024-03-29 | Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D Generative Prior | Jaehoon Ko et.al. | 2403.20153 | link |
2024-04-05 | GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling | Bowen Zhang et.al. | 2403.19655 | null |
2024-03-28 | Mesh2NeRF: Direct Mesh Supervision for Neural Radiance Field Representation and Generation | Yujin Chen et.al. | 2403.19319 | null |
2024-03-29 | Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction | Qiuhong Shen et.al. | 2403.18795 | link |
2024-03-25 | DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion | Yuanze Lin et.al. | 2403.17237 | null |
2024-03-25 | VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation | Yang Chen et.al. | 2403.17001 | null |
2024-03-25 | Exploiting Priors from 3D Diffusion Models for RGB-Based One-Shot View Planning | Sicong Pan et.al. | 2403.16803 | link |
2024-03-22 | InterFusion: Text-Driven Generation of 3D Human-Object Interaction | Sisi Dai et.al. | 2403.15612 | link |
2024-03-22 | LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis | Kevin Xie et.al. | 2403.15385 | null |
2024-03-22 | ThemeStation: Generating Theme-Aware 3D Assets from Few Exemplars | Zhenwei Wang et.al. | 2403.15383 | link |
2024-03-22 | DreamFlow: High-Quality Text-to-3D Generation by Approximating Probability Flow | Kyungmin Lee et.al. | 2403.14966 | null |
2024-03-22 | STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians | Yifei Zeng et.al. | 2403.14939 | null |
2024-03-21 | DreamReward: Text-to-3D Generation with Human Preference | Junliang Ye et.al. | 2403.14613 | null |
2024-03-20 | Compress3D: a Compressed Latent Space for 3D Generation from a Single Image | Bowen Zhang et.al. | 2403.13524 | null |
2024-03-17 | General Line Coordinates in 3D | Joshua Martinez et.al. | 2403.13014 | null |
2024-03-19 | GVGEN: Text-to-3D Generation with Volumetric Representation | Xianglong He et.al. | 2403.12957 | null |
2024-03-19 | Precise-Physics Driven Text-to-3D Generation | Qingshan Xu et.al. | 2403.12438 | null |
2024-03-19 | ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance | Yongwei Chen et.al. | 2403.12409 | null |
2024-03-18 | VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models | Junlin Han et.al. | 2403.12034 | null |
2024-03-19 | Generic 3D Diffusion Adapter Using Controlled Multi-View Editing | Hansheng Chen et.al. | 2403.12032 | link |
2024-03-18 | LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation | Yushi Lan et.al. | 2403.12019 | link |
2024-03-18 | SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion | Vikram Voleti et.al. | 2403.12008 | null |
2024-03-17 | BrightDreamer: Generic 3D Gaussian Generative Framework for Fast Text-to-3D Synthesis | Lutao Jiang et.al. | 2403.11273 | link |
2024-03-15 | Isotropic3D: Image-to-3D Generation Based on a Single CLIP Embedding | Pengkun Liu et.al. | 2403.10395 | link |
2024-03-19 | Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting | Zhiqi Li et.al. | 2403.09981 | link |
2024-03-14 | Make-Your-3D: Fast and Consistent Subject-Driven 3D Content Generation | Fangfu Liu et.al. | 2403.09625 | null |
2024-03-14 | Hyper-3DG: Text-to-3D Gaussian Generation via Hypergraph | Donglin Di et.al. | 2403.09236 | link |
2024-03-14 | Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior | Cheng Chen et.al. | 2403.09140 | null |
2024-03-13 | UniLiDAR: Bridge the domain gap among different LiDARs for continual learning | Zikun Xu et.al. | 2403.08512 | null |
2024-03-11 | 3D simulations of TRAPPIST-1e with varying CO2, CH4 and haze profiles | Mei Ting Mak et.al. | 2403.06928 | null |
2024-03-11 | ExoCubed: A Riemann-Solver based Cubed-Sphere Dynamic Core for Planetary Atmospheres | Sihe Chen et.al. | 2403.06844 | link |
2024-03-11 | V3D: Video Diffusion Models are Effective 3D Generators | Zilong Chen et.al. | 2403.06738 | link |
2024-03-11 | 3D-aware Image Generation and Editing with Multi-modal Conditions | Bo Li et.al. | 2403.06470 | null |
2024-03-08 | CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model | Zhengyi Wang et.al. | 2403.05034 | null |
2024-03-04 | 3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors | Fangzhou Hong et.al. | 2403.02234 | link |
2024-03-04 | TripoSR: Fast 3D Object Reconstruction from a Single Image | Dmitry Tochilkin et.al. | 2403.02151 | link |
2024-03-08 | G3DR: Generative 3D Reconstruction in ImageNet | Pradyumna Reddy et.al. | 2403.00939 | link |
2024-02-28 | The VOROS: Lifting ROC curves to 3D | Christopher Ratigan et.al. | 2402.18689 | link |
2024-02-27 | DivAvatar: Diverse 3D Avatar Generation with a Single Prompt | Weijing Tao et.al. | 2402.17292 | null |
2024-02-22 | Place Anything into Any Video | Ziling Liu et.al. | 2402.14316 | null |
2024-02-22 | MVD $^2$ : Efficient Multiview 3D Reconstruction for Multiview Diffusion | Xin-Yang Zheng et.al. | 2402.14253 | null |
2024-02-20 | MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction | Shitao Tang et.al. | 2402.12712 | null |
2024-02-19 | Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability | Xuelin Qian et.al. | 2402.12225 | null |
2024-02-13 | IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation | Luke Melas-Kyriazi et.al. | 2402.08682 | null |
2024-02-11 | GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting | Xiaoyu Zhou et.al. | 2402.07207 | null |
2024-02-08 | AvatarMMC: 3D Head Avatar Generation and Editing with Multi-Modal Conditioning | Wamiq Reyaz Para et.al. | 2402.05803 | null |
2024-02-07 | SPAD : Spatially Aware Multiview Diffusers | Yash Kant et.al. | 2402.05235 | null |
2024-02-05 | Retrieval-Augmented Score Distillation for Text-to-3D Generation | Junyoung Seo et.al. | 2402.02972 | link |
2024-02-02 | A Comprehensive Survey on 3D Content Generation | Jian Liu et.al. | 2402.01166 | link |
3D Gaussian Splatting
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-06-26 | MADrive: Memory-Augmented Driving Scene Modeling | Polina Karpikova et.al. | 2506.21520 | null |
2025-06-26 | EndoFlow-SLAM: Real-Time Endoscopic SLAM with Flow-Constrained Gaussian Splatting | Taoyu Wu et.al. | 2506.21420 | null |
2025-06-26 | Geometry and Perception Guided Gaussians for Multiview-consistent 3D Generation from a Single Image | Pufan Li et.al. | 2506.21152 | null |
2025-06-26 | User-in-the-Loop View Sampling with Error Peaking Visualization | Ayaka Yasunaga et.al. | 2506.21009 | null |
2025-06-25 | 3DGH: 3D Head Generation with Composable Hair and Face | Chengan He et.al. | 2506.20875 | null |
2025-06-24 | Virtual Memory for 3D Gaussian Splatting | Jonathan Haberl et.al. | 2506.19415 | null |
2025-06-23 | GRAND-SLAM: Local Optimization for Globally Consistent Large-Scale Multi-Agent Gaussian SLAM | Annika Thomas et.al. | 2506.18885 | null |
2025-06-23 | Reconstructing Tornadoes in 3D with Gaussian Splatting | Adam Yang et.al. | 2506.18677 | null |
2025-06-21 | 3D Gaussian Splatting for Fine-Detailed Surface Reconstruction in Large-Scale Scene | Shihan Chen et.al. | 2506.17636 | null |
2025-06-20 | Part $^{2}$ GS: Part-aware Modeling of Articulated Objects using 3D Gaussian Splatting | Tianjiao Yu et.al. | 2506.17212 | null |
2025-06-23 | R3eVision: A Survey on Robust Rendering, Restoration, and Enhancement for 3D Low-Level Vision | Weeyoung Kwon et.al. | 2506.16262 | link |
2025-06-24 | RA-NeRF: Robust Neural Radiance Field Reconstruction with Accurate Camera Pose Estimation under Complex Trajectories | Qingsong Yan et.al. | 2506.15242 | null |
2025-06-17 | Peering into the Unknown: Active View Selection with Neural Uncertainty Maps for 3D Reconstruction | Zhengquan Zhang et.al. | 2506.14856 | null |
2025-06-17 | 3DGS-IEval-15K: A Large-scale Image Quality Evaluation Database for 3D Gaussian-Splatting | Yuke Xing et.al. | 2506.14642 | link |
2025-06-17 | HRGS: Hierarchical Gaussian Splatting for Memory-Efficient High-Resolution 3D Reconstruction | Changbai Li et.al. | 2506.14229 | null |
2025-06-23 | GAF: Gaussian Action Field as a Dynamic World Model for Robotic Manipulation | Ying Chai et.al. | 2506.14135 | null |
2025-06-16 | GRaD-Nav++: Vision-Language Model Enabled Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics | Qianzhong Chen et.al. | 2506.14009 | null |
2025-06-16 | PF-LHM: 3D Animatable Avatar Reconstruction from Pose-free Articulated Human Images | Lingteng Qiu et.al. | 2506.13766 | null |
2025-06-16 | Multiview Geometric Regularization of Gaussian Splatting for Accurate Radiance Fields | Jungeon Kim et.al. | 2506.13508 | null |
2025-06-16 | GS-2DGS: Geometrically Supervised 2DGS for Reflective Object Reconstruction | Jinguang Tong et.al. | 2506.13110 | null |
2025-06-15 | Metropolis-Hastings Sampling for 3D Gaussian Reconstruction | Hyunjin Kim et.al. | 2506.12945 | null |
2025-06-17 | Efficient multi-view training for 3D Gaussian Splatting | Minhyuk Choi et.al. | 2506.12727 | null |
2025-06-14 | Perceptual-GS: Scene-adaptive Perceptual Densification for Gaussian Splatting | Hongbi Zhou et.al. | 2506.12400 | null |
2025-06-12 | PointGS: Point Attention-Aware Sparse View Synthesis with Gaussian Splatting | Lintao Xiang et.al. | 2506.10335 | null |
2025-06-11 | DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos | Chieh Hubert Lin et.al. | 2506.09997 | null |
2025-06-11 | Gaussian Herding across Pens: An Optimal Transport Perspective on Global Gaussian Reduction for 3DGS | Tao Wang et.al. | 2506.09534 | null |
2025-06-11 | HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene | Jianing Chen et.al. | 2506.09518 | null |
2025-06-11 | TinySplat: Feedforward Approach for Generating Compact 3D Scene Representation | Zetian Song et.al. | 2506.09479 | null |
2025-06-12 | ODG: Occupancy Prediction Using Dual Gaussians | Yunxiao Shi et.al. | 2506.09417 | null |
2025-06-10 | StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams | Zike Wu et.al. | 2506.08862 | link |
2025-06-11 | Gaussian2Scene: 3D Scene Representation Learning via Self-supervised Learning with 3D Gaussian Splatting | Keyi Liu et.al. | 2506.08777 | null |
2025-06-10 | SceneSplat++: A Large Dataset and Comprehensive Benchmark for Language Gaussian Splatting | Mengjiao Ma et.al. | 2506.08710 | null |
2025-06-10 | Complex-Valued Holographic Radiance Fields | Yicheng Zhan et.al. | 2506.08350 | null |
2025-06-09 | Speedy Deformable 3D Gaussian Splatting: Fast Rendering and Compression of Dynamic Scenes | Allen Tu et.al. | 2506.07917 | link |
2025-06-09 | GaussianVAE: Adaptive Learning Dynamics of 3D Gaussians for High-Fidelity Super-Resolution | Shuja Khalid et.al. | 2506.07897 | null |
2025-06-09 | R3D2: Realistic 3D Asset Insertion via Diffusion for Autonomous Driving Simulation | William Ljungbergh et.al. | 2506.07826 | null |
2025-06-09 | OpenSplat3D: Open-Vocabulary 3D Instance Segmentation using Gaussian Splatting | Jens Piekenbrinck et.al. | 2506.07697 | null |
2025-06-09 | ProSplat: Improved Feed-Forward 3D Gaussian Splatting for Wide-Baseline Sparse Views | Xiaohan Lu et.al. | 2506.07670 | null |
2025-06-09 | PIG: Physically-based Multi-Material Interaction with 3D Gaussians | Zeyu Xiao et.al. | 2506.07657 | null |
2025-06-09 | Hierarchical Scoring with 3D Gaussian Splatting for Instance Image-Goal Navigation | Yijie Deng et.al. | 2506.07338 | null |
2025-06-08 | Accelerating 3D Gaussian Splatting with Neural Sorting and Axis-Oriented Rasterization | Zhican Wang et.al. | 2506.07069 | null |
2025-06-08 | Hybrid Mesh-Gaussian Representation for Efficient Indoor Scene Reconstruction | Binxiao Huang et.al. | 2506.06988 | null |
2025-06-07 | Gaussian Mapping for Evolving Scenes | Vladimir Yugay et.al. | 2506.06909 | null |
2025-06-06 | Dy3DGS-SLAM: Monocular 3D Gaussian Splatting SLAM for Dynamic Environments | Mingrui Li et.al. | 2506.05965 | null |
2025-06-06 | SurGSplat: Progressive Geometry-Constrained Gaussian Splatting for Surgical Scene Reconstruction | Yuchao Zheng et.al. | 2506.05935 | null |
2025-06-06 | Lumina: Real-Time Mobile Neural Rendering by Exploiting Computational Redundancy | Yu Feng et.al. | 2506.05682 | null |
2025-06-05 | VoxelSplat: Dynamic Gaussian Splatting as an Effective Loss for Occupancy and Flow Prediction | Ziyue Zhu et.al. | 2506.05563 | null |
2025-06-05 | On-the-fly Reconstruction for Large-Scale Novel View Synthesis from Unposed Images | Andreas Meuleman et.al. | 2506.05558 | null |
2025-06-05 | ODE-GS: Latent ODEs for Dynamic Scene Extrapolation with 3D Gaussian Splatting | Daniel Wang et.al. | 2506.05480 | null |
2025-06-05 | Revisiting Depth Representations for Feed-Forward 3D Gaussian Splatting | Duochao Shi et.al. | 2506.05327 | null |
2025-06-05 | Synthetic Dataset Generation for Autonomous Mobile Robots Using 3D Gaussian Splatting for Vision Training | Aneesh Deogan et.al. | 2506.05092 | null |
2025-06-05 | Point Cloud Segmentation of Agricultural Vehicles using 3D Gaussian Splatting | Alfred T. Christiansen et.al. | 2506.05009 | null |
2025-06-05 | Generating Synthetic Stereo Datasets using 3D Gaussian Splatting and Expert Knowledge Transfer | Filip Slezak et.al. | 2506.04908 | null |
2025-06-05 | Object-X: Learning to Reconstruct Multi-Modal 3D Object Representations | Gaia Di Lorenzo et.al. | 2506.04789 | null |
2025-06-04 | Pseudo-Simulation for Autonomous Driving | Wei Cao et.al. | 2506.04218 | link |
2025-06-04 | FlexGS: Train Once, Deploy Everywhere with Many-in-One Flexible 3D Gaussian Splatting | Hengyu Liu et.al. | 2506.04174 | null |
2025-06-04 | Splatting Physical Scenes: End-to-End Real-to-Sim from Imperfect Robot Data | Ben Moran et.al. | 2506.04120 | null |
2025-06-04 | SplArt: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian Splatting | Shengjie Lin et.al. | 2506.03594 | link |
2025-06-04 | Robust Neural Rendering in the Wild with Asymmetric Dual 3D Gaussian Splatting | Chengqi Li et.al. | 2506.03538 | null |
2025-06-03 | Multi-Spectral Gaussian Splatting with Neural Color Representation | Lukas Meyer et.al. | 2506.03407 | null |
2025-06-03 | Large Processor Chip Model | Kaiyan Chang et.al. | 2506.02929 | null |
2025-06-04 | Voyager: Real-Time Splatting City-Scale 3D Gaussians on Your Phone | Zheng Liu et.al. | 2506.02774 | null |
2025-06-03 | RobustSplat: Decoupling Densification and Dynamics for Transient-Free 3DGS | Chuanyu Fu et.al. | 2506.02751 | null |
2025-06-03 | EyeNavGS: A 6-DoF Navigation Dataset and Record-n-Replay Software for Real-World 3DGS Scenes in VR | Zihao Ding et.al. | 2506.02380 | link |
2025-06-02 | GSCodec Studio: A Modular Framework for Gaussian Splat Compression | Sicheng Li et.al. | 2506.01822 | link |
2025-06-02 | WorldExplorer: Towards Generating Fully Navigable 3D Scenes | Manuel-Andreas Schneider et.al. | 2506.01799 | null |
2025-06-01 | Globally Consistent RGB-D SLAM with 2D Gaussian Splatting | Xingguang Zhong et.al. | 2506.00970 | link |
2025-05-30 | 3D Gaussian Splat Vulnerabilities | Matthew Hull et.al. | 2506.00280 | link |
2025-05-30 | Adaptive Voxelization for Transform coding of 3D Gaussian splatting data | Chenjunjie Wang et.al. | 2506.00271 | null |
2025-05-30 | Understanding while Exploring: Semantics-driven Active Mapping | Liyan Chen et.al. | 2506.00225 | null |
2025-05-30 | AdaHuman: Animatable Detailed 3D Human Generation with Compositional Multiview Diffusion | Yangyi Huang et.al. | 2505.24877 | null |
2025-05-30 | TC-GS: A Faster Gaussian Splatting Module Utilizing Tensor Cores | Zimu Liao et.al. | 2505.24796 | link |
2025-05-30 | Tackling View-Dependent Semantics in 3D Language Gaussian Splatting | Jiazhong Cen et.al. | 2505.24746 | link |
2025-05-30 | LTM3D: Bridging Token Spaces for Conditional 3D Generation with Auto-Regressive Diffusion Framework | Xin Kang et.al. | 2505.24245 | null |
2025-05-29 | 3DGEER: Exact and Efficient Volumetric Rendering with 3D Gaussians | Zixun Huang et.al. | 2505.24053 | link |
2025-05-30 | ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS | Weijie Wang et.al. | 2505.23734 | link |
2025-05-29 | AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views | Lihan Jiang et.al. | 2505.23716 | null |
2025-05-29 | Mobi- $π$ : Mobilizing Your Robot Learning Policy | Jingyun Yang et.al. | 2505.23692 | null |
2025-05-29 | Holistic Large-Scale Scene Reconstruction via Mixed Gaussian Splatting | Chuandong Liu et.al. | 2505.23280 | link |
2025-05-29 | LODGE: Level-of-Detail Large-Scale Gaussian Splatting with Efficient Rendering | Jonas Kulhanek et.al. | 2505.23158 | null |
2025-05-29 | Pose-free 3D Gaussian splatting via shape-ray estimation | Youngju Na et.al. | 2505.22978 | null |
2025-05-28 | 3DGS Compression with Sparsity-guided Hierarchical Transform Coding | Hao Xu et.al. | 2505.22908 | null |
2025-05-28 | STDR: Spatio-Temporal Decoupling for Real-Time Dynamic Scene Rendering | Zehao Li et.al. | 2505.22400 | null |
2025-05-28 | UP-SLAM: Adaptively Structured Gaussian SLAM with Uncertainty Prediction in Dynamic Environments | Wancai Zheng et.al. | 2505.22335 | null |
2025-05-28 | Learning Fine-Grained Geometry for Sparse-View Splatting via Cascade Depth Loss | Wenjun Lu et.al. | 2505.22279 | null |
2025-05-28 | Hyperspectral Gaussian Splatting | Sunil Kumar Narayanan et.al. | 2505.21890 | null |
2025-05-27 | Empowering Vector Graphics with Consistently Arbitrary Viewing and View-dependent Visibility | Yidi Li et.al. | 2505.21377 | link |
2025-05-27 | Structure from Collision | Takuhiro Kaneko et.al. | 2505.21335 | null |
2025-05-29 | 3D-UIR: 3D Gaussian for Underwater 3D Scene Reconstruction via Physics Based Appearance-Medium Decoupling | Jieyu Yuan et.al. | 2505.21238 | null |
2025-05-28 | CityGo: Lightweight Urban Modeling and Rendering with Proxy Buildings and Residual Gaussians | Weihang Liu et.al. | 2505.21041 | null |
2025-05-27 | Intern-GS: Vision Model Guided Sparse-View 3D Gaussian Splatting | Xiangyu Sun et.al. | 2505.20729 | null |
2025-05-27 | Wideband RF Radiance Field Modeling Using Frequency-embedded 3D Gaussian Splatting | Zechen Li et.al. | 2505.20714 | link |
2025-05-26 | ParticleGS: Particle-Based Dynamics Modeling of 3D Gaussians for Prior-free Motion Extrapolation | Jinsheng Quan et.al. | 2505.20270 | link |
2025-05-26 | OB3D: A New Dataset for Benchmarking Omnidirectional 3D Reconstruction Using Blender | Shintaro Ito et.al. | 2505.20126 | link |
2025-05-26 | K-Buffers: A Plug-in Method for Enhancing Neural Fields with Multiple Buffers | Haofan Ren et.al. | 2505.19564 | link |
2025-05-25 | Improving Novel view synthesis of 360 $^\circ$ Scenes in Extremely Sparse Views by Jointly Training Hemisphere Sampled Synthetic Images | Guangan Chen et.al. | 2505.19264 | link |
2025-05-25 | Triangle Splatting for Real-Time Radiance Field Rendering | Jan Held et.al. | 2505.19175 | null |
2025-05-25 | FHGS: Feature-Homogenized Gaussian Splatting | Q. G. Duan et.al. | 2505.19154 | null |
2025-05-25 | Veta-GS: View-dependent deformable 3D Gaussian Splatting for thermal infrared Novel-view Synthesis | Myeongseok Nam et.al. | 2505.19138 | null |
2025-05-25 | VPGS-SLAM: Voxel-based Progressive 3D Gaussian SLAM in Large-Scale Scenes | Tianchen Deng et.al. | 2505.18992 | link |
2025-05-24 | Efficient Differentiable Hardware Rasterization for 3D Gaussian Splatting | Yitian Yuan et.al. | 2505.18764 | null |
2025-05-24 | SuperGS: Consistent and Detailed 3D Super-Resolution Scene Reconstruction via Gaussian Splatting | Shiyun Xie et.al. | 2505.18649 | null |
2025-05-23 | Pose Splatter: A 3D Gaussian Splatting Model for Quantifying Animal Pose and Appearance | Jack Goffinet et.al. | 2505.18342 | null |
2025-05-23 | CGS-GAN: 3D Consistent Gaussian Splatting GANs for High Resolution Human Head Synthesis | Florian Barthel et.al. | 2505.17590 | null |
2025-05-23 | From Flight to Insight: Semantic 3D Reconstruction for Aerial Inspection via Gaussian Splatting and Language-Guided Segmentation | Mahmoud Chick Zaouali et.al. | 2505.17402 | null |
2025-05-22 | Motion Matters: Compact Gaussian Streaming for Free-Viewpoint Video Reconstruction | Jiacong Chen et.al. | 2505.16533 | null |
2025-05-21 | RUSplatting: Robust 3D Gaussian Splatting for Sparse-View Underwater Scene Reconstruction | Zhuodong Jiang et.al. | 2505.15737 | null |
2025-05-21 | PlantDreamer: Achieving Realistic 3D Plant Models with Diffusion-Guided Gaussian Splatting | Zane K J Hartley et.al. | 2505.15528 | null |
2025-05-21 | GS2E: Gaussian Splatting is an Effective Data Generator for Event Stream Generation | Yuchen Li et.al. | 2505.15287 | null |
2025-05-21 | MonoSplat: Generalizable 3D Gaussian Splatting from Monocular Depth Foundation Models | Yifan Liu et.al. | 2505.15185 | link |
2025-05-20 | Scan, Materialize, Simulate: A Generalizable Framework for Physically Grounded Robot Planning | Amine Elhafsi et.al. | 2505.14938 | null |
2025-05-20 | Personalize Your Gaussian: Consistent 3D Scene Personalization from a Single Image | Yuxuan Wang et.al. | 2505.14537 | null |
2025-05-20 | MGStream: Motion-aware 3D Gaussian for Streamable Dynamic Scene Reconstruction | Zhenyu Bao et.al. | 2505.13839 | link |
2025-05-19 | 3D Gaussian Adaptive Reconstruction for Fourier Light-Field Microscopy | Chenyu Xu et.al. | 2505.12875 | null |
2025-05-19 | TACOcc:Target-Adaptive Cross-Modal Fusion with Volume Rendering for 3D Semantic Occupancy | Luyao Lei et.al. | 2505.12693 | null |
2025-05-18 | Is Semantic SLAM Ready for Embedded Systems ? A Comparative Survey | Calvin Galagain et.al. | 2505.12384 | null |
2025-05-17 | GTR: Gaussian Splatting Tracking and Reconstruction of Unknown Objects Based on Appearance and Geometric Complexity | Takuya Ikeda et.al. | 2505.11905 | null |
2025-05-16 | GrowSplat: Constructing Temporal Digital Twins of Plants with Gaussian Splats | Simeon Adebola et.al. | 2505.10923 | null |
2025-05-16 | EA-3DGS: Efficient and Adaptive 3D Gaussians with Highly Enhanced Quality for outdoor scenes | Jianlin Guo et.al. | 2505.10787 | link |
2025-05-14 | ExploreGS: a vision-based low overhead framework for 3D scene reconstruction | Yunji Feng et.al. | 2505.10578 | null |
2025-05-15 | Consistent Quantity-Quality Control across Scenes for Deployment-Aware Gaussian Splatting | Fengdi Zhang et.al. | 2505.10473 | link |
2025-05-15 | VRSplat: Fast and Robust Gaussian Splatting for Virtual Reality | Xuechang Tu et.al. | 2505.10144 | link |
2025-05-15 | Advances in Radiance Field for Dynamic Scene: From Neural Field to Gaussian Field | Jinlong Fan et.al. | 2505.10049 | link |
2025-05-15 | Large-Scale Gaussian Splatting SLAM | Zhe Xin et.al. | 2505.09915 | null |
2025-05-14 | Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware | Justin Yu et.al. | 2505.09601 | null |
2025-05-13 | DLO-Splatting: Tracking Deformable Linear Objects Using 3D Gaussian Splatting | Holly Dinkel et.al. | 2505.08644 | null |
2025-05-13 | FOCI: Trajectory Optimization on Gaussian Splats | Mario Gomez Andreu et.al. | 2505.08510 | null |
2025-05-13 | A Survey of 3D Reconstruction with Event Cameras: From Event-based Geometry to Neural 3D Rendering | Chuanzhi Xu et.al. | 2505.08438 | null |
2025-05-10 | Virtualized 3D Gaussians: Flexible Cluster-based Level-of-Detail System for Real-Time Rendering of Composed Scenes | Xijie Yang et.al. | 2505.06523 | null |
2025-05-08 | TeGA: Texture Space Gaussian Avatars for High-Resolution Dynamic Head Modeling | Gengyan Li et.al. | 2505.05672 | null |
2025-05-08 | Steepest Descent Density Control for Compact 3D Gaussian Splatting | Peihao Wang et.al. | 2505.05587 | null |
2025-05-08 | SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation | Yonwoo Choi et.al. | 2505.05475 | link |
2025-05-08 | Time of the Flight of the Gaussians: Optimizing Depth Indirectly in Dynamic Radiance Fields | Runfeng Li et.al. | 2505.05356 | null |
2025-05-07 | SGCR: Spherical Gaussians for Efficient 3D Curve Reconstruction | Xinran Yang et.al. | 2505.04668 | link |
2025-05-07 | Bridging Geometry-Coherent Text-to-3D Generation with Multi-View Diffusion Priors and Gaussian Splatting | Feng Yang et.al. | 2505.04262 | null |
2025-05-06 | 3D Gaussian Splatting Data Compression with Mixture of Priors | Lei Liu et.al. | 2505.03310 | null |
2025-05-04 | SparSplat: Fast Multi-View Reconstruction with Generalizable 2D Gaussian Splatting | Shubhendu Jena et.al. | 2505.02175 | null |
2025-05-04 | GarmentGS: Point-Cloud Guided Gaussian Splatting for High-Fidelity Non-Watertight 3D Garment Reconstruction | Zhihao Tang et.al. | 2505.02126 | null |
2025-05-03 | HybridGS: High-Efficiency Gaussian Splatting Data Compression using Dual-Channel Sparse Representation and Point Cloud Encoder | Qi Yang et.al. | 2505.01938 | link |
2025-05-03 | GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting | Anushka Agarwal et.al. | 2505.01928 | null |
2025-05-03 | Visual enhancement and 3D representation for underwater scenes: a review | Guoxi Huang et.al. | 2505.01869 | null |
2025-05-03 | AquaGS: Fast Underwater Scene Reconstruction with SfM-Free Gaussian Splatting | Junhao Shi et.al. | 2505.01799 | null |
2025-05-02 | FalconWing: An Open-Source Platform for Ultra-Light Fixed-Wing Aircraft Research | Yan Miao et.al. | 2505.01383 | null |
2025-05-02 | Compensating Spatiotemporally Inconsistent Observations for Online Dynamic 3D Gaussian Splatting | Youngsik Yun et.al. | 2505.01235 | null |
2025-04-30 | A Survey on 3D Reconstruction Techniques in Plant Phenotyping: From Classical Methods to Neural Radiance Fields (NeRF), 3D Gaussian Splatting (3DGS), and Beyond | Jiajia Li et.al. | 2505.00737 | link |
2025-04-29 | GauSS-MI: Gaussian Splatting Shannon Mutual Information for Active 3D Reconstruction | Yuhan Xie et.al. | 2504.21067 | link |
2025-04-29 | GaussTrap: Stealthy Poisoning Attacks on 3D Gaussian Splatting for Targeted Scene Confusion | Jiaxin Hong et.al. | 2504.20829 | null |
2025-04-29 | EfficientHuman: Efficient Training and Reconstruction of Moving Human using Articulated 2D Gaussian | Hao Tian et.al. | 2504.20607 | null |
2025-04-29 | Creating Your Editable 3D Photorealistic Avatar with Tetrahedron-constrained Gaussian Splatting | Hanxi Liu et.al. | 2504.20403 | null |
2025-05-01 | GSFeatLoc: Visual Localization Using Feature Correspondence on 3D Gaussian Splatting | Jongwon Lee et.al. | 2504.20379 | null |
2025-04-28 | Mesh-Learner: Texturing Mesh with Spherical Harmonics | Yunfei Wan et.al. | 2504.19938 | link |
2025-04-28 | CE-NPBG: Connectivity Enhanced Neural Point-Based Graphics for Novel View Synthesis in Autonomous Driving Scenes | Mohammad Altillawi et.al. | 2504.19557 | null |
2025-04-28 | GSFF-SLAM: 3D Semantic Gaussian Splatting SLAM via Feature Field | Zuxing Lu et.al. | 2504.19409 | null |
2025-04-30 | 4DGS-CC: A Contextual Coding Framework for 4D Gaussian Splatting Data Compression | Zicong Chen et.al. | 2504.18925 | null |
2025-05-01 | TransparentGS: Fast Inverse Rendering of Transparent Objects with Gaussians | Letian Huang et.al. | 2504.18768 | null |
2025-04-28 | RGS-DR: Reflective Gaussian Surfels with Deferred Rendering for Shiny Objects | Georgios Kouros et.al. | 2504.18468 | null |
2025-04-25 | PerfCam: Digital Twinning for Production Lines Using 3D Gaussian Splatting and Vision Models | Michel Gokan Khan et.al. | 2504.18165 | link |
2025-04-24 | iVR-GS: Inverse Volume Rendering for Explorable Visualization via Editable 3D Gaussian Splatting | Kaiyuan Tang et.al. | 2504.17954 | link |
2025-04-23 | Visibility-Uncertainty-guided 3D Gaussian Inpainting via Scene Conceptional Learning | Mingxuan Cui et.al. | 2504.17815 | link |
2025-04-24 | CasualHDRSplat: Robust High Dynamic Range 3D Gaussian Splatting from Casually Captured Videos | Shucheng Gong et.al. | 2504.17728 | link |
2025-04-23 | HUG: Hierarchical Urban Gaussian Splatting with Block-Based Reconstruction | Zhongtao Wang et.al. | 2504.16606 | null |
2025-04-23 | ToF-Splatting: Dense SLAM using Sparse Time-of-Flight Depth and Multi-Frame Integration | Andrea Conti et.al. | 2504.16545 | null |
2025-04-21 | StyleMe3D: Stylization with Disentangled Priors by Multiple Encoders on 3D Gaussians | Cailin Zhuang et.al. | 2504.15281 | null |
2025-04-21 | MoBGS: Motion Deblurring Dynamic 3D Gaussian Splatting for Blurry Monocular Video | Minh-Quan Viet Bui et.al. | 2504.15122 | null |
2025-04-20 | NVSMask3D: Hard Visual Prompting with Camera Pose Interpolation for 3D Open Vocabulary Instance Segmentation | Junyuan Fang et.al. | 2504.14638 | null |
2025-04-20 | VGNC: Reducing the Overfitting of Sparse-view 3DGS via Validation-guided Gaussian Number Control | Lifeng Lin et.al. | 2504.14548 | null |
2025-04-20 | Metamon-GS: Enhancing Representability with Variance-Guided Densification and Light Encoding | Junyan Su et.al. | 2504.14460 | null |
2025-04-23 | SEGA: Drivable 3D Gaussian Head Avatar from a Single Image | Chen Guo et.al. | 2504.14373 | null |
2025-04-18 | EG-Gaussian: Epipolar Geometry and Graph Network Enhanced 3D Gaussian Splatting | Beizhen Zhao et.al. | 2504.13540 | null |
2025-04-17 | Volume Encoding Gaussians: Transfer Function-Agnostic 3D Gaussians for Volume Rendering | Landon Dyken et.al. | 2504.13339 | null |
2025-04-17 | Novel Demonstration Generation with Gaussian Splatting Enables Robust One-Shot Manipulation | Sizhe Yang et.al. | 2504.13175 | null |
2025-04-18 | ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos | Zetong Zhang et.al. | 2504.13167 | null |
2025-04-17 | Digital Twin Generation from Visual Data: A Survey | Andrew Melnik et.al. | 2504.13159 | link |
2025-04-17 | Training-Free Hierarchical Scene Understanding for Gaussian Splatting with Superpoint Graphs | Shaohui Dai et.al. | 2504.13153 | link |
2025-04-17 | GSAC: Leveraging Gaussian Splatting for Photorealistic Avatar Creation with Unity Integration | Rendong Zhang et.al. | 2504.12999 | link |
2025-04-17 | Second-order Optimization of Gaussian Splats with Importance Sampling | Hamza Pehlivan et.al. | 2504.12905 | null |
2025-04-17 | AAA-Gaussians: Anti-Aliased and Artifact-Free 3D Gaussian Rendering | Michael Steiner et.al. | 2504.12811 | null |
2025-04-17 | CAGE-GS: High-fidelity Cage Based 3D Gaussian Splatting Deformation | Yifei Tong et.al. | 2504.12800 | null |
2025-04-17 | TSGS: Improving Gaussian Splatting for Transparent Surface Reconstruction via Normal and De-lighting Priors | Mingwei Li et.al. | 2504.12799 | null |
2025-04-17 | ARAP-GS: Drag-driven As-Rigid-As-Possible 3D Gaussian Splatting Editing with Diffusion Prior | Xiao Han et.al. | 2504.12788 | null |
2025-04-16 | CAGS: Open-Vocabulary 3D Scene Understanding with Context-Aware Gaussian Splatting | Wei Sun et.al. | 2504.11893 | null |
2025-04-16 | 3DAffordSplat: Efficient Affordance Reasoning with 3D Gaussians | Zeming Wei et.al. | 2504.11218 | link |
2025-04-15 | 3D Gabor Splatting: Reconstruction of High-frequency Surface Texture using Gabor Noise | Haato Watanabe et.al. | 2504.11003 | null |
2025-04-15 | LL-Gaussian: Low-Light Scene Reconstruction and Enhancement via Gaussian Splatting for Novel View Synthesis | Hao Sun et.al. | 2504.10331 | null |
2025-04-14 | EBAD-Gaussian: Event-driven Bundle Adjusted Deblur Gaussian Splatting | Yufei Deng et.al. | 2504.10012 | null |
2025-04-16 | GaussVideoDreamer: 3D Scene Generation with Video Diffusion and Inconsistency-Aware Gaussian Splatting | Junlin Hao et.al. | 2504.10001 | null |
2025-04-13 | DropoutGS: Dropping Out Gaussians for Better Sparse-view Rendering | Yexing Xu et.al. | 2504.09491 | null |
2025-04-12 | A Constrained Optimization Approach for Gaussian Splatting from Coarsely-posed Images and Noisy Lidar Point Clouds | Jizong Peng et.al. | 2504.09129 | null |
2025-04-12 | BIGS: Bimanual Category-agnostic Interaction Reconstruction from Monocular Videos via 3D Gaussian Splatting | Jeongwan On et.al. | 2504.09097 | null |
2025-04-12 | You Need a Transition Plane: Bridging Continuous Panoramic 3D Reconstruction with Perspective Gaussian Splatting | Zhijie Shen et.al. | 2504.09062 | null |
2025-04-15 | BlockGaussian: Efficient Large-Scale Scene Novel View Synthesis via Adaptive Block-Based Gaussian Splatting | Yongchang Wu et.al. | 2504.09048 | link |
2025-04-11 | FMLGS: Fast Multilevel Language Embedded Gaussians for Part-level Interactive Agents | Xin Tan et.al. | 2504.08581 | null |
2025-04-10 | InteractAvatar: Modeling Hand-Face Interaction in Photorealistic Avatars with Deformable Gaussians | Kefan Chen et.al. | 2504.07949 | null |
2025-04-10 | View-Dependent Uncertainty Estimation of 3D Gaussian Splatting | Chenyu Han et.al. | 2504.07370 | null |
2025-04-09 | Wheat3DGS: In-field 3D Reconstruction, Instance Segmentation and Phenotyping of Wheat Heads with Gaussian Splatting | Daiwei Zhang et.al. | 2504.06978 | null |
2025-04-09 | IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments | Can Zhang et.al. | 2504.06827 | null |
2025-04-09 | SVG-IR: Spatially-Varying Gaussian Splatting for Inverse Rendering | Hanxiao Sun et.al. | 2504.06815 | link |
2025-04-10 | Stochastic Ray Tracing of 3D Transparent Gaussians | Xin Sun et.al. | 2504.06598 | null |
2025-04-08 | Micro-splatting: Maximizing Isotropic Constraints for Refined Optimization in 3D Gaussian Splatting | Jee Won Lee et.al. | 2504.05740 | null |
2025-04-07 | View-Dependent Deformation Fields for 2D Editing of 3D Models | Martin El Mqirmi et.al. | 2504.05544 | null |
2025-04-07 | L3GS: Layered 3D Gaussian Splats for Efficient 3D Scene Delivery | Yi-Zhen Tsai et.al. | 2504.05517 | link |
2025-04-07 | Let it Snow! Animating Static Gaussian Scenes With Dynamic Weather Effects | Gal Fiebelman et.al. | 2504.05296 | null |
2025-04-07 | PanoDreamer: Consistent Text to 360-Degree Scene Generation | Zhexiao Xiong et.al. | 2504.05152 | null |
2025-04-07 | Embracing Dynamics: Dynamics-aware 4D Gaussian Splatting SLAM | Zhicong Sun et.al. | 2504.04844 | link |
2025-04-07 | DeclutterNeRF: Generative-Free 3D Scene Recovery for Occlusion Removal | Wanzhou Liu et.al. | 2504.04679 | null |
2025-04-05 | 3R-GS: Best Practice in Optimizing Camera Poses Along with 3DGS | Zhisheng Huang et.al. | 2504.04294 | null |
2025-04-05 | Interpretable Single-View 3D Gaussian Splatting using Unsupervised Hierarchical Disentangled Representation Learning | Yuyang Zhang et.al. | 2504.04190 | null |
2025-04-04 | HumanDreamer-X: Photorealistic Single-image Human Avatars Reconstruction via Gaussian Restoration | Boyuan Wang et.al. | 2504.03536 | null |
2025-04-03 | Compressing 3D Gaussian Splatting by Noise-Substituted Vector Quantization | Haishan Wang et.al. | 2504.03059 | link |
2025-04-03 | MonoGS++: Fast and Accurate Monocular RGB Gaussian SLAM | Renwu Li et.al. | 2504.02437 | null |
2025-04-03 | ConsDreamer: Advancing Multi-View Consistency for Zero-Shot Text-to-3D Generation | Yuan Zhou et.al. | 2504.02316 | link |
2025-04-02 | UAVTwin: Neural Digital Twins for UAVs using Gaussian Splatting | Jaehoon Choi et.al. | 2504.02158 | null |
2025-04-02 | Diffusion-Guided Gaussian Splatting for Large-Scale Unconstrained 3D Reconstruction and Novel View Synthesis | Niluthpol Chowdhury Mithun et.al. | 2504.01960 | null |
2025-04-02 | BOGausS: Better Optimized Gaussian Splatting | Stéphane Pateux et.al. | 2504.01844 | null |
2025-04-02 | FlowR: Flowing from Sparse to Dense 3D Reconstructions | Tobias Fischer et.al. | 2504.01647 | null |
2025-04-02 | 3DBonsai: Structure-Aware Bonsai Modeling Using Conditioned 3D Gaussian Splatting | Hao Wu et.al. | 2504.01619 | null |
2025-04-02 | RealityAvatar: Towards Realistic Loose Clothing Modeling in Animatable 3D Gaussian Avatars | Yahui Li et.al. | 2504.01559 | null |
2025-04-02 | Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment | Ziteng Cui et.al. | 2504.01503 | link |
2025-04-02 | 3D Gaussian Inverse Rendering with Approximated Global Illumination | Zirui Wu et.al. | 2504.01358 | null |
2025-04-01 | DropGaussian: Structural Regularization for Sparse-view Gaussian Splatting | Hyunwoo Park et.al. | 2504.00773 | null |
2025-04-01 | UnIRe: Unsupervised Instance Decomposition for Dynamic Urban Scene Reconstruction | Yunxuan Mao et.al. | 2504.00763 | null |
2025-04-01 | Monocular and Generalizable Gaussian Talking Head Animation | Shengjie Gong et.al. | 2504.00665 | null |
2025-03-31 | StochasticSplats: Stochastic Rasterization for Sorting-Free 3D Gaussian Splatting | Shakiba Kheradmand et.al. | 2503.24366 | null |
2025-04-01 | Visual Acoustic Fields | Yuelei Li et.al. | 2503.24270 | null |
2025-03-31 | DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting | Seungjun Lee et.al. | 2503.24210 | null |
2025-03-31 | Learning 3D-Gaussian Simulators from RGB Videos | Mikel Zhobro et.al. | 2503.24009 | null |
2025-03-31 | ExScene: Free-View 3D Scene Reconstruction with Gaussian Splatting from a Single Image | Tianyi Gong et.al. | 2503.23881 | null |
2025-03-30 | Gaussian Blending Unit: An Edge GPU Plug-in for Real-Time Gaussian-Based Rendering in AR/VR | Zhifan Ye et.al. | 2503.23625 | null |
2025-03-30 | Enhancing 3D Gaussian Splatting Compression via Spatial Condition-based Prediction | Jingui Ma et.al. | 2503.23337 | null |
2025-03-30 | ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and Reasoning | Zhenyang Liu et.al. | 2503.23297 | null |
2025-03-29 | NeuralGS: Bridging Neural Fields and 3D Gaussian Splatting for Compact 3D Representations | Zhenyu Tang et.al. | 2503.23162 | null |
2025-03-29 | CityGS-X: A Scalable Architecture for Efficient and Geometrically Accurate Large-Scale Scene Reconstruction | Yuanyuan Gao et.al. | 2503.23044 | null |
2025-03-28 | TranSplat: Lighting-Consistent Cross-Scene Object Transfer with 3D Gaussian Splatting | Boyang et.al. | 2503.22676 | null |
2025-03-28 | AH-GS: Augmented 3D Gaussian Splatting for High-Frequency Detail Representation | Chenyang Xu et.al. | 2503.22324 | null |
2025-03-28 | Follow Your Motion: A Generic Temporal Consistency Portrait Editing Framework with Trajectory Guidance | Haijie Yang et.al. | 2503.22225 | null |
2025-03-28 | ABC-GS: Alignment-Based Controllable Style Transfer for 3D Gaussian Splatting | Wenjie Liu et.al. | 2503.22218 | null |
2025-03-31 | Disentangled 4D Gaussian Splatting: Towards Faster and More Efficient Dynamic Scene Rendering | Hao Feng et.al. | 2503.22159 | null |
2025-03-27 | Semantic Consistent Language Gaussian Splatting for Point-Level Open-vocabulary Querying | Hairong Yin et.al. | 2503.21767 | null |
2025-03-28 | LandMarkSystem Technical Report | Zhenxiang Ma et.al. | 2503.21364 | link |
2025-03-27 | Frequency-Aware Gaussian Splatting Decomposition | Yishai Lavi et.al. | 2503.21226 | null |
2025-03-26 | PGC: Physics-Based Gaussian Cloth from a Single Pose | Michelle Guo et.al. | 2503.20779 | null |
2025-03-26 | TC-GS: Tri-plane based compression for 3D Gaussian Splatting | Taorui Wang et.al. | 2503.20221 | link |
2025-03-26 | EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis | Sheng Miao et.al. | 2503.20168 | null |
2025-03-25 | Thin-Shell-SfT: Fine-Grained Monocular Non-rigid 3D Surface Tracking with Neural Deformation Fields | Navami Kairanda et.al. | 2503.19976 | null |
2025-03-26 | A Survey on Event-driven 3D Reconstruction: Development under Different Categories | Chuanzhi Xu et.al. | 2503.19753 | null |
2025-03-28 | GaussianUDF: Inferring Unsigned Distance Functions through 3D Gaussian Splatting | Shujuan Li et.al. | 2503.19458 | null |
2025-03-25 | SparseGS-W: Sparse-View 3D Gaussian Splatting in the Wild with Generative Priors | Yiqing Li et.al. | 2503.19452 | null |
2025-03-26 | COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting | Jiaxin Zhang et.al. | 2503.19443 | link |
2025-03-25 | MATT-GS: Masked Attention-based 3DGS for Robot Perception and Object Detection | Jee Won Lee et.al. | 2503.19330 | null |
2025-03-25 | HoGS: Unified Near and Far Object Reconstruction via Homogeneous Gaussian Splatting | Xinpeng Liu et.al. | 2503.19232 | link |
2025-03-24 | NexusGS: Sparse View Synthesis with Epipolar Depth Priors in 3D Gaussian Splatting | Yulong Zheng et.al. | 2503.18794 | null |
2025-03-24 | GS-Marker: Generalizable and Robust Watermarking for 3D Gaussian Splatting | Lijiang Li et.al. | 2503.18718 | null |
2025-03-24 | Hardware-Rasterized Ray-Based Gaussian Splatting | Samuel Rota Bulò et.al. | 2503.18682 | null |
2025-03-24 | LLGS: Unsupervised Gaussian Splatting for Image Enhancement and Reconstruction in Pure Dark Environment | Haoran Wang et.al. | 2503.18640 | null |
2025-03-25 | StableGS: A Floater-Free Framework for 3D Gaussian Splatting | Luchao Wang et.al. | 2503.18458 | null |
2025-03-24 | 4DGC: Rate-Aware 4D Gaussian Compression for Efficient Streamable Free-Viewpoint Video | Qiang Hu et.al. | 2503.18421 | null |
2025-03-24 | DashGaussian: Optimizing 3D Gaussian Splatting in 200 Seconds | Youyu Chen et.al. | 2503.18402 | null |
2025-03-24 | GI-SLAM: Gaussian-Inertial SLAM | Xulang Liu et.al. | 2503.18275 | null |
2025-03-23 | Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving | Junhao Ge et.al. | 2503.18108 | link |
2025-03-23 | PanoGS: Gaussian-based Panoptic Segmentation for 3D Open Vocabulary Scene Understanding | Hongjia Zhai et.al. | 2503.18107 | null |
2025-03-21 | TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting | Jianchuan Chen et.al. | 2503.17032 | null |
2025-03-21 | DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery | Jiadong Tang et.al. | 2503.16964 | null |
2025-03-21 | Optimized Minimal 3D Gaussian Splatting | Joo Chan Lee et.al. | 2503.16924 | null |
2025-03-20 | SAGE: Semantic-Driven Adaptive Gaussian Splatting in Extended Reality | Chiara Schiavo et.al. | 2503.16747 | null |
2025-03-20 | GauRast: Enhancing GPU Triangle Rasterizers to Accelerate 3D Gaussian Splatting | Sixu Li et.al. | 2503.16681 | null |
2025-03-20 | M3: 3D-Spatial MultiModal Memory | Xueyan Zou et.al. | 2503.16413 | link |
2025-03-20 | Gaussian Graph Network: Learning Efficient and Generalizable Gaussian Representations from Multi-view Images | Shengjun Zhang et.al. | 2503.16338 | null |
2025-03-20 | OccluGaussian: Occlusion-Aware Gaussian Splatting for Large Scene Reconstruction and Rendering | Shiyong Liu et.al. | 2503.16177 | null |
2025-03-20 | Enhancing Close-up Novel View Synthesis via Pseudo-labeling | Jiatong Xia et.al. | 2503.15908 | link |
2025-03-20 | VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling | Hyojun Go et.al. | 2503.15855 | null |
2025-03-20 | BARD-GS: Blur-Aware Reconstruction of Dynamic Scenes via Gaussian Splatting | Yiren Lu et.al. | 2503.15835 | null |
2025-03-18 | HandSplat: Embedding-Driven Gaussian Splatting for High-Fidelity Hand Rendering | Yilan Dong et.al. | 2503.14736 | null |
2025-03-18 | Optimized 3D Gaussian Splatting using Coarse-to-Fine Image Frequency Modulation | Umar Farooq et.al. | 2503.14475 | null |
2025-03-18 | Improving Adaptive Density Control for 3D Gaussian Splatting | Glenn Grubert et.al. | 2503.14274 | link |
2025-03-18 | Lightweight Gradient-Aware Upscaling of 3D Gaussian Splatting Images | Simon Niedermayr et.al. | 2503.14171 | null |
2025-03-18 | Light4GS: Lightweight Compact 4D Gaussian Splatting Generation via Context Model | Mufan Liu et.al. | 2503.13948 | null |
2025-03-17 | Gaussian On-the-Fly Splatting: A Progressive Framework for Robust Near Real-Time 3DGS Optimization | Yiwei Xu et.al. | 2503.13086 | null |
2025-03-17 | CAT-3DGS Pro: A New Benchmark for Efficient 3DGS Compression | Yu-Ting Zhan et.al. | 2503.12862 | null |
2025-03-17 | CompMarkGS: Robust Watermarking for Compression 3D Gaussian Splatting | Sumin In et.al. | 2503.12836 | null |
2025-03-17 | AV-Surf: Surface-Enhanced Geometry-Aware Novel-View Acoustic Synthesis | Hadam Baek et.al. | 2503.12806 | null |
2025-03-16 | SPC-GS: Gaussian Splatting with Semantic-Prompt Consistency for Indoor Open-World Free-view Synthesis from Sparse Inputs | Guibiao Liao et.al. | 2503.12535 | null |
2025-03-16 | VRsketch2Gaussian: 3D VR Sketch Guided 3D Object Generation with Gaussian Splatting | Songen Gu et.al. | 2503.12383 | null |
2025-03-18 | GS-I $^{3}$ : Gaussian Splatting for Surface Reconstruction from Illumination-Inconsistent Images | Tengfei Wang et.al. | 2503.12335 | link |
2025-03-16 | Swift4D:Adaptive divide-and-conquer Gaussian Splatting for compact and efficient reconstruction of dynamic scene | Jiahao Wu et.al. | 2503.12307 | null |
2025-03-18 | 3D Gaussian Splatting against Moving Objects for High-Fidelity Street Scene Reconstruction | Peizhen Zheng et.al. | 2503.12001 | link |
2025-03-15 | DynaGSLAM: Real-Time Gaussian-Splatting SLAM for Online Rendering, Tracking, Motion Predictions of Moving Objects in Dynamic Scenes | Runfa Blark Li et.al. | 2503.11979 | null |
2025-03-14 | Advancing 3D Gaussian Splatting Editing with Complementary and Consensus Information | Xuanqi Zhang et.al. | 2503.11601 | null |
2025-03-14 | EgoSplat: Open-Vocabulary Egocentric Scene Understanding with Language Embedded 3D Gaussian Splatting | Di Li et.al. | 2503.11345 | null |
2025-03-14 | Uncertainty-Aware Normal-Guided Gaussian Splatting for Surface Reconstruction from Sparse Image Sequences | Zhen Tan et.al. | 2503.11172 | null |
2025-03-13 | LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds | Lingteng Qiu et.al. | 2503.10625 | link |
2025-03-13 | VicaSplat: A Single Run is All You Need for 3D Gaussian Splatting and Camera Estimation from Unposed Video Frames | Zhiqi Li et.al. | 2503.10286 | null |
2025-03-13 | ROODI: Reconstructing Occluded Objects with Denoising Inpainters | Yeonjin Chang et.al. | 2503.10256 | null |
2025-03-15 | 3D Student Splatting and Scooping | Jialin Zhu et.al. | 2503.10148 | link |
2025-03-13 | GaussHDR: High Dynamic Range Gaussian Splatting via Learning Unified 3D and 2D Local Tone Mapping | Jinfeng Liu et.al. | 2503.10143 | null |
2025-03-12 | Physics-Aware Human-Object Rendering from Sparse Views via 3D Gaussian Splatting | Weiquan Wang et.al. | 2503.09640 | null |
2025-03-12 | Hybrid Rendering for Multimodal Autonomous Driving: Merging Neural and Physics-Based Simulation | Máté Tóth et.al. | 2503.09464 | null |
2025-03-12 | Online Language Splatting | Saimouli Katragadda et.al. | 2503.09447 | null |
2025-03-12 | Close-up-GS: Enhancing Close-Up View Synthesis in 3D Gaussian Splatting with Progressive Self-Training | Jiatong Xia et.al. | 2503.09396 | null |
2025-03-11 | PCGS: Progressive Compression of 3D Gaussian Splatting | Yihang Chen et.al. | 2503.08511 | link |
2025-03-11 | HRAvatar: High-Quality and Relightable Gaussian Head Avatar | Dongbin Zhang et.al. | 2503.08224 | null |
2025-03-11 | S3R-GS: Streamlining the Pipeline for Large-Scale Street Scene Reconstruction | Guangting Zheng et.al. | 2503.08217 | null |
2025-03-11 | Dynamic Scene Reconstruction: Recent Advance in Real-time Rendering and Streaming | Jiaxuan Zhu et.al. | 2503.08166 | null |
2025-03-11 | ArticulatedGS: Self-supervised Digital Twin Modeling of Articulated Objects using 3D Gaussian Splatting | Junfu Guo et.al. | 2503.08135 | null |
2025-03-13 | MVGSR: Multi-View Consistency Gaussian Splatting for Robust Surface Reconstruction | Chenfeng Hou et.al. | 2503.08093 | null |
2025-03-11 | GigaSLAM: Large-Scale Monocular SLAM with Hierachical Gaussian Splats | Kai Deng et.al. | 2503.08071 | link |
2025-03-11 | 7DGS: Unified Spatial-Temporal-Angular Gaussian Splatting | Zhongpai Gao et.al. | 2503.07946 | null |
2025-03-10 | POp-GS: Next Best View in 3D-Gaussian Splatting with P-Optimality | Joey Wilson et.al. | 2503.07819 | null |
2025-03-10 | SOGS: Second-Order Anchor for Advanced 3D Gaussian Splatting | Jiahui Zhang et.al. | 2503.07476 | null |
2025-03-10 | EigenGS Representation: From Eigenspace to Gaussian Image Space | Lo-Wei Tai et.al. | 2503.07446 | null |
2025-03-10 | All That Glitters Is Not Gold: Key-Secured 3D Secrets within 3D Gaussian Splatting | Yan Ren et.al. | 2503.07191 | link |
2025-03-10 | Frequency-Aware Density Control via Reparameterization for High-Quality Rendering of 3D Gaussian Splatting | Zhaojie Zeng et.al. | 2503.07000 | link |
2025-03-09 | REArtGS: Reconstructing and Generating Articulated Objects via 3D Gaussian Splatting with Geometric and Motion Constraints | Di Wu et.al. | 2503.06677 | null |
2025-03-09 | StructGS: Adaptive Spherical Harmonics and Rendering Enhancements for Superior 3D Gaussian Splatting | Zexu Huang et.al. | 2503.06462 | null |
2025-03-08 | SplatTalk: 3D VQA with Gaussian Splatting | Anh Thai et.al. | 2503.06271 | null |
2025-03-08 | StreamGS: Online Generalizable Gaussian Splatting Reconstruction for Unposed Image Streams | Yang LI et.al. | 2503.06235 | null |
2025-03-08 | ForestSplats: Deformable transient field for Gaussian Splatting in the Wild | Wongi Park et.al. | 2503.06179 | null |
2025-03-08 | Feature-EndoGaussian: Feature Distilled Gaussian Splatting in Surgical Deformable Scene Reconstruction | Kai Li et.al. | 2503.06161 | null |
2025-03-07 | Free Your Hands: Lightweight Relightable Turntable Capture Pipeline | Jiahui Fan et.al. | 2503.05511 | null |
2025-03-07 | LiDAR-enhanced 3D Gaussian Splatting Mapping | Jian Shen et.al. | 2503.05425 | null |
2025-03-07 | Self-Modeling Robots by Photographing | Kejun Hu et.al. | 2503.05398 | null |
2025-03-07 | CoMoGaussian: Continuous Motion-Aware Gaussian Splatting from Motion-Blurred Images | Jungho Lee et.al. | 2503.05332 | link |
2025-03-07 | STGA: Selective-Training Gaussian Head Avatars | Hanzhi Guo et.al. | 2503.05196 | null |
2025-03-07 | MGSR: 2D/3D Mutual-boosted Gaussian Splatting for High-fidelity Surface Reconstruction under Various Light Conditions | Qingyuan Zhou et.al. | 2503.05182 | null |
2025-03-07 | SplatPose: Geometry-Aware 6-DoF Pose Estimation from Single RGB Image via 3D Gaussian Splatting | Linqi Yang et.al. | 2503.05174 | null |
2025-03-07 | SeeLe: A Unified Acceleration Framework for Real-Time Gaussian Splatting | Xiaotong Huang et.al. | 2503.05168 | null |
2025-03-07 | EvolvingGS: High-Fidelity Streamable Volumetric Video via Evolving 3D Gaussian Representation | Chao Zhang et.al. | 2503.05162 | null |
2025-03-07 | GaussianCAD: Robust Self-Supervised CAD Reconstruction from Three Orthographic Views Using 3D Gaussian Splatting | Zheng Zhou et.al. | 2503.05161 | null |
2025-03-06 | S2Gaussian: Sparse-View Super-Resolution 3D Gaussian Splatting | Yecong Wan et.al. | 2503.04314 | null |
2025-03-06 | Instrument-Splatting: Controllable Photorealistic Reconstruction of Surgical Instruments Using Gaussian Splatting | Shuojue Yang et.al. | 2503.04082 | null |
2025-03-06 | Beyond Existance: Fulfill 3D Reconstructed Scenes with Pseudo Details | Yifei Gao et.al. | 2503.04037 | null |
2025-03-06 | GaussianGraph: 3D Gaussian-based Scene Graph Generation for Open-world Scene Understanding | Xihan Wang et.al. | 2503.04034 | null |
2025-03-06 | GRaD-Nav: Efficiently Learning Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics | Qianzhong Chen et.al. | 2503.03984 | null |
2025-03-04 | 2DGS-Avatar: Animatable High-fidelity Clothed Avatar via 2D Gaussian Splatting | Qipeng Yan et.al. | 2503.02452 | null |
2025-03-04 | DQO-MAP: Dual Quadrics Multi-Object mapping with Gaussian Splatting | Haoyuan Li et.al. | 2503.02223 | link |
2025-03-03 | Morpheus: Text-Driven 3D Gaussian Splat Shape and Color Stylization | Jamie Wynn et.al. | 2503.02009 | null |
2025-03-03 | Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models | Jay Zhangjie Wu et.al. | 2503.01774 | null |
2025-03-03 | OpenGS-SLAM: Open-Set Dense Semantic SLAM with 3D Gaussian Splatting for Object-Level Scene Understanding | Dianyi Yang et.al. | 2503.01646 | null |
2025-03-03 | FGS-SLAM: Fourier-based Gaussian Splatting for Real-time SLAM with Sparse and Dense Map Fusion | Yansong Xu et.al. | 2503.01109 | null |
2025-03-02 | Evolving High-Quality Rendering and Reconstruction in a Unified Framework with Contribution-Adaptive Regularization | You Shen et.al. | 2503.00881 | null |
2025-03-02 | Vid2Fluid: 3D Dynamic Fluid Assets from Single-View Videos with Generative Gaussian Splatting | Zhiwei Zhao et.al. | 2503.00868 | null |
2025-03-02 | PSRGS:Progressive Spectral Residual of 3D Gaussian for High-Frequency Recovery | BoCheng Li et.al. | 2503.00848 | null |
2025-03-02 | DoF-Gaussian: Controllable Depth-of-Field for 3D Gaussian Splatting | Liao Shen et.al. | 2503.00746 | null |
2025-03-03 | FlexDrive: Toward Trajectory Flexibility in Driving Scene Reconstruction and Rendering | Jingqiu Zhou et.al. | 2502.21093 | null |
2025-02-28 | EndoPBR: Material and Lighting Estimation for Photorealistic Surgical Simulations via Physically-based Rendering | John J. Han et.al. | 2502.20669 | null |
2025-02-27 | No Parameters, No Problem: 3D Gaussian Splatting without Camera Intrinsics and Extrinsics | Dongbo Shi et.al. | 2502.19800 | null |
2025-02-27 | Open-Vocabulary Semantic Part Segmentation of 3D Human | Keito Suzuki et.al. | 2502.19782 | null |
2025-02-26 | Compression in 3D Gaussian Splatting: A Survey of Methods, Trends, and Future Directions | Muhammad Salman Ali et.al. | 2502.19457 | null |
2025-02-26 | Does 3D Gaussian Splatting Need Accurate Volumetric Rendering? | Adam Celarek et.al. | 2502.19318 | link |
2025-02-28 | OpenFly: A Versatile Toolchain and Large-scale Benchmark for Aerial Vision-Language Navigation | Yunpeng Gao et.al. | 2502.18041 | null |
2025-02-27 | UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting | Haoyuan Li et.al. | 2502.17860 | null |
2025-02-24 | Laplace-Beltrami Operator for Gaussian Splatting | Hongyu Zhou et.al. | 2502.17531 | null |
2025-02-24 | Graph-Guided Scene Reconstruction from Images with 3D Gaussian Splatting | Chong Cheng et.al. | 2502.17377 | null |
2025-02-24 | VR-Pipe: Streamlining Hardware Graphics Pipeline for Volume Rendering | Junseo Lee et.al. | 2502.17078 | null |
2025-02-23 | Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration | Kim Jun-Seong et.al. | 2502.16652 | null |
2025-02-23 | Dragen3D: Multiview Geometry Consistent 3D Gaussian Generation with Drag-Based Control | Jinbo Yan et.al. | 2502.16475 | null |
2025-02-21 | RGB-Only Gaussian Splatting SLAM for Unbounded Outdoor Scenes | Sicheng Yu et.al. | 2502.15633 | null |
2025-02-20 | GS-Cache: A GS-Cache Inference Framework for Large-scale Gaussian Splatting Models | Miao Tao et.al. | 2502.14938 | null |
2025-02-20 | Hier-SLAM++: Neuro-Symbolic Semantic SLAM with a Hierarchically Categorical Gaussian Splatting | Boying Li et.al. | 2502.14931 | null |
2025-02-20 | CDGS: Confidence-Aware Depth Regularization for 3D Gaussian Splatting | Qilin Zhang et.al. | 2502.14684 | link |
2025-02-20 | OG-Gaussian: Occupancy Based Street Gaussians for Autonomous Driving | Yedong Shen et.al. | 2502.14235 | null |
2025-02-19 | GlossGau: Efficient Inverse Rendering for Glossy Surface with Anisotropic Spherical Gaussian | Bang Du et.al. | 2502.14129 | null |
2025-02-19 | 3D Gaussian Splatting aided Localization for Large and Complex Indoor-Environments | Vincent Ress et.al. | 2502.13803 | null |
2025-02-18 | RadSplatter: Extending 3D Gaussian Splatting to Radio Frequencies for Wireless Radiomap Extrapolation | Yiheng Wang et.al. | 2502.12686 | null |
2025-02-17 | 3D Gaussian Inpainting with Depth-Guided Cross-View Consistency | Sheng-Yu Huang et.al. | 2502.11801 | null |
2025-02-17 | Exploring the Versal AI Engine for 3D Gaussian Splatting | Kotaro Shimamura et.al. | 2502.11782 | null |
2025-02-17 | GaussianMotion: End-to-End Learning of Animatable Gaussian Avatars with Pose Guidance from Text | Gyumin Shim et.al. | 2502.11642 | null |
2025-02-16 | OMG: Opacity Matters in Material Modeling with Gaussian Splatting | Silong Yong et.al. | 2502.10988 | null |
2025-02-16 | GS-GVINS: A Tightly-integrated GNSS-Visual-Inertial Navigation System Augmented by 3D Gaussian Splatting | Zelin Zhou et.al. | 2502.10975 | null |
2025-02-15 | E-3DGS: Event-Based Novel View Rendering of Large-Scale Scenes Using 3D Gaussian Splatting | Sohaib Zahid et.al. | 2502.10827 | null |
2025-02-13 | X-SG $^2$ S: Safe and Generalizable Gaussian Splatting with X-dimensional Watermarks | Zihang Cheng et.al. | 2502.10475 | null |
2025-02-12 | Interactive Holographic Visualization for 3D Facial Avatar | Tri Tung Nguyen Nguyen et.al. | 2502.08085 | null |
2025-02-11 | TranSplat: Surface Embedding-guided 3D Gaussian Splatting for Transparent Object Manipulation | Jeongyun Kim et.al. | 2502.07840 | link |
2025-02-11 | Flow Distillation Sampling: Regularizing 3D Gaussians with Pre-trained Matching Priors | Lin-Zhuo Chen et.al. | 2502.07615 | null |
2025-02-05 | GARAD-SLAM: 3D GAussian splatting for Real-time Anti Dynamic SLAM | Mingrui Li et.al. | 2502.03228 | null |
2025-02-05 | GP-GS: Gaussian Processes for Enhanced Gaussian Splatting | Zhihao Guo et.al. | 2502.02283 | link |
2025-02-04 | LAYOUTDREAMER: Physics-guided Layout for Text-to-3D Compositional Scene Generation | Yang Zhou et.al. | 2502.01949 | null |
2025-02-11 | UVGS: Reimagining Unstructured 3D Gaussian Splatting using UV Mapping | Aashish Rai et.al. | 2502.01846 | null |
2025-02-03 | Scalable 3D Gaussian Splatting-Based RF Signal Spatial Propagation Modeling | Kang Yang et.al. | 2502.01826 | null |
2025-02-03 | VR-Robo: A Real-to-Sim-to-Real Framework for Visual Robot Navigation and Locomotion | Shaoting Zhu et.al. | 2502.01536 | null |
2025-02-02 | EmoTalkingGaussian: Continuous Emotion-conditioned Talking Head Synthesis | Junuk Cha et.al. | 2502.00654 | null |
2025-01-31 | Lifting by Gaussians: A Simple, Fast and Flexible Method for 3D Instance Segmentation | Rohan Chacko et.al. | 2502.00173 | null |
2025-01-31 | Advancing Dense Endoscopic Reconstruction with Gaussian Splatting-driven Surface Normal-aware Tracking and Mapping | Yiming Huang et.al. | 2501.19319 | link |
2025-01-31 | RaySplats: Ray Tracing based Gaussian Splatting | Krzysztof Byrski et.al. | 2501.19196 | link |
2025-01-31 | JGHand: Joint-Driven Animatable Hand Avater via 3D Gaussian Splatting | Zhoutao Sun et.al. | 2501.19088 | null |
2025-01-30 | Drag Your Gaussian: Effective Drag-Based Editing with Score Distillation for 3D Gaussian Splatting | Yansong Qu et.al. | 2501.18672 | null |
2025-01-29 | 3D Reconstruction of Shoes for Augmented Reality | Pratik Shrestha et.al. | 2501.18643 | null |
2025-01-31 | VoD-3DGS: View-opacity-Dependent 3D Gaussian Splatting | Mateusz Nowak et.al. | 2501.17978 | null |
2025-01-29 | CrowdSplat: Exploring Gaussian Splatting For Crowd Rendering | Xiaohan Sun et.al. | 2501.17792 | link |
2025-01-29 | FeatureGS: Eigenvalue-Feature Optimization in 3D Gaussian Splatting for Geometrically Accurate and Artifact-Reduced Reconstruction | Miriam Jäger et.al. | 2501.17655 | null |
2025-01-28 | Evaluating CrowdSplat: Perceived Level of Detail for Gaussian Crowds | Xiaohan Sun et.al. | 2501.17085 | null |
2025-01-28 | DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation | Chenguo Lin et.al. | 2501.16764 | null |
2025-01-25 | Towards Better Robustness: Progressively Joint Pose-3DGS Learning for Arbitrarily Long Videos | Zhen-Hui Dong et.al. | 2501.15096 | null |
2025-01-25 | HuGDiffusion: Generalizable Single-Image Human Rendering via 3D Gaussian Diffusion | Yingzhi Tang et.al. | 2501.15008 | null |
2025-01-24 | HAMMER: Heterogeneous, Multi-Robot Semantic Gaussian Splatting | Javier Yu et.al. | 2501.14147 | null |
2025-01-27 | 3DGS $^2$ : Near Second-order Converging 3D Gaussian Splatting | Lei Lan et.al. | 2501.13975 | null |
2025-01-23 | GoDe: Gaussians on Demand for Progressive Level of Detail and Scalable Compression | Francesco Di Sario et.al. | 2501.13558 | null |
2025-01-23 | MultiDreamer3D: Multi-concept 3D Customization with Concept-Aware Diffusion Guidance | Wooseok Song et.al. | 2501.13449 | null |
2025-01-23 | GeomGS: LiDAR-Guided Geometry-Aware Gaussian Splatting for Robot Localization | Jaewon Lee et.al. | 2501.13417 | null |
2025-01-23 | VIGS SLAM: IMU-based Large-Scale 3D Gaussian Splatting SLAM | Gyuhyeon Pak et.al. | 2501.13402 | null |
2025-01-23 | Deblur-Avatar: Animatable Avatars from Motion-Blurred Monocular Videos | Xianrui Luo et.al. | 2501.13335 | null |
2025-01-22 | Sketch and Patch: Efficient 3D Gaussian Representation for Man-Made Scenes | Yuang Shi et.al. | 2501.13045 | null |
2025-01-21 | DARB-Splatting: Generalizing Splatting with Decaying Anisotropic Radial Basis Functions | Vishagar Arunan et.al. | 2501.12369 | null |
2025-01-22 | HAC++: Towards 100X Compression of 3D Gaussian Splatting | Yihang Chen et.al. | 2501.12255 | link |
2025-01-22 | GSVC: Efficient Video Representation and Compression Through 2D Gaussian Splatting | Longan Wang et.al. | 2501.12060 | null |
2025-01-20 | See In Detail: Enhancing Sparse-view 3D Gaussian Splatting with Local Depth and Semantic Regularization | Zongqi He et.al. | 2501.11508 | null |
2025-01-19 | RDG-GS: Relative Depth Guidance with Gaussian Splatting for Real-time Sparse-View 3D Rendering | Chenlu Zhan et.al. | 2501.11102 | null |
2025-01-15 | BloomScene: Lightweight Structured 3D Gaussian Splatting for Crossmodal Scene Generation | Xiaolu Hou et.al. | 2501.10462 | link |
2025-01-20 | GSTAR: Gaussian Surface Tracking and Reconstruction | Chengwei Zheng et.al. | 2501.10283 | null |
2025-01-16 | Creating Virtual Environments with 3D Gaussian Splatting: A Comparative Study | Shi Qiu et.al. | 2501.09302 | null |
2025-01-15 | CityLoc: 6 DoF Localization of Text Descriptions in Large-Scale Scenes with Gaussian Representation | Qi Ma et.al. | 2501.08982 | null |
2025-01-15 | GS-LIVO: Real-Time LiDAR, Inertial, and Visual Multi-sensor Fused Odometry with Gaussian Mapping | Sheng Hong et.al. | 2501.08672 | null |
2025-01-14 | 3D Gaussian Splatting with Normal Information for Mesh Extraction and Improved Rendering | Meenakshi Krishnan et.al. | 2501.08370 | null |
2025-01-13 | UnCommon Objects in 3D | Xingchen Liu et.al. | 2501.07574 | link |
2025-01-13 | 3DGS-to-PC: Convert a 3D Gaussian Splatting Scene into a Dense Point Cloud or Mesh | Lewis A G Stuart et.al. | 2501.07478 | link |
2025-01-14 | SplatMAP: Online Dense Monocular SLAM with 3D Gaussian Splatting | Yue Hu et.al. | 2501.07015 | null |
2025-01-12 | Synthetic Prior for Few-Shot Drivable Head Avatar Inversion | Wojciech Zielonka et.al. | 2501.06903 | null |
2025-01-12 | ActiveGAMER: Active GAussian Mapping through Efficient Rendering | Liyan Chen et.al. | 2501.06897 | null |
2025-01-11 | NVS-SQA: Exploring Self-Supervised Quality Representation Learning for Neurally Synthesized Scenes without References | Qiang Qu et.al. | 2501.06488 | link |
2025-01-10 | Locality-aware Gaussian Compression for Fast and High-quality Rendering | Seungjoo Shin et.al. | 2501.05757 | null |
2025-01-13 | Arc2Avatar: Generating Expressive 3D Avatars from a Single Image via ID Guidance | Dimitrios Gerogiannis et.al. | 2501.05379 | null |
2025-01-09 | Scaffold-SLAM: Structured 3D Gaussians for Simultaneous Localization and Photorealistic Mapping | Wen Tianci et.al. | 2501.05242 | null |
2025-01-08 | GaussianVideo: Efficient Video Representation via Hierarchical Gaussian Splatting | Andrew Bond et.al. | 2501.04782 | null |
2025-01-07 | MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting | Sangwoon Kwak et.al. | 2501.03714 | null |
2025-01-07 | DehazeGS: Seeing Through Fog with 3D Gaussian Splatting | Jinze Yu et.al. | 2501.03659 | null |
2025-01-07 | ConcealGS: Concealing Invisible Copyright Information in 3D Gaussian Splatting | Yifeng Yang et.al. | 2501.03605 | link |
2025-01-06 | Compression of 3D Gaussian Splatting with Optimized Feature Planes and Standard Video Codecs | Soonbin Lee et.al. | 2501.03399 | null |
2025-01-06 | HOGSA: Bimanual Hand-Object Interaction Understanding with 3D Gaussian Splatting Based Data Augmentation | Wentian Qu et.al. | 2501.02845 | null |
2025-01-03 | Cloth-Splatting: 3D Cloth State Estimation from RGB Supervision | Alberta Longhini et.al. | 2501.01715 | null |
2025-01-03 | CrossView-GS: Cross-view Gaussian Splatting For Large-scale Scene Reconstruction | Chenhao Zhang et.al. | 2501.01695 | null |
2025-01-03 | PG-SAG: Parallel Gaussian Splatting for Fine-Grained Large-Scale Urban Buildings Reconstruction via Semantic-Aware Grouping | Tengfei Wang et.al. | 2501.01677 | link |
2025-01-02 | Deformable Gaussian Splatting for Efficient and High-Fidelity Reconstruction of Surgical Scenes | Jiwei Shan et.al. | 2501.01101 | null |
2025-01-02 | EasySplat: View-Adaptive Learning makes 3D Gaussian Splatting Easy | Ao Gao et.al. | 2501.01003 | null |
2024-12-31 | PanoSLAM: Panoptic 3D Scene Reconstruction via Gaussian SLAM | Runnan Chen et.al. | 2501.00352 | null |
2024-12-31 | SG-Splatting: Accelerating 3D Gaussian Splatting with Spherical Gaussians | Yiwen Wang et.al. | 2501.00342 | null |
2024-12-30 | PERSE: Personalized 3D Generative Avatars from A Single Portrait | Hyunsoo Cha et.al. | 2412.21206 | null |
2024-12-30 | KeyGS: A Keyframe-Centric Gaussian Splatting Method for Monocular Image Sequences | Keng-Wei Chang et.al. | 2412.20767 | null |
2024-12-29 | MaskGaussian: Adaptive 3D Gaussian Representation from Probabilistic Masks | Yifei Liu et.al. | 2412.20522 | link |
2024-12-28 | DEGSTalk: Decomposed Per-Embedding Gaussian Fields for Hair-Preserving Talking Face Synthesis | Kaijun Deng et.al. | 2412.20148 | link |
2024-12-28 | GSplatLoc: Ultra-Precise Camera Localization via 3D Gaussian Splatting | Atticus J. Zeller et.al. | 2412.20056 | link |
2024-12-27 | Dust to Tower: Coarse-to-Fine Photo-Realistic Scene Reconstruction from Sparse Uncalibrated Images | Xudong Cai et.al. | 2412.19518 | null |
2024-12-27 | Learning Radiance Fields from a Single Snapshot Compressive Image | Yunhao Li et.al. | 2412.19483 | null |
2024-12-26 | BeSplat – Gaussian Splatting from a Single Blurry Image and Event Stream | Gopi Raju Matta et.al. | 2412.19370 | link |
2024-12-26 | Generating Editable Head Avatars with 3D Gaussian GANs | Guohao Li et.al. | 2412.19149 | link |
2024-12-26 | CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting | Siyu Jiao et.al. | 2412.19142 | null |
2024-12-26 | MVS-GS: High-Quality 3D Gaussian Splatting Mapping via Online Multi-View Stereo | Byeonggwon Lee et.al. | 2412.19130 | null |
2024-12-25 | WeatherGS: 3D Scene Reconstruction in Adverse Weather Conditions via Gaussian Splatting | Chenghao Qian et.al. | 2412.18862 | link |
2024-12-25 | GSAVS: Gaussian Splatting-based Autonomous Vehicle Simulator | Rami Wilson et.al. | 2412.18816 | null |
2024-12-25 | ArtNVG: Content-Style Separated Artistic Neighboring-View Gaussian Stylization | Zixiao Gu et.al. | 2412.18783 | null |
2024-12-24 | RSGaussian:3D Gaussian Splatting with LiDAR for Aerial Remote Sensing Novel View Synthesis | Yiling Yao et.al. | 2412.18380 | null |
2024-12-23 | GaussianPainter: Painting Point Cloud into 3D Gaussians with Normal Guidance | Jingqiu Zhou et.al. | 2412.17715 | null |
2024-12-23 | CoSurfGS:Collaborative 3D Surface Gaussian Splatting with Distributed Learning for Large Scene Reconstruction | Yuanyuan Gao et.al. | 2412.17612 | null |
2024-12-23 | Balanced 3DGS: Gaussian-wise Parallelism Rendering with Fine-Grained Tiling | Hao Gui et.al. | 2412.17378 | null |
2024-12-22 | GSemSplat: Generalizable Semantic 3D Gaussian Splatting from Uncalibrated Image Pairs | Xingrui Wang et.al. | 2412.16932 | link |
2024-12-22 | GeoTexDensifier: Geometry-Texture-Aware Densification for High-Quality Photorealistic 3D Gaussian Splatting | Hanqing Jiang et.al. | 2412.16809 | null |
2024-12-21 | Topology-Aware 3D Gaussian Splatting: Leveraging Persistent Homology for Optimized Structural Integrity | Tianqi Shen et.al. | 2412.16619 | link |
2024-12-21 | OmniSplat: Taming Feed-Forward 3D Gaussian Splatting for Omnidirectional Images with Editable Capabilities | Suyoung Lee et.al. | 2412.16604 | null |
2024-12-20 | Interactive Scene Authoring with Specialized Generative Primitives | Clément Jambon et.al. | 2412.16253 | null |
2024-12-20 | CoCoGaussian: Leveraging Circle of Confusion for Gaussian Splatting from Defocused Images | Jungho Lee et.al. | 2412.16028 | null |
2024-12-20 | AvatarPerfect: User-Assisted 3D Gaussian Splatting Avatar Refinement with Automatic Pose Suggestion | Jotaro Sakamiya et.al. | 2412.15609 | null |
2024-12-20 | EGSRAL: An Enhanced 3D Gaussian Splatting based Renderer with Automated Labeling for Large-Scale Driving Scene | Yixiong Huo et.al. | 2412.15550 | link |
2024-12-19 | GSRender: Deduplicated Occupancy Prediction via Weakly Supervised 3D Gaussian Splatting | Qianpu Sun et.al. | 2412.14579 | null |
2024-12-19 | Improving Geometry in Sparse-View 3DGS via Reprojection-based DoF Separation | Yongsung Kim et.al. | 2412.14568 | null |
2024-12-18 | GraphAvatar: Compact Head Avatars with GNN-Generated 3D Gaussians | Xiaobao Wei et.al. | 2412.13983 | link |
2024-12-18 | GAGS: Granularity-Aware Feature Distillation for Language Gaussian Splatting | Yuning Peng et.al. | 2412.13654 | null |
2024-12-18 | 4D Radar-Inertial Odometry based on Gaussian Modeling and Multi-Hypothesis Scan Matching | Fernando Amodeo et.al. | 2412.13639 | link |
2024-12-18 | Turbo-GS: Accelerating 3D Gaussian Fitting for High-Quality Radiance Fields | Tao Lu et.al. | 2412.13547 | null |
2024-12-18 | Vivar: A Generative AR System for Intuitive Multi-Modal Sensor Data Presentation | Yunqi Guo et.al. | 2412.13509 | null |
2024-12-17 | CATSplat: Context-Aware Transformer with Spatial Guidance for Generalizable 3D Gaussian Splatting from A Single-View Image | Wonseok Roh et.al. | 2412.12906 | null |
2024-12-17 | HyperGS: Hyperspectral 3D Gaussian Splatting | Christopher Thirgood et.al. | 2412.12849 | null |
2024-12-17 | 3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting | Qi Wu et.al. | 2412.12507 | link |
2024-12-16 | Wonderland: Navigating 3D Scenes from a Single Image | Hanwen Liang et.al. | 2412.12091 | null |
2024-12-16 | SweepEvGS: Event-Based 3D Gaussian Splatting for Macro and Micro Radiance Field Rendering from a Single Sweep | Jingqian Wu et.al. | 2412.11579 | null |
2024-12-16 | EditSplat: Multi-View Fusion and Attention-Guided Optimization for View-Consistent 3D Scene Editing with 3D Gaussian Splatting | Dong In Lee et.al. | 2412.11520 | null |
2024-12-14 | DCSEG: Decoupled 3D Open-Set Segmentation using Gaussian Splatting | Luis Wiedmann et.al. | 2412.10972 | link |
2024-12-13 | SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians | Siyun Liang et.al. | 2412.10231 | null |
2024-12-18 | SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video | Jongmin Park et.al. | 2412.09982 | null |
2024-12-13 | RP-SLAM: Real-time Photorealistic SLAM with Efficient 3D Gaussian Splatting | Lizhi Bai et.al. | 2412.09868 | null |
2024-12-12 | PBR-NeRF: Inverse Rendering with Physics-Based Neural Fields | Sean Wu et.al. | 2412.09680 | link |
2024-12-12 | LiftImage3D: Lifting Any Single Image to 3D Gaussians with Video Generation Priors | Yabo Chen et.al. | 2412.09597 | null |
2024-12-12 | LIVE-GS: LLM Powers Interactive VR by Enhancing Gaussian Splatting | Haotian Mao et.al. | 2412.09176 | null |
2024-12-10 | Proc-GS: Procedural Building Generation for City Assembly with 3D Gaussians | Yixuan Li et.al. | 2412.07660 | null |
2024-12-10 | Faster and Better 3D Splatting via Group Training | Chengbo Wang et.al. | 2412.07608 | null |
2024-12-10 | ResGS: Residual Densification of 3D Gaussian for Efficient Detail Recovery | Yanzhe Lyu et.al. | 2412.07494 | null |
2024-12-10 | EventSplat: 3D Gaussian Splatting from Moving Event Cameras for Real-time Rendering | Toshiya Yura et.al. | 2412.07293 | null |
2024-12-09 | Deblur4DGS: 4D Gaussian Splatting from Blurry Monocular Video | Renlong Wu et.al. | 2412.06424 | link |
2024-12-09 | 4D Gaussian Splatting with Scale-aware Residual Field and Adaptive Optimization for Real-time Rendering of Temporally Complex Dynamic Scenes | Jinbo Yan et.al. | 2412.06299 | null |
2024-12-12 | Advancing Extended Reality with 3D Gaussian Splatting: Innovations and Prospects | Shi Qiu et.al. | 2412.06257 | null |
2024-12-09 | Splatter-360: Generalizable 360 $^{\circ}$ Gaussian Splatting for Wide-baseline Panoramic Images | Zheng Chen et.al. | 2412.06250 | link |
2024-12-09 | Generative Densification: Learning to Densify Gaussians for High-Fidelity Generalizable 3D Reconstruction | Seungtae Nam et.al. | 2412.06234 | null |
2024-12-07 | Temporally Compressed 3D Gaussian Splatting for Dynamic Scenes | Saqib Javed et.al. | 2412.05700 | null |
2024-12-07 | WATER-GS: Toward Copyright Protection for 3D Gaussian Splatting via Universal Watermarking | Yuqi Tan et.al. | 2412.05695 | null |
2024-12-07 | Template-free Articulated Gaussian Splatting for Real-time Reposable Dynamic View Synthesis | Diwen Wan et.al. | 2412.05570 | null |
2024-12-07 | Text-to-3D Gaussian Splatting with Physics-Grounded Motion Generation | Wenqing Wang et.al. | 2412.05560 | null |
2024-12-07 | Radiant: Large-scale 3D Gaussian Rendering based on Hierarchical Framework | Haosong Peng et.al. | 2412.05546 | null |
2024-12-06 | Extrapolated Urban View Synthesis Benchmark | Xiangyu Han et.al. | 2412.05256 | link |
2024-12-06 | MixedGaussianAvatar: Realistically and Geometrically Accurate Head Avatar via Mixed 2D-3D Gaussian Splatting | Peng Chen et.al. | 2412.04955 | link |
2024-12-06 | Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction | Jixuan Fan et.al. | 2412.04887 | link |
2024-12-06 | WRF-GS: Wireless Radiation Field Reconstruction with 3D Gaussian Splatting | Chaozheng Wen et.al. | 2412.04832 | link |
2024-12-06 | Pushing Rendering Boundaries: Hard Gaussian Splatting | Qingshan Xu et.al. | 2412.04826 | null |
2024-12-05 | QUEEN: QUantized Efficient ENcoding of Dynamic Gaussians for Streaming Free-viewpoint Videos | Sharath Girish et.al. | 2412.04469 | null |
2024-12-06 | PBDyG: Position Based Dynamic Gaussians for Motion-Aware Clothed Human Avatars | Shota Sasaki et.al. | 2412.04433 | null |
2024-12-05 | Multi-View Pose-Agnostic Change Localization with Zero Labels | Chamuditha Jayanga Galappaththige et.al. | 2412.03911 | link |
2024-12-05 | HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian Splatting | Jingyu Lin et.al. | 2412.03844 | link |
2024-12-04 | Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos | Hanxue Liang et.al. | 2412.03526 | null |
2024-12-04 | 2DGS-Room: Seed-Guided 2D Gaussian Splatting with Geometric Constrains for High-Fidelity Indoor Scene Reconstruction | Wanting Zhang et.al. | 2412.03428 | null |
2024-12-04 | Volumetrically Consistent 3D Gaussian Rasterization | Chinmay Talegaonkar et.al. | 2412.03378 | link |
2024-12-04 | SGSST: Scaling Gaussian Splatting StyleTransfer | Bruno Galerne et.al. | 2412.03371 | link |
2024-12-04 | Splats in Splats: Embedding Invisible 3D Watermark within Gaussian Splatting | Yijia Guo et.al. | 2412.03121 | null |
2024-12-03 | Gaussian Splatting Under Attack: Investigating Adversarial Noise in 3D Objects | Abdurrahman Zeybey et.al. | 2412.02803 | null |
2024-12-03 | RelayGS: Reconstructing Dynamic Scenes with Large-Scale and Complex Motions via Relay Gaussians | Qiankun Gao et.al. | 2412.02493 | link |
2024-12-03 | GSGTrack: Gaussian Splatting-Guided Object Pose Tracking from RGB Videos | Zhiyuan Chen et.al. | 2412.02267 | null |
2024-12-03 | Multi-robot autonomous 3D reconstruction using Gaussian splatting with Semantic guidance | Jing Zeng et.al. | 2412.02249 | null |
2024-12-03 | How to Use Diffusion Priors under Sparse Views? | Qisen Wang et.al. | 2412.02225 | link |
2024-12-03 | SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from Sparse Multi-View RGB Images | Junqiu Yu et.al. | 2412.02140 | null |
2024-12-03 | Gaussian Object Carver: Object-Compositional Gaussian Splatting with surfaces completion | Liu Liu et.al. | 2412.02075 | link |
2024-12-02 | Occam’s LGS: A Simple Approach for Language Gaussian Splatting | Jiahuan Cheng et.al. | 2412.01807 | null |
2024-12-02 | CTRL-D: Controllable Dynamic 3D Scene Editing with Personalized 2D Diffusion | Kai He et.al. | 2412.01792 | null |
2024-12-02 | Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes | Lihan Jiang et.al. | 2412.01745 | null |
2024-12-02 | HUGSIM: A Real-Time, Photo-Realistic and Closed-Loop Simulator for Autonomous Driving | Hongyu Zhou et.al. | 2412.01718 | null |
2024-12-02 | GuardSplat: Efficient and Robust Watermarking for 3D Gaussian Splatting | Zixuan Chen et.al. | 2411.19895 | link |
2024-11-29 | TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting | Bojun Xiong et.al. | 2411.19654 | link |
2024-11-29 | Tortho-Gaussian: Splatting True Digital Orthophoto Maps | Xin Wang et.al. | 2411.19594 | null |
2024-11-29 | Gaussian Splashing: Direct Volumetric Rendering Underwater | Nir Mualem et.al. | 2411.19588 | null |
2024-11-29 | Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding | Wenbo Zhang et.al. | 2411.19551 | link |
2024-12-02 | GausSurf: Geometry-Guided 3D Gaussian Splatting for Surface Reconstruction | Jiepeng Wang et.al. | 2411.19454 | null |
2024-11-29 | RF-3DGS: Wireless Channel Modeling with Radio Radiance Field and 3D Gaussian Splatting | Lihao Zhang et.al. | 2411.19420 | link |
2024-11-28 | InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception | Haijie Li et.al. | 2411.19235 | null |
2024-11-28 | Gaussians-to-Life: Text-Driven Animation of 3D Gaussian Splatting Scenes | Thomas Wimmer et.al. | 2411.19233 | link |
2024-11-28 | RIGI: Rectifying Image-to-3D Generation Inconsistency via Uncertainty-aware Learning | Jiacheng Wang et.al. | 2411.18866 | null |
2024-11-27 | Textured Gaussians for Enhanced 3D Scene Appearance Modeling | Brian Chao et.al. | 2411.18625 | null |
2024-11-27 | PhyCAGE: Physically Plausible Compositional 3D Asset Generation from a Single Image | Han Yan et.al. | 2411.18548 | null |
2024-11-27 | HEMGS: A Hybrid Entropy Model for 3D Gaussian Splatting Data Compression | Lei Liu et.al. | 2411.18473 | null |
2024-11-27 | Neural Surface Priors for Editable Gaussian Splatting | Jakub Szymkowiak et.al. | 2411.18311 | link |
2024-11-27 | Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters | Zhiyang Guo et.al. | 2411.18197 | null |
2024-11-27 | GLS: Geometry-aware 3D Language Gaussian Splatting | Jiaxiong Qiu et.al. | 2411.18066 | link |
2024-11-27 | HI-SLAM2: Geometry-Aware Gaussian SLAM for Fast Monocular Scene Reconstruction | Wei Zhang et.al. | 2411.17982 | link |
2024-11-26 | DROID-Splat: Combining end-to-end SLAM with 3D Gaussian Splatting | Christian Homeyer et.al. | 2411.17660 | link |
2024-11-26 | Distractor-free Generalizable 3D Gaussian Splatting | Yanqi Bao et.al. | 2411.17605 | link |
2024-11-28 | SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting | Gyeongjin Kang et.al. | 2411.17190 | null |
2024-11-25 | G2SDF: Surface Reconstruction from Explicit Gaussians with Implicit SDFs | Kunyi Li et.al. | 2411.16898 | null |
2024-11-25 | PreF3R: Pose-Free Feed-Forward 3D Gaussian Splatting from Variable-length Image Sequence | Zequn Chen et.al. | 2411.16877 | null |
2024-11-25 | SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving | Georg Hess et.al. | 2411.16816 | link |
2024-11-25 | SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis | Hyojun Go et.al. | 2411.16443 | link |
2024-11-25 | Quadratic Gaussian Splatting for Efficient and Detailed Surface Reconstruction | Ziyu Zhang et.al. | 2411.16392 | null |
2024-11-25 | Event-boosted Deformable 3D Gaussians for Fast Dynamic Scene Reconstruction | Wenhao Xu et.al. | 2411.16180 | null |
2024-11-24 | ZeroGS: Training 3D Gaussian Splatting from Unposed Images | Yu Chen et.al. | 2411.15779 | null |
2024-11-24 | GSurf: 3D Reconstruction via Signed Distance Fields with Direct Gaussian Supervision | Xu Baixin et.al. | 2411.15723 | link |
2024-11-23 | Gassidy: Gaussian Splatting SLAM in Dynamic Environments | Long Wen et.al. | 2411.15476 | null |
2024-11-23 | SplatSDF: Boosting Neural Implicit SDF via Gaussian Splatting Fusion | Runfa Blark Li et.al. | 2411.15468 | null |
2024-11-22 | UniGaussian: Driving Scene Reconstruction from Multiple Camera Models via Unified Gaussian Representations | Yuan Ren et.al. | 2411.15355 | null |
2024-11-22 | 3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes | Jan Held et.al. | 2411.14974 | link |
2024-11-22 | Dynamics-Aware Gaussian Splatting Streaming Towards Fast On-the-Fly Training for 4D Reconstruction | Zhening Liu et.al. | 2411.14847 | null |
2024-11-22 | VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving | Haiming Zhang et.al. | 2411.14716 | null |
2024-11-21 | NexusSplats: Efficient 3D Gaussian Splatting in the Wild | Yuzhou Tang et.al. | 2411.14514 | null |
2024-11-21 | Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation | Zhuoman Liu et.al. | 2411.14423 | null |
2024-11-21 | SplatR : Experience Goal Visual Rearrangement with 3D Gaussian Splatting and Dense Feature Matching | Arjun P S et.al. | 2411.14322 | link |
2024-11-20 | Generating 3D-Consistent Videos from Unposed Internet Photos | Gene Chou et.al. | 2411.13549 | null |
2024-11-20 | GazeGaussian: High-Fidelity Gaze Redirection with 3D Gaussian Splatting | Xiaobao Wei et.al. | 2411.12981 | null |
2024-11-19 | PR-ENDO: Physically Based Relightable Gaussian Splatting for Endoscopy | Joanna Kaleta et.al. | 2411.12510 | link |
2024-11-19 | SCIGS: 3D Gaussians Splatting from a Snapshot Compressive Image | Zixu Wang et.al. | 2411.12471 | null |
2024-11-20 | Beyond Gaussians: Fast and High-Fidelity 3D Splatting with Linear Kernels | Haodong Chen et.al. | 2411.12440 | null |
2024-11-19 | LiV-GS: LiDAR-Vision Integration for 3D Gaussian Splatting SLAM in Outdoor Environments | Renxiang Xiao et.al. | 2411.12185 | null |
2024-11-19 | Sketch-guided Cage-based 3D Gaussian Splatting Deformation | Tianhao Xie et.al. | 2411.12168 | null |
2024-11-21 | FruitNinja: 3D Object Interior Texture Generation with Gaussian Splatting | Fangyu Wu et.al. | 2411.12089 | null |
2024-11-18 | TimeFormer: Capturing Temporal Relationships of Deformable 3D Gaussians for Robust Reconstruction | DaDong Jiang et.al. | 2411.11941 | null |
2024-11-18 | DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes | Chensheng Peng et.al. | 2411.11921 | link |
2024-11-18 | RoboGSim: A Real2Sim2Real Robotic Gaussian Splatting Simulator | Xinhai Li et.al. | 2411.11839 | null |
2024-11-18 | GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views | Boyao Zhou et.al. | 2411.11363 | null |
2024-11-17 | VeGaS: Video Gaussian Splatting | Weronika Smolak-Dyżewska et.al. | 2411.11024 | link |
2024-11-15 | The Oxford Spires Dataset: Benchmarking Large-Scale LiDAR-Visual Localisation, Reconstruction and Radiance Field Methods | Yifu Tao et.al. | 2411.10546 | null |
2024-11-15 | USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting | Kang Chen et.al. | 2411.10504 | link |
2024-11-15 | Efficient Density Control for 3D Gaussian Splatting | Xiaobin Deng et.al. | 2411.10133 | link |
2024-11-15 | GSEditPro: 3D Gaussian Splatting Editing with Attention-based Progressive Localization | Yanhao Sun et.al. | 2411.10033 | null |
2024-11-15 | GGAvatar: Reconstructing Garment-Separated 3D Gaussian Splatting Avatars from Monocular Video | Jingxuan Chen et.al. | 2411.09952 | link |
2024-11-14 | Adversarial Attacks Using Differentiable Rendering: A Survey | Matthew Hull et.al. | 2411.09749 | null |
2024-11-14 | DyGASR: Dynamic Generalized Exponential Splatting with Surface Alignment for Accelerated 3D Mesh Reconstruction | Shengchao Zhao et.al. | 2411.09156 | null |
2024-11-13 | Towards More Accurate Fake Detection on Images Generated from Advanced Generative and Neural Rendering Models | Chengdong Dong et.al. | 2411.08642 | null |
2024-11-13 | Biomass phenotyping of oilseed rape through UAV multi-view oblique imaging with 3DGS and SAM model | Yutao Shen et.al. | 2411.08453 | null |
2024-11-13 | MBA-SLAM: Motion Blur Aware Dense Visual SLAM with Radiance Fields Representation | Peng Wang et.al. | 2411.08279 | link |
2024-11-14 | Projecting Gaussian Ellipsoids While Avoiding Affine Projection Approximation | Han Qi et.al. | 2411.07579 | null |
2024-11-12 | GaussianCut: Interactive segmentation via graph cut for 3D Gaussian Splatting | Umangi Jain et.al. | 2411.07555 | null |
2024-11-12 | HiCoM: Hierarchical Coherent Motion for Streamable Dynamic Scene with 3D Gaussian Splatting | Qiankun Gao et.al. | 2411.07541 | link |
2024-11-12 | GUS-IR: Gaussian Splatting with Unified Shading for Inverse Rendering | Zhihao Liang et.al. | 2411.07478 | null |
2024-11-11 | A Hierarchical Compression Technique for 3D Gaussian Splatting Compression | He Huang et.al. | 2411.06976 | null |
2024-11-10 | Adaptive and Temporally Consistent Gaussian Surfels for Multi-view Dynamic Reconstruction | Decai Chen et.al. | 2411.06602 | null |
2024-11-12 | SplatFormer: Point Transformer for Robust 3D Gaussian Splatting | Yutong Chen et.al. | 2411.06390 | link |
2024-11-10 | Through the Curved Cover: Synthesizing Cover Aberrated Scenes with Refractive Field | Liuyue Xie et.al. | 2411.06365 | null |
2024-11-09 | AI-Driven Stylization of 3D Environments | Yuanbo Chen et.al. | 2411.06067 | null |
2024-11-09 | GaussianSpa: An “Optimizing-Sparsifying” Simplification Framework for Compact and High-Quality 3D Gaussian Splatting | Yangming Zhang et.al. | 2411.06019 | null |
2024-11-07 | ProEdit: Simple Progression is All You Need for High-Quality 3D Scene Editing | Jun-Kun Chen et.al. | 2411.05006 | null |
2024-11-07 | MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views | Yuedong Chen et.al. | 2411.04924 | link |
2024-11-08 | GS2Pose: Two-stage 6D Object Pose Estimation Guided by Gaussian Splatting | Jilan Mei et.al. | 2411.03807 | null |
2024-11-06 | 3DGS-CD: 3D Gaussian Splatting-based Change Detection for Physical Object Rearrangement | Ziqi Lu et.al. | 2411.03706 | link |
2024-11-06 | Structure Consistent Gaussian Splatting with Matching Prior for Few-shot Novel View Synthesis | Rui Peng et.al. | 2411.03637 | link |
2024-11-05 | Object and Contact Point Tracking in Demonstrations Using 3D Gaussian Splatting | Michael Büttner et.al. | 2411.03555 | null |
2024-11-05 | HFGaussian: Learning Generalizable Gaussian Human with Integrated Human Features | Arnab Dey et.al. | 2411.03086 | null |
2024-11-05 | LVI-GS: Tightly-coupled LiDAR-Visual-Inertial SLAM using 3D Gaussian Splatting | Huibin Zhao et.al. | 2411.02703 | null |
2024-11-04 | Modeling Uncertainty in 3D Gaussian Splatting through Continuous Semantic Splatting | Joey Wilson et.al. | 2411.02547 | null |
2024-11-06 | SplatOverflow: Asynchronous Hardware Troubleshooting | Amritansh Kwatra et.al. | 2411.02332 | null |
2024-11-05 | FewViewGS: Gaussian Splatting with Few View Matching and Multi-stage Training | Ruihong Yin et.al. | 2411.02229 | null |
2024-11-06 | GVKF: Gaussian Voxel Kernel Functions for Highly Efficient Surface Reconstruction in Open Scenes | Gaochao Song et.al. | 2411.01853 | null |
2024-11-01 | CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes | Yang Liu et.al. | 2411.00771 | null |
2024-10-31 | Aquatic-GS: A Hybrid 3D Representation for Underwater Scenes | Shaohua Liu et.al. | 2411.00239 | null |
2024-10-31 | Self-Ensembling Gaussian Splatting for Few-shot Novel View Synthesis | Chen Zhao et.al. | 2411.00144 | link |
2024-10-31 | No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images | Botao Ye et.al. | 2410.24207 | link |
2024-11-01 | GeoSplatting: Towards Geometry Guided Gaussian Splatting for Physically-based Inverse Rendering | Kai Ye et.al. | 2410.24204 | null |
2024-10-31 | GaussianMarker: Uncertainty-Aware Copyright Protection of 3D Gaussian Splatting | Xiufeng Huang et.al. | 2410.23718 | null |
2024-10-31 | GS-Blur: A 3D Scene-Based Dataset for Realistic Image Deblurring | Dongwoo Lee et.al. | 2410.23658 | link |
2024-10-30 | ELMGS: Enhancing memory and computation scaLability through coMpression for 3D Gaussian Splatting | Muhammad Salman Ali et.al. | 2410.23213 | null |
2024-10-31 | Epipolar-Free 3D Gaussian Splatting for Generalizable Novel View Synthesis | Zhiyuan Min et.al. | 2410.22817 | null |
2024-10-29 | PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting | Sunghwan Hong et.al. | 2410.22128 | link |
2024-10-29 | FreeGaussian: Guidance-free Controllable 3D Gaussian Splats with Flow Derivatives | Qizhi Chen et.al. | 2410.22070 | null |
2024-10-28 | CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians | Chongjian Ge et.al. | 2410.20723 | null |
2024-10-28 | ODGS: 3D Scene Reconstruction from Omnidirectional Images with 3D Gaussian Splattings | Suyoung Lee et.al. | 2410.20686 | link |
2024-10-27 | Normal-GS: 3D Gaussian Splatting with Normal-Involved Rendering | Meng Wei et.al. | 2410.20593 | null |
2024-10-30 | DiffGS: Functional Gaussian Splatting Diffusion | Junsheng Zhou et.al. | 2410.19657 | null |
2024-10-25 | Robotic Learning in your Backyard: A Neural Simulator from Open Source Components | Liyou Zhou et.al. | 2410.19564 | link |
2024-10-25 | Content-Aware Radiance Fields: Aligning Model Complexity with Scene Intricacy Through Learned Bitwidth Quantization | Weihang Liu et.al. | 2410.19483 | link |
2024-10-24 | Sort-free Gaussian Splatting via Weighted Sum Rendering | Qiqi Hou et.al. | 2410.18931 | null |
2024-10-24 | Dynamic 3D Gaussian Tracking for Graph-Based Neural Dynamics Modeling | Mingtong Zhang et.al. | 2410.18912 | null |
2024-10-27 | Binocular-Guided 3D Gaussian Splatting with View Consistency for Sparse View Synthesis | Liang Han et.al. | 2410.18822 | null |
2024-10-23 | VR-Splatting: Foveated Radiance Field Rendering via 3D Gaussian Splatting and Neural Points | Linus Franke et.al. | 2410.17932 | null |
2024-10-23 | PLGS: Robust Panoptic Lifting with 3D Gaussian Splatting | Yu Wang et.al. | 2410.17505 | null |
2024-10-22 | AG-SLAM: Active Gaussian Splatting SLAM | Wen Jiang et.al. | 2410.17422 | null |
2024-10-22 | SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes | Cheng-De Fan et.al. | 2410.17249 | null |
2024-10-18 | GS-LIVM: Real-Time Photo-Realistic LiDAR-Inertial-Visual Mapping with Gaussian Splatting | Yusen Xie et.al. | 2410.17084 | null |
2024-10-22 | E-3DGS: Gaussian Splatting with Exposure and Motion Events | Xiaoting Yin et.al. | 2410.16995 | link |
2024-10-21 | 3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors | Xi Liu et.al. | 2410.16266 | null |
2024-10-22 | Fully Explicit Dynamic Gaussian Splatting | Junoh Lee et.al. | 2410.15629 | null |
2024-10-22 | EF-3DGS: Event-Aided Free-Trajectory 3D Gaussian Splatting | Bohao Liao et.al. | 2410.15392 | null |
2024-10-18 | Neural Signed Distance Function Inference through Splatting 3D Gaussians Pulled on Zero-Level Set | Wenyuan Zhang et.al. | 2410.14189 | null |
2024-10-17 | DepthSplat: Connecting Gaussian Splatting and Depth | Haofei Xu et.al. | 2410.13862 | link |
2024-10-17 | DN-4DGS: Denoised Deformable Network with Temporal-Spatial Aggregation for Dynamic Scene Rendering | Jiahao Lu et.al. | 2410.13607 | link |
2024-10-17 | GlossyGS: Inverse Rendering of Glossy Objects with 3D Gaussian Splatting | Shuichang Lai et.al. | 2410.13349 | null |
2024-10-16 | 3D Gaussian Splatting in Robotics: A Survey | Siting Zhu et.al. | 2410.12262 | link |
2024-10-15 | SplatPose+: Real-time Image-Based Pose-Agnostic 3D Anomaly Detection | Yizhe Liu et.al. | 2410.12080 | link |
2024-10-15 | LoGS: Visual Localization via Gaussian Splatting with Fewer Training Images | Yuzhou Cheng et.al. | 2410.11505 | null |
2024-10-15 | MCGS: Multiview Consistency Enhancement for Sparse-View 3D Gaussian Radiance Fields | Yuru Xiao et.al. | 2410.11394 | null |
2024-10-15 | GSORB-SLAM: Gaussian Splatting SLAM benefits from ORB features and Transmittance information | Wancai Zheng et.al. | 2410.11356 | null |
2024-10-15 | Scalable Indoor Novel-View Synthesis using Drone-Captured 360 Imagery with 3D Gaussian Splatting | Yuanbo Chen et.al. | 2410.11285 | null |
2024-10-14 | Few-shot Novel View Synthesis using Depth Aware 3D Gaussian Splatting | Raja Kumar et.al. | 2410.11080 | link |
2024-10-15 | 4-LEGS: 4D Language Embedded Gaussian Splatting | Gal Fiebelman et.al. | 2410.10719 | null |
2024-10-11 | SurgicalGS: Dynamic 3D Gaussian Splatting for Accurate Robotic-Assisted Surgical Scene Reconstruction | Jialei Chen et.al. | 2410.09292 | null |
2024-10-11 | MeshGS: Adaptive Mesh-Aligned Gaussian Splatting for High-Quality Rendering | Jaehoon Choi et.al. | 2410.08941 | null |
2024-10-11 | Learning Interaction-aware 3D Gaussian Splatting for One-shot Hand Avatars | Xuan Huang et.al. | 2410.08840 | link |
2024-10-11 | Look Gauss, No Pose: Novel View Synthesis using Gaussian Splatting without Accurate Pose Initialization | Christian Schmidt et.al. | 2410.08743 | link |
2024-10-10 | FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction | Irving Fang et.al. | 2410.08282 | null |
2024-10-10 | Neural Material Adaptor for Visual Grounding of Intrinsic Dynamics | Junyi Cao et.al. | 2410.08257 | null |
2024-10-10 | Poison-splat: Computation Cost Attack on 3D Gaussian Splatting | Jiahao Lu et.al. | 2410.08190 | link |
2024-10-10 | DifFRelight: Diffusion-Based Facial Performance Relighting | Mingming He et.al. | 2410.08188 | null |
2024-10-10 | Efficient Perspective-Correct 3D Gaussian Splatting Using Hybrid Transparency | Florian Hahlbohm et.al. | 2410.08129 | null |
2024-10-10 | IncEventGS: Pose-Free Gaussian Splatting from a Single Event Camera | Jian Huang et.al. | 2410.08107 | link |
2024-10-11 | Fast Feedforward 3D Gaussian Splatting Compression | Yihang Chen et.al. | 2410.08017 | link |
2024-10-10 | MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting | Ruijie Zhu et.al. | 2410.07707 | link |
2024-10-09 | Spiking GS: Towards High-Accuracy and Low-Cost Surface Reconstruction via Spiking Neuron-based Gaussian Splatting | Weixing Zhang et.al. | 2410.07266 | link |
2024-10-09 | 3D Representation Methods: A Survey | Zhengren Wang et.al. | 2410.06475 | null |
2024-10-08 | HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction | Shengji Tang et.al. | 2410.06245 | null |
2024-10-08 | GSLoc: Visual Localization with 3D Gaussian Splatting | Kazii Botashev et.al. | 2410.06165 | null |
2024-10-08 | Comparative Analysis of Novel View Synthesis and Photogrammetry for 3D Forest Stand Reconstruction and extraction of individual tree parameters | Guoji Tian et.al. | 2410.05772 | null |
2024-10-07 | GS-VTON: Controllable 3D Virtual Try-on with Gaussian Splatting | Yukang Cao et.al. | 2410.05259 | null |
2024-10-07 | DreamSat: Towards a General 3D Model for Novel View Synthesis of Space Objects | Nidhi Mathihalli et.al. | 2410.05097 | link |
2024-10-07 | PhotoReg: Photometrically Registering 3D Gaussian Splatting Models | Ziwen Yuan et.al. | 2410.05044 | null |
2024-10-07 | 6DGS: Enhanced Direction-Aware Gaussian Splatting for Volumetric Rendering | Zhongpai Gao et.al. | 2410.04974 | null |
2024-10-07 | Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting | Matthew Strong et.al. | 2410.04680 | link |
2024-10-06 | Mode-GS: Monocular Depth Guided Anchored 3D Gaussian Splatting for Robust Ground-View Scene Rendering | Yonghan Lee et.al. | 2410.04646 | null |
2024-10-04 | Variational Bayes Gaussian Splatting | Toon Van de Maele et.al. | 2410.03592 | link |
2024-10-03 | Flash-Splat: 3D Reflection Removal with Flash Cues and Gaussian Splats | Mingyang Xie et.al. | 2410.02764 | null |
2024-10-03 | GI-GS: Global Illumination Decomposition on Gaussian Splatting for Inverse Rendering | Hongze Chen et.al. | 2410.02619 | null |
2024-10-07 | SuperGS: Super-Resolution 3D Gaussian Splatting via Latent Feature Field and Gradient-guided Splitting | Shiyun Xie et.al. | 2410.02571 | link |
2024-10-02 | MVGS: Multi-view-regulated Gaussian Splatting for Novel View Synthesis | Xiaobiao Du et.al. | 2410.02103 | link |
2024-10-03 | EVER: Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis | Alexander Mai et.al. | 2410.01804 | null |
2024-10-02 | 3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object Detection | Yang Cao et.al. | 2410.01647 | link |
2024-10-02 | Gaussian Splatting in Mirrors: Reflection-Aware Rendering via Virtual Camera Optimization | Zihan Wang et.al. | 2410.01614 | link |
2024-10-02 | UW-GS: Distractor-Aware 3D Gaussian Splatting for Enhanced Underwater Scene Reconstruction | Haoran Wang et.al. | 2410.01517 | link |
2024-10-02 | EVA-Gaussian: 3D Gaussian-based Real-time Human Novel View Synthesis under Diverse Camera Settings | Yingdong Hu et.al. | 2410.01425 | null |
2024-10-02 | CaRtGS: Computational Alignment for Real-Time Gaussian Splatting SLAM | Dapeng Feng et.al. | 2410.00486 | link |
2024-10-01 | Seamless Augmented Reality Integration in Arthroscopy: A Pipeline for Articular Reconstruction and Guidance | Hongchao Shu et.al. | 2410.00386 | null |
2024-10-01 | GSPR: Multimodal Place Recognition Using 3D Gaussian Splatting for Autonomous Driving | Zhangshuo Qi et.al. | 2410.00299 | link |
2024-09-30 | RL-GSBridge: 3D Gaussian Splatting Based Real2Sim2Real Method for Robotic Manipulation Learning | Yuxuan Wu et.al. | 2409.20291 | null |
2024-09-30 | Robust Gaussian Splatting SLAM by Leveraging Loop Closure | Zunjie Zhu et.al. | 2409.20111 | null |
2024-10-01 | RNG: Relightable Neural Gaussians | Jiahui Fan et.al. | 2409.19702 | null |
2024-09-28 | 1st Place Solution to the 8th HANDS Workshop Challenge – ARCTIC Track: 3DGS-based Bimanual Category-agnostic Interaction Reconstruction | Jeongwan On et.al. | 2409.19215 | null |
2024-09-26 | HGS-Planner: Hierarchical Planning Framework for Active Scene Reconstruction Using 3D Gaussian Splatting | Zijun Xu et.al. | 2409.17624 | null |
2024-09-25 | SeaSplat: Representing Underwater Scenes with 3D Gaussian Splatting and a Physically Grounded Image Formation Model | Daniel Yang et.al. | 2409.17345 | null |
2024-09-25 | Go-SLAM: Grounded Object Segmentation and Localization with Gaussian Splatting SLAM | Phu Pham et.al. | 2409.16944 | null |
2024-09-24 | GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization | Gennady Sidorov et.al. | 2409.16502 | link |
2024-09-24 | Frequency-based View Selection in Gaussian Splatting Reconstruction | Monica M. Q. Li et.al. | 2409.16470 | null |
2024-09-26 | Gaussian Deja-vu: Creating Controllable 3D Gaussian Head-Avatars with Enhanced Generalization and Personalization Abilities | Peizhi Yan et.al. | 2409.16147 | link |
2024-09-23 | Human Hair Reconstruction with Strand-Aligned 3D Gaussians | Egor Zakharov et.al. | 2409.14778 | null |
2024-09-22 | MVPGS: Excavating Multi-view Priors for Gaussian Splatting from Sparse Input Views | Wangze Xu et.al. | 2409.14316 | null |
2024-09-21 | SplatLoc: 3D Gaussian Splatting-based Visual Localization for Augmented Reality | Hongjia Zhai et.al. | 2409.14067 | null |
2024-09-20 | Elite-EvGS: Learning Event-based 3D Gaussian Splatting by Distilling Event-to-Video Priors | Zixin Zhang et.al. | 2409.13392 | null |
2024-09-20 | 3D-GSW: 3D Gaussian Splatting Watermark for Protecting Copyrights in Radiance Fields | Youngdong Jang et.al. | 2409.13222 | null |
2024-09-19 | MGSO: Monocular Real-time Photometric SLAM with Efficient 3D Gaussian Splatting | Yan Song Hu et.al. | 2409.13055 | null |
2024-09-18 | SRIF: Semantic Shape Registration Empowered by Diffusion-based Image Morphing and Flow Estimation | Mingze Sun et.al. | 2409.11682 | link |
2024-09-18 | Gradient-Driven 3D Segmentation and Affordance Transfer in Gaussian Splatting Using 2D Masks | Joji Joseph et.al. | 2409.11681 | link |
2024-09-17 | GS-Net: Generalizable Plug-and-Play 3D Gaussian Splatting Module | Yichen Zhang et.al. | 2409.11307 | null |
2024-09-17 | SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction | Marko Mihajlovic et.al. | 2409.11211 | null |
2024-09-17 | GLC-SLAM: Gaussian Splatting SLAM with Efficient Loop Closure | Ziheng Xu et.al. | 2409.10982 | null |
2024-09-16 | Phys3DGS: Physically-based 3D Gaussian Splatting for Inverse Rendering | Euntae Choi et.al. | 2409.10335 | null |
2024-09-16 | BEINGS: Bayesian Embodied Image-goal Navigation with Gaussian Splatting | Wugang Meng et.al. | 2409.10216 | link |
2024-09-16 | DENSER: 3D Gaussians Splatting for Scene Reconstruction of Dynamic Urban Environments | Mahmud A. Mohamad et.al. | 2409.10041 | link |
2024-09-15 | MesonGS: Post-training Compression of 3D Gaussians via Efficient Attribute Transformation | Shuzhao Xie et.al. | 2409.09756 | null |
2024-09-17 | A Diffusion Approach to Radiance Field Relighting using Multi-Illumination Synthesis | Yohan Poirier-Ginter et.al. | 2409.08947 | null |
2024-09-13 | AdR-Gaussian: Accelerating Gaussian Splatting with Adaptive Radius | Xinzhe Wang et.al. | 2409.08669 | null |
2024-09-13 | Dense Point Clouds Matter: Dust-GS for Scene Reconstruction from Sparse Viewpoints | Shan Chen et.al. | 2409.08613 | null |
2024-09-13 | CSS: Overcoming Pose and Scene Challenges in Crowd-Sourced 3D Gaussian Splatting | Runze Chen et.al. | 2409.08562 | null |
2024-09-12 | FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally | Qiuhong Shen et.al. | 2409.08270 | link |
2024-09-12 | Thermal3D-GS: Physics-induced 3D Gaussians for Thermal Infrared Novel-view Synthesis | Qian Chen et.al. | 2409.08042 | link |
2024-09-12 | SwinGS: Sliding Window Gaussian Splatting for Volumetric Video Streaming with Arbitrary Length | Bangya Liu et.al. | 2409.07759 | null |
2024-09-11 | Self-Evolving Depth-Supervised 3D Gaussian Splatting from Rendered Stereo Pairs | Sadra Safadoust et.al. | 2409.07456 | null |
2024-09-11 | Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models | Haibo Yang et.al. | 2409.07452 | link |
2024-09-11 | ThermalGaussian: Thermal 3D Gaussian Splatting | Rongfeng Lu et.al. | 2409.07200 | link |
2024-09-10 | GigaGS: Scaling up Planar-Based 3D Gaussians for Large Scene Surface Reconstruction | Junyi Chen et.al. | 2409.06685 | null |
2024-09-10 | Sources of Uncertainty in 3D Scene Reconstruction | Marcus Klasson et.al. | 2409.06407 | link |
2024-09-09 | Lagrangian Hashing for Compressed Neural Field Representations | Shrisudhan Govindarajan et.al. | 2409.05334 | null |
2024-09-08 | GS-PT: Exploiting 3D Gaussian Splatting for Comprehensive Point Cloud Understanding via Self-supervised Learning | Keyi Liu et.al. | 2409.04963 | null |
2024-09-11 | Fisheye-GS: Lightweight and Extensible Gaussian Splatting Module for Fisheye Cameras | Zimu Liao et.al. | 2409.04751 | link |
2024-09-06 | GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers | Lorenza Prospero et.al. | 2409.04196 | link |
2024-09-06 | 3D-GP-LMVIC: Learning-based Multi-View Image Coding with 3D Gaussian Geometric Priors | Yujun Huang et.al. | 2409.04013 | link |
2024-09-05 | LM-Gaussian: Boost Sparse-view 3D Gaussian Splatting with Large Model Priors | Hanyang Yu et.al. | 2409.03456 | null |
2024-09-05 | Optimizing 3D Gaussian Splatting for Sparse Viewpoint Scene Reconstruction | Shen Chen et.al. | 2409.03213 | null |
2024-09-04 | Object Gaussian for Monocular 6D Pose Estimation from Sparse Views | Luqing Luo et.al. | 2409.02581 | null |
2024-09-04 | GGS: Generalizable Gaussian Splatting for Lane Switching in Autonomous Driving | Huasong Han et.al. | 2409.02382 | null |
2024-09-03 | DynOMo: Online Point Tracking by Dynamic Online Monocular Gaussian Reconstruction | Jenny Seidenschwarz et.al. | 2409.02104 | null |
2024-09-03 | PRoGS: Progressive Rendering of Gaussian Splats | Brent Zoomers et.al. | 2409.01761 | null |
2024-09-03 | GaussianPU: A Hybrid 2D-3D Upsampling Framework for Enhancing Color Point Clouds via 3D Gaussian Splatting | Zixuan Guo et.al. | 2409.01581 | null |
2024-09-02 | Free-DyGS: Camera-Pose-Free Scene Reconstruction based on Gaussian Splatting for Dynamic Surgical Videos | Qian Li et.al. | 2409.01003 | null |
2024-09-06 | 3D Gaussian Splatting for Large-scale 3D Surface Reconstruction from Aerial Images | YuanZheng Wu et.al. | 2409.00381 | null |
2024-08-30 | OG-Mapping: Octree-based Structured 3D Gaussians for Online Dense Mapping | Meng Wang et.al. | 2408.17223 | null |
2024-08-29 | ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model | Fangfu Liu et.al. | 2408.16767 | null |
2024-08-28 | Towards Realistic Example-based Modeling via 3D Gaussian Stitching | Xinyu Gao et.al. | 2408.15708 | null |
2024-08-27 | Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty | Saining Zhang et.al. | 2408.15242 | link |
2024-08-27 | Learning-based Multi-View Stereo: A Survey | Fangjinhua Wang et.al. | 2408.15235 | null |
2024-08-27 | LapisGS: Layered Progressive 3D Gaussian Splatting for Adaptive Streaming | Yuang Shi et.al. | 2408.14823 | link |
2024-08-26 | Avatar Concept Slider: Manipulate Concepts In Your Human Avatar With Fine-grained Control | Yixuan He et.al. | 2408.13995 | null |
2024-08-27 | Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs | Brandon Smart et.al. | 2408.13912 | null |
2024-08-25 | TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers | Chuanrui Zhang et.al. | 2408.13770 | null |
2024-08-25 | SceneDreamer360: Text-Driven 3D-Consistent Scene Generation with Panoramic Gaussian Splatting | Wenrui Li et.al. | 2408.13711 | link |
2024-08-23 | BiGS: Bidirectional Gaussian Primitives for Relightable 3D Gaussian Splatting | Zhenyuan Liu et.al. | 2408.13370 | null |
2024-08-23 | FLoD: Integrating Flexible Level of Detail into 3D Gaussian Splatting for Customizable Rendering | Yunji Seo et.al. | 2408.12894 | null |
2024-08-26 | GSFusion: Online RGB-D Mapping Where Gaussian Splatting Meets TSDF Fusion | Jiaxin Wei et.al. | 2408.12677 | link |
2024-08-22 | Subsurface Scattering for 3D Gaussian Splatting | Jan-Niklas Dihlmann et.al. | 2408.12282 | null |
2024-08-21 | Robust 3D Gaussian Splatting for Novel View Synthesis in Presence of Distractors | Paul Ungermann et.al. | 2408.11697 | link |
2024-08-27 | Pano2Room: Novel View Synthesis from a Single Indoor Panorama | Guo Pu et.al. | 2408.11413 | link |
2024-08-20 | GSLoc: Efficient Camera Pose Refinement via 3D Gaussian Splatting | Changkun Liu et.al. | 2408.11085 | link |
2024-08-20 | ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining | Qi Ma et.al. | 2408.10906 | null |
2024-08-20 | DEGAS: Detailed Expressions on Full-Body Gaussian Avatars | Zhijing Shao et.al. | 2408.10588 | link |
2024-08-20 | LoopSplat: Loop Closure by Registering 3D Gaussian Splats | Liyuan Zhu et.al. | 2408.10154 | link |
2024-08-20 | Gaussian in the Dark: Real-Time View Synthesis From Inconsistent Dark Images Using Gaussian Splatting | Sheng Ye et.al. | 2408.09130 | link |
2024-08-16 | Correspondence-Guided SfM-Free 3D Gaussian Splatting for NVS | Wei Sun et.al. | 2408.08723 | null |
2024-08-15 | WaterSplatting: Fast Underwater 3D Scene Reconstruction Using Gaussian Splatting | Huapeng Li et.al. | 2408.08206 | null |
2024-08-19 | FlashGS: Efficient 3D Gaussian Splatting for Large-scale and High-resolution Rendering | Guofeng Feng et.al. | 2408.07967 | link |
2024-08-14 | 3D Gaussian Editing with A Single Image | Guan Luo et.al. | 2408.07540 | null |
2024-08-13 | SpectralGaussians: Semantic, spectral 3D Gaussian splatting for multi-spectral scene representation, visualization and analysis | Saptarshi Neil Sinha et.al. | 2408.06975 | null |
2024-08-12 | Mipmap-GS: Let Gaussians Deform with Scale-specific Mipmap for Anti-aliasing Rendering | Jiameng Li et.al. | 2408.06286 | link |
2024-08-12 | Developing Smart MAVs for Autonomous Inspection in GPS-denied Constructions | Paoqiang Pan et.al. | 2408.06030 | null |
2024-08-10 | Visual SLAM with 3D Gaussian Primitives and Depth Priors Enabling Novel View Synthesis | Zhongche Qu et.al. | 2408.05635 | null |
2024-08-09 | DreamCouple: Exploring High Quality Text-to-3D Generation Via Rectified Flow | Hangyu Li et.al. | 2408.05008 | null |
2024-08-08 | InstantStyleGaussian: Efficient Art Style Transfer with 3D Gaussian Splatting | Xin-Yi Yu et.al. | 2408.04249 | null |
2024-08-07 | Towards Real-Time Gaussian Splatting: Accelerating 3DGS through Photometric SLAM | Yan Song Hu et.al. | 2408.03825 | null |
2024-08-07 | Compact 3D Gaussian Splatting for Static and Dynamic Radiance Fields | Joo Chan Lee et.al. | 2408.03822 | null |
2024-08-07 | 3iGS: Factorised Tensorial Illumination for 3D Gaussian Splatting | Zhe Jun Tang et.al. | 2408.03753 | link |
2024-08-07 | PRTGS: Precomputed Radiance Transfer of Gaussian Splats for Real-Time High-Quality Relighting | Yijia Guo et.al. | 2408.03538 | null |
2024-08-02 | A General Framework to Boost 3D GS Initialization for Text-to-3D Generation by Lexical Richness | Lutao Jiang et.al. | 2408.01269 | null |
2024-08-02 | Reality Fusion: Robust Real-time Immersive Mobile Robot Teleoperation with Volumetric Visual Data Fusion | Ke Li et.al. | 2408.01225 | link |
2024-08-07 | IG-SLAM: Instant Gaussian SLAM | F. Aykut Sarikamis et.al. | 2408.01126 | null |
2024-08-01 | LoopSparseGS: Loop Based Sparse-View Friendly Gaussian Splatting | Zhenyu Bao et.al. | 2408.00254 | null |
2024-07-31 | Localized Gaussian Splatting Editing with Contextual Awareness | Hanyuan Xiao et.al. | 2408.00083 | null |
2024-07-31 | Expressive Whole-Body 3D Gaussian Avatar | Gyeongsik Moon et.al. | 2407.21686 | null |
2024-07-30 | SceneTeller: Language-to-3D Scene Generation | Başak Melis Öcal et.al. | 2407.20727 | null |
2024-07-29 | Radiance Fields for Robotic Teleoperation | Maximum Wilder-Smith et.al. | 2407.20194 | link |
2024-07-24 | 3D Gaussian Splatting: Survey, Technologies, Challenges, and Opportunities | Yanqi Bao et.al. | 2407.17418 | link |
2024-07-23 | HDRSplat: Gaussian Splatting for High Dynamic Range 3D Scene Reconstruction from Raw Images | Shreyas Singh et.al. | 2407.16503 | link |
2024-07-23 | Integrating Meshes and 3D Gaussians for Indoor Scene Reconstruction with SAM Mask Guidance | Jiyeop Kim et.al. | 2407.16173 | null |
2024-07-22 | 6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model | Matteo Bortolon et.al. | 2407.15484 | null |
2024-07-22 | Enhancement of 3D Gaussian Splatting using Raw Mesh for Photorealistic Recreation of Architectures | Ruizhe Wang et.al. | 2407.15435 | null |
2024-07-21 | HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions | Haiyang Zhou et.al. | 2407.15187 | null |
2024-07-20 | Realistic Surgical Image Dataset Generation Based On 3D Gaussian Splatting | Tianle Zeng et.al. | 2407.14846 | null |
2024-07-19 | DirectL: Efficient Radiance Fields Rendering for 3D Light Field Displays | Zongyuan Yang et.al. | 2407.14053 | null |
2024-07-20 | Connecting Consistency Distillation to Score Distillation for Text-to-3D Generation | Zongrui Li et.al. | 2407.13584 | link |
2024-07-18 | EaDeblur-GS: Event assisted 3D Deblur Reconstruction with Gaussian Splatting | Yuchen Weng et.al. | 2407.13520 | null |
2024-07-17 | Splatfacto-W: A Nerfstudio Implementation of Gaussian Splatting for Unconstrained Photo Collections | Congrong Xu et.al. | 2407.12306 | null |
2024-07-16 | MVG-Splatting: Multi-View Guided Gaussian Splatting with Adaptive Quantile-Based Geometric Consistency Densification | Zhuoxiao Li et.al. | 2407.11840 | null |
2024-07-16 | Click-Gaussian: Interactive Segmentation to Any 3D Gaussians | Seokhun Choi et.al. | 2407.11793 | null |
2024-07-16 | SlingBAG: Sliding ball adaptive growth algorithm with differentiable radiation enables super-efficient iterative 3D photoacoustic image reconstruction | Shuang Li et.al. | 2407.11781 | link |
2024-07-16 | Ev-GS: Event-based Gaussian splatting for Efficient and Accurate Radiance Field Rendering | Jingqian Wu et.al. | 2407.11343 | null |
2024-07-14 | 3DEgo: 3D Editing on the Go! | Umar Khalid et.al. | 2407.10102 | null |
2024-07-14 | SpikeGS: 3D Gaussian Splatting from Spike Streams with High-Speed Camera Motion | Jiyuan Zhang et.al. | 2407.10062 | null |
2024-07-12 | StyleSplat: 3D Object Style Transfer with Gaussian Splatting | Sahil Jain et.al. | 2407.09473 | null |
2024-07-11 | WildGaussians: 3D Gaussian Splatting in the Wild | Jonas Kulhanek et.al. | 2407.08447 | link |
2024-07-11 | Survey on Fundamental Deep Learning 3D Reconstruction Techniques | Yonge Bai et.al. | 2407.08137 | null |
2024-07-17 | MIGS: Multi-Identity Gaussian Splatting via Tensor Decomposition | Aggelina Chatziagapi et.al. | 2407.07284 | null |
2024-07-09 | Reference-based Controllable Scene Stylization with Gaussian Splatting | Yiqun Mei et.al. | 2407.07220 | null |
2024-07-10 | 3D Gaussian Ray Tracing: Fast Tracing of Particle Scenes | Nicolas Moenne-Loccoz et.al. | 2407.07090 | null |
2024-07-07 | PICA: Physics-Integrated Clothed Avatar | Bo Peng et.al. | 2407.05324 | null |
2024-07-06 | SurgicalGaussian: Deformable 3D Gaussians for High-Fidelity Surgical Scene Reconstruction | Weixing Xie et.al. | 2407.05023 | link |
2024-07-12 | Segment Any 4D Gaussians | Shengxiang Ji et.al. | 2407.04504 | null |
2024-07-04 | PFGS: High Fidelity Point Cloud Rendering via Feature Splatting | Jiaxu Wang et.al. | 2407.03857 | link |
2024-07-04 | SpikeGS: Reconstruct 3D scene via fast-moving bio-inspired sensors | Yijia Guo et.al. | 2407.03771 | null |
2024-07-04 | VEGS: View Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned Priors | Sungwon Hwang et.al. | 2407.02945 | link |
2024-07-03 | Free-SurGS: SfM-Free 3D Gaussian Splatting for Surgical Scene Reconstruction | Jiaxin Guo et.al. | 2407.02918 | link |
2024-07-04 | AutoSplat: Constrained Gaussian Splatting for Autonomous Driving Scene Reconstruction | Mustafa Khan et.al. | 2407.02598 | null |
2024-07-02 | TrAME: Trajectory-Anchored Multi-View Editing for Text-Guided 3D Gaussian Splatting Manipulation | Chaofan Luo et.al. | 2407.02034 | null |
2024-07-01 | GaussianStego: A Generalizable Stenography Pipeline for Generative 3D Gaussians Splatting | Chenxin Li et.al. | 2407.01301 | null |
2024-07-02 | RTGS: Enabling Real-Time Gaussian Splatting on Mobile Devices Using Efficiency-Guided Pruning and Foveated Rendering | Weikai Lin et.al. | 2407.00435 | link |
2024-06-29 | OccFusion: Rendering Occluded Humans with Generative Diffusion Priors | Adam Sun et.al. | 2407.00316 | null |
2024-06-28 | SpotlessSplats: Ignoring Distractors in 3D Gaussian Splatting | Sara Sabour et.al. | 2406.20055 | null |
2024-06-28 | EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting | Daiwei Zhang et.al. | 2406.19811 | null |
2024-06-27 | Lightweight Predictive 3D Gaussian Splats | Junli Cao et.al. | 2406.19434 | link |
2024-06-26 | On Scaling Up 3D Gaussian Splatting Training | Hexu Zhao et.al. | 2406.18533 | link |
2024-06-26 | GaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enhanced Quality | Taoran Yi et.al. | 2406.18462 | null |
2024-06-26 | Trimming the Fat: Efficient Compression of 3D Gaussian Splats through Pruning | Muhammad Salman Ali et.al. | 2406.18214 | link |
2024-06-26 | GS-Octree: Octree-based 3D Gaussian Splatting for Robust Object-level 3D Reconstruction Under Strong Lighting | Jiaze Li et.al. | 2406.18199 | null |
2024-06-25 | NerfBaselines: Consistent and Reproducible Evaluation of Novel View Synthesis Methods | Jonas Kulhanek et.al. | 2406.17345 | null |
2024-06-24 | Reducing the Memory Footprint of 3D Gaussian Splatting | Panagiotis Papantonakis et.al. | 2406.17074 | null |
2024-06-23 | LGS: A Light-weight 4D Gaussian Splatting for Efficient Surgical Scene Reconstruction | Hengyu Liu et.al. | 2406.16073 | link |
2024-06-23 | Learning with Noisy Ground Truth: From 2D Classification to 3D Reconstruction | Yangdi Lu et.al. | 2406.15982 | null |
2024-06-21 | Taming 3DGS: High-Quality Radiance Fields with Limited Resources | Saswat Subhajyoti Mallick et.al. | 2406.15643 | link |
2024-06-21 | Gaussian Splatting to Real World Flight Navigation Transfer with Liquid Networks | Alex Quach et.al. | 2406.15149 | null |
2024-06-18 | Sampling 3D Gaussian Scenes in Seconds with Latent Diffusion Models | Paul Henderson et.al. | 2406.13099 | null |
2024-06-18 | HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors | Panwang Pan et.al. | 2406.12459 | link |
2024-06-17 | A Hierarchical 3D Gaussian Representation for Real-Time Rendering of Very Large Datasets | Bernhard Kerbl et.al. | 2406.12080 | null |
2024-06-22 | RetinaGS: Scalable Training for Dense Scene Rendering with Billion-Scale 3D Gaussians | Bingling Li et.al. | 2406.11836 | null |
2024-06-18 | Effective Rank Analysis and Regularization for Enhanced 3D Gaussian Splatting | Junha Hyung et.al. | 2406.11672 | null |
2024-06-14 | Wild-GS: Real-Time Novel View Synthesis from Unconstrained Photo Collections | Jiacong Xu et.al. | 2406.10373 | null |
2024-06-14 | L4GM: Large 4D Gaussian Reconstruction Model | Jiawei Ren et.al. | 2406.10324 | null |
2024-06-14 | PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting | Alex Hanson et.al. | 2406.10219 | link |
2024-06-14 | GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors | Xiqian Yu et.al. | 2406.10111 | null |
2024-06-14 | Unified Gaussian Primitives for Scene Representation and Rendering | Yang Zhou et.al. | 2406.09733 | null |
2024-06-13 | Modeling Ambient Scene Dynamics for Free-view Synthesis | Meng-Li Shih et.al. | 2406.09395 | null |
2024-06-13 | GGHead: Fast and Generalizable 3D Gaussian Heads | Tobias Kirschstein et.al. | 2406.09377 | null |
2024-06-13 | Gaussian-Forest: Hierarchical-Hybrid 3D Gaussian Splatting for Compressed Scene Modeling | Fengyi Zhang et.al. | 2406.08759 | null |
2024-06-12 | ICE-G: Image Conditional Editing of 3D Gaussian Splats | Vishnu Jaganathan et.al. | 2406.08488 | null |
2024-06-12 | Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models | Yuxuan Xue et.al. | 2406.08475 | null |
2024-06-12 | From Chaos to Clarity: 3DGS in the Dark | Zhihao Li et.al. | 2406.08300 | null |
2024-06-11 | Trim 3D Gaussian Splatting for Accurate Geometry Representation | Lue Fan et.al. | 2406.07499 | null |
2024-06-11 | Cinematic Gaussians: Real-Time HDR Radiance Fields with Depth of Field | Chao Wang et.al. | 2406.07329 | null |
2024-06-10 | GaussianCity: Generative Gaussian Splatting for Unbounded 3D City Generation | Haozhe Xie et.al. | 2406.06526 | link |
2024-06-10 | PGSR: Planar-based Gaussian Splatting for Efficient and High-Fidelity Surface Reconstruction | Danpeng Chen et.al. | 2406.06521 | null |
2024-06-10 | MVGamba: Unify 3D Content Generation as State Space Sequence Modeling | Xuanyu Yi et.al. | 2406.06367 | link |
2024-06-10 | Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis | Xin Jin et.al. | 2406.06216 | link |
2024-06-09 | RefGaussian: Disentangling Reflections from 3D Gaussian Splatting for Realistic Rendering | Rui Zhang et.al. | 2406.05852 | null |
2024-06-09 | VCR-GauS: View Consistent Depth-Normal Regularizer for Gaussian Surface Reconstruction | Hanlin Chen et.al. | 2406.05774 | null |
2024-06-06 | A Survey on 3D Human Avatar Modeling – From Reconstruction to Generation | Ruihe Wang et.al. | 2406.04253 | null |
2024-06-06 | Localized Gaussian Point Management | Haosen Yang et.al. | 2406.04251 | null |
2024-06-06 | Superpoint Gaussian Splatting for Real-Time High-Fidelity Dynamic Scene Reconstruction | Diwen Wan et.al. | 2406.03697 | link |
2024-06-10 | Event3DGS: Event-Based 3D Gaussian Splatting for High-Speed Robot Egomotion | Tianyi Xiong et.al. | 2406.02972 | null |
2024-06-05 | Adversarial Generation of Hierarchical Gaussians for 3D Generative Model | Sangeek Hyun et.al. | 2406.02968 | link |
2024-06-04 | 3D-HGS: 3D Half-Gaussian Splatting | Haolin Li et.al. | 2406.02720 | link |
2024-06-06 | Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting | Inkyu Shin et.al. | 2406.02541 | null |
2024-06-04 | SatSplatYOLO: 3D Gaussian Splatting-based Virtual Object Detection Ensembles for Satellite Feature Recognition | Van Minh Nguyen et.al. | 2406.02533 | null |
2024-06-04 | DDGS-CT: Direction-Disentangled Gaussian Splatting for Realistic Volume Rendering | Zhongpai Gao et.al. | 2406.02518 | null |
2024-06-04 | WE-GS: An In-the-wild Efficient 3D Gaussian Representation for Unconstrained Photo Collections | Yuze Wang et.al. | 2406.02407 | null |
2024-06-04 | Query-based Semantic Gaussian Field for Scene Representation in Reinforcement Learning | Jiaxu Wang et.al. | 2406.02370 | null |
2024-06-04 | OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding | Yanmin Wu et.al. | 2406.02058 | null |
2024-06-04 | FastLGS: Speeding up Language Embedded Gaussians with Feature Grid Mapping | Yuzhou Ji et.al. | 2406.01916 | null |
2024-06-03 | Tetrahedron Splatting for 3D Generation | Chun Gu et.al. | 2406.01579 | link |
2024-06-03 | DreamPhysics: Learning Physical Properties of Dynamic 3D Gaussians with Video Diffusion Priors | Tianyu Huang et.al. | 2406.01476 | link |
2024-06-03 | RaDe-GS: Rasterizing Depth in Gaussian Splatting | Baowen Zhang et.al. | 2406.01467 | link |
2024-05-31 | ContextGS: Compact 3D Gaussian Splatting with Anchor Level Context Model | Yufei Wang et.al. | 2405.20721 | link |
2024-05-31 | R $^2$ -Gaussian: Rectifying Radiative Gaussian Splatting for Tomographic Reconstruction | Ruyi Zha et.al. | 2405.20693 | link |
2024-05-30 | $\textit{S}^3$ Gaussian: Self-Supervised Street Gaussians for Autonomous Driving | Nan Huang et.al. | 2405.20323 | link |
2024-06-03 | A Pixel Is Worth More Than One 3D Gaussians in Single-View 3D Reconstruction | Jianghao Shen et.al. | 2405.20310 | null |
2024-05-29 | EvaGaussians: Event Stream Assisted Gaussian Splatting from Blurry Images | Wangbo Yu et.al. | 2405.20224 | null |
2024-05-30 | Object-centric Reconstruction and Tracking of Dynamic Unknown Objects using 3D Gaussian Splatting | Kuldeep R Barad et.al. | 2405.20104 | null |
2024-05-30 | GaussianRoom: Improving 3D Gaussian Splatting with SDF Guidance and Monocular Cues for Indoor Scene Reconstruction | Haodong Xiang et.al. | 2405.19671 | null |
2024-05-30 | Uncertainty-guided Optimal Transport in Depth Supervised Sparse-View 3D Gaussian | Wei Sun et.al. | 2405.19657 | null |
2024-05-30 | TAMBRIDGE: Bridging Frame-Centered Tracking and 3D Gaussian Splatting for Enhanced SLAM | Peifeng Jiang et.al. | 2405.19614 | null |
2024-05-29 | NPGA: Neural Parametric Gaussian Avatars | Simon Giebenhain et.al. | 2405.19331 | null |
2024-05-29 | LP-3DGS: Learning to Prune 3D Gaussian Splatting | Zhaoliang Zhang et.al. | 2405.18784 | link |
2024-05-28 | A Grid-Free Fluid Solver based on Gaussian Spatial Representation | Jingrui Xing et.al. | 2405.18133 | null |
2024-05-28 | FreeSplat: Generalizable 3D Gaussian Splatting Towards Free-View Synthesis of Indoor Scenes | Yunsong Wang et.al. | 2405.17958 | link |
2024-05-28 | A Refined 3D Gaussian Representation for High-Quality Dynamic Scene Reconstruction | Bin Zhang et.al. | 2405.17891 | null |
2024-05-29 | HFGS: 4D Gaussian Splatting with Emphasis on Spatial and Temporal High-Frequency Components for Endoscopic Scene Reconstruction | Haoyu Zhao et.al. | 2405.17872 | link |
2024-05-30 | Deform3DGS: Flexible Deformation for Fast Surgical Scene Reconstruction with Gaussian Splatting | Shuojue Yang et.al. | 2405.17835 | link |
2024-05-28 | Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh | Xiangjun Gao et.al. | 2405.17811 | null |
2024-05-28 | SafeguardGS: 3D Gaussian Primitive Pruning While Avoiding Catastrophic Scene Destruction | Yongjae Lee et.al. | 2405.17793 | link |
2024-05-29 | DC-Gaussian: Improving 3D Gaussian Splatting for Reflective Dash Cam Videos | Linhan Wang et.al. | 2405.17705 | link |
2024-05-27 | GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane | Yansong Qu et.al. | 2405.17596 | null |
2024-05-27 | DOF-GS: Adjustable Depth-of-Field 3D Gaussian Splatting for Refocusing,Defocus Rendering and Blur Removal | Yujie Wang et.al. | 2405.17351 | null |
2024-05-27 | Memorize What Matters: Emergent Scene Decomposition from Multitraverse | Yiming Li et.al. | 2405.17187 | link |
2024-05-28 | F-3DGS: Factorized Coordinates and Representations for 3D Gaussian Splatting | Xiangyu Sun et.al. | 2405.17083 | null |
2024-05-28 | SA-GS: Semantic-Aware Gaussian Splatting for Large Scene Reconstruction with Geometry Constrain | Butian Xiong et.al. | 2405.16923 | null |
2024-05-28 | PyGS: Large-scale Scene Representation with Pyramidal 3D Gaussian Splatting | Zipeng Wang et.al. | 2405.16829 | null |
2024-05-26 | Splat-SLAM: Globally Optimized RGB-only SLAM with 3D Gaussians | Erik Sandström et.al. | 2405.16544 | link |
2024-05-24 | Feature Splatting for Better Novel View Synthesis with Low Overlap | T. Berriel Martins et.al. | 2405.15518 | link |
2024-05-24 | GSDeformer: Direct Cage-based Deformation for 3D Gaussian Splatting | Jiajun Huang et.al. | 2405.15491 | null |
2024-05-27 | HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed via Gaussian Splatting | Yuanhao Cai et.al. | 2405.15125 | link |
2024-05-24 | GS-Hider: Hiding Messages into 3D Gaussian Splatting | Xuanyu Zhang et.al. | 2405.15118 | null |
2024-05-23 | TIGER: Text-Instructed 3D Gaussian Retrieval and Coherent Editing | Teng Xu et.al. | 2405.14455 | null |
2024-05-24 | RoGS: Large Scale Road Surface Reconstruction based on 2D Gaussian Splatting | Zhiheng Feng et.al. | 2405.14342 | link |
2024-05-22 | DoGaussian: Distributed-Oriented Gaussian Splatting for Large-Scale 3D Reconstruction Via Gaussian Consensus | Yu Chen et.al. | 2405.13943 | link |
2024-05-22 | Gaussian Time Machine: A Real-Time Rendering Methodology for Time-Variant Appearances | Licheng Shen et.al. | 2405.13694 | null |
2024-05-21 | Gaussian Control with Hierarchical Semantic Graphs in 3D Human Recovery | Hongsheng Wang et.al. | 2405.12477 | null |
2024-05-20 | GarmentDreamer: 3DGS Guided Garment Synthesis with Diverse Geometry and Texture Details | Boqian Li et.al. | 2405.12420 | link |
2024-05-22 | AtomGS: Atomizing Gaussian Splatting for High-Fidelity Radiance Field | Rong Liu et.al. | 2405.12369 | link |
2024-05-20 | Embracing Radiance Field Rendering in 6G: Over-the-Air Training and Inference with 3D Contents | Guanlin Wu et.al. | 2405.12155 | null |
2024-05-20 | CoR-GS: Sparse-View 3D Gaussian Splatting via Co-Regularization | Jiawei Zhang et.al. | 2405.12110 | link |
2024-05-21 | Gaussian Head & Shoulders: High Fidelity Neural Upper Body Avatars with Anchor Gaussian Guided Texture Warping | Tianhao Wu et.al. | 2405.12069 | null |
2024-05-20 | MirrorGaussian: Reflecting 3D Gaussians for Reconstructing Mirror Reflections | Jiayue Liu et.al. | 2405.11921 | null |
2024-05-18 | Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching | Xingyu Miao et.al. | 2405.11252 | link |
2024-05-18 | MotionGS : Compact Gaussian Splatting SLAM by Motion Filter | Xinli Guo et.al. | 2405.11129 | link |
2024-05-17 | Photorealistic 3D Urban Scene Reconstruction and Point Cloud Extraction using Google Earth Imagery and Gaussian Splatting | Kyle Gao et.al. | 2405.11021 | null |
2024-05-17 | ART3D: 3D Gaussian Splatting for Text-Guided Artistic Scenes Generation | Pengzhi Li et.al. | 2405.10508 | null |
2024-05-16 | GS-Planner: A Gaussian-Splatting-based Planning Framework for Active High-Fidelity Reconstruction | Rui Jin et.al. | 2405.10142 | null |
2024-05-11 | Direct Learning of Mesh and Appearance via 3D Gaussian Splatting | Ancheng Lin et.al. | 2405.06945 | null |
2024-05-10 | I3DGS: Improve 3D Gaussian Splatting from Multiple Dimensions | Jinwei Lin et.al. | 2405.06408 | null |
2024-05-09 | DragGaussian: Enabling Drag-style Manipulation on 3D Gaussian Representation | Sitian Shen et.al. | 2405.05800 | null |
2024-05-09 | FastScene: Text-Driven Fast 3D Indoor Scene Generation via Panoramic Gaussian Splatting | Yikun Ma et.al. | 2405.05768 | null |
2024-05-18 | NGM-SLAM: Gaussian Splatting SLAM with Radiance Field Submap | Mingrui Li et.al. | 2405.05702 | null |
2024-05-09 | Benchmarking Neural Radiance Fields for Autonomous Robots: An Overview | Yuhang Ming et.al. | 2405.05526 | null |
2024-05-08 | GDGS: Gradient Domain Gaussian Splatting for Sparse Representation of Radiance Fields | Yuanhao Gong et.al. | 2405.05446 | null |
2024-05-06 | A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose | Kaiwen Jiang et.al. | 2405.03659 | null |
2024-05-03 | HoloGS: Instant Depth-based 3D Gaussian Splatting with Microsoft HoloLens 2 | Miriam Jäger et.al. | 2405.02005 | null |
2024-05-01 | Spectrally Pruned Gaussian Fields with Neural Compensation | Runyi Yang et.al. | 2405.00676 | link |
2024-04-30 | GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting | Kai Zhang et.al. | 2404.19702 | null |
2024-04-29 | SAGS: Structure-Aware 3D Gaussian Splatting | Evangelos Ververas et.al. | 2404.19149 | null |
2024-04-29 | MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing | Cong Wang et.al. | 2404.19026 | null |
2024-04-29 | DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing | Minghao Chen et.al. | 2404.18929 | null |
2024-04-29 | Bootstrap 3D Reconstructed Scenes from 3D Gaussian Splatting | Yifei Gao et.al. | 2404.18669 | null |
2024-04-29 | 3D Gaussian Splatting with Deferred Reflection | Keyang Ye et.al. | 2404.18454 | link |
2024-04-29 | Reconstructing Satellites in 3D from Amateur Telescope Images | Zhiming Chang et.al. | 2404.18394 | null |
2024-04-26 | SLAM for Indoor Mapping of Wide Area Construction Environments | Vincent Ress et.al. | 2404.17215 | null |
2024-04-25 | GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting | Kyusun Cho et.al. | 2404.16012 | link |
2024-04-25 | OMEGAS: Object Mesh Extraction from Large Scenes Guided by Gaussian Segmentation | Lizhi Wang et.al. | 2404.15891 | link |
2024-04-22 | Guess The Unseen: Dynamic 3D Scene Reconstruction from Partial 2D Glimpses | Inhee Lee et.al. | 2404.14410 | null |
2024-04-22 | CLIP-GS: CLIP-Informed Gaussian Splatting for Real-time and View-consistent 3D Semantic Understanding | Guibiao Liao et.al. | 2404.14249 | link |
2024-04-28 | GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting | Hongyun Yu et.al. | 2404.14037 | null |
2024-04-21 | GScream: Learning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal | Yuxin Wang et.al. | 2404.13679 | null |
2024-04-19 | Learn2Talk: 3D Talking Face Learns from 2D Talking Face | Yixiang Zhuang et.al. | 2404.12888 | null |
2024-04-19 | EfficientGS: Streamlining Gaussian Splatting for Large-Scale High-Resolution Scene Representation | Wenkai Liu et.al. | 2404.12777 | null |
2024-04-22 | Does Gaussian Splatting need SFM Initialization? | Yalda Foroutan et.al. | 2404.12547 | null |
2024-04-22 | Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Monocular Videos | Isabella Liu et.al. | 2404.12379 | null |
2024-04-17 | RainyScape: Unsupervised Rainy Scene Reconstruction using Decoupled Neural Rendering | Xianqiang Lyu et.al. | 2404.11401 | null |
2024-04-18 | DeblurGS: Gaussian Splatting for Camera Motion Blur | Jeongtaek Oh et.al. | 2404.11358 | null |
2024-04-17 | Novel View Synthesis for Cinematic Anatomy on Mobile and Immersive Displays | Simon Niedermayr et.al. | 2404.11285 | null |
2024-04-16 | Gaussian Opacity Fields: Efficient and Compact Surface Reconstruction in Unbounded Scenes | Zehao Yu et.al. | 2404.10772 | null |
2024-04-16 | Gaussian Splatting Decoder for 3D-aware Generative Adversarial Networks | Florian Barthel et.al. | 2404.10625 | null |
2024-04-16 | AbsGS: Recovering Fine Details for 3D Gaussian Splatting | Zongxin Ye et.al. | 2404.10484 | null |
2024-04-16 | SRGS: Super-Resolution 3D Gaussian Splatting | Xiang Feng et.al. | 2404.10318 | link |
2024-04-15 | LetsGo: Large-Scale Garage Modeling and Rendering via LiDAR-Assisted Gaussian Primitives | Jiadi Cui et.al. | 2404.09748 | null |
2024-04-15 | 3D Gaussian Splatting as Markov Chain Monte Carlo | Shakiba Kheradmand et.al. | 2404.09591 | null |
2024-04-16 | LoopGaussian: Creating 3D Cinemagraph with Multi-view Images via Eulerian Motion Field | Jiyang Li et.al. | 2404.08966 | link |
2024-04-15 | OccGaussian: 3D Gaussian Splatting for Occluded Human Rendering | Jingrui Ye et.al. | 2404.08449 | null |
2024-04-10 | RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion | Jaidev Shriram et.al. | 2404.07199 | null |
2024-04-10 | Gaussian-LIC: Photo-realistic LiDAR-Inertial-Camera SLAM with 3D Gaussian Splatting | Xiaolei Lang et.al. | 2404.06926 | null |
2024-04-10 | SplatPose & Detect: Pose-Agnostic 3D Anomaly Detection | Mathis Kruse et.al. | 2404.06832 | link |
2024-04-12 | SpikeNVS: Enhancing Novel View Synthesis from Blurry Images via Spike Camera | Gaole Dai et.al. | 2404.06710 | null |
2024-04-14 | 3D Geometry-aware Deformable Gaussian Splatting for Dynamic View Synthesis | Zhicheng Lu et.al. | 2404.06270 | null |
2024-04-09 | Gaussian Pancakes: Geometrically-Regularized 3D Gaussian Splatting for Realistic Endoscopic Reconstruction | Sierra Bonilla et.al. | 2404.06128 | link |
2024-04-09 | Revising Densification in Gaussian Splatting | Samuel Rota Bulò et.al. | 2404.06109 | null |
2024-04-09 | Hash3D: Training-free Acceleration for 3D Generation | Xingyi Yang et.al. | 2404.06091 | link |
2024-04-08 | StylizedGS: Controllable Stylization for 3D Gaussian Splatting | Dingxi Zhang et.al. | 2404.05220 | null |
2024-04-06 | Z-Splat: Z-Axis Gaussian Splatting for Camera-Sonar Fusion | Ziyuan Qu et.al. | 2404.04687 | link |
2024-04-05 | Robust Gaussian Splatting | François Darmon et.al. | 2404.04211 | null |
2024-04-04 | Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting | Jeongmin Bae et.al. | 2404.03613 | null |
2024-04-08 | OmniGS: Omnidirectional Gaussian Splatting for Fast Radiance Field Reconstruction using Omnidirectional Images | Longwei Li et.al. | 2404.03202 | link |
2024-04-03 | TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Surrounding Autonomous Driving Scenes | Cheng Zhao et.al. | 2404.02410 | null |
2024-04-01 | Mirror-3DGS: Incorporating Mirror Reflections into 3D Gaussian Splatting | Jiarui Meng et.al. | 2404.01168 | null |
2024-04-07 | CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians | Yang Liu et.al. | 2404.01133 | link |
2024-04-01 | MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements | Lisong C. Sun et.al. | 2404.00923 | null |
2024-03-30 | 3DGSR: Implicit Surface Reconstruction with 3D Gaussian Splatting | Xiaoyang Lyu et.al. | 2404.00409 | null |
2024-03-29 | InstantSplat: Unbounded Sparse-view Pose-free Gaussian Splatting in 40 Seconds | Zhiwen Fan et.al. | 2403.20309 | link |
2024-03-29 | Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces | Mauro Comi et.al. | 2403.20275 | null |
2024-03-29 | HGS-Mapping: Online Dense Mapping Using Hybrid Gaussian Representation in Urban Scenes | Ke Wu et.al. | 2403.20159 | null |
2024-03-29 | SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior | Zhongrui Yu et.al. | 2403.20079 | null |
2024-03-29 | HO-Gaussian: Hybrid Optimization of 3D Gaussian Splatting for Urban Scenes | Zhuopeng Li et.al. | 2403.20032 | null |
2024-03-28 | GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling | Bowen Zhang et.al. | 2403.19655 | null |
2024-03-28 | GauStudio: A Modular Framework for 3D Gaussian Splatting and Beyond | Chongjie Ye et.al. | 2403.19632 | link |
2024-03-28 | CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians | Avinash Paliwal et.al. | 2403.19495 | link |
2024-03-29 | Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction | Qiuhong Shen et.al. | 2403.18795 | link |
2024-03-26 | Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D Gaussians | Kerui Ren et.al. | 2403.17898 | link |
2024-03-26 | 2D Gaussian Splatting for Geometrically Accurate Radiance Fields | Binbin Huang et.al. | 2403.17888 | link |
2024-03-26 | DN-Splatter: Depth and Normal Priors for Gaussian Splatting and Meshing | Matias Turkulainen et.al. | 2403.17822 | link |
2024-03-25 | GSDF: 3DGS Meets SDF for Improved Rendering and Reconstruction | Mulin Yu et.al. | 2403.16964 | null |
2024-03-23 | Gaussian in the Wild: 3D Gaussian Splatting for Unconstrained Image Collections | Dongbin Zhang et.al. | 2403.15704 | null |
2024-03-22 | Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting | Jun Guo et.al. | 2403.15624 | null |
2024-03-22 | Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting | Zheng Zhang et.al. | 2403.15530 | null |
2024-03-22 | STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians | Yifei Zeng et.al. | 2403.14939 | null |
2024-03-21 | MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images | Yuedong Chen et.al. | 2403.14627 | link |
2024-03-21 | Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering | Antoine Guédon et.al. | 2403.14554 | null |
2024-03-21 | HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression | Yihang Chen et.al. | 2403.14530 | link |
2024-03-21 | Isotropic Gaussian Splatting for Real-Time Radiance Field Rendering | Yuanhao Gong et.al. | 2403.14244 | null |
2024-03-19 | GVGEN: Text-to-3D Generation with Volumetric Representation | Xianglong He et.al. | 2403.12957 | null |
2024-03-19 | HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting | Hongyu Zhou et.al. | 2403.12722 | null |
2024-03-22 | RGBD GS-ICP SLAM | Seongbo Ha et.al. | 2403.12550 | link |
2024-03-19 | High-Fidelity SLAM Using Gaussian Splatting with Rendering-Guided Densification and Regularized Optimization | Shuo Sun et.al. | 2403.12535 | link |
2024-03-20 | View-Consistent 3D Editing with Gaussian Splatting | Yuxuan Wang et.al. | 2403.11868 | null |
2024-03-19 | BAD-Gaussians: Bundle Adjusted Deblur Gaussian Splatting | Lingzhe Zhao et.al. | 2403.11831 | link |
2024-03-18 | NEDS-SLAM: A Novel Neural Explicit Dense Semantic SLAM Framework using 3D Gaussian Splatting | Yiming Ji et.al. | 2403.11679 | null |
2024-03-20 | GaussNav: Gaussian Splatting for Visual Navigation | Xiaohan Lei et.al. | 2403.11625 | link |
2024-03-18 | 3DGS-Calib: 3D Gaussian Splatting for Multimodal SpatioTemporal Calibration | Quentin Herau et.al. | 2403.11577 | null |
2024-03-18 | Fed3DGS: Scalable 3D Gaussian Splatting with Federated Learning | Teppei Suzuki et.al. | 2403.11460 | link |
2024-03-18 | Bridging 3D Gaussian and Mesh for Freeview Video Rendering | Yuting Xiao et.al. | 2403.11453 | null |
2024-03-18 | Motion-aware 3D Gaussian Splatting for Efficient Dynamic Scene Reconstruction | Zhiyang Guo et.al. | 2403.11447 | null |
2024-03-18 | BAGS: Building Animatable Gaussian Splatting from a Monocular Video with Diffusion Priors | Tingyang Zhang et.al. | 2403.11427 | null |
2024-03-18 | Beyond Uncertainty: Risk-Aware Active View Acquisition for Safe Robot Navigation and 3D Scene Understanding with FisherRF | Guangyi Liu et.al. | 2403.11396 | null |
2024-03-17 | 3DGS-ReLoc: 3D Gaussian Splatting for Map Representation and Visual ReLocalization | Peng Jiang et.al. | 2403.11367 | null |
2024-03-17 | Compact 3D Gaussian Splatting For Dense Visual SLAM | Tianchen Deng et.al. | 2403.11247 | link |
2024-03-15 | SWAG: Splatting in the Wild images with Appearance-conditioned Gaussians | Hiba Dahmani et.al. | 2403.10427 | null |
2024-03-15 | GGRt: Towards Generalizable 3D Gaussians without Pose Priors in Real-Time | Hao Li et.al. | 2403.10147 | null |
2024-03-15 | Texture-GS: Disentangling the Geometry and Texture for 3D Gaussian Splatting Editing | Tian-Xing Xu et.al. | 2403.10050 | null |
2024-03-14 | Touch-GS: Visual-Tactile Supervised 3D Gaussian Splatting | Aiden Swann et.al. | 2403.09875 | null |
2024-03-14 | GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping | Yuhang Zheng et.al. | 2403.09637 | link |
2024-03-14 | Relaxing Accurate Initialization Constraint for 3D Gaussian Splatting | Jaewoo Jung et.al. | 2403.09413 | link |
2024-03-14 | A New Split Algorithm for 3D Gaussian Splatting | Qiyuan Feng et.al. | 2403.09143 | null |
2024-03-14 | GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing | Jing Wu et.al. | 2403.08733 | link |
2024-03-13 | Gaussian Splatting in Style | Abhishek Saroha et.al. | 2403.08498 | null |
2024-03-12 | StyleGaussian: Instant 3D Style Transfer with Gaussian Splatting | Kunhao Liu et.al. | 2403.07807 | null |
2024-03-13 | DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization | Jiahe Li et.al. | 2403.06912 | link |
2024-03-11 | FreGS: 3D Gaussian Splatting with Progressive Frequency Regularization | Jiahui Zhang et.al. | 2403.06908 | null |
2024-03-07 | Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis | Yuanhao Cai et.al. | 2403.04116 | link |
2024-02-29 | 3D Gaussian Model for Animation and Texturing | Xiangzhi Eric Wang et.al. | 2402.19441 | null |
2024-02-27 | VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction | Jiaqi Lin et.al. | 2402.17427 | null |
2024-02-24 | Spec-Gaussian: Anisotropic View-Dependent Appearance for 3D Gaussian Splatting | Ziyi Yang et.al. | 2402.15870 | null |
2024-02-22 | GaussianPro: 3D Gaussian Splatting with Progressive Propagation | Kai Cheng et.al. | 2402.14650 | null |
2024-02-21 | Identifying Unnecessary 3D Gaussians using Clustering for Fast Rendering of 3D Gaussian Splatting | Joongho Jo et.al. | 2402.13827 | null |
2024-02-20 | How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a Survey | Fabio Tosi et.al. | 2402.13255 | link |
2024-02-15 | GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering | Abdullah Hamdi et.al. | 2402.10128 | link |
2024-02-11 | 3D Gaussian as a New Vision Era: A Survey | Ben Fei et.al. | 2402.07181 | null |
2024-02-13 | GS-CLIP: Gaussian Splatting for Contrastive Language-Image-3D Pretraining from Real-World Data | Haoyuan Li et.al. | 2402.06198 | null |
2024-02-09 | HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting | Zhenglin Zhou et.al. | 2402.06149 | link |
2024-02-06 | Rig3DGS: Creating Controllable Portraits from Casual Monocular Videos | Alfredo Rivero et.al. | 2402.03723 | null |
2024-02-07 | 4D Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes | Yuanxing Duan et.al. | 2402.03307 | link |
2024-02-01 | 360-GS: Layout-guided Panoramic Gaussian Splatting For Indoor Roaming | Jiayang Bai et.al. | 2402.00763 | null |
Text-to-Video
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-06-26 | SmoothSinger: A Conditional Diffusion Model for Singing Voice Synthesis with Multi-Resolution Architecture | Kehan Sui et.al. | 2506.21478 | null |
2025-06-26 | ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models | Hongbo Liu et.al. | 2506.21356 | null |
2025-06-26 | HieraSurg: Hierarchy-Aware Diffusion Model for Surgical Video Generation | Diego Biagini et.al. | 2506.21287 | null |
2025-06-26 | Video Virtual Try-on with Conditional Diffusion Transformer Inpainter | Cheng Zou et.al. | 2506.21270 | null |
2025-06-26 | DFVEdit: Conditional Delta Flow Vector for Zero-shot Video Editing | Lingling Cai et.al. | 2506.20967 | null |
2025-06-26 | Consistent Zero-shot 3D Texture Synthesis Using Geometry-aware Diffusion and Temporal Video Models | Donggoo Kang et.al. | 2506.20946 | null |
2025-06-25 | Video Perception Models for 3D Scene Synthesis | Rui Huang et.al. | 2506.20601 | null |
2025-06-25 | BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos | Jiahao Lin et.al. | 2506.20103 | null |
2025-06-24 | Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation | Xingyang Li et.al. | 2506.19852 | null |
2025-06-24 | GenHSI: Controllable Generation of Human-Scene Interaction Videos | Zekun Li et.al. | 2506.19840 | null |
2025-06-24 | SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution | Liangbin Xie et.al. | 2506.19838 | null |
2025-06-24 | Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Router | Yubo Huang et.al. | 2506.19833 | null |
2025-06-24 | Training-Free Motion Customization for Distilled Video Generators with Adaptive Test-Time Distillation | Jintao Rong et.al. | 2506.19348 | null |
2025-06-23 | VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory | Runjia Li et.al. | 2506.18903 | null |
2025-06-23 | From Virtual Games to Real-World Play | Wenqiang Sun et.al. | 2506.18901 | null |
2025-06-23 | FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation | Kaiyi Huang et.al. | 2506.18899 | null |
2025-06-23 | MinD: Unified Visual Imagination and Control via Hierarchical World Models | Xiaowei Chi et.al. | 2506.18897 | null |
2025-06-23 | OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation | Qijun Gan et.al. | 2506.18866 | null |
2025-06-23 | Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset | Zhuowei Chen et.al. | 2506.18851 | null |
2025-06-23 | Matrix-Game: Interactive World Foundation Model | Yifan Zhang et.al. | 2506.18701 | null |
2025-06-23 | RDPO: Real Data Preference Optimization for Physics Consistency Video Generation | Wenxu Qian et.al. | 2506.18655 | null |
2025-06-23 | BulletGen: Improving 4D Reconstruction with Bullet-Time Generation | Denys Rozumnyi et.al. | 2506.18601 | null |
2025-06-23 | VQ-Insight: Teaching VLMs for AI-Generated Video Quality Understanding via Progressive Visual Reinforcement Learning | Xuanyu Zhang et.al. | 2506.18564 | null |
2025-06-23 | Emergent Temporal Correspondences from Video Diffusion Transformers | Jisu Nam et.al. | 2506.17220 | link |
2025-06-20 | Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition | Jiaqi Li et.al. | 2506.17201 | null |
2025-06-20 | Seeing What Matters: Generalizable AI-generated Video Detection with Forensic-Oriented Augmentation | Riccardo Corvi et.al. | 2506.16802 | null |
2025-06-19 | VideoGAN-based Trajectory Proposal for Automated Vehicles | Annajoyce Mariani et.al. | 2506.16209 | null |
2025-06-19 | FastInit: Fast Noise Initialization for Temporally Consistent Video Generation | Chengyu Bai et.al. | 2506.16119 | null |
2025-06-19 | PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models | Tianchen Zhao et.al. | 2506.16054 | null |
2025-06-19 | Advanced Sign Language Video Generation with Compressed and Quantized Multi-Condition Tokenization | Cong Wang et.al. | 2506.15980 | null |
2025-06-20 | Sekai: A Video Dataset towards World Exploration | Zhen Li et.al. | 2506.15675 | null |
2025-06-20 | Show-o2: Improved Native Unified Multimodal Models | Jinheng Xie et.al. | 2506.15564 | link |
2025-06-17 | Causally Steered Diffusion for Automated Video Counterfactual Generation | Nikos Spyrou et.al. | 2506.14404 | null |
2025-06-17 | CausalDiffTab: Mixed-Type Causal-Aware Diffusion for Tabular Data Generation | Jia-Chen Zhang et.al. | 2506.14206 | null |
2025-06-18 | VideoMAR: Autoregressive Video Generatio with Continuous Tokens | Hu Yu et.al. | 2506.14168 | null |
2025-06-16 | UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions | Zhucun Xue et.al. | 2506.13691 | null |
2025-06-16 | STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation | Jiamin Wang et.al. | 2506.13138 | null |
2025-06-15 | iDiT-HOI: Inpainting-based Hand Object Interaction Reenactment via Video Diffusion Transformer | Zhelun Shen et.al. | 2506.12847 | null |
2025-06-13 | SignAligner: Harmonizing Complementary Pose Modalities for Coherent Sign Language Generation | Xu Wang et.al. | 2506.11621 | null |
2025-06-12 | GenWorld: Towards Detecting AI-generated Real-world Simulation Videos | Weiliang Chen et.al. | 2506.10975 | null |
2025-06-12 | M4V: Multi-Modal Mamba for Text-to-Video Generation | Jiancheng Huang et.al. | 2506.10915 | null |
2025-06-12 | GigaVideo-1: Advancing Video Generation via Automatic Feedback with 4 GPU-Hours Fine-Tuning | Xiaoyi Bao et.al. | 2506.10639 | null |
2025-06-12 | DreamActor-H1: High-Fidelity Human-Product Demonstration Video Generation via Motion-designed Diffusion Transformers | Lizhen Wang et.al. | 2506.10568 | null |
2025-06-12 | AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation | Haoyuan Shi et.al. | 2506.10540 | null |
2025-06-11 | PlayerOne: Egocentric World Simulator | Yuanpeng Tu et.al. | 2506.09995 | null |
2025-06-11 | InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions | Zhenzhi Wang et.al. | 2506.09984 | null |
2025-06-11 | ReSim: Reliable World Simulation for Autonomous Driving | Jiazhi Yang et.al. | 2506.09981 | null |
2025-06-11 | DGAE: Diffusion-Guided Autoencoder for Efficient Latent Representation Learning | Dongxu Liu et.al. | 2506.09644 | null |
2025-06-11 | Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation | Shanchuan Lin et.al. | 2506.09350 | null |
2025-06-10 | Seedance 1.0: Exploring the Boundaries of Video Generation Models | Yu Gao et.al. | 2506.09113 | null |
2025-06-10 | FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation | Zheqi He et.al. | 2506.09081 | null |
2025-06-10 | MagCache: Fast Video Generation with Magnitude-Aware Cache | Zehong Ma et.al. | 2506.09045 | link |
2025-06-11 | Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models | Xuanchi Ren et.al. | 2506.09042 | link |
2025-06-10 | HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation | Ziyao Huang et.al. | 2506.08797 | null |
2025-06-10 | How Much To Guide: Revisiting Adaptive Guidance in Classifier-Free Guidance Text-to-Vision Diffusion Models | Huixuan Zhang et.al. | 2506.08351 | null |
2025-06-09 | Seeing Voices: Generating A-Roll Video from Audio with Mirage | Aditi Sundararaman et.al. | 2506.08279 | null |
2025-06-09 | Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion | Xun Huang et.al. | 2506.08009 | null |
2025-06-09 | Dreamland: Controllable World Creation with Simulator and Generative Models | Sicheng Mo et.al. | 2506.08006 | null |
2025-06-09 | Audio-Sync Video Generation with Multi-Stream Temporal Control | Shuchen Weng et.al. | 2506.08003 | null |
2025-06-09 | Generative Modeling of Weights: Generalization or Memorization? | Boya Zeng et.al. | 2506.07998 | link |
2025-06-09 | Video Unlearning via Low-Rank Refusal Vector | Simone Facchiano et.al. | 2506.07891 | null |
2025-06-09 | PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement | Teng Hu et.al. | 2506.07848 | null |
2025-06-09 | Consistent Video Editing as Flow-Driven Image-to-Video Generation | Ge Wang et.al. | 2506.07713 | null |
2025-06-10 | From Generation to Generalization: Emergent Few-Shot Learning in Video Diffusion Models | Pablo Acuaviva et.al. | 2506.07280 | null |
2025-06-08 | TV-LiVE: Training-Free, Text-Guided Video Editing via Layer Informed Vitality Exploitation | Min-Jung Kim et.al. | 2506.07205 | null |
2025-06-08 | Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models | Sangwon Jang et.al. | 2506.07177 | null |
2025-06-06 | Restereo: Diffusion stereo video generation and restoration | Xingchang Huang et.al. | 2506.06023 | null |
2025-06-06 | LLIA – Enabling Low-Latency Interactive Avatars: Real-Time Audio-Driven Portrait Video Generation with Diffusion Models | Haojie Yu et.al. | 2506.05806 | null |
2025-06-05 | EX-4D: EXtreme Viewpoint 4D Video Synthesis via Depth Watertight Mesh | Tao Hu et.al. | 2506.05554 | null |
2025-06-05 | ContentV: Efficient Training of Video Generation Models with Limited Compute | Wenfeng Lin et.al. | 2506.05343 | null |
2025-06-09 | Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers | Haosong Liu et.al. | 2506.05096 | null |
2025-06-05 | FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation | Huihan Wang et.al. | 2506.04956 | null |
2025-06-05 | DualX-VSR: Dual Axial Spatial $\times$ Temporal Transformer for Real-World Video Super-Resolution without Motion Compensation | Shuo Cao et.al. | 2506.04830 | null |
2025-06-06 | FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion | Akide Liu et.al. | 2506.04648 | null |
2025-06-05 | Follow-Your-Creation: Empowering 4D Creation through Video Inpainting | Yue Ma et.al. | 2506.04590 | null |
2025-06-04 | LayerFlow: A Unified Model for Layer-aware Video Generation | Sihui Ji et.al. | 2506.04228 | null |
2025-06-04 | UNIC: Unified In-Context Video Editing | Zixuan Ye et.al. | 2506.04216 | null |
2025-06-05 | FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers | Xuanhua He et.al. | 2506.04213 | null |
2025-06-04 | DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models | Ziyi Wu et.al. | 2506.03517 | null |
2025-06-03 | Chipmunk: Training-Free Acceleration of Diffusion Transformers with Dynamic Column-Sparse Deltas | Austin Silveria et.al. | 2506.03275 | null |
2025-06-03 | IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation | Yuanze Lin et.al. | 2506.03150 | null |
2025-06-03 | Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval | Jiwen Yu et.al. | 2506.03141 | null |
2025-06-03 | CamCloneMaster: Enabling Reference-based Camera Control for Video Generation | Yawen Luo et.al. | 2506.03140 | null |
2025-06-03 | AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation | Lu Qiu et.al. | 2506.03126 | null |
2025-06-03 | DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation | Zhengyao Lv et.al. | 2506.03123 | null |
2025-06-03 | TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models | Chetwin Low et.al. | 2506.03099 | null |
2025-06-03 | ORV: 4D Occupancy-centric Robot Video Generation | Xiuyu Yang et.al. | 2506.03079 | link |
2025-06-03 | Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers | Pengtao Chen et.al. | 2506.03065 | null |
2025-06-03 | LinkTo-Anime: A 2D Animation Optical Flow Dataset from 3D Model Rendering | Xiaoyi Feng et.al. | 2506.02733 | null |
2025-06-03 | LumosFlow: Motion-Guided Long Video Generation | Jiahao Chen et.al. | 2506.02497 | null |
2025-05-30 | MiniMax-Remover: Taming Bad Noise Helps Video Object Removal | Bojia Zi et.al. | 2505.24873 | null |
2025-05-30 | DreamDance: Animating Character Art via Inpainting Stable Gaussian Worlds | Jiaxu Zhang et.al. | 2505.24733 | null |
2025-05-30 | UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation | Yang-Tian Sun et.al. | 2505.24521 | null |
2025-05-30 | Interactive Video Generation via Domain Adaptation | Ishaan Rawal et.al. | 2505.24253 | null |
2025-05-30 | STORK: Improving the Fidelity of Mid-NFE Sampling for Diffusion and Flow Matching Models | Zheng Tan et.al. | 2505.24210 | link |
2025-05-29 | MAGREF: Masked Guidance for Any-Reference Video Generation | Yufan Deng et.al. | 2505.23742 | link |
2025-05-29 | VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos | Tingyu Song et.al. | 2505.23693 | link |
2025-05-29 | VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models | Xiangdong Zhang et.al. | 2505.23656 | link |
2025-05-29 | VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation | Shi-Xue Zhang et.al. | 2505.23484 | link |
2025-05-29 | Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis | Hengyuan Cao et.al. | 2505.23325 | null |
2025-05-29 | RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer | Liu Liu et.al. | 2505.23171 | null |
2025-05-29 | Zero-to-Hero: Zero-Shot Initialization Empowering Reference-Based Video Appearance Editing | Tongtong Su et.al. | 2505.23134 | link |
2025-05-29 | MMGT: Motion Mask Guided Two-Stage Network for Co-Speech Gesture Video Generation | Siyuan Wang et.al. | 2505.23120 | link |
2025-05-29 | GeoMan: Temporally Consistent Human Geometry Estimation using Image-to-Video Diffusion | Gwanghyun Kim et.al. | 2505.23085 | null |
2025-05-29 | MOVi: Training-free Text-conditioned Multi-Object Video Generation | Aimon Rahman et.al. | 2505.22980 | null |
2025-05-28 | Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation | Zhe Kong et.al. | 2505.22647 | link |
2025-05-28 | Q-VDiT: Towards Accurate Quantization and Distillation of Video-Generation Diffusion Transformers | Weilun Feng et.al. | 2505.22167 | null |
2025-05-28 | FaceEditTalker: Interactive Talking Head Generation with Facial Attribute Editing | Guanwen Feng et.al. | 2505.22141 | null |
2025-05-28 | LatentMove: Towards Complex Human Movement Video Generation | Ashkan Taghipour et.al. | 2505.22046 | null |
2025-05-28 | PanoWan: Lifting Diffusion Video Generation Models to 360° with Latitude/Longitude-aware Mechanisms | Yifei Xia et.al. | 2505.22016 | null |
2025-05-28 | Learning World Models for Interactive Video Generation | Taiye Chen et.al. | 2505.21996 | null |
2025-05-27 | HDRSDR-VQA: A Subjective Video Quality Dataset for HDR and SDR Comparative Evaluation | Bowen Chen et.al. | 2505.21831 | null |
2025-05-27 | Think Before You Diffuse: LLMs-Guided Physics-Aware Video Generation | Ke Zhang et.al. | 2505.21653 | null |
2025-05-27 | VideoMarkBench: Benchmarking Robustness of Video Watermarking | Zhengyuan Jiang et.al. | 2505.21620 | link |
2025-05-27 | Frame In-N-Out: Unbounded Controllable Image-to-Video Generation | Boyang Wang et.al. | 2505.21491 | null |
2025-05-27 | Dynamic Vision from EEG Brain Recordings: How much does EEG know? | Prajwal Singh et.al. | 2505.21385 | null |
2025-05-28 | SageAttention2++: A More Efficient Implementation of SageAttention2 | Jintao Zhang et.al. | 2505.21136 | link |
2025-05-27 | Minute-Long Videos with Dual Parallelisms | Zeqing Wang et.al. | 2505.21070 | link |
2025-05-27 | RainFusion: Adaptive Video Generation Acceleration via Multi-Dimensional Visual Redundancy | Aiyue Chen et.al. | 2505.21036 | null |
2025-05-27 | Frame-Level Captions for Long Video Generation with Complex Multi Scenes | Guangcong Zheng et.al. | 2505.20827 | null |
2025-05-27 | Learning Generalizable Robot Policy with Human Demonstration Video as a Prompt | Xiang Zhu et.al. | 2505.20795 | null |
2025-05-27 | Photography Perspective Composition: Towards Aesthetic Perspective Recommendation | Lujian Yao et.al. | 2505.20655 | null |
2025-05-27 | Incorporating Flexible Image Conditioning into Text-to-Video Diffusion Models without Training | Bolin Lai et.al. | 2505.20629 | null |
2025-05-28 | OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation | Shenghai Yuan et.al. | 2505.20292 | link |
2025-05-27 | Dynamic-I2V: Exploring Image-to-Video Generation Models via Multimodal LLM | Peng Liu et.al. | 2505.19901 | null |
2025-05-26 | DriveCamSim: Generalizable Camera Simulation via Explicit Camera Modeling for Autonomous Driving | Wenchao Sun et.al. | 2505.19692 | link |
2025-05-26 | TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs | Juntong Wang et.al. | 2505.19535 | null |
2025-05-26 | The Role of Video Generation in Enhancing Data-Limited Action Understanding | Wei Li et.al. | 2505.19495 | null |
2025-05-26 | Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals | Nate Gillman et.al. | 2505.19386 | null |
2025-05-25 | From Single Images to Motion Policies via Video-Generation Environment Representations | Weiming Zhi et.al. | 2505.19306 | null |
2025-05-25 | SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation | Shenggan Cheng et.al. | 2505.19151 | null |
2025-05-25 | WorldEval: World Model as Real-World Robot Policies Evaluator | Yaxuan Li et.al. | 2505.19017 | null |
2025-05-24 | Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation | Shuo Yang et.al. | 2505.18875 | null |
2025-05-24 | VORTA: Efficient Video Diffusion via Routing Sparse Attention | Wenhao Sun et.al. | 2505.18809 | link |
2025-05-23 | WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions | Zizhang Li et.al. | 2505.18151 | null |
2025-05-23 | DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation | Junhao Chen et.al. | 2505.18078 | null |
2025-05-23 | SafeMVDrive: Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain | Jiawei Zhou et.al. | 2505.17727 | null |
2025-05-23 | Scaling Image and Video Generation via Test-Time Evolutionary Search | Haoran He et.al. | 2505.17618 | null |
2025-05-23 | InfLVG: Reinforce Inference-Time Consistent Long Video Generation with GRPO | Xueji Fang et.al. | 2505.17574 | link |
2025-05-22 | Training-Free Efficient Video Generation via Dynamic Token Carving | Yuechen Zhang et.al. | 2505.16864 | link |
2025-05-22 | Action2Dialogue: Generating Character-Centric Narratives from Scene-Level Prompts | Taewon Kang et.al. | 2505.16819 | null |
2025-05-22 | MAGIC: Motion-Aware Generative Inference via Confidence-Guided LLM | Siwei Meng et.al. | 2505.16456 | null |
2025-05-23 | Challenger: Affordable Adversarial Driving Video Generation | Zhiyuan Xu et.al. | 2505.15880 | null |
2025-05-21 | Generative AI for Autonomous Driving: A Review | Katharina Winter et.al. | 2505.15863 | null |
2025-05-25 | Interspatial Attention for Efficient 4D Human Video Generation | Ruizhi Shao et.al. | 2505.15800 | null |
2025-05-21 | AvatarShield: Visual Reinforcement Learning for Human-Centric Video Forgery Detection | Zhipei Xu et.al. | 2505.15173 | null |
2025-05-21 | CineTechBench: A Benchmark for Cinematographic Technique Understanding and Generation | Xinran Wang et.al. | 2505.15145 | link |
2025-05-20 | Programmatic Video Prediction Using Large Language Models | Hao Tang et.al. | 2505.14948 | link |
2025-05-20 | Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers | Sucheng Ren et.al. | 2505.14687 | link |
2025-05-20 | LMP: Leveraging Motion Prior in Zero-Shot Video Generation with Diffusion Transformer | Changgu Chen et.al. | 2505.14167 | null |
2025-05-20 | Hunyuan-Game: Industrial-grade Intelligent Game Creation Model | Ruihuang Li et.al. | 2505.14135 | null |
2025-05-19 | FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance | Dian Shao et.al. | 2505.13437 | null |
2025-05-19 | MAGI-1: Autoregressive Video Generation at Scale | Sand. ai et.al. | 2505.13211 | link |
2025-05-19 | DreamGen: Unlocking Generalization in Robot Learning through Neural Trajectories | Joel Jang et.al. | 2505.12705 | link |
2025-05-19 | Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking | Zihan Su et.al. | 2505.12667 | null |
2025-05-19 | BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation | Haiquan Wen et.al. | 2505.12620 | link |
2025-05-18 | Video-GPT via Next Clip Diffusion | Shaobin Zhuang et.al. | 2505.12489 | null |
2025-05-17 | LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation | Jiarui Wang et.al. | 2505.12098 | link |
2025-05-17 | VFRTok: Variable Frame Rates Video Tokenizer with Duration-Proportional Information Assumption | Tianxiong Zhong et.al. | 2505.12053 | null |
2025-05-16 | QVGen: Pushing the Limit of Quantized Video Generative Models | Yushi Huang et.al. | 2505.11497 | null |
2025-05-16 | Face Consistency Benchmark for GenAI Video | Michal Podstawski et.al. | 2505.11425 | null |
2025-05-14 | Aquarius: A Family of Industry-Level Video Generation Models for Marketing Scenarios | Huafeng Shi et.al. | 2505.10584 | null |
2025-05-16 | MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation | Yanbo Ding et.al. | 2505.10238 | link |
2025-05-15 | ToonifyGB: StyleGAN-based Gaussian Blendshapes for 3D Stylized Head Avatars | Rui-Yang Ju et.al. | 2505.10072 | null |
2025-05-18 | EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models | Hu Yue et.al. | 2505.09694 | link |
2025-05-15 | Generating time-consistent dynamics with discriminator-guided image diffusion models | Philipp Hess et.al. | 2505.09089 | null |
2025-05-13 | Generative AI for Autonomous Driving: Frontiers and Opportunities | Yuping Wang et.al. | 2505.08854 | link |
2025-05-13 | Symbolically-Guided Visual Plan Inference from Uncurated Video Data | Wenyan Yang et.al. | 2505.08444 | null |
2025-05-12 | DanceGRPO: Unleashing GRPO on Visual Generation | Zeyue Xue et.al. | 2505.07818 | null |
2025-05-12 | ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models | Ozgur Kara et.al. | 2505.07652 | null |
2025-05-16 | Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model | Wei Li et.al. | 2505.07449 | link |
2025-05-15 | Generative Pre-trained Autoregressive Diffusion Transformer | Yuan Zhang et.al. | 2505.07344 | null |
2025-05-11 | DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models | Junhao Xia et.al. | 2505.07057 | null |
2025-05-11 | BridgeIV: Bridging Customized Image and Video Generation through Test-Time Autoregressive Identity Propagation | Panwen Hu et.al. | 2505.06985 | null |
2025-05-10 | Jailbreaking the Text-to-Video Generative Models | Jiayang Liu et.al. | 2505.06679 | null |
2025-05-10 | ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images | Xianghao Kong et.al. | 2505.06537 | null |
2025-05-08 | T2VTextBench: A Human Evaluation Benchmark for Textual Control in Video Generation Models | Xuyang Guo et.al. | 2505.04946 | null |
2025-05-08 | HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation | Teng Hu et.al. | 2505.04512 | null |
2025-05-06 | Real-Time Person Image Synthesis Using a Flow Matching Model | Jiwoo Jeong et.al. | 2505.03562 | link |
2025-05-06 | Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights | Zhaiming Shen et.al. | 2505.03205 | null |
2025-05-04 | DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization | Wenchuan Wang et.al. | 2505.02192 | null |
2025-05-03 | PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth | Bu Jin et.al. | 2505.01729 | null |
2025-05-02 | VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos | Zongxia Li et.al. | 2505.01481 | link |
2025-05-02 | FreePCA: Integrating Consistency Information across Long-short Frames in Training-free Long Video Generation via Principal Component Analysis | Jiangtong Tan et.al. | 2505.01172 | link |
2025-05-01 | Controllable Weather Synthesis and Removal with Video Diffusion Models | Chih-Hao Lin et.al. | 2505.00704 | null |
2025-05-01 | T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation | Xuyang Guo et.al. | 2505.00337 | null |
2025-04-30 | Direct Motion Models for Assessing Generated Videos | Kelsey Allen et.al. | 2505.00209 | null |
2025-04-30 | Eye2Eye: A Simple Approach for Monocular-to-Stereo Video Synthesis | Michal Geyer et.al. | 2505.00135 | null |
2025-04-30 | ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction | Qihao Liu et.al. | 2504.21855 | null |
2025-04-30 | HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation | Haiyang Zhou et.al. | 2504.21650 | link |
2025-04-30 | Simple Visual Artifact Detection in Sora-Generated Videos | Misora Sugiyama et.al. | 2504.21334 | null |
2025-04-30 | Capturing Conditional Dependence via Auto-regressive Diffusion Models | Xunpeng Huang et.al. | 2504.21314 | null |
2025-04-29 | TesserAct: Learning 4D Embodied World Models | Haoyu Zhen et.al. | 2504.20995 | null |
2025-04-29 | DDPS: Discrete Diffusion Posterior Sampling for Paths in Layered Graphs | Hao Luan et.al. | 2504.20754 | null |
2025-04-29 | Advance Fake Video Detection via Vision Transformers | Joy Battocchio et.al. | 2504.20669 | null |
2025-04-28 | DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer | Junpeng Jiang et.al. | 2504.19614 | null |
2025-04-26 | Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning | Yifan Xie et.al. | 2504.18810 | null |
2025-04-26 | Stealing Creator’s Workflow: A Creator-Inspired Agentic Framework with Iterative Feedback Loop for Improved Scientific Short-form Generation | Jong Inn Park et.al. | 2504.18805 | null |
2025-04-25 | NoiseController: Towards Consistent Multi-view Video Generation via Noise Decomposition and Collaboration | Haotian Dong et.al. | 2504.18448 | null |
2025-04-23 | Subject-driven Video Generation via Disentangled Identity and Motion | Daneul Kim et.al. | 2504.17816 | null |
2025-04-24 | Dynamic Camera Poses and Where to Find Them | Chris Rockwell et.al. | 2504.17788 | null |
2025-04-24 | MV-Crafter: An Intelligent System for Music-guided Video Generation | Chuer Chen et.al. | 2504.17267 | null |
2025-04-24 | DIVE: Inverting Conditional Diffusion Models for Discriminative Tasks | Yinqi Li et.al. | 2504.17253 | link |
2025-04-25 | We’ll Fix it in Post: Improving Text-to-Video Generation with Neuro-Symbolic Feedback | Minkyu Choi et.al. | 2504.17180 | null |
2025-04-23 | BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation | Ruotong Wang et.al. | 2504.16907 | null |
2025-04-23 | ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance | Ying Li et.al. | 2504.16464 | null |
2025-04-23 | VideoMark: A Distortion-Free Robust Watermarking Framework for Video Diffusion Models | Xuming Hu et.al. | 2504.16359 | null |
2025-04-22 | Survey of Video Diffusion Models: Foundations, Implementations, and Applications | Yimu Wang et.al. | 2504.16081 | link |
2025-04-22 | Efficient Temporal Consistency in Diffusion-Based Video Editing with Adaptor Modules: A Theoretical Framework | Xinyuan Song et.al. | 2504.16016 | null |
2025-04-22 | Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning | Wang Lin et.al. | 2504.15932 | null |
2025-04-22 | Satellite to GroundScape – Large-scale Consistent Ground View Generation from Satellite Views | Ningli Xu et.al. | 2504.15786 | null |
2025-04-22 | DiTPainter: Efficient Video Inpainting with Diffusion Transformers | Xian Wu et.al. | 2504.15661 | null |
2025-04-21 | Solving New Tasks by Adapting Internet Video Knowledge | Calvin Luo et.al. | 2504.15369 | null |
2025-04-21 | Tiger200K: Manually Curated High Visual Quality Video Dataset from UGC Platform | Xianpan Zhou et.al. | 2504.15182 | null |
2025-04-21 | DyST-XL: Dynamic Layout Planning and Content Control for Compositional Text-to-Video Generation | Weijie He et.al. | 2504.15032 | null |
2025-04-21 | Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation | Chenjie Cao et.al. | 2504.14899 | link |
2025-04-20 | Turbo2K: Towards Ultra-Efficient and High-Quality 2K Video Synthesis | Jingjing Ren et.al. | 2504.14470 | null |
2025-04-19 | SphereDiff: Tuning-free Omnidirectional Panoramic Image and Video Generation via Spherical Latent Representation | Minho Park et.al. | 2504.14396 | link |
2025-04-21 | SkyReels-V2: Infinite-length Film Generative Model | Guibin Chen et.al. | 2504.13074 | link |
2025-04-21 | Packing Input Frame Context in Next-Frame Prediction Models for Video Generation | Lvmin Zhang et.al. | 2504.12626 | link |
2025-04-16 | VGDFR: Diffusion-based Video Generation with Dynamic Latent Frame Rate | Zhihang Yuan et.al. | 2504.12259 | link |
2025-04-16 | Modular-Cam: Modular Dynamic Camera-view Video Generation with LLM | Zirui Pan et.al. | 2504.12048 | null |
2025-04-16 | The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation | Bingjie Gao et.al. | 2504.11739 | null |
2025-04-17 | VideoPanda: Video Panoramic Diffusion with Multi-view Attention | Kevin Xie et.al. | 2504.11389 | null |
2025-04-15 | InterAnimate: Taming Region-aware Diffusion Model for Realistic Human Interaction Animation | Yukang Lin et.al. | 2504.10905 | null |
2025-04-15 | OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding | Dianbing Xi et.al. | 2504.10825 | null |
2025-04-14 | H-MoRe: Learning Human-centric Motion Representation for Action Analysis | Zhanbo Huang et.al. | 2504.10676 | link |
2025-04-14 | H3AE: High Compression, High Speed, and High Quality AutoEncoder for Video Diffusion Models | Yushu Wu et.al. | 2504.10567 | null |
2025-04-14 | FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos | Rui Chen et.al. | 2504.10358 | null |
2025-04-14 | Aligning Anime Video Generation with Human Feedback | Bingwen Zhu et.al. | 2504.10044 | null |
2025-04-14 | EquiVDM: Equivariant Video Diffusion Models with Temporally Consistent Noise | Chao Liu et.al. | 2504.09789 | null |
2025-04-13 | CamMimic: Zero-Shot Image To Camera Motion Personalized Video Generation Using Diffusion Models | Pooja Guhan et.al. | 2504.09472 | null |
2025-04-11 | Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model | Team Seawead et.al. | 2504.08685 | null |
2025-04-11 | Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization | Jialu Li et.al. | 2504.08641 | null |
2025-04-11 | Diffusion Models for Robotic Manipulation: A Survey | Rosa Wolf et.al. | 2504.08438 | null |
2025-04-11 | EasyGenNet: An Efficient Framework for Audio-Driven Gesture Video Generation Based on Diffusion Model | Renda Li et.al. | 2504.08344 | null |
2025-04-11 | RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements | Guangcong Zheng et.al. | 2504.08212 | link |
2025-04-11 | TokenMotion: Decoupled Motion Control via Token Disentanglement for Human-centric Video Generation | Ruineng Li et.al. | 2504.08181 | null |
2025-04-10 | Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction | Zeren Jiang et.al. | 2504.07961 | link |
2025-04-10 | Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos | Rundong Luo et.al. | 2504.07940 | null |
2025-04-10 | Diffusion Transformers for Tabular Data Time Series Generation | Fabrizio Garuti et.al. | 2504.07566 | link |
2025-04-09 | EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation | Diljeet Jagpal et.al. | 2504.06861 | null |
2025-04-09 | DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation | Wangbo Zhao et.al. | 2504.06803 | link |
2025-04-09 | RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism | Elia Peruzzo et.al. | 2504.06672 | null |
2025-04-09 | Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception | Ruotian Peng et.al. | 2504.06666 | null |
2025-04-08 | CamContextI2V: Context-aware Controllable Video Generation | Luis Denninger et.al. | 2504.06022 | link |
2025-04-07 | One-Minute Video Generation with Test-Time Training | Karan Dalal et.al. | 2504.05298 | null |
2025-04-07 | Video-Bench: Human-Aligned Video Generation Benchmark | Hui Han et.al. | 2504.04907 | null |
2025-04-05 | Video4DGen: Enhancing Video and 4D Generation through Mutual Optimization | Yikai Wang et.al. | 2504.04153 | link |
2025-04-05 | Multi-identity Human Image Animation with Structural Video Diffusion | Zhenzhi Wang et.al. | 2504.04126 | null |
2025-04-05 | Can You Count to Nine? A Human Evaluation Benchmark for Counting Limits in Modern Text-to-Video Models | Xuyang Guo et.al. | 2504.04051 | null |
2025-04-05 | DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion | Maksim Siniukov et.al. | 2504.04010 | null |
2025-04-04 | Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models | Xuran Ma et.al. | 2504.03140 | link |
2025-04-03 | How I Warped Your Noise: a Temporally-Correlated Noise Prior for Diffusion Models | Pascal Chang et.al. | 2504.03072 | null |
2025-04-03 | Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments | Chenyu Zhang et.al. | 2504.02918 | null |
2025-04-03 | Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets | Chuning Zhu et.al. | 2504.02792 | null |
2025-04-03 | Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model | Shengjun Zhang et.al. | 2504.02764 | null |
2025-04-04 | Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation | Fa-Ting Hong et.al. | 2504.02542 | link |
2025-04-03 | ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion Transfer | Jiayi Gao et.al. | 2504.02451 | link |
2025-04-03 | SkyReels-A2: Compose Anything in Video Diffusion Transformers | Zhengcong Fei et.al. | 2504.02436 | link |
2025-04-04 | MG-Gen: Single Image to Motion Graphics Generation with Layer Decomposition | Takahiro Shirakawa et.al. | 2504.02361 | null |
2025-04-03 | OmniCam: Unified Multimodal Video Generation via Camera Control | Xiaoda Yang et.al. | 2504.02312 | null |
2025-04-02 | WorldPrompter: Traversable Text-to-Scene Generation | Zhaoyang Zhang et.al. | 2504.02045 | null |
2025-04-03 | VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step | Hanyang Wang et.al. | 2504.01956 | null |
2025-04-01 | WorldScore: A Unified Evaluation Benchmark for World Generation | Haoyi Duan et.al. | 2504.00983 | null |
2025-04-01 | DecoFuse: Decomposing and Fusing the “What”, “Where”, and “How” for Brain-Inspired fMRI-to-Video Decoding | Chong Li et.al. | 2504.00432 | null |
2025-03-31 | GazeLLM: Multimodal LLMs incorporating Human Visual Attention | Jun Rekimoto et.al. | 2504.00221 | null |
2025-03-31 | Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation | Shengqiong Wu et.al. | 2503.24379 | null |
2025-04-01 | HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation | Boyuan Wang et.al. | 2503.24026 | null |
2025-03-31 | JointTuner: Appearance-Motion Adaptive Joint Training for Customized Video Generation | Fangda Chen et.al. | 2503.23951 | null |
2025-04-01 | On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices | Bosung Kim et.al. | 2503.23796 | link |
2025-03-31 | HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation | Kun Liu et.al. | 2503.23715 | null |
2025-03-30 | VideoGen-Eval: Agent-based System for Video Generation Evaluation | Yuhang Yang et.al. | 2503.23452 | link |
2025-03-30 | JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization | Kai Liu et.al. | 2503.23377 | null |
2025-04-02 | Towards Physically Plausible Video Generation via VLM Planning | Xindi Yang et.al. | 2503.23368 | null |
2025-03-30 | MoCha: Towards Movie-Grade Talking Character Synthesis | Cong Wei et.al. | 2503.23307 | null |
2025-03-30 | SketchVideo: Sketch-based Video Generation and Editing | Feng-Lin Liu et.al. | 2503.23284 | null |
2025-03-28 | Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion Model | Jangho Park et.al. | 2503.22622 | null |
2025-03-28 | EchoFlow: A Foundation Model for Cardiac Ultrasound Image and Video Generation | Hadrien Reynaud et.al. | 2503.22357 | null |
2025-03-28 | CoGen: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving | Yishen Ji et.al. | 2503.22231 | null |
2025-03-27 | VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models | Chi-Pin Huang et.al. | 2503.21781 | null |
2025-03-27 | Exploring the Evolution of Physics Cognition in Video Generation: A Survey | Minghui Lin et.al. | 2503.21765 | link |
2025-03-27 | VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness | Dian Zheng et.al. | 2503.21755 | link |
2025-03-27 | Audio-driven Gesture Generation via Deviation Feature in the Latent Space | Jiahui Chen et.al. | 2503.21616 | null |
2025-03-27 | ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model | Jinwei Qi et.al. | 2503.21144 | null |
2025-03-26 | RecTable: Fast Modeling Tabular Data with Rectified Flow | Masane Fuchi et.al. | 2503.20731 | link |
2025-03-26 | AccidentSim: Generating Physically Realistic Vehicle Collision Videos from Real-World Accident Reports | Xiangwen Zhang et.al. | 2503.20654 | null |
2025-03-26 | GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving | Lloyd Russell et.al. | 2503.20523 | null |
2025-03-26 | VPO: Aligning Text-to-Video Generation Models with Prompt Optimization | Jiale Cheng et.al. | 2503.20491 | link |
2025-03-26 | Wan: Open and Advanced Large-Scale Video Generative Models | WanTeam et.al. | 2503.20314 | link |
2025-03-26 | Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models | Prin Phunyaphibarn et.al. | 2503.20240 | null |
2025-03-26 | Video Motion Graphs | Haiyang Liu et.al. | 2503.20218 | null |
2025-03-25 | Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors | Yuke Lou et.al. | 2503.20118 | null |
2025-03-25 | Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals | Stefan Stojanov et.al. | 2503.19953 | null |
2025-03-25 | FullDiT: Multi-Task Video Generative Foundation Model with Full Attention | Xuan Ju et.al. | 2503.19907 | null |
2025-03-25 | Mask $^2$ DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation | Tianhao Qi et.al. | 2503.19881 | null |
2025-03-25 | AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers | Jiazhi Guan et.al. | 2503.19824 | null |
2025-03-25 | AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset | Haiyu Zhang et.al. | 2503.19462 | null |
2025-03-26 | Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing | Jaihoon Kim et.al. | 2503.19385 | null |
2025-03-25 | MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation | Yukang Lin et.al. | 2503.19383 | null |
2025-03-26 | EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models | Yufei Cai et.al. | 2503.19369 | link |
2025-03-25 | Long-Context Autoregressive Video Modeling with Next-Frame Prediction | Yuchao Gu et.al. | 2503.19325 | link |
2025-03-25 | Aether: Geometric-Aware Unified World Modeling | Aether Team et.al. | 2503.18945 | null |
2025-03-24 | Video-T1: Test-Time Scaling for Video Generation | Fangfu Liu et.al. | 2503.18942 | null |
2025-03-24 | Training-free Diffusion Acceleration with Bottleneck Sampling | Ye Tian et.al. | 2503.18940 | null |
2025-03-25 | AMD-Hummingbird: Towards an Efficient Text-to-Video Model | Takashi Isobe et.al. | 2503.18559 | link |
2025-03-24 | EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation | Qiang Qu et.al. | 2503.18552 | null |
2025-03-24 | Can Text-to-Video Generation help Video-Language Alignment? | Luca Zanella et.al. | 2503.18507 | null |
2025-03-24 | Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation | Dingcheng Zhen et.al. | 2503.18429 | null |
2025-03-24 | Resource-Efficient Motion Control for Video Generation via Dynamic Mask Guidance | Sicong Feng et.al. | 2503.18386 | null |
2025-03-23 | LongDiff: Training-Free Long Video Generation in One Go | Zhuoling Li et.al. | 2503.18150 | null |
2025-03-23 | TransAnimate: Taming Layer Diffusion to Generate RGBA Video | Xuewei Chen et.al. | 2503.17934 | null |
2025-03-21 | Position: Interactive Generative Video as Next-Generation Game Engine | Jiwen Yu et.al. | 2503.17359 | null |
2025-03-21 | AnimatePainter: A Self-Supervised Rendering Framework for Reconstructing Painting Process | Junjie Hu et.al. | 2503.17029 | null |
2025-03-21 | Enabling Versatile Controls for Video Diffusion Models | Xu Zhang et.al. | 2503.16983 | link |
2025-03-21 | Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model | Yingying Fan et.al. | 2503.16942 | null |
2025-03-20 | XAttention: Block Sparse Attention with Antidiagonal Scoring | Ruyi Xu et.al. | 2503.16428 | link |
2025-03-20 | MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance | Quanhao Li et.al. | 2503.16421 | null |
2025-03-20 | ScalingNoise: Scaling Inference-Time Search for Generating Infinite Videos | Haolin Yang et.al. | 2503.16400 | null |
2025-03-20 | PoseTraj: Pose-Aware Trajectory Control in Video Diffusion | Longbin Ji et.al. | 2503.16068 | null |
2025-03-20 | Animating the Uncaptured: Humanoid Mesh Animation with Video Diffusion Models | Marc Benedí San Millán et.al. | 2503.15996 | null |
2025-03-20 | MiLA: Multi-view Intensive-fidelity Long-term Video Generation World Model for Autonomous Driving | Haiguang Wang et.al. | 2503.15875 | link |
2025-03-20 | VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling | Hyojun Go et.al. | 2503.15855 | null |
2025-03-19 | Temporal Regularization Makes Your Video Generator Stronger | Harold Haodong Chen et.al. | 2503.15417 | null |
2025-03-20 | VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention | Mingzhe Zheng et.al. | 2503.15138 | null |
2025-03-18 | MusicInfuser: Making Video Diffusion Listen and Dance | Susung Hong et.al. | 2503.14505 | null |
2025-03-18 | MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation | Hongyu Zhang et.al. | 2503.14428 | null |
2025-03-18 | Impossible Videos | Zechen Bai et.al. | 2503.14378 | null |
2025-03-18 | LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models | Yu Cheng et.al. | 2503.14325 | link |
2025-03-18 | Concat-ID: Towards Universal Identity-Preserving Video Synthesis | Yong Zhong et.al. | 2503.14151 | null |
2025-03-18 | Fast Autoregressive Video Generation with Diagonal Decoding | Yang Ye et.al. | 2503.14070 | null |
2025-03-18 | AIGVE-Tool: AI-Generated Video Evaluation Toolkit with Multifaceted Benchmark | Xinhao Xiang et.al. | 2503.14064 | link |
2025-03-17 | Frame-wise Conditioning Adaptation for Fine-Tuning Diffusion Models in Text-to-Video Prediction | Zheyuan Liu et.al. | 2503.12953 | null |
2025-03-17 | AUTV: Creating Underwater Video Datasets with Pixel-wise Annotations | Quang Trung Truong et.al. | 2503.12828 | null |
2025-03-16 | SPC-GS: Gaussian Splatting with Semantic-Prompt Consistency for Indoor Open-World Free-view Synthesis from Sparse Inputs | Guibiao Liao et.al. | 2503.12535 | null |
2025-03-15 | A Speech-to-Video Synthesis Approach Using Spatio-Temporal Diffusion for Vocal Tract MRI | Paula Andrea Pérez-Toro et.al. | 2503.12102 | null |
2025-03-15 | SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering | Byeongjun Park et.al. | 2503.12024 | link |
2025-03-14 | ReCamMaster: Camera-Controlled Generative Rendering from A Single Video | Jianhong Bai et.al. | 2503.11647 | null |
2025-03-14 | HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models | Ziqin Zhou et.al. | 2503.11513 | null |
2025-03-14 | TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation | Hongxiang Zhao et.al. | 2503.11423 | null |
2025-03-14 | Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model | Haoyang Huang et.al. | 2503.11251 | link |
2025-03-14 | Cross-Modal Learning for Music-to-Music-Video Description Generation | Zhuoyuan Mao et.al. | 2503.11190 | null |
2025-03-13 | CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models | Hao He et.al. | 2503.10592 | null |
2025-03-13 | Long Context Tuning for Video Generation | Yuwei Guo et.al. | 2503.10589 | null |
2025-03-13 | CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance | Yufan Deng et.al. | 2503.10391 | null |
2025-03-13 | Semantic Latent Motion for Portrait Video Generation | Qiyuan Zhang et.al. | 2503.10096 | null |
2025-03-16 | VMBench: A Benchmark for Perception-Aligned Video Motion Generation | Xinran Ling et.al. | 2503.10076 | link |
2025-03-13 | UVE: Are MLLMs Unified Evaluators for AI-Generated Videos? | Yuanxin Liu et.al. | 2503.09949 | link |
2025-03-13 | VideoMerge: Towards Training-free Long Video Generation | Siyang Zhang et.al. | 2503.09926 | null |
2025-03-12 | LuciBot: Automated Robot Policy Learning from Generated Videos | Xiaowen Qiu et.al. | 2503.09871 | null |
2025-03-14 | On the Limitations of Vision-Language Models in Understanding Image Transforms | Ahmad Mustafa Anis et.al. | 2503.09837 | null |
2025-03-12 | I2V3D: Controllable image-to-video generation with 3D guidance | Zhiyuan Zhang et.al. | 2503.09733 | null |
2025-03-12 | PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop | Chenyu Li et.al. | 2503.09595 | link |
2025-03-12 | Unified Dense Prediction of Video Diffusion | Lehan Yang et.al. | 2503.09344 | null |
2025-03-12 | Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latant Space | Jian Zhu et.al. | 2503.09215 | null |
2025-03-13 | WonderVerse: Extendable 3D Scene Generation with Video Generative Models | Hao Feng et.al. | 2503.09160 | null |
2025-03-12 | Reangle-A-Video: 4D Video Generation as Video-to-Video Translation | Hyeonho Jeong et.al. | 2503.09151 | null |
2025-03-11 | REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder | Yitian Zhang et.al. | 2503.08665 | null |
2025-03-11 | Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled Sampling | Subin Kim et.al. | 2503.08605 | null |
2025-03-12 | $^R$ FLAV: Rolling Flow matching for infinite Audio Video generation | Alex Ergasti et.al. | 2503.08307 | link |
2025-03-11 | WISA: World Simulator Assistant for Physics-Aware Text-to-Video Generation | Jing Wang et.al. | 2503.08153 | null |
2025-03-11 | ObjectMover: Generative Object Movement with Video Prior | Xin Yu et.al. | 2503.08037 | null |
2025-03-11 | How Can Video Generative AI Transform K-12 Education? Examining Teachers’ Perspectives through TPACK and TAM | Unggi Lee et.al. | 2503.08003 | null |
2025-03-10 | DreamRelation: Relation-Centric Video Customization | Yujie Wei et.al. | 2503.07602 | null |
2025-03-11 | VACE: All-in-One Video Creation and Editing | Zeyinzi Jiang et.al. | 2503.07598 | null |
2025-03-10 | AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion | Mingzhen Sun et.al. | 2503.07418 | null |
2025-03-10 | Automated Movie Generation via Multi-Agent CoT Planning | Weijia Wu et.al. | 2503.07314 | link |
2025-03-09 | VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evaluation in Video Generation | Hritik Bansal et.al. | 2503.06800 | null |
2025-03-09 | TR-DQ: Time-Rotation Diffusion Quantization | Yihua Shao et.al. | 2503.06564 | null |
2025-03-09 | QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation | Junyi Wu et.al. | 2503.06545 | link |
2025-03-11 | LightMotion: A Light and Tuning-free Method for Simulating Camera Motion in Video Generation | Quanjian Song et.al. | 2503.06508 | link |
2025-03-09 | Generative Video Bi-flow | Chen Liu et.al. | 2503.06364 | null |
2025-03-08 | Text2Story: Advancing Video Storytelling with Text Guidance | Taewon Kang et.al. | 2503.06310 | null |
2025-03-08 | Object-Centric World Model for Language-Guided Manipulation | Youngjoon Jeong et.al. | 2503.06170 | null |
2025-03-08 | VACT: A Video Automatic Causal Testing System and a Benchmark | Haotong Yang et.al. | 2503.06163 | null |
2025-03-07 | MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio | Xuenan Xu et.al. | 2503.05242 | link |
2025-03-07 | Unified Reward Model for Multimodal Understanding and Generation | Yibin Wang et.al. | 2503.05236 | null |
2025-03-06 | Toward Lightweight and Fast Decoders for Diffusion Models in Image and Video Generation | Alexey Buzovkin et.al. | 2503.04871 | link |
2025-03-06 | FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video | Yue Gao et.al. | 2503.04720 | null |
2025-03-06 | What Are You Doing? A Closer Look at Controllable Human Video Generation | Emanuele Bugliarello et.al. | 2503.04666 | null |
2025-03-08 | The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation | Aoxiong Yin et.al. | 2503.04606 | link |
2025-03-05 | GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control | Xuanchi Ren et.al. | 2503.03751 | link |
2025-03-08 | Rethinking Video Tokenization: A Conditioned Diffusion-based Approach | Nianzu Yang et.al. | 2503.03708 | link |
2025-03-05 | DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance | Zhao Yang et.al. | 2503.03689 | link |
2025-03-05 | High-Quality Virtual Single-Viewpoint Surgical Video: Geometric Autocalibration of Multiple Cameras in Surgical Lights | Yuna Kato et.al. | 2503.03558 | link |
2025-03-05 | Video Super-Resolution: All You Need is a Video Diffusion Model | Zhihao Zhan et.al. | 2503.03355 | null |
2025-03-04 | GRADEO: Towards Human-Like Evaluation for Text-to-Video Generation via Multi-Step Reasoning | Zhun Mou et.al. | 2503.02341 | null |
2025-03-03 | VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation | Wenhao Wang et.al. | 2503.01739 | link |
2025-03-03 | VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors | Juil Koo et.al. | 2503.01107 | null |
2025-03-02 | Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think | Jie Tian et.al. | 2503.00948 | link |
2025-03-01 | Learning to Animate Images from A Few Videos to Portray Delicate Human Actions | Haoxin Li et.al. | 2503.00276 | null |
2025-03-04 | Unified Video Action Model | Shuang Li et.al. | 2503.00200 | null |
2025-02-28 | Raccoon: Multi-stage Diffusion Training with Coarse-to-Fine Curating Videos | Zhiyu Tan et.al. | 2502.21314 | null |
2025-02-28 | Training-free and Adaptive Sparse Attention for Efficient Long Video Generation | Yifei Xia et.al. | 2502.21079 | null |
2025-02-28 | HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models | Xiao Wang et.al. | 2502.20811 | null |
2025-02-28 | WorldModelBench: Judging Video Generation Models As World Models | Dacheng Li et.al. | 2502.20694 | null |
2025-02-27 | Mobius: Text to Seamless Looping Video Generation via Latent Shift | Xiuli Bi et.al. | 2502.20307 | link |
2025-02-27 | FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute | Sotiris Anagnostidis et.al. | 2502.20126 | null |
2025-02-27 | C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation | Yuhao Li et.al. | 2502.19868 | link |
2025-02-26 | FLAP: Fully-controllable Audio-driven Portrait Video Generation through 3D head conditioned diffusion mode | Lingzhou Mu et.al. | 2502.19455 | null |
2025-03-03 | TransVDM: Motion-Constrained Video Diffusion Model for Transparent Video Synthesis | Menghao Li et.al. | 2502.19454 | null |
2025-02-25 | SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference | Jintao Zhang et.al. | 2502.18137 | link |
2025-02-25 | ASurvey: Spatiotemporal Consistency in Video Generation | Zhiyu Yin et.al. | 2502.17863 | null |
2025-02-24 | X-Dancer: Expressive Music to Human Dance Video Generation | Zeyuan Chen et.al. | 2502.17414 | null |
2025-02-24 | VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing | Xiangpeng Yang et.al. | 2502.17258 | null |
2025-02-24 | Diffusion Models for Tabular Data: Challenges, Current Progress, and Future Directions | Zhong Li et.al. | 2502.17119 | link |
2025-02-21 | RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers | Min Zhao et.al. | 2502.15894 | null |
2025-02-21 | VaViM and VaVAM: Autonomous Driving through Video Generative Modeling | Florent Bartoccioni et.al. | 2502.15672 | link |
2025-02-20 | Hardware-Friendly Static Quantization Method for Video Diffusion Transformers | Sanghyun Yi et.al. | 2502.15077 | null |
2025-02-20 | LAVID: An Agentic LVLM Framework for Diffusion-Generated Video Detection | Qingyuan Liu et.al. | 2502.14994 | null |
2025-02-20 | Improving the Diffusability of Autoencoders | Ivan Skorokhodov et.al. | 2502.14831 | null |
2025-02-21 | RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers | Ke Cao et.al. | 2502.14377 | null |
2025-02-20 | Designing Parameter and Compute Efficient Diffusion Transformers using Distillation | Vignesh Sundaresha et.al. | 2502.14226 | null |
2025-02-19 | FantasyID: Face Knowledge Enhanced ID-Preserving Video Generation | Yunpeng Zhang et.al. | 2502.13995 | link |
2025-02-19 | LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation | Junchen Fu et.al. | 2502.12945 | null |
2025-02-18 | VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation | Xinlong Chen et.al. | 2502.12782 | link |
2025-02-18 | MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation | Sihyun Yu et.al. | 2502.12632 | null |
2025-02-17 | LaM-SLidE: Latent Space Modeling of Spatial Dynamical Systems via Linked Entities | Florian Sestak et.al. | 2502.12128 | link |
2025-02-17 | DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation | Zhihang Yuan et.al. | 2502.11897 | link |
2025-02-17 | Object-Centric Image to Video Generation with Language Guidance | Angel Villar-Corrales et.al. | 2502.11655 | null |
2025-02-16 | MaskFlow: Discrete Flows For Flexible and Efficient Long Video Generation | Michael Fuest et.al. | 2502.11234 | null |
2025-02-16 | Phantom: Subject-consistent video generation via cross-modal alignment | Lijie Liu et.al. | 2502.11079 | null |
2025-02-17 | Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model | Guoqing Ma et.al. | 2502.10248 | link |
2025-02-14 | RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control | Teng Li et.al. | 2502.10059 | null |
2025-02-14 | GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation | Hongyin Zhang et.al. | 2502.09268 | null |
2025-02-12 | CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation | Qinghe Wang et.al. | 2502.08639 | null |
2025-02-12 | FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis | Wonjoon Jin et.al. | 2502.08244 | null |
2025-02-12 | Learning Human Skill Generators at Key-Step Levels | Yilu Wu et.al. | 2502.08234 | null |
2025-02-12 | AnyCharV: Bootstrap Controllable Character Video Generation with Fine-to-Coarse Guidance | Zhao Wang et.al. | 2502.08189 | null |
2025-02-12 | Next Block Prediction: Video Generation via Semi-Autoregressive Modeling | Shuhuai Ren et.al. | 2502.07737 | null |
2025-02-14 | Magic 1-For-1: Generating One Minute Video Clips within One Minute | Hongwei Yi et.al. | 2502.07701 | link |
2025-02-12 | VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation | Sixiao Zheng et.al. | 2502.07531 | null |
2025-02-13 | Enhance-A-Video: Better Generated Video for Free | Yang Luo et.al. | 2502.07508 | link |
2025-02-11 | Generative Ghost: Investigating Ranking Bias Hidden in AI-Generated Videos | Haowen Gao et.al. | 2502.07327 | null |
2025-02-11 | Articulate That Object Part (ATOP): 3D Part Articulation from Text and Motion Personalization | Aditya Vora et.al. | 2502.07278 | null |
2025-02-11 | Contextual Gesture: Co-Speech Gesture Video Generation through Context-aware Gesture Representation | Pinxin Liu et.al. | 2502.07239 | null |
2025-02-10 | Lotus: Creating Short Videos From Long Videos With Abstractive and Extractive Summarization | Aadit Barua et.al. | 2502.07096 | null |
2025-02-10 | Conditional diffusion model with spatial attention and latent embedding for medical image segmentation | Behzad Hejrati et.al. | 2502.06997 | link |
2025-02-10 | Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT | Dongyang Liu et.al. | 2502.06782 | null |
2025-02-10 | History-Guided Video Diffusion | Kiwhan Song et.al. | 2502.06764 | null |
2025-02-10 | Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists | Bojia Zi et.al. | 2502.06734 | null |
2025-02-10 | TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models | Yangguang Li et.al. | 2502.06608 | link |
2025-02-10 | CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers | D. She et.al. | 2502.06527 | null |
2025-02-10 | Efficient-vDiT: Efficient Video Diffusion Transformers With Attention Tile | Hangliang Ding et.al. | 2502.06155 | null |
2025-02-08 | Towards AI-driven Sign Language Generation with Non-manual Markers | Han Zhang et.al. | 2502.05661 | null |
2025-02-08 | Training-Free Constrained Generation With Stable Diffusion Models | Stefano Zampini et.al. | 2502.05625 | null |
2025-02-08 | A Physical Coherence Benchmark for Evaluating Video Generation Models via Optical Flow-guided Frame Prediction | Yongfan Chen et.al. | 2502.05503 | link |
2025-02-07 | FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation | Shilong Zhang et.al. | 2502.05179 | link |
2025-02-07 | Goku: Flow Based Video Generative Foundation Models | Shoufa Chen et.al. | 2502.04896 | null |
2025-02-07 | HumanDiT: Pose-Guided Diffusion Transformer for Long-form Human Motion Video Generation | Qijun Gan et.al. | 2502.04847 | null |
2025-02-06 | Fast Video Generation with Sliding Tile Attention | Peiyuan Zhang et.al. | 2502.04507 | null |
2025-02-06 | UniCP: A Unified Caching and Pruning Framework for Efficient Video Generation | Wenzhang Sun et.al. | 2502.04393 | null |
2025-02-06 | MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation | Jinbo Xing et.al. | 2502.04299 | null |
2025-02-06 | Learning Real-World Action-Video Dynamics with Heterogeneous Masked Autoregression | Lirui Wang et.al. | 2502.04296 | null |
2025-02-06 | Content-Rich AIGC Video Quality Assessment via Intricate Text Alignment and Motion-Aware Consistency | Shangkun Sun et.al. | 2502.04076 | link |
2025-02-06 | UniForm: A Unified Diffusion Transformer for Audio-Video Generation | Lei Zhao et.al. | 2502.03897 | null |
2025-02-05 | Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach | Yunuo Chen et.al. | 2502.03639 | null |
2025-02-05 | FreqPrior: Improving Video Diffusion Models with Frequency Filtering Gaussian Noise | Yunlong Yuan et.al. | 2502.03496 | null |
2025-02-05 | MotionAgent: Fine-grained Controllable Video Generation via Motion Field Agent | Xinyao Liao et.al. | 2502.03207 | null |
2025-02-04 | Controllable Video Generation with Provable Disentanglement | Yifan Shen et.al. | 2502.02690 | null |
2025-02-04 | VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models | Hila Chefer et.al. | 2502.02492 | null |
2025-02-05 | IPO: Iterative Preference Optimization for Text-to-Video Generation | Xiaomeng Yang et.al. | 2502.02088 | null |
2025-02-03 | VILP: Imitation Learning with Latent Video Planning | Zhengtong Xu et.al. | 2502.01784 | link |
2025-02-03 | Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity | Haocheng Xi et.al. | 2502.01776 | null |
2025-02-05 | MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation | Haibo Tong et.al. | 2502.01719 | null |
2025-02-02 | HuViDPO:Enhancing Video Generation through Direct Preference Optimization for Human-Centric Alignment | Lifan Jiang et.al. | 2502.01690 | null |
2025-02-03 | Improved Training Technique for Latent Consistency Models | Quan Dao et.al. | 2502.01441 | link |
2025-02-03 | VidSketch: Hand-drawn Sketch-Driven Video Generation with Diffusion Control | Lifan Jiang et.al. | 2502.01101 | link |
2025-02-03 | OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models | Gaojie Lin et.al. | 2502.01061 | null |
2025-02-03 | Pushing the Boundaries of State Space Models for Image and Video Generation | Yicong Hong et.al. | 2502.00972 | null |
2025-01-31 | Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search | Yuta Oshima et.al. | 2501.19252 | null |
2025-01-30 | Every Image Listens, Every Image Dances: Music-Driven Image Animation | Zhikang Dong et.al. | 2501.18801 | null |
2025-01-28 | CascadeV: An Implementation of Wurstchen Architecture for Video Generation | Wenfeng Lin et.al. | 2501.16612 | link |
2025-01-26 | “See What I Imagine, Imagine What I See”: Human-AI Co-Creation System for 360 $^\circ$ Panoramic Video Generation in VR | Yunge Wen et.al. | 2501.15456 | null |
2025-01-24 | VideoShield: Regulating Diffusion-based Video Generation Models via Watermarking | Runyi Hu et.al. | 2501.14195 | link |
2025-01-23 | Improving Video Generation with Human Feedback | Jie Liu et.al. | 2501.13918 | null |
2025-01-23 | EchoVideo: Identity-Preserving Human Video Generation by Multimodal Feature Fusion | Jiangchuan Wei et.al. | 2501.13452 | null |
2025-01-21 | Taming Teacher Forcing for Masked Autoregressive Video Generation | Deyu Zhou et.al. | 2501.12389 | null |
2025-01-22 | Video Depth Anything: Consistent Depth Estimation for Super-Long Videos | Sili Chen et.al. | 2501.12375 | null |
2025-01-20 | GenVidBench: A Challenging Benchmark for Detecting AI-Generated Video | Zhenliang Ni et.al. | 2501.11340 | null |
2025-01-20 | CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation | Zheng Chong et.al. | 2501.11325 | link |
2025-01-18 | EMO2: End-Effector Guided Audio-Driven Avatar Video Generation | Linrui Tian et.al. | 2501.10687 | null |
2025-01-17 | DiffuEraser: A Diffusion Model for Video Inpainting | Xiaowen Li et.al. | 2501.10018 | link |
2025-01-17 | RichSpace: Enriching Text-to-Video Prompt Space via Text Embedding Interpolation | Yuefan Cao et.al. | 2501.09982 | null |
2025-01-16 | VideoWorld: Exploring Knowledge Learning from Unlabeled Videos | Zhongwei Ren et.al. | 2501.09781 | null |
2025-01-16 | Learnings from Scaling Visual Tokenizers for Reconstruction and Generation | Philippe Hansen-Estruch et.al. | 2501.09755 | null |
2025-01-14 | Do generative video models learn physical principles from watching videos? | Saman Motamed et.al. | 2501.09038 | link |
2025-01-15 | Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion | Jingyuan Chen et.al. | 2501.09019 | null |
2025-01-15 | RepVideo: Rethinking Cross-Layer Representation for Video Generation | Chenyang Si et.al. | 2501.08994 | null |
2025-01-15 | Comprehensive Subjective and Objective Evaluation Method for Text-generated Video | Zelu Qi et.al. | 2501.08545 | null |
2025-01-14 | Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models | Weichen Fan et.al. | 2501.08453 | null |
2025-01-14 | 3D Gaussian Splatting with Normal Information for Mesh Extraction and Improved Rendering | Meenakshi Krishnan et.al. | 2501.08370 | null |
2025-01-14 | GameFactory: Creating New Games with Generative Interactive Videos | Jiwen Yu et.al. | 2501.08325 | null |
2025-01-14 | Diffusion Adversarial Post-Training for One-Step Video Generation | Shanchuan Lin et.al. | 2501.08316 | null |
2025-01-14 | LayerAnimate: Layer-specific Control for Animation | Yuxue Yang et.al. | 2501.08295 | null |
2025-01-14 | FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors | Yabo Zhang et.al. | 2501.08225 | link |
2025-01-13 | BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations | Weixi Feng et.al. | 2501.07647 | null |
2025-01-13 | Training-Free Motion-Guided Video Generation with Enhanced Temporal Consistency Using Motion Consistency Loss | Xinyu Zhang et.al. | 2501.07563 | null |
2025-01-11 | Qffusion: Controllable Portrait Video Editing via Quadrant-Grid Attention Learning | Maomao Li et.al. | 2501.06438 | null |
2025-01-10 | MEt3R: Measuring Multi-View Consistency in Generated Images | Mohammad Asim et.al. | 2501.06336 | null |
2025-01-10 | Multi-subject Open-set Personalization in Video Generation | Tsai-Shien Chen et.al. | 2501.06187 | null |
2025-01-10 | VideoAuteur: Towards Long Narrative Video Generation | Junfei Xiao et.al. | 2501.06173 | null |
2025-01-08 | Tuning-Free Long Video Generation via Global-Local Collaborative Diffusion | Yongjia Ma et.al. | 2501.05484 | null |
2025-01-09 | Progressive Growing of Video Tokenizers for Highly Compressed Latent Spaces | Aniruddha Mahapatra et.al. | 2501.05442 | null |
2025-01-08 | ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning | Yuzhou Huang et.al. | 2501.04698 | null |
2025-01-08 | LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition | Bowen Hao et.al. | 2501.04204 | null |
2025-01-07 | Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers | Yuechen Zhang et.al. | 2501.03931 | link |
2025-01-09 | Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control | Zekai Gu et.al. | 2501.03847 | link |
2025-01-07 | Motion-Aware Generative Frame Interpolation | Guozhen Zhang et.al. | 2501.03699 | null |
2025-01-06 | License Plate Images Generation with Diffusion Models | Mariia Shpir et.al. | 2501.03374 | null |
2025-01-06 | Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation | Guy Yariv et.al. | 2501.03059 | null |
2025-01-06 | TransPixar: Advancing Text-to-Video Generation with Transparency | Luozhou Wang et.al. | 2501.03006 | link |
2025-01-06 | Brick-Diffusion: Generating Long Videos with Brick-to-Wall Denoising | Yunlong Yuan et.al. | 2501.02741 | null |
2025-01-05 | GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking | Weikang Bian et.al. | 2501.02690 | null |
2025-01-04 | Benchmark Evaluations, Applications, and Challenges of Large Vision Language Models: A Survey | Zongxia Li et.al. | 2501.02189 | link |
2025-01-03 | JoyGen: Audio-Driven 3D Depth-Aware Talking-Face Video Editing | Qili Wang et.al. | 2501.01798 | link |
2025-01-06 | VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control | Yuanpeng Tu et.al. | 2501.01427 | null |
2025-01-03 | Free-Form Motion Control: A Synthetic Video Generation Dataset with Controllable Camera and Object Motions | Xincheng Shuai et.al. | 2501.01425 | null |
2025-01-02 | On Unifying Video Generation and Camera Pose Estimation | Chun-Hao Paul Huang et.al. | 2501.01409 | null |
2025-01-01 | Beyond Text: Implementing Multimodal Large Language Model-Powered Multi-Agent Systems Using a No-Code Platform | Cheonsu Jeong et.al. | 2501.00750 | null |
2025-01-03 | DreamDrive: Generative 4D Scene Modeling from Street View Images | Jiageng Mao et.al. | 2501.00601 | null |
2024-12-30 | LTX-Video: Realtime Video Latent Diffusion | Yoav HaCohen et.al. | 2501.00103 | link |
2024-12-30 | Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model | Yifei Huang et.al. | 2412.21080 | link |
2024-12-30 | VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation | Jiazheng Xu et.al. | 2412.21059 | link |
2024-12-30 | ILDiff: Generate Transparent Animated Stickers by Implicit Layout Distillation | Ting Zhang et.al. | 2412.20901 | null |
2024-12-30 | Dialogue Director: Bridging the Gap in Dialogue Visualization for Multimodal Storytelling | Min Zhang et.al. | 2412.20725 | null |
2024-12-29 | Open-Sora: Democratizing Efficient Video Production for All | Zangwei Zheng et.al. | 2412.20404 | link |
2024-12-27 | Generative Video Propagation | Shaoteng Liu et.al. | 2412.19761 | null |
2024-12-30 | VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models | Tao Wu et.al. | 2412.19645 | null |
2024-12-30 | DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT | Xiaotao Hu et.al. | 2412.19505 | link |
2024-12-25 | Accelerating Diffusion Transformers with Dual Feature Caching | Chang Zou et.al. | 2412.18911 | link |
2024-12-24 | Video Is Worth a Thousand Images: Exploring the Latest Trends in Long Video Generation | Faraz Waseem et.al. | 2412.18688 | null |
2024-12-24 | DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers | Yuntao Chen et.al. | 2412.18607 | null |
2024-12-24 | ZeroHSI: Zero-Shot 4D Human-Scene Interaction by Video Generation | Hongjie Li et.al. | 2412.18600 | null |
2024-12-24 | DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation | Minghong Cai et.al. | 2412.18597 | link |
2024-12-23 | Large Motion Video Autoencoding with Cross-modal Video VAE | Yazhou Xing et.al. | 2412.17805 | null |
2024-12-23 | VidTwin: Video VAE with Decoupled Structure and Dynamics | Yuchi Wang et.al. | 2412.17726 | link |
2024-12-23 | FFA Sora, video generation as fundus fluorescein angiography simulator | Xinyuan Wu et.al. | 2412.17346 | null |
2024-12-23 | Enhancing Multi-Text Long Video Generation Consistency without Tuning: Time-Frequency Analysis, Prompt Alignment, and Theory | Xingyao Li et.al. | 2412.17254 | null |
2024-12-22 | SubstationAI: Multimodal Large Model-Based Approaches for Analyzing Substation Equipment Faults | Jinzhi Wang et.al. | 2412.17077 | null |
2024-12-22 | Adapting Image-to-Video Diffusion Models for Large-Motion Frame Interpolation | Luoxu Jin et.al. | 2412.17042 | null |
2024-12-21 | GANFusion: Feed-Forward Text-to-3D with Diffusion in GAN Space | Souhaib Attaiki et.al. | 2412.16717 | null |
2024-12-21 | TCAQ-DM: Timestep-Channel Adaptive Quantization for Diffusion Models | Haocheng Huang et.al. | 2412.16700 | null |
2024-12-21 | VAST 1.0: A Unified Framework for Controllable and Consistent Video Generation | Chi Zhang et.al. | 2412.16677 | null |
2024-12-21 | Follow-Your-MultiPose: Tuning-Free Multi-Character Text-to-Video Generation via Pose Guidance | Beiyuan Zhang et.al. | 2412.16495 | null |
2024-12-20 | DOLLAR: Few-Step Video Generation via Distillation and Latent Reward Optimization | Zihan Ding et.al. | 2412.15689 | null |
2024-12-20 | CustomTTT: Motion and Appearance Customized Video Generation via Test-Time Training | Xiuli Bi et.al. | 2412.15646 | link |
2024-12-19 | AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation | Moayed Haji-Ali et.al. | 2412.15191 | null |
2024-12-19 | Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM | Yatai Ji et.al. | 2412.15156 | link |
2024-12-19 | Parallelized Autoregressive Visual Generation | Yuqing Wang et.al. | 2412.15119 | null |
2024-12-19 | Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations | Yucheng Hu et.al. | 2412.14803 | null |
2024-12-19 | Consistent Human Image and Video Generation with Spatially Conditioned Diffusion | Mingdeng Cao et.al. | 2412.14531 | link |
2024-12-19 | DirectorLLM for Human-Centric Video Generation | Kunpeng Song et.al. | 2412.14484 | null |
2024-12-18 | Autoregressive Video Generation without Vector Quantization | Haoge Deng et.al. | 2412.14169 | link |
2024-12-18 | VideoDPO: Omni-Preference Alignment for Video Diffusion Generation | Runtao Liu et.al. | 2412.14167 | null |
2024-12-18 | AKiRa: Augmentation Kit on Rays for optical video generation | Xi Wang et.al. | 2412.14158 | null |
2024-12-18 | SurgSora: Decoupled RGBD-Flow Diffusion Model for Controllable Surgical Video Generation | Tong Chen et.al. | 2412.14018 | null |
2024-12-18 | Real-time One-Step Diffusion-based Expressive Portrait Videos Generation | Hanzhong Guo et.al. | 2412.13479 | link |
2024-12-18 | SAVGBench: Benchmarking Spatially Aligned Audio-Video Generation | Kazuki Shimada et.al. | 2412.13462 | null |
2024-12-17 | CompactFlowNet: Efficient Real-time Optical Flow Estimation on Mobile Devices | Andrei Znobishchev et.al. | 2412.13273 | null |
2024-12-17 | MotionBridge: Dynamic Video Inbetweening with Flexible Controls | Maham Tanveer et.al. | 2412.13190 | null |
2024-12-17 | VidTok: A Versatile and Open-Source Video Tokenizer | Anni Tang et.al. | 2412.13061 | link |
2024-12-16 | Can video generation replace cinematographers? Research on the cinematic language of generated video | Xiaozhe Li et.al. | 2412.12223 | null |
2024-12-16 | InterDyn: Controllable Interactive Dynamics with Video Diffusion Models | Rick Akkerman et.al. | 2412.11785 | null |
2024-12-16 | Generative Inbetweening through Frame-wise Conditions-Driven Video Generation | Tianyi Zhu et.al. | 2412.11755 | link |
2024-12-16 | VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting | Muhammet Furkan Ilaslan et.al. | 2412.11621 | link |
2024-12-15 | GenLit: Reformulating Single-Image Relighting as Video Generation | Shrisha Bharadwaj et.al. | 2412.11224 | null |
2024-12-15 | DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes | Jinxiu Liu et.al. | 2412.11100 | null |
2024-12-14 | Video Diffusion Transformers are In-Context Learners | Zhengcong Fei et.al. | 2412.10783 | link |
2024-12-13 | SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device | Yushu Wu et.al. | 2412.10494 | null |
2024-12-16 | TIV-Diffusion: Towards Object-Centric Movement for Text-driven Image to Video Generation | Xingrui Wang et.al. | 2412.10275 | null |
2024-12-13 | Exploring the Frontiers of Animation Video Generation in the Sora Era: Method, Dataset and Benchmark | Yudong Jiang et.al. | 2412.10255 | link |
2024-12-13 | LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity | Hongjie Wang et.al. | 2412.09856 | null |
2024-12-13 | MSC: Multi-Scale Spatio-Temporal Causal Attention for Autoregressive Video Diffusion | Xunnong Xu et.al. | 2412.09828 | null |
2024-12-12 | Doe-1: Closed-Loop Autonomous Driving with Large World Model | Wenzhao Zheng et.al. | 2412.09627 | link |
2024-12-12 | OmniDrag: Enabling Motion Control for Omnidirectional Image-to-Video Generation | Weiqi Li et.al. | 2412.09623 | null |
2024-12-12 | Owl-1: Omni World Model for Consistent Long Video Generation | Yuanhui Huang et.al. | 2412.09600 | link |
2024-12-12 | LiftImage3D: Lifting Any Single Image to 3D Gaussians with Video Generation Priors | Yabo Chen et.al. | 2412.09597 | null |
2024-12-12 | Video Creation by Demonstration | Yihong Sun et.al. | 2412.09551 | null |
2024-12-12 | UFO: Enhancing Diffusion-Based Video Generation with a Uniform Frame Organizer | Delong Liu et.al. | 2412.09389 | link |
2024-12-12 | T-SVG: Text-Driven Stereoscopic Video Generation | Qiao Jin et.al. | 2412.09323 | null |
2024-12-12 | InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption | Tiehan Fan et.al. | 2412.09283 | null |
2024-12-12 | LVMark: Robust Watermark for latent video diffusion models | MinHyuk Jang et.al. | 2412.09122 | null |
2024-12-12 | Enhancing Facial Consistency in Conditional Video Generation via Facial Landmark Transformation | Lianrui Mu et.al. | 2412.08976 | null |
2024-12-11 | Pysical Informed Driving World Model | Zhuoran Yang et.al. | 2412.08410 | null |
2024-12-11 | FLIP: Flow-Centric Generative Planning for General-Purpose Manipulation Tasks | Chongkai Gao et.al. | 2412.08261 | null |
2024-12-11 | VSD2M: A Large-scale Vision-language Sticker Dataset for Multi-frame Animated Sticker Generation | Zhiqiang Yuan et.al. | 2412.08259 | null |
2024-12-11 | UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics | Xi Chen et.al. | 2412.07774 | null |
2024-12-10 | From Slow Bidirectional to Fast Causal Video Generators | Tianwei Yin et.al. | 2412.07772 | null |
2024-12-10 | SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints | Jianhong Bai et.al. | 2412.07760 | link |
2024-12-10 | 3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation | Xiao Fu et.al. | 2412.07759 | null |
2024-12-10 | Multi-Shot Character Consistency for Text-to-Video Generation | Yuval Atzmon et.al. | 2412.07750 | null |
2024-12-10 | StyleMaster: Stylize Your Video with Artistic Generation and Translation | Zixuan Ye et.al. | 2412.07744 | null |
2024-12-10 | STIV: Scalable Text and Image Conditioned Video Generation | Zongyu Lin et.al. | 2412.07730 | null |
2024-12-10 | ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer | Jinyi Hu et.al. | 2412.07720 | link |
2024-12-09 | SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations | Zhaorun Chen et.al. | 2412.06878 | null |
2024-12-08 | Latent-Reframe: Enabling Camera Control for Video Diffusion Model without Training | Zhenghong Zhou et.al. | 2412.06029 | null |
2024-12-08 | FlexDiT: Dynamic Token Density Control for Diffusion Transformer | Shuning Chang et.al. | 2412.06028 | link |
2024-12-08 | Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation | Hyeonho Jeong et.al. | 2412.06016 | null |
2024-12-08 | Accelerating Video Diffusion Models via Distribution Matching | Yuanzhi Zhu et.al. | 2412.05899 | null |
2024-12-08 | MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation | Shuwei Shi et.al. | 2412.05848 | null |
2024-12-08 | Self-Guidance: Boosting Flow and Diffusion Generation on Their Own | Tiancheng Li et.al. | 2412.05827 | null |
2024-12-07 | Combining Genre Classification and Harmonic-Percussive Features with Diffusion Models for Music-Video Generation | Leonardo Pina et.al. | 2412.05694 | null |
2024-12-06 | Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model | Lening Wang et.al. | 2412.05280 | link |
2024-12-06 | Mind the Time: Temporally-Controlled Multi-Event Video Generation | Ziyi Wu et.al. | 2412.05263 | null |
2024-12-06 | UniMLVG: Unified Framework for Multi-view Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving | Rui Chen et.al. | 2412.04842 | link |
2024-12-05 | Using Diffusion Priors for Video Amodal Segmentation | Kaihua Chen et.al. | 2412.04623 | null |
2024-12-05 | PaintScene4D: Consistent 4D Scene Generation from Text Prompts | Vinayak Gupta et.al. | 2412.04471 | null |
2024-12-05 | MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation | Longtao Zheng et.al. | 2412.04448 | null |
2024-12-05 | DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models | Yizhuo Li et.al. | 2412.04446 | null |
2024-12-05 | GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration | Kaiyi Huang et.al. | 2412.04440 | null |
2024-12-05 | Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation | Yuying Ge et.al. | 2412.04432 | link |
2024-12-05 | Instructional Video Generation | Yayuan Li et.al. | 2412.04189 | null |
2024-12-05 | IF-MDM: Implicit Face Motion Diffusion Model for High-Fidelity Realtime Talking Head Generation | Sejong Yang et.al. | 2412.04000 | null |
2024-12-05 | DiffSign: AI-Assisted Generation of Customizable Sign Language Videos With Enhanced Realism | Sudha Krishnamurthy et.al. | 2412.03878 | link |
2024-12-05 | Movie Gen: SWOT Analysis of Meta’s Generative AI Foundation Model for Transforming Media Generation, Advertising, and Entertainment Industries | Abul Ehtesham et.al. | 2412.03837 | null |
2024-12-04 | Advancing Auto-Regressive Continuation for Video Frames | Ruibo Ming et.al. | 2412.03758 | null |
2024-12-04 | Navigation World Models | Amir Bar et.al. | 2412.03572 | null |
2024-12-04 | Imagine360: Immersive 360 Video Generation from Perspective Anchor | Jing Tan et.al. | 2412.03552 | null |
2024-12-04 | Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention | Hannan Lu et.al. | 2412.03520 | null |
2024-12-04 | SINGER: Vivid Audio-driven Singing Video Generation with Multi-scale Spectral Diffusion Model | Yan Li et.al. | 2412.03430 | null |
2024-12-04 | MaterialPicker: Multi-Modal Material Generation with Diffusion Transformers | Xiaohe Ma et.al. | 2412.03225 | null |
2024-12-04 | Mimir: Improving Video Diffusion Models for Precise Text Understanding | Shuai Tan et.al. | 2412.03085 | null |
2024-12-03 | Motion Prompting: Controlling Video Generation with Motion Trajectories | Daniel Geng et.al. | 2412.02700 | null |
2024-12-03 | AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction | Lingteng Qiu et.al. | 2412.02684 | null |
2024-12-03 | Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback | Hiroki Furuta et.al. | 2412.02617 | null |
2024-12-03 | VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation | Mingzhe Zheng et.al. | 2412.02259 | link |
2024-12-02 | World-consistent Video Diffusion with Explicit 3D Modeling | Qihang Zhang et.al. | 2412.01821 | null |
2024-12-02 | Driving Scene Synthesis on Free-form Trajectories with Generative Prior | Zeyu Yang et.al. | 2412.01717 | null |
2024-12-04 | InfinityDrive: Breaking Time Limits in Driving World Models | Xi Guo et.al. | 2412.01522 | null |
2024-12-02 | CPA: Camera-pose-awareness Diffusion Transformer for Video Generation | Yuelei Wang et.al. | 2412.01429 | null |
2024-12-02 | MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models | Xiaomin Li et.al. | 2412.01343 | null |
2024-12-02 | Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation | Xin Yan et.al. | 2412.01316 | null |
2024-11-29 | Fleximo: Towards Flexible Text-to-Human Motion Video Generation | Yuhang Zhang et.al. | 2411.19459 | null |
2024-11-28 | Trajectory Attention for Fine-grained Video Motion Control | Zeqi Xiao et.al. | 2411.19324 | null |
2024-11-28 | MSG score: A Comprehensive Evaluation for Multi-Scene Video Generation | Daewon Yoon et.al. | 2411.19121 | null |
2024-11-28 | Timestep Embedding Tells: It’s Time to Cache for Video Diffusion Model | Feng Liu et.al. | 2411.19108 | null |
2024-11-28 | SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing | Rong-Cheng Tu et.al. | 2411.18983 | null |
2024-12-02 | AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers | Sherwin Bahmani et.al. | 2411.18673 | null |
2024-11-27 | Towards Chunk-Wise Generation for Long Videos | Siyang Zhang et.al. | 2411.18668 | null |
2024-11-27 | Individual Content and Motion Dynamics Preserved Pruning for Video Diffusion Models | Yiming Wu et.al. | 2411.18375 | null |
2024-11-30 | MotionCharacter: Identity-Preserving and Motion Controllable Human Video Generation | Haopeng Fang et.al. | 2411.18281 | null |
2024-11-26 | Passive Deepfake Detection Across Multi-modalities: A Comprehensive Survey | Hong-Hanh Nguyen-Le et.al. | 2411.17911 | null |
2024-11-27 | Accelerating Vision Diffusion Transformers with Skip Branches | Guanjie Chen et.al. | 2411.17616 | link |
2024-11-26 | Identity-Preserving Text-to-Video Generation by Frequency Decomposition | Shenghai Yuan et.al. | 2411.17440 | link |
2024-11-26 | AnchorCrafter: Animate CyberAnchors Saling Your Products via Human-Object Interacting Video Generation | Ziyi Xu et.al. | 2411.17383 | null |
2024-11-26 | AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM | Jiarui Wang et.al. | 2411.17221 | link |
2024-11-28 | PhysMotion: Physics-Grounded Dynamics From a Single Image | Xiyang Tan et.al. | 2411.17189 | null |
2024-11-26 | PersonalVideo: High ID-Fidelity Video Customization without Dynamic and Semantic Degradation | Hengjia Li et.al. | 2411.17048 | null |
2024-11-26 | Free $^2$ Guide: Gradient-Free Path Integral Control for Enhancing Text-to-Video Generation with Large Vision-Language Models | Jaemin Kim et.al. | 2411.17041 | null |
2024-11-25 | Pathways on the Image Manifold: Image Editing via Video Generation | Noam Rotstein et.al. | 2411.16819 | null |
2024-11-25 | DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation | Zun Wang et.al. | 2411.16657 | null |
2024-11-25 | Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric | Zhichao Zhang et.al. | 2411.16619 | null |
2024-11-25 | Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing | Kaifeng Gao et.al. | 2411.16375 | link |
2024-11-23 | Optical-Flow Guided Prompt Optimization for Coherent Video Generation | Hyelin Nam et.al. | 2411.15540 | null |
2024-11-22 | MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation | Weijia Wu et.al. | 2411.15262 | link |
2024-11-22 | VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement | Daeun Lee et.al. | 2411.15115 | null |
2024-11-21 | Understanding World or Predicting Future? A Comprehensive Survey of World Models | Jingtao Ding et.al. | 2411.14499 | null |
2024-11-21 | StereoCrafter-Zero: Zero-Shot Stereo Video Generation with Noisy Restart | Jian Shi et.al. | 2411.14295 | link |
2024-11-21 | TaQ-DiT: Time-aware Quantization for Diffusion Transformers | Xinyan Liu et.al. | 2411.14172 | null |
2024-11-21 | MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control | Ruiyuan Gao et.al. | 2411.13807 | null |
2024-11-20 | What You See Is What Matters: A Novel Visual and Physics-Based Metric for Evaluating Video Generation Quality | Zihan Wang et.al. | 2411.13609 | null |
2024-11-20 | REDUCIO! Generating 1024 $\times$ 1024 Video within 16 Seconds using Extremely Compressed Motion Latents | Rui Tian et.al. | 2411.13552 | link |
2024-11-20 | VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models | Ziqi Huang et.al. | 2411.13503 | link |
2024-11-19 | Towards motion from video diffusion models | Paul Janson et.al. | 2411.12831 | null |
2024-11-19 | Automated 3D Physical Simulation of Open-world Scene with Gaussian Splatting | Haoyu Zhao et.al. | 2411.12789 | null |
2024-11-19 | PoM: Efficient Image and Video Generation with the Polynomial Mixer | David Picard et.al. | 2411.12663 | link |
2024-11-18 | Medical Video Generation for Disease Progression Simulation | Xu Cao et.al. | 2411.11943 | null |
2024-11-18 | SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular Input | Zhen Lv et.al. | 2411.11934 | null |
2024-11-19 | SoK: On the Role and Future of AIGC Watermarking in the Era of Gen-AI | Kui Ren et.al. | 2411.11478 | null |
2024-11-18 | Teaching Video Diffusion Model with Latent Physical Phenomenon Knowledge | Qinglong Cao et.al. | 2411.11343 | null |
2024-11-17 | SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration | Jintao Zhang et.al. | 2411.10958 | link |
2024-11-16 | ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models | Vipula Rawte et.al. | 2411.10867 | null |
2024-11-16 | AnimateAnything: Consistent and Controllable Animation for Video Generation | Guojun Lei et.al. | 2411.10836 | null |
2024-11-15 | OnlyFlow: Optical Flow based Motion Conditioning for Video Diffusion Models | Mathis Koroglu et.al. | 2411.10501 | null |
2024-11-14 | Advancing Diffusion Models: Alias-Free Resampling and Enhanced Rotational Equivariance | Md Fahim Anjum et.al. | 2411.09174 | null |
2024-11-14 | VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation | Youpeng Wen et.al. | 2411.09153 | null |
2024-11-16 | A Survey on Vision Autoregressive Model | Kai Jiang et.al. | 2411.08666 | null |
2024-11-13 | EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation | Xiaofeng Wang et.al. | 2411.08380 | null |
2024-11-13 | Motion Control for Enhanced Complex Action Video Generation | Qiang Zhou et.al. | 2411.08328 | null |
2024-11-12 | Artificial Intelligence for Biomedical Video Generation | Linyuan Li et.al. | 2411.07619 | null |
2024-11-10 | I2VControl-Camera: Precise Video Camera Control with Adjustable Motion Strength | Wanquan Feng et.al. | 2411.06525 | null |
2024-11-08 | Autoregressive Models in Vision: A Survey | Jing Xiong et.al. | 2411.05902 | link |
2024-11-08 | WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making | Zhilong Zhang et.al. | 2411.05619 | null |
2024-11-07 | SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation | Koichi Namekata et.al. | 2411.04989 | null |
2024-11-07 | Uncovering Hidden Subspaces in Video Diffusion Models Using Re-Identification | Mischa Dombrowski et.al. | 2411.04956 | null |
2024-11-07 | DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion | Wenqiang Sun et.al. | 2411.04928 | null |
2024-11-11 | StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration | Panwen Hu et.al. | 2411.04925 | null |
2024-11-07 | MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views | Yuedong Chen et.al. | 2411.04924 | link |
2024-11-07 | Taming Rectified Flow for Inversion and Editing | Jiangshan Wang et.al. | 2411.04746 | link |
2024-11-05 | TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation | Wenhao Wang et.al. | 2411.04709 | null |
2024-11-05 | Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey | Ao Fu et.al. | 2411.02914 | null |
2024-11-07 | Adaptive Caching for Faster Video Generation with Diffusion Transformers | Kumara Kahatapitiya et.al. | 2411.02397 | null |
2024-11-04 | How Far is Video Generation from World Model: A Physical Law Perspective | Bingyi Kang et.al. | 2411.02385 | null |
2024-11-03 | Optical Flow Representation Alignment Mamba Diffusion Model for Medical Video Generation | Zhenbin Wang et.al. | 2411.01647 | null |
2024-11-02 | Fast and Memory-Efficient Video Diffusion Using Streamlined Inference | Zheng Zhan et.al. | 2411.01171 | link |
2024-11-01 | GameGen-X: Interactive Open-world Game Video Generation | Haoxuan Che et.al. | 2411.00769 | link |
2024-11-04 | Fashion-VDM: Video Diffusion Model for Virtual Try-On | Johanna Karras et.al. | 2411.00225 | null |
2024-10-31 | Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning | Penghui Ruan et.al. | 2410.24219 | link |
2024-10-31 | Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts | Xiang Deng et.al. | 2410.23836 | null |
2024-10-31 | SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation | Yining Hong et.al. | 2410.23277 | null |
2024-10-30 | LumiSculpt: A Consistency Lighting Control Network for Video Generation | Yuxin Zhang et.al. | 2410.22979 | null |
2024-10-30 | HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models | Shengkai Zhang et.al. | 2410.22901 | link |
2024-10-29 | Investigating Memorization in Video Diffusion Models | Chen Chen et.al. | 2410.21669 | null |
2024-10-28 | LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior | Hanyu Wang et.al. | 2410.21264 | null |
2024-10-28 | Video to Video Generative Adversarial Network for Few-shot Learning Based on Policy Gradient | Yintai Ma et.al. | 2410.20657 | null |
2024-10-27 | ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation | Zongyi Li et.al. | 2410.20502 | null |
2024-10-26 | MarDini: Masked Autoregressive Diffusion for Video Generation at Scale | Haozhe Liu et.al. | 2410.20280 | null |
2024-10-26 | Your Image is Secretly the Last Frame of a Pseudo Video | Wenlong Chen et.al. | 2410.20158 | null |
2024-10-26 | GiVE: Guiding Visual Encoder to Perceive Overlooked Information | Junjie Li et.al. | 2410.20109 | null |
2024-10-26 | GHIL-Glue: Hierarchical Control with Filtered Subgoal Images | Kyle B. Hatch et.al. | 2410.20018 | null |
2024-10-25 | FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality | Zhengyao Lv et.al. | 2410.19355 | null |
2024-10-24 | Framer: Interactive Frame Interpolation | Wen Wang et.al. | 2410.18978 | null |
2024-10-24 | Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances | Shilin Lu et.al. | 2410.18775 | link |
2024-10-23 | WorldSimBench: Towards Video Generation Models as World Simulators | Yiran Qin et.al. | 2410.18072 | null |
2024-10-23 | VISAGE: Video Synthesis using Action Graphs for Surgery | Yousef Yeganeh et.al. | 2410.17751 | null |
2024-10-21 | 3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors | Xi Liu et.al. | 2410.16266 | null |
2024-10-20 | EVA: An Embodied World Model for Future Video Anticipation | Xiaowei Chi et.al. | 2410.15461 | null |
2024-10-20 | Allegro: Open the Black Box of Commercial-Level Video Generation Model | Yuan Zhou et.al. | 2410.15458 | link |
2024-10-20 | FrameBridge: Improving Image-to-Video Generation with Bridge Models | Yuji Wang et.al. | 2410.15371 | null |
2024-10-27 | VidPanos: Generative Panoramic Videos from Casual Panning Videos | Jingwei Ma et.al. | 2410.13832 | null |
2024-10-17 | DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control | Yujie Wei et.al. | 2410.13830 | null |
2024-10-18 | DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation | Hanbo Cheng et.al. | 2410.13726 | link |
2024-10-17 | Movie Gen: A Cast of Media Foundation Models | Adam Polyak et.al. | 2410.13720 | link |
2024-10-21 | DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation | Guosheng Zhao et.al. | 2410.13571 | null |
2024-10-18 | Fundus to Fluorescein Angiography Video Generation as a Retinal Generative Foundation Model | Weiyi Zhang et.al. | 2410.13242 | null |
2024-10-17 | AsymKV: Enabling 1-Bit Quantization of KV Cache with Layer-Wise Asymmetric Quantization Configurations | Qian Tao et.al. | 2410.13212 | null |
2024-10-16 | SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation | Jaehong Yoon et.al. | 2410.12761 | null |
2024-10-16 | Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices | Zhiyuan Ma et.al. | 2410.11795 | null |
2024-10-14 | Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models | Jingzhi Bao et.al. | 2410.10821 | link |
2024-10-14 | LVD-2M: A Long-take Video Dataset with Temporally Dense Captions | Tianwei Xiong et.al. | 2410.10816 | link |
2024-10-14 | Boosting Camera Motion Control for Video Diffusion Transformers | Soon Yau Cheong et.al. | 2410.10802 | null |
2024-10-14 | Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention | Dejia Xu et.al. | 2410.10774 | null |
2024-10-14 | DragEntity: Trajectory Guided Video Generation using Entity and Positional Relationships | Zhang Wan et.al. | 2410.10751 | null |
2024-10-16 | MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting | Yue Zhang et.al. | 2410.10122 | link |
2024-10-15 | VideoAgent: Self-Improving Video Generation | Achint Soni et.al. | 2410.10076 | link |
2024-10-11 | Quality Prediction of AI Generated Images and Videos: Emerging Trends and Opportunities | Abhijay Ghildyal et.al. | 2410.08534 | null |
2024-10-10 | Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content | Qiuheng Wang et.al. | 2410.08260 | null |
2024-10-10 | Scaling Laws For Diffusion Transformers | Zhengyang Liang et.al. | 2410.08184 | null |
2024-10-10 | Progressive Autoregressive Video Diffusion Models | Desai Xie et.al. | 2410.08151 | link |
2024-10-10 | HARIVO: Harnessing Text-to-Image Models for Video Generation | Mingi Kwon et.al. | 2410.07763 | null |
2024-10-10 | Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation | Jiahao Cui et.al. | 2410.07718 | link |
2024-10-10 | MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion | Onkar Susladkar et.al. | 2410.07659 | link |
2024-10-09 | Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis | Bohan Zeng et.al. | 2410.07155 | link |
2024-10-08 | BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way | Jiazi Bu et.al. | 2410.06241 | null |
2024-10-08 | GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation | Chi-Lam Cheang et.al. | 2410.06158 | null |
2024-10-08 | Pyramidal Flow Matching for Efficient Video Generative Modeling | Yang Jin et.al. | 2410.05954 | link |
2024-10-08 | SeeClear: Semantic Distillation Enhances Pixel Condensation for Video Super-Resolution | Qi Tang et.al. | 2410.05799 | link |
2024-10-08 | T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design | Jiachen Li et.al. | 2410.05677 | null |
2024-10-08 | ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler | Serin Yang et.al. | 2410.05651 | null |
2024-10-08 | TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation | Gihyun Kwon et.al. | 2410.05591 | link |
2024-10-07 | Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation | Fanqing Meng et.al. | 2410.05363 | link |
2024-10-10 | The Dawn of Video Generation: Preliminary Explorations with SORA-like Models | Ailing Zeng et.al. | 2410.05227 | null |
2024-10-07 | Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality | Ge Ya et.al. | 2410.05203 | link |
2024-10-07 | ACDC: Autoregressive Coherent Multimodal Generation using Diffusion Correction | Hyungjin Chung et.al. | 2410.04721 | null |
2024-10-06 | Realizing Video Summarization from the Path of Language-based Semantic Understanding | Kuan-Chen Mu et.al. | 2410.04511 | null |
2024-10-03 | People are poorly equipped to detect AI-powered voice clones | Sarah Barrington et.al. | 2410.03791 | null |
2024-10-04 | Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach | Yaofang Liu et.al. | 2410.03160 | link |
2024-10-04 | ECHOPulse: ECG controlled echocardio-grams video generation | Yiwei Li et.al. | 2410.03143 | link |
2024-10-03 | Loong: Generating Minute-level Long Videos with Autoregressive Language Models | Yuqing Wang et.al. | 2410.02757 | null |
2024-10-03 | SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration | Jintao Zhang et.al. | 2410.02367 | link |
2024-10-02 | COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation | Mingzhen Sun et.al. | 2410.01718 | null |
2024-10-02 | MM-LDM: Multi-Modal Latent Diffusion Model for Sounding Video Generation | Mingzhen Sun et.al. | 2410.01594 | link |
2024-10-01 | Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining | Jie Cheng et.al. | 2410.00564 | link |
2024-09-30 | ImmersePro: End-to-End Stereo Video Synthesis Via Implicit Disparity Learning | Jian Shi et.al. | 2410.00262 | link |
2024-09-30 | Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs | Zicheng Zhang et.al. | 2409.20063 | null |
2024-09-30 | Replace Anyone in Videos | Xiang Wang et.al. | 2409.19911 | link |
2024-09-27 | PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation | Shaowei Liu et.al. | 2409.18964 | link |
2024-09-27 | Convergence of Diffusion Models Under the Manifold Hypothesis in High-Dimensions | Iskander Azangulov et.al. | 2409.18804 | null |
2024-09-26 | Self-Supervised Learning of Deviation in Latent Representation for Co-speech Gesture Video Generation | Huan Yang et.al. | 2409.17674 | null |
2024-09-26 | A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation | Masato Ishii et.al. | 2409.17550 | link |
2024-09-25 | Pose-Guided Fine-Grained Sign Language Video Generation | Tongkai Shi et.al. | 2409.16709 | null |
2024-09-24 | Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation | Homanga Bharadhwaj et.al. | 2409.16283 | null |
2024-09-23 | Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond | Hong Chen et.al. | 2409.14993 | null |
2024-09-23 | Advancing Video Quality Assessment for AIGC | Xinli Yue et.al. | 2409.14888 | null |
2024-09-23 | Video-to-Audio Generation with Fine-grained Temporal Semantics | Yuchen Hu et.al. | 2409.14709 | null |
2024-09-22 | Dormant: Defending against Pose-driven Human Image Animation | Jiachen Zhou et.al. | 2409.14424 | link |
2024-09-27 | JVID: Joint Video-Image Diffusion for Visual-Quality and Temporal-Consistency in Video Generation | Hadrien Reynaud et.al. | 2409.14149 | null |
2024-09-20 | JoyHallo: Digital human model for Mandarin | Sheng Shi et.al. | 2409.13268 | null |
2024-09-19 | Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation | Chenyu Wang et.al. | 2409.12532 | null |
2024-09-19 | Infrared Small Target Detection in Satellite Videos: A New Dataset and A Novel Recurrent Feature Refinement Framework | Xinyi Ying et.al. | 2409.12448 | link |
2024-09-17 | OSV: One Step is Enough for High-Quality Image to Video Generation | Xiaofeng Mao et.al. | 2409.11367 | null |
2024-09-19 | The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives | Samee Arif et.al. | 2409.11261 | link |
2024-09-16 | Embodiment-Agnostic Action Planning via Object-Part Scene Flow | Weiliang Tang et.al. | 2409.10032 | null |
2024-09-13 | STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment | Yong Ren et.al. | 2409.08601 | null |
2024-09-11 | DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures | Steven Hogue et.al. | 2409.07649 | null |
2024-09-11 | Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models | Haibo Yang et.al. | 2409.07452 | link |
2024-09-11 | EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion | Jian Zhang et.al. | 2409.07255 | link |
2024-09-10 | SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation | Teng Hu et.al. | 2409.06633 | null |
2024-09-10 | G3PT: Unleash the power of Autoregressive Modeling in 3D Generation via Cross-scale Querying Transformer | Jinzhi Zhang et.al. | 2409.06322 | null |
2024-09-11 | MyGo: Consistent and Controllable Multi-View Driving Video Generation with Camera Control | Yining Yao et.al. | 2409.06189 | null |
2024-09-12 | DriveScape: Towards High-Resolution Controllable Multi-View Driving Video Generation | Wei Wu et.al. | 2409.05463 | null |
2024-09-06 | Qihoo-T2X: An Efficiency-Focused Diffusion Transformer via Proxy Tokens for Text-to-Any-Task | Jing Wang et.al. | 2409.04005 | link |
2024-09-06 | DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes | Jianbiao Mei et.al. | 2409.04003 | link |
2024-09-04 | PoseTalk: Text-and-Audio-based Pose Control and Motion Refinement for One-Shot Talking Head Generation | Jun Ling et.al. | 2409.02657 | null |
2024-09-05 | Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency | Jianwen Jiang et.al. | 2409.02634 | null |
2024-09-03 | DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos | Wenbo Hu et.al. | 2409.02095 | link |
2024-09-05 | CyberHost: Taming Audio-driven Avatar Diffusion Model with Region Codebook Attention | Gaojie Lin et.al. | 2409.01876 | null |
2024-09-03 | DiVE: DiT-based Video Generation with Enhanced Control | Junpeng Jiang et.al. | 2409.01595 | null |
2024-09-02 | AMG: Avatar Motion Guided Video Generation | Zhangsihao Yang et.al. | 2409.01502 | link |
2024-09-09 | OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model | Liuhan Chen et.al. | 2409.01199 | link |
2024-08-31 | Compositional 3D-aware Video Generation with LLM Director | Hanxin Zhu et.al. | 2409.00558 | null |
2024-08-30 | CinePreGen: Camera Controllable Video Previsualization via Engine-powered Diffusion | Yiran Chen et.al. | 2408.17424 | null |
2024-08-30 | VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers | Juncan Deng et.al. | 2408.17131 | null |
2024-08-29 | One-Shot Learning Meets Depth Diffusion in Multi-Object Videos | Anisha Jain et.al. | 2408.16704 | null |
2024-08-29 | DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving | Yongjie Fu et.al. | 2408.16647 | null |
2024-08-29 | Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation | Xiaoyu Jin et.al. | 2408.16506 | null |
2024-08-28 | GenDDS: Generating Diverse Driving Video Scenarios with Prompt-to-Video Generative Model | Yongjie Fu et.al. | 2408.15868 | null |
2024-08-27 | GenRec: Unifying Video Generation and Recognition with Diffusion Models | Zejia Weng et.al. | 2408.15241 | link |
2024-08-27 | Fundus2Video: Cross-Modal Angiography Video Generation from Static Fundus Photography with Clinical Knowledge Guidance | Weiyi Zhang et.al. | 2408.15217 | link |
2024-08-28 | SurGen: Text-Guided Diffusion Model for Surgical Video Generation | Joseph Cho et.al. | 2408.14028 | null |
2024-09-02 | Training-free Long Video Generation with Chain of Diffusion Model Experts | Wenhao Li et.al. | 2408.13423 | null |
2024-08-24 | TVG: A Training-free Transition Video Generation Method with Diffusion Models | Rui Zhang et.al. | 2408.13413 | null |
2024-08-23 | CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities | Tao Wu et.al. | 2408.13239 | link |
2024-08-23 | EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation | Cong Wang et.al. | 2408.13005 | null |
2024-08-22 | xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations | Can Qin et.al. | 2408.12590 | null |
2024-08-22 | Real-Time Video Generation with Pyramid Attention Broadcast | Xuanlei Zhao et.al. | 2408.12588 | link |
2024-08-21 | DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework | Zhifei Xie et.al. | 2408.11788 | null |
2024-08-21 | TrackGo: A Flexible and Efficient Method for Controllable Video Generation | Haitao Zhou et.al. | 2408.11475 | null |
2024-08-19 | Kubrick: Multimodal Agent Collaborations for Synthetic Video Generation | Liu He et.al. | 2408.10453 | null |
2024-08-19 | Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data | Tao Yang et.al. | 2408.10119 | null |
2024-08-19 | Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation | Yunxin Li et.al. | 2408.09787 | link |
2024-08-18 | SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama | Jing Tang et.al. | 2408.09333 | link |
2024-08-21 | JPEG-LM: LLMs as Image Generators with Canonical Codec Representations | Xiaochuang Han et.al. | 2408.08459 | null |
2024-08-16 | FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance | Jiasong Feng et.al. | 2408.08189 | null |
2024-08-15 | When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding | Pingping Zhang et.al. | 2408.08093 | null |
2024-08-14 | Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving | Yuqing Wen et.al. | 2408.07605 | null |
2024-08-15 | ControlNeXt: Powerful and Efficient Control for Image and Video Generation | Bohao Peng et.al. | 2408.06070 | link |
2024-08-20 | Scene123: One Prompt to 3D Scene Generation via Video-Assisted and Consistency-Enhanced MAE | Yiying Yang et.al. | 2408.05477 | null |
2024-08-10 | High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model | Weizhi Zhong et.al. | 2408.05416 | null |
2024-08-08 | Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics | Ruining Li et.al. | 2408.04631 | null |
2024-08-05 | VidGen-1M: A Large-Scale Dataset for Text-to-video Generation | Zhiyu Tan et.al. | 2408.02629 | null |
2024-08-01 | Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion | Manuel Kansy et.al. | 2408.00458 | null |
2024-07-31 | Tora: Trajectory-oriented Diffusion Transformer for Video Generation | Zhenghao Zhang et.al. | 2407.21705 | link |
2024-07-31 | Explainable and Controllable Motion Curve Guided Cardiac Ultrasound Video Generation | Junxuan Yu et.al. | 2407.21490 | null |
2024-07-31 | Fine-gained Zero-shot Video Sampling | Dengsheng Chen et.al. | 2407.21475 | null |
2024-07-31 | Benchmarking AIGC Video Quality Assessment: A Dataset and Unified Model | Zhichao Zhang et.al. | 2407.21408 | null |
2024-08-04 | Adding Multimodal Controls to Whole-body Human Motion Generation | Yuxuan Bian et.al. | 2407.21136 | link |
2024-07-30 | EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos | Aashish Rai et.al. | 2407.20592 | null |
2024-07-29 | FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention | Yu Lu et.al. | 2407.19918 | null |
2024-07-29 | Synthetic Thermal and RGB Videos for Automatic Pain Assessment utilizing a Vision-MLP Architecture | Stefanos Gkikas et.al. | 2407.19811 | null |
2024-07-28 | FIND: Fine-tuning Initial Noise Distribution with Policy Optimization for Diffusion Models | Changgu Chen et.al. | 2407.19453 | link |
2024-07-27 | Faster Image2Video Generation: A Closer Look at CLIP Image Embedding’s Impact on Spatio-Temporal Cross-Attentions | Ashkan Taghipour et.al. | 2407.19205 | null |
2024-07-26 | UniForensics: Face Forgery Detection via General Facial Representation | Ziyuan Fang et.al. | 2407.19079 | null |
2024-07-24 | SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency | Yiming Xie et.al. | 2407.17470 | null |
2024-07-28 | HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation | Zhenzhi Wang et.al. | 2407.17438 | link |
2024-07-23 | MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence | Canyu Zhao et.al. | 2407.16655 | null |
2024-07-23 | Diffusion Transformer Captures Spatial-Temporal Dependencies: A Theory for Gaussian Process Data | Hengyu Fu et.al. | 2407.16134 | null |
2024-07-23 | Fréchet Video Motion Distance: A Metric for Evaluating Motion Consistency in Videos | Jiahe Liu et.al. | 2407.16124 | link |
2024-07-21 | Flow as the Cross-Domain Manipulation Interface | Mengda Xu et.al. | 2407.15208 | null |
2024-07-21 | Anchored Diffusion for Video Face Reenactment | Idan Kligvasser et.al. | 2407.15153 | null |
2024-07-19 | T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation | Kaiyue Sun et.al. | 2407.14505 | link |
2024-07-19 | Unlearning Concepts from Text-to-Video Diffusion Models | Shiqi Liu et.al. | 2407.14209 | null |
2024-07-25 | Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion | Boyang Deng et.al. | 2407.13759 | null |
2024-07-18 | Multi-sentence Video Grounding for Long Video Generation | Wei Feng et.al. | 2407.13219 | null |
2024-07-20 | VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control | Sherwin Bahmani et.al. | 2407.12781 | null |
2024-07-17 | Towards Understanding Unsafe Video Generation | Yan Pang et.al. | 2407.12581 | link |
2024-07-15 | IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation | Yuanhao Zhai et.al. | 2407.10937 | link |
2024-07-15 | A Survey of Defenses against AI-generated Visual Media: Detection, Disruption, and Authentication | Jingyi Deng et.al. | 2407.10575 | null |
2024-07-13 | Learning Online Scale Transformation for Talking Head Video Generation | Fa-Ting Hong et.al. | 2407.09965 | null |
2024-07-12 | Inference Optimization of Foundation Models on AI Accelerators | Youngsuk Park et.al. | 2407.09111 | null |
2024-07-16 | Bora: Biomedical Generalist Video Generation Model | Weixiang Sun et.al. | 2407.08944 | null |
2024-07-11 | Still-Moving: Customized Video Generation without Customized Video Data | Hila Chefer et.al. | 2407.08674 | null |
2024-07-11 | A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights | Wentao Lei et.al. | 2407.08428 | link |
2024-07-11 | E2VIDiff: Perceptual Events-to-Video Reconstruction using Diffusion Priors | Jinxiu Liang et.al. | 2407.08231 | null |
2024-07-10 | VEnhancer: Generative Space-Time Enhancement for Video Generation | Jingwen He et.al. | 2407.07667 | null |
2024-07-10 | Video-to-Audio Generation with Hidden Alignment | Manjie Xu et.al. | 2407.07464 | null |
2024-07-12 | Mobius: A High Efficient Spatial-Temporal Parallel Training Paradigm for Text-to-Video Generation Task | Yiran Yang et.al. | 2407.06617 | link |
2024-07-08 | MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions | Xuan Ju et.al. | 2407.06358 | null |
2024-07-08 | Dynamics of quantum turbulence in axially rotating thermal counterflow | Ritesh Dwivedi et.al. | 2407.06311 | link |
2024-07-08 | VIMI: Grounding Video Generation through Multi-modal Instruction | Yuwei Fang et.al. | 2407.06304 | null |
2024-07-08 | The Tug-of-War Between Deepfake Generation and Detection | Hannah Lee et.al. | 2407.06174 | null |
2024-07-08 | T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models | Yibo Miao et.al. | 2407.05965 | null |
2024-07-08 | This&That: Language-Gesture Controlled Video Generation for Robot Planning | Boyang Wang et.al. | 2407.05530 | null |
2024-07-05 | Unsupervised Video Summarization via Reinforcement Learning and a Trained Evaluator | Mehryar Abbasi et.al. | 2407.04258 | null |
2024-07-03 | Robot Shape and Location Retention in Video Generation Using Diffusion Models | Peng Wang et.al. | 2407.02873 | link |
2024-07-02 | OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation | Kepan Nan et.al. | 2407.02371 | null |
2024-07-04 | GVDIFF: Grounded Text-to-Video Generation with Diffusion Models | Huanzhang Dou et.al. | 2407.01921 | null |
2024-07-01 | Evaluation of Text-to-Video Generation Models: A Dynamics Perspective | Mingxiang Liao et.al. | 2407.01094 | link |
2024-06-29 | SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix | Peng Dai et.al. | 2407.00367 | null |
2024-06-28 | MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance | Yuang Zhang et.al. | 2406.19680 | null |
2024-06-27 | What Matters in Detecting AI-Generated Videos like Sora? | Chirui Chang et.al. | 2406.19568 | null |
2024-06-26 | ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation | Shenghai Yuan et.al. | 2406.18522 | link |
2024-06-25 | Text-Animator: Controllable Visual Text Video Generation | Lin Liu et.al. | 2406.17777 | null |
2024-06-25 | MotionBooth: Motion-Aware Customized Text-to-Video Generation | Jianzong Wu et.al. | 2406.17758 | null |
2024-06-24 | FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models | Haonan Qiu et.al. | 2406.16863 | link |
2024-06-24 | Dreamitate: Real-World Visuomotor Policy Learning via Video Generation | Junbang Liang et.al. | 2406.16862 | null |
2024-06-24 | Video-Infinity: Distributed Long Video Generation | Zhenxiong Tan et.al. | 2406.16260 | null |
2024-06-23 | Listen and Move: Improving GANs Coherency in Agnostic Sound-to-Video Generation | Rafael Redondo et.al. | 2406.16155 | null |
2024-06-22 | MVOC: a training-free multiple video object composition method with diffusion models | Wei Wang et.al. | 2406.15829 | link |
2024-06-24 | VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation | Xuan He et.al. | 2406.15252 | null |
2024-06-20 | Fantastic Copyrighted Beasts and How (Not) to Generate Them | Luxi He et.al. | 2406.14526 | null |
2024-06-20 | SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset | Josef Dai et.al. | 2406.14477 | link |
2024-06-20 | Video Generation with Learned Action Prior | Meenakshi Sarkar et.al. | 2406.14436 | null |
2024-06-20 | ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning | Zhongjie Duan et.al. | 2406.14130 | link |
2024-06-19 | Splatter a Video: Video Gaussian Representation for Versatile Processing | Yang-Tian Sun et.al. | 2406.13870 | null |
2024-06-21 | GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation | Baiqi Li et.al. | 2406.13743 | link |
2024-06-19 | ARDuP: Active Region Video Diffusion for Universal Policies | Shuaiyi Huang et.al. | 2406.13301 | null |
2024-06-19 | Neural Residual Diffusion Models for Deep Scalable Vision Generation | Zhiyuan Ma et.al. | 2406.13215 | null |
2024-06-18 | Generative Artificial Intelligence-Guided User Studies: An Application for Air Taxi Services | Shengdi Xiao et.al. | 2406.12296 | null |
2024-06-17 | NLDF: Neural Light Dynamic Fields for Efficient 3D Talking Head Generation | Niu Guanchen et.al. | 2406.11259 | null |
2024-06-17 | Vid3D: Synthesis of Dynamic 3D Scenes using 2D Video Diffusion | Rishab Parthasarathy et.al. | 2406.11196 | link |
2024-06-16 | ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models | Kaifeng Gao et.al. | 2406.10981 | link |
2024-06-14 | VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs | Rohit Bharadwaj et.al. | 2406.10326 | link |
2024-06-14 | Training-free Camera Control for Video Generation | Chen Hou et.al. | 2406.10126 | null |
2024-06-13 | Turns Out I’m Not Real: Towards Robust Detection of AI-Generated Videos | Qingyuan Liu et.al. | 2406.09601 | null |
2024-06-13 | Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs | Zijia Zhao et.al. | 2406.09367 | link |
2024-06-12 | Vivid-ZOO: Multi-View Video Generation with Diffusion Model | Bing Li et.al. | 2406.08659 | null |
2024-06-12 | TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation | Weixi Feng et.al. | 2406.08656 | link |
2024-06-12 | DiTFastAttn: Attention Compression for Diffusion Transformer Models | Zhihang Yuan et.al. | 2406.08552 | null |
2024-06-12 | Hierarchical Patch Diffusion Models for High-Resolution Video Generation | Ivan Skorokhodov et.al. | 2406.07792 | null |
2024-06-11 | HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness | Zihui Xue et.al. | 2406.07754 | null |
2024-06-11 | AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation | Kai Wang et.al. | 2406.07686 | null |
2024-06-11 | 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models | Heng Yu et.al. | 2406.07472 | null |
2024-06-11 | Visual Representation Learning with Stochastic Frame Prediction | Huiwon Jang et.al. | 2406.07398 | null |
2024-06-09 | Ctrl-V: Higher Fidelity Video Generation with Bounding-Box Controlled Object Motion | Ge Ya Luo et.al. | 2406.05630 | link |
2024-06-12 | MotionClone: Training-Free Motion Cloning for Controllable Video Generation | Pengyang Ling et.al. | 2406.05338 | link |
2024-06-07 | CoNo: Consistency Noise Injection for Tuning-free Long Video Diffusion | Xingrui Wang et.al. | 2406.05082 | null |
2024-06-07 | Zero-Shot Video Editing through Adaptive Sliding Score Distillation | Lianghan Zhu et.al. | 2406.04888 | null |
2024-06-07 | Online Continual Learning of Video Diffusion Models From a Single Video Stream | Jason Yoo et.al. | 2406.04814 | null |
2024-06-06 | GenAI Arena: An Open Evaluation Platform for Generative Models | Dongfu Jiang et.al. | 2406.04485 | null |
2024-06-06 | ShareGPT4Video: Improving Video Understanding and Generation with Better Captions | Lin Chen et.al. | 2406.04325 | null |
2024-06-06 | SF-V: Single Forward Video Generation Model | Zhixing Zhang et.al. | 2406.04324 | link |
2024-06-06 | VideoTetris: Towards Compositional Text-to-Video Generation | Ye Tian et.al. | 2406.04277 | link |
2024-06-05 | VideoPhy: Evaluating Physical Commonsense for Video Generation | Hritik Bansal et.al. | 2406.03520 | null |
2024-06-05 | Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control | Jingyun Xue et.al. | 2406.03035 | null |
2024-06-04 | ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation | Tianchen Zhao et.al. | 2406.02540 | link |
2024-06-04 | V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation | Cong Wang et.al. | 2406.02511 | null |
2024-06-04 | CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation | Dejia Xu et.al. | 2406.02509 | null |
2024-06-04 | I4VGen: Image as Stepping Stone for Text-to-Video Generation | Xiefan Guo et.al. | 2406.02230 | null |
2024-06-04 | Learning Temporally Consistent Video Depth from Video Diffusion Priors | Jiahao Shao et.al. | 2406.01493 | null |
2024-06-03 | DreamPhysics: Learning Physical Properties of Dynamic 3D Gaussians with Video Diffusion Priors | Tianyu Huang et.al. | 2406.01476 | link |
2024-06-04 | Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation | Enhui Ma et.al. | 2406.01349 | null |
2024-06-03 | UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation | Xiang Wang et.al. | 2406.01188 | null |
2024-06-03 | ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation | Shaoshu Yang et.al. | 2406.00908 | link |
2024-06-02 | EchoNet-Synthetic: Privacy-preserving Video Generation for Safe Medical Data Sharing | Hadrien Reynaud et.al. | 2406.00808 | link |
2024-05-31 | 4Diffusion: Multi-view Video Diffusion Model for 4D Generation | Haiyu Zhang et.al. | 2405.20674 | null |
2024-05-30 | Improving the Training of Rectified Flows | Sangyun Lee et.al. | 2405.20320 | link |
2024-05-30 | CV-VAE: A Compatible Video VAE for Latent Generative Video Models | Sijie Zhao et.al. | 2405.20279 | link |
2024-06-02 | MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model | Muyao Niu et.al. | 2405.20222 | link |
2024-05-30 | Promptus: Can Prompts Streaming Replace Video Streaming with Stable Diffusion | Jiangkai Wu et.al. | 2405.20032 | link |
2024-05-30 | DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark | Haoxing Chen et.al. | 2405.19707 | link |
2024-05-29 | EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture | Jiaqi Xu et.al. | 2405.18991 | link |
2024-05-29 | T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback | Jiachen Li et.al. | 2405.18750 | link |
2024-05-28 | Phased Consistency Model | Fu-Yun Wang et.al. | 2405.18407 | link |
2024-05-28 | RACCooN: Remove, Add, and Change Video Content with Auto-Generated Narratives | Jaehong Yoon et.al. | 2405.18406 | link |
2024-05-28 | VITON-DiT: Learning In-the-Wild Video Try-On from Human Dance Videos via Diffusion Transformers | Jun Zheng et.al. | 2405.18326 | null |
2024-05-28 | EG4D: Explicit Generation of 4D Object without Score Distillation | Qi Sun et.al. | 2405.18132 | link |
2024-05-28 | MAVIN: Multi-Action Video Generation with Diffusion Models via Transition Video Infilling | Bowen Zhang et.al. | 2405.18003 | link |
2024-05-28 | Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation | Akio Hayakawa et.al. | 2405.17842 | link |
2024-05-27 | RefDrop: Controllable Consistency in Image or Video Generation via Reference Feature Guidance | Jiaojiao Fan et.al. | 2405.17661 | null |
2024-05-27 | ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance | Jiannan Huang et.al. | 2405.17532 | link |
2024-05-27 | Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control | Zhengfei Kuang et.al. | 2405.17414 | null |
2024-05-27 | Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer | Ruizhi Shao et.al. | 2405.17405 | null |
2024-05-27 | Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability | Shenyuan Gao et.al. | 2405.17398 | link |
2024-05-28 | Controllable Longer Image Animation with Diffusion Models | Qiang Wang et.al. | 2405.17306 | null |
2024-05-27 | Sync4D: Video Guided Controllable Dynamics for Physics-Based 4D Generation | Zhoujie Fu et.al. | 2405.16849 | null |
2024-05-27 | Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels | Yikai Wang et.al. | 2405.16822 | null |
2024-05-26 | Towards Multi-Task Multi-Modal Models: A Video Generative Perspective | Lijun Yu et.al. | 2405.16728 | null |
2024-05-28 | Disentangling Foreground and Background Motion for Enhanced Realism in Human Video Generation | Jinlin Liu et.al. | 2405.16393 | null |
2024-05-25 | Video Prediction Models as General Visual Encoders | James Maier et.al. | 2405.16382 | null |
2024-05-24 | Scaling Diffusion Mamba with Bidirectional SSMs for Efficient Image and Video Generation | Shentong Mo et.al. | 2405.15881 | null |
2024-05-24 | A Misleading Gallery of Fluid Motion by Generative Artificial Intelligence | Ali Kashefi et.al. | 2405.15406 | link |
2024-05-24 | iVideoGPT: Interactive VideoGPTs are Scalable World Models | Jialong Wu et.al. | 2405.15223 | link |
2024-05-23 | Video Diffusion Models are Training-free Motion Interpreter and Controller | Zeqi Xiao et.al. | 2405.14864 | null |
2024-05-24 | Fisher Flow Matching for Generative Modeling over Discrete Data | Oscar Davis et.al. | 2405.14664 | null |
2024-05-24 | PoseCrafter: One-Shot Personalized Video Synthesis Following Flexible Pose Control | Yong Zhong et.al. | 2405.14582 | null |
2024-05-23 | MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes | Ruiyuan Gao et.al. | 2405.14475 | null |
2024-05-22 | ReVideo: Remake a Video with Motion and Content Control | Chong Mou et.al. | 2405.13865 | null |
2024-05-22 | MotionCraft: Physics-based Zero-Shot Video Generation | Luca Savant Aira et.al. | 2405.13557 | link |
2024-05-22 | Enhanced Creativity and Ideation through Stable Video Synthesis | Elijah Miller et.al. | 2405.13357 | null |
2024-05-21 | CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers | Andrew Marmon et.al. | 2405.13195 | null |
2024-05-21 | OpenCarbonEval: A Unified Carbon Emission Estimation Framework in Large-Scale AI Models | Zhaojian Yu et.al. | 2405.12843 | link |
2024-05-21 | DisenStudio: Customized Multi-subject Text-to-Video Generation with Disentangled Spatial Control | Hong Chen et.al. | 2405.12796 | null |
2024-05-19 | FIFO-Diffusion: Generating Infinite Videos from Text without Training | Jihwan Kim et.al. | 2405.11473 | link |
2024-05-17 | From Sora What We Can See: A Survey of Text-to-Video Generation | Rui Sun et.al. | 2405.10674 | link |
2024-05-15 | Dance Any Beat: Blending Beats with Visuals in Dance Video Generation | Xuanchen Wang et.al. | 2405.09266 | null |
2024-05-13 | The Lost Melody: Empirical Observations on Text-to-Video Generation From A Storytelling Perspective | Andrew Shin et.al. | 2405.08720 | null |
2024-05-10 | OneTo3D: One Image to Re-editable Dynamic 3D Model and Video Generation | Jinwei Lin et.al. | 2405.06547 | link |
2024-05-08 | Reviewing Intelligent Cinematography: AI research for camera-based video production | Adrian Azzarelli et.al. | 2405.05039 | null |
2024-05-15 | TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation | Hritik Bansal et.al. | 2405.04682 | link |
2024-05-07 | Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation | Dogucan Yaman et.al. | 2405.04327 | null |
2024-05-07 | Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models | Fan Bao et.al. | 2405.04233 | null |
2024-05-07 | Sora Detector: A Unified Hallucination Detection for Large Text-to-Video Models | Zhixuan Chu et.al. | 2405.04180 | link |
2024-05-07 | Exposing AI-generated Videos: A Benchmark Dataset and a Local-and-Global Temporal Defect Based Detection Method | Peisong He et.al. | 2405.04133 | null |
2024-05-06 | Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond | Zheng Zhu et.al. | 2405.03520 | link |
2024-05-06 | Video Diffusion Models: A Survey | Andrew Melnik et.al. | 2405.03150 | link |
2024-05-10 | Matten: Video Generation with Mamba-Attention | Yu Gao et.al. | 2405.03025 | null |
2024-05-02 | StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation | Yupeng Zhou et.al. | 2405.01434 | link |
2024-05-05 | VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization | Yuliang Liu et.al. | 2404.19652 | link |
2024-04-30 | Bridge to Non-Barrier Communication: Gloss-Prompted Fine-grained Cued Speech Gesture Generation with Diffusion Model | Wentao Lei et.al. | 2404.19277 | null |
2024-04-29 | FlexiFilm: Long Video Generation with Flexible Conditions | Yichen Ouyang et.al. | 2404.18620 | link |
2024-04-25 | Synthesizing Audio from Silent Video using Sequence to Sequence Modeling | Hugo Garrido-Lestache Belinchon et.al. | 2404.17608 | link |
2024-04-25 | TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models | Haomiao Ni et.al. | 2404.16306 | link |
2024-04-26 | Semantically consistent Video-to-Audio Generation using Multimodal Language Large Model | Gehui Chen et.al. | 2404.16305 | null |
2024-04-24 | Beyond Deepfake Images: Detecting AI-Generated Videos | Danial Samadi Vahdati et.al. | 2404.15955 | null |
2024-05-01 | MotionMaster: Training-free Camera Motion Transfer For Video Generation | Teng Hu et.al. | 2404.15789 | null |
2024-04-23 | ID-Animator: Zero-Shot Identity-Preserving Human Video Generation | Xuanhua He et.al. | 2404.15275 | link |
2024-04-22 | TAVGBench: Benchmarking Text to Audible-Video Generation | Yuxin Mao et.al. | 2404.14381 | link |
2024-04-23 | Accelerating Image Generation with Sub-path Linear Approximation Model | Chen Xu et.al. | 2404.13903 | null |
2024-04-27 | Exploring AIGC Video Quality: A Focus on Visual Harmony, Video-Text Consistency and Domain Distribution Gap | Bowen Qu et.al. | 2404.13573 | link |
2024-04-21 | Motion-aware Latent Diffusion Models for Video Frame Interpolation | Zhilin Huang et.al. | 2404.13534 | null |
2024-04-20 | Music Consistency Models | Zhengcong Fei et.al. | 2404.13358 | null |
2024-04-19 | PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation | Tianyuan Zhang et.al. | 2404.13026 | null |
2024-04-19 | ConCLVD: Controllable Chinese Landscape Video Generation via Diffusion Model | Dingming Liu et.al. | 2404.12903 | null |
2024-04-18 | On the Content Bias in Fréchet Video Distance | Songwei Ge et.al. | 2404.12391 | null |
2024-04-18 | RoboDreamer: Learning Compositional World Models for Robot Imagination | Siyuan Zhou et.al. | 2404.12377 | null |
2024-04-18 | AniClipart: Clipart Animation with Text-to-Video Priors | Ronghuan Wu et.al. | 2404.12347 | null |
2024-04-15 | Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model | Han Lin et.al. | 2404.09967 | null |
2024-04-16 | LoopAnimate: Loopable Salient Object Animation | Fanyi Wang et.al. | 2404.09172 | null |
2024-04-13 | THQA: A Perceptual Quality Assessment Database for Talking Heads | Yingjie Zhou et.al. | 2404.09003 | link |
2024-04-16 | LoopGaussian: Creating 3D Cinemagraph with Multi-view Images via Eulerian Motion Field | Jiyang Li et.al. | 2404.08966 | link |
2024-04-10 | A Transformer-Based Model for the Prediction of Human Gaze Behavior on Videos | Suleyman Ozdel et.al. | 2404.07351 | null |
2024-04-08 | Action-conditioned video data improves predictability | Meenakshi Sarkar et.al. | 2404.05439 | null |
2024-04-07 | MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators | Shenghai Yuan et.al. | 2404.05014 | link |
2024-04-07 | AnimateZoo: Zero-shot Video Generation of Cross-Species Animation via Subject Alignment | Yuanfeng Xu et.al. | 2404.04946 | null |
2024-04-02 | CameraCtrl: Enabling Camera Control for Text-to-Video Generation | Hao He et.al. | 2404.02101 | link |
2024-04-02 | Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model | Xu He et.al. | 2404.01862 | link |
2024-03-28 | A Review of Multi-Modal Large Language and Vision Models | Kilian Carolan et.al. | 2404.01322 | null |
2024-04-01 | Evaluating Text-to-Visual Generation with Image-to-Text Generation | Zhiqiu Lin et.al. | 2404.01291 | link |
2024-03-30 | Grid Diffusion Models for Text-to-Video Generation | Taegyeong Lee et.al. | 2404.00234 | null |
2024-03-29 | Motion Inversion for Video Customization | Luozhou Wang et.al. | 2403.20193 | null |
2024-03-28 | Frame by Familiar Frame: Understanding Replication in Video Diffusion Models | Aimon Rahman et.al. | 2403.19593 | null |
2024-03-26 | Tutorial on Diffusion Models for Imaging and Vision | Stanley H. Chan et.al. | 2403.18103 | null |
2024-03-26 | TC4D: Trajectory-Conditioned Text-to-4D Generation | Sherwin Bahmani et.al. | 2403.17920 | null |
2024-03-26 | Annotated Biomedical Video Generation using Denoising Diffusion Probabilistic Models and Flow Fields | Rüveyda Yilmaz et.al. | 2403.17808 | link |
2024-03-25 | TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models | Zhongwei Zhang et.al. | 2403.17005 | null |
2024-03-25 | A Survey on Long Video Generation: Challenges, Methods, and Prospects | Chengxuan Li et.al. | 2403.16407 | null |
2024-03-24 | Opportunities and challenges in the application of large artificial intelligence models in radiology | Liangrui Pan et.al. | 2403.16112 | null |
2024-03-23 | Adaptive Super Resolution For One-Shot Talking-Head Generation | Luchuan Song et.al. | 2403.15944 | link |
2024-03-22 | Spectral Motion Alignment for Video Motion Transfer using Diffusion Models | Geon Yeong Park et.al. | 2403.15249 | null |
2024-03-21 | StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text | Roberto Henschel et.al. | 2403.14773 | link |
2024-03-21 | Explorative Inbetweening of Time and Space | Haiwen Feng et.al. | 2403.14611 | null |
2024-03-22 | AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks | Max Ku et.al. | 2403.14468 | link |
2024-03-21 | Enabling Visual Composition and Animation in Unsupervised Video Generation | Aram Davtyan et.al. | 2403.14368 | null |
2024-03-21 | StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN | Jongwoo Choi et.al. | 2403.14186 | link |
2024-03-21 | Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition | Sihyun Yu et.al. | 2403.14148 | null |
2024-03-20 | Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation | Fu-Yun Wang et.al. | 2403.13745 | link |
2024-03-22 | S2DM: Sector-Shaped Diffusion Models for Video Generation | Haoran Lang et.al. | 2403.13408 | null |
2024-03-22 | Mora: Enabling Generalist Video Generation via A Multi-Agent Framework | Zhengqing Yuan et.al. | 2403.13248 | link |
2024-03-19 | AnimateDiff-Lightning: Cross-Model Diffusion Distillation | Shanchuan Lin et.al. | 2403.12706 | null |
2024-03-18 | CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility | Bojia Zi et.al. | 2403.12035 | link |
2024-03-18 | VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model | Qi Zuo et.al. | 2403.12010 | null |
2024-03-19 | Subjective-Aligned Dateset and Metric for Text-to-Video Quality Assessment | Tengchuan Kou et.al. | 2403.11956 | link |
2024-03-18 | Virbo: Multimodal Multilingual Avatar Video Generation in Digital Marketing | Juan Zhang et.al. | 2403.11700 | null |
2024-03-17 | Endora: Video Generation Models as Endoscopy Simulators | Chenxin Li et.al. | 2403.11050 | null |
2024-03-15 | DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers | Xuanlei Zhao et.al. | 2403.10266 | link |
2024-03-15 | Animate Your Motion: Turning Still Images into Dynamic Videos | Mingxiao Li et.al. | 2403.10179 | null |
2024-03-14 | Video Editing via Factorized Diffusion Distillation | Uriel Singer et.al. | 2403.09334 | null |
2024-03-17 | Intention-driven Ego-to-Exo Video Generation | Hongchen Luo et.al. | 2403.09194 | null |
2024-03-13 | VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis | Enric Corona et.al. | 2403.08764 | null |
2024-03-13 | Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts | Yue Ma et.al. | 2403.08268 | link |
2024-03-12 | AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production | Jiuniu Wang et.al. | 2403.07952 | null |
2024-03-10 | WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs | Deshun Yang et.al. | 2403.07944 | null |
2024-03-12 | SSM Meets Video Diffusion Models: Efficient Video Generation with Structured State Spaces | Yuta Oshima et.al. | 2403.07711 | link |
2024-03-15 | DragAnything: Motion Control for Anything using Entity Representation | Weijia Wu et.al. | 2403.07420 | link |
2024-03-11 | DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation | Guosheng Zhao et.al. | 2403.06845 | null |
2024-03-11 | A Comparative Study of Perceptual Quality Metrics for Audio-driven Talking Head Videos | Weixia Zhang et.al. | 2403.06421 | link |
2024-03-11 | Video Generation with Consistency Tuning | Chaoyi Wang et.al. | 2403.06356 | null |
2024-03-10 | FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing | Youyuan Zhang et.al. | 2403.06269 | null |
2024-03-10 | BlazeBVD: Make Scale-Time Equalization Great Again for Blind Video Deflickering | Xinmin Qiu et.al. | 2403.06243 | null |
2024-03-10 | VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models | Wenhao Wang et.al. | 2403.06098 | link |
2024-03-08 | VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models | Yabo Zhang et.al. | 2403.05438 | link |
2024-03-08 | Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation | Joseph Cho et.al. | 2403.05131 | null |
2024-03-07 | A spatiotemporal style transfer algorithm for dynamic visual stimulus generation | Antonino Greco et.al. | 2403.04940 | null |
2024-03-08 | Pix2Gif: Motion-Guided Diffusion for GIF Generation | Hitesh Kandala et.al. | 2403.04634 | link |
2024-03-05 | Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation | Weijie Li et.al. | 2403.02827 | null |
2024-03-06 | UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control | Xuweiyi Chen et.al. | 2403.02332 | link |
2024-03-05 | AtomoVideo: High Fidelity Image-to-Video Generation | Litong Gong et.al. | 2403.01800 | null |
2024-03-02 | SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code | Ziniu Hu et.al. | 2403.01248 | null |
2024-03-01 | Abductive Ego-View Accident Video Understanding for Safe Driving Perception | Jianwu Fang et.al. | 2403.00436 | null |
2024-02-29 | Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers | Tsai-Shien Chen et.al. | 2402.19479 | null |
2024-02-28 | Context-aware Talking Face Video Generation | Meidai Xuanyuan et.al. | 2402.18092 | null |
2024-02-27 | EMO: Emote Portrait Alive – Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions | Linrui Tian et.al. | 2402.17485 | null |
2024-02-27 | Sora Generates Videos with Stunning Geometrical Consistency | Xuanyi Li et.al. | 2402.17403 | null |
2024-02-28 | Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models | Yixin Liu et.al. | 2402.17177 | link |
2024-02-27 | Video as the New Language for Real-World Decision Making | Sherry Yang et.al. | 2402.17139 | null |
2024-02-22 | Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis | Willi Menapace et.al. | 2402.14797 | null |
2024-02-22 | Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models | Yixuan Ren et.al. | 2402.14780 | null |
2024-02-21 | Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation | Kihong Kim et.al. | 2402.13729 | null |
2024-02-24 | UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing | Jianhong Bai et.al. | 2402.13185 | null |
2024-02-20 | Neural Network Diffusion | Kai Wang et.al. | 2402.13144 | link |
2024-02-20 | VGMShield: Mitigating Misuse of Video Generative Models | Yan Pang et.al. | 2402.13126 | link |
2024-02-19 | Dynamic and Super-Personalized Media Ecosystem Driven by Generative AI: Unpredictable Plays Never Repeating The Same | Sungjun Ahn et.al. | 2402.12412 | null |
2024-02-16 | Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation | Lanqing Guo et.al. | 2402.10491 | link |
2024-02-14 | Magic-Me: Identity-Specific Video Customized Diffusion | Ze Ma et.al. | 2402.09368 | link |
2024-02-10 | Denoising Diffusion Probabilistic Models in Six Simple Steps | Richard E. Turner et.al. | 2402.04384 | null |