๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๐Ÿ˜ŽAI/3D Reconstruction

[Paper Review] Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers

by SolaKim 2025. 2. 17.

https://arxiv.org/abs/2312.09147

 

Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers

Recent advancements in 3D reconstruction from single images have been driven by the evolution of generative models. Prominent among these are methods based on Score Distillation Sampling (SDS) and the adaptation of diffusion models in the 3D domain. Despit

arxiv.org

 

์ด ๋…ผ๋ฌธ์—์„œ๋Š” 3D Gaussian Splatting Representation ์„ ํ™œ์šฉํ•˜์—ฌ ๋‹จ์ผ ์ด๋ฏธ์ง€์—์„œ ๋น ๋ฅด๊ณ  ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅํ•œ 3D ๋ณต์›์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ“Œ ์ด ๋…ผ๋ฌธ์—์„œ 3DGS ๋ฅผ ์ด์šฉํ•ด์„œ ํ•˜๋ ค๋Š” ๊ฒƒ?

๐Ÿ’ก ๋‹จ์ผ ์ด๋ฏธ์ง€์—์„œ 3D ๋ชจ๋ธ์„ ๋น ๋ฅด๊ฒŒ ๊ณ ํ’ˆ์งˆ๋กœ ๋ณต์›ํ•˜๊ณ , ์ด๋ฅผ ํ™œ์šฉํ•œ ์ƒˆ๋กœ์šด ๋ทฐ ๋ Œ๋”๋ง์„ ์ˆ˜ํ–‰!

 

 

1๏ธโƒฃ 3D Gaussian Representation์ด๋ž€?

  • 3D Gaussian representation ์€ 3D ๊ณต๊ฐ„์—์„œ ํŠน์ • ์œ„์น˜์˜ ์ •๋ณด (์˜ˆ: RGB, ๋ฐ€๋„, ๋˜๋Š” ๋‹ค๋ฅธ ํŠน์„ฑ ๊ฐ’) ๋ฅผ Gaussian ๋ถ„ํฌ(ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ) ๋กœ ํ‘œํ˜„ํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.
  • 3D ๊ณต๊ฐ„ ์ „์ฒด๋ฅผ ํ•˜๋‚˜์˜ ์—ฐ์†์ ์ธ ํ•จ์ˆ˜๋กœ ๋ชจ๋ธ๋ง ํ•˜๋ ค๋Š” ์‹œ๋„์ž…๋‹ˆ๋‹ค.
  • ํ•˜์ง€๋งŒ ๊ณ ์ฐจ์›์ ์ด๊ณ  ๋ณต์žกํ•œ ๊ณต๊ฐ„์„ ๋‹ค๋ฃจ๊ธฐ ๋•Œ๋ฌธ์— ํ•™์Šต ๊ณผ์ •์—์„œ ์—ฌ๋Ÿฌ ์–ด๋ ค์›€์ด ์žˆ์Šต๋‹ˆ๋‹ค.
  • ํŠน์ง•
    • Discrete (์ด์‚ฐ์ ): 3D ๊ณต๊ฐ„์ด ๋ถˆ์—ฐ์†์ ์ธ ์  ์ง‘ํ•ฉ์œผ๋กœ ๋‚˜ํƒ€๋‚  ์ˆ˜ ์žˆ์Œ. ๋ชจ๋“  ์ขŒํ‘œ๋ฅผ ๋‹ค๋ฃจ๋ ค๋ฉด ๊ณ„์‚ฐ๋Ÿ‰์ด ํผ.
    • Non-structural (๋น„๊ตฌ์กฐ์ ): 3D ๋ฐ์ดํ„ฐ๋Š” ์ผ๊ด€๋œ ๊ฒฉ์ž ๊ตฌ์กฐ(์˜ˆ: ์ด๋ฏธ์ง€์˜ ํ”ฝ์…€ ๋ฐฐ์—ด)๊ฐ€ ์—†๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Œ.
      • ์˜ˆ: Point cloud๋Š” ๋ถˆ๊ทœ์น™ํ•˜๊ฒŒ ์ƒ˜ํ”Œ๋ง๋œ 3D ์ ๋“ค์˜ ์ง‘ํ•ฉ.
    • Higher-dimensional (๊ณ ์ฐจ์›): 2D ์ด๋ฏธ์ง€๋ณด๋‹ค ํ›จ์”ฌ ๋” ๋งŽ์€ ๋ฐ์ดํ„ฐ ์–‘์„ ํฌํ•จํ•˜๊ณ , ์ฒ˜๋ฆฌ ๋น„์šฉ์ด ํผ.

 

2๏ธโƒฃ Implicit Representation์ด๋ž€?

  • Implicit representation์€ ์ขŒํ‘œ ๊ธฐ๋ฐ˜ ์‹ ๊ฒฝ๋ง(Neural Network)์„ ์‚ฌ์šฉํ•ด 3D ๋ฐ์ดํ„ฐ๋ฅผ ์••์ถ•์ ์œผ๋กœ ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค.
    • ์˜ˆ: Neural Radiance Field (NeRF) ๊ฐ™์€ ๋ชจ๋ธ์€ ์ขŒํ‘œ (x,y,z) ๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ RGB ๊ฐ’๊ณผ ๋ฐ€๋„๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
  • ํŠน์ง•:
    • ์—ฐ์†์ (Continuous): ๊ณต๊ฐ„ ์ „์ฒด๋ฅผ ์ง์ ‘ ์ƒ˜ํ”Œ๋งํ•  ํ•„์š” ์—†์ด, ์›ํ•˜๋Š” ์ขŒํ‘œ์—์„œ RGB์™€ ๋ฐ€๋„๋ฅผ ๊ณ„์‚ฐ.
    • ๋” ํšจ์œจ์ : ์ €์žฅ ๊ณต๊ฐ„์ด ์ž‘๊ณ  ํ•™์Šต์ด ๋น„๊ต์  ์‰ฌ์›€.
    • ์ €ํ•ด์ƒ๋„ ๋ฐ์ดํ„ฐ๋ฅผ ๊ณ ํ•ด์ƒ๋„๋กœ ๋ณต์›ํ•  ์ˆ˜ ์žˆ๋Š” ์žฅ์ .

 

Implicit representation ์ธ NeRF ๋ฅผ ํ•™์Šต์‹œํ‚ค๋Š”๊ฒƒ ๋ณด๋‹ค 3DGS ์„ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ์ด ๋”์šฑ ์–ด๋ ต์Šต๋‹ˆ๋‹ค.
๊ทธ ์ด์œ ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ ๋ฌธ์ œ
    • 3D ๊ณต๊ฐ„์„ ์ง์ ‘ ๋‹ค๋ฃฐ ๋•Œ๋Š” ์—„์ฒญ๋‚œ ์–‘์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๊ณ  ํ•™์Šตํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
    • Voxel-based ํ‘œํ˜„ ๋ฐฉ์‹์ฒ˜๋Ÿผ 3D ๊ฒฉ์ž(grid)๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ฉ”๋ชจ๋ฆฌ ์†Œ๋ชจ์™€ ์—ฐ์‚ฐ ๋น„์šฉ์ด ๊ธ‰๊ฒฉํžˆ ์ฆ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
  • ๋ถˆ๊ทœ์น™์„ฑ๊ณผ ๋น„๊ตฌ์กฐ์  ํ˜•ํƒœ
    • Point cloud๋‚˜ ์‹ค์ œ 3D ๋ฐ์ดํ„ฐ๋Š” ๊ฒฉ์ž ๊ตฌ์กฐ๊ฐ€ ์—†๊ธฐ ๋•Œ๋ฌธ์—, ์ด๋ฅผ ๋ชจ๋ธ๋งํ•˜๋ ค๋ฉด ๋ณต์žกํ•œ ๋ณ€ํ™˜์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
    • ๋ฐ˜๋ฉด, implicit representation์€ ์—ฐ์†์  ์ขŒํ‘œ ๊ณต๊ฐ„์„ ๋ชจ๋ธ๋งํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ด ๋ฌธ์ œ๋ฅผ ํ”ผํ•ด ๊ฐˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๋ฐ์ดํ„ฐ ํฌ์†Œ์„ฑ(Sparsity)
    • 3D ๋ฐ์ดํ„ฐ๋Š” ํฌ์†Œ(sparse) ํ•œ ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค.
      ์˜ˆ: ๋Œ€๋ถ€๋ถ„์˜ ์ ์ด ๋นˆ ๊ณต๊ฐ„์— ๋ถ„ํฌํ•  ์ˆ˜ ์žˆ๊ณ , ์ผ๋ถ€๋งŒ ์˜๋ฏธ ์žˆ๋Š” ์ •๋ณด๋ฅผ ๊ฐ€์ง.
    • Gaussian representation์€ ์ด๋Ÿฌํ•œ ํฌ์†Œ์„ฑ ๋•Œ๋ฌธ์— ๋ชจ๋ธ์˜ ์ผ๋ฐ˜ํ™”์™€ ํ•™์Šต์ด ๋” ์–ด๋ ต์Šต๋‹ˆ๋‹ค.
  • ๋ณต์žกํ•œ ๋ถ„ํฌ ํ•™์Šต
    • Gaussian ๋ถ„ํฌ ์ž์ฒด๊ฐ€ ํ‰๊ท (mean)๊ณผ ๊ณต๋ถ„์‚ฐ(covariance) ์„ ํ•™์Šตํ•ด์•ผ ํ•˜๋ฏ€๋กœ, ๋‹จ์ˆœ RGB์™€ ๋ฐ€๋„๋ฅผ ์˜ˆ์ธกํ•˜๋Š” implicit ๋ฐฉ์‹๋ณด๋‹ค ๋ชจ๋ธ์˜ ๋ณต์žก๋„๊ฐ€ ํฝ๋‹ˆ๋‹ค.

 

 

 

3D Gaussian Representation ์€ ํ•™์Šต์—์„œ ์–ด๋ ค์›€์ด ์žˆ์ง€๋งŒ, ๋‹ค์–‘ํ•œ ์žฅ์ ์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.

๋…ผ๋ฌธ์—์„œ๋Š” ์•„๋ž˜์™€ ๊ฐ™์€ ์ด์œ ๋กœ, NeRF์™€ ๊ฐ™์€ Implicit Representation ์ด ์žˆ๋Š”๋ฐ๋„, 3DGS ์™€ ๊ฐ™์€ explicit representation ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.

1. ์‹ค์‹œ๊ฐ„ ๋ Œ๋”๋ง(Real-time Rendering)์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

  • NeRF์˜ ๊ฐ€์žฅ ํฐ ๋‹จ์ ์€ ๋ Œ๋”๋ง ์†๋„๊ฐ€ ๋Š๋ฆฌ๋‹ค๋Š” ๊ฒƒ!
  • NeRF๋Š” ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•ด ์ขŒํ‘œ๋งˆ๋‹ค ๋ฐ€๋„์™€ ์ƒ‰์„ ์˜ˆ์ธกํ•ด์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํ•œ ์žฅ์˜ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐ ์ˆ˜ ์ดˆ~์ˆ˜ ๋ถ„์ด ๊ฑธ๋ฆผ.
  • ํ•˜์ง€๋งŒ 3DGS๋Š” explicit representation ๋ฐฉ์‹์ด๋ฏ€๋กœ, GPU์˜ rasterization pipeline(๊ทธ๋ž˜ํ”ฝ ์นด๋“œ์—์„œ ๊ธฐ๋ณธ์ ์œผ๋กœ ์ง€์›ํ•˜๋Š” ์—ฐ์‚ฐ)์„ ์ด์šฉํ•ด์„œ ํ›จ์”ฌ ๋น ๋ฅด๊ฒŒ ๋ Œ๋”๋ง ๊ฐ€๋Šฅ.

2. ๋” ๋‚˜์€ ํŽธ์ง‘ ๊ฐ€๋Šฅ์„ฑ(Editability)

  • NeRF๋Š” ์‹ ๊ฒฝ๋ง์ด ์ขŒํ‘œ-์ƒ‰์ƒ ๊ด€๊ณ„๋ฅผ ๋‚ด์žฌ์ ์œผ๋กœ ํ•™์Šตํ•˜๋ฏ€๋กœ, ๊ฐœ๋ณ„ ๊ฐ์ฒด๋ฅผ ์ง์ ‘ ์ˆ˜์ •ํ•˜๊ฑฐ๋‚˜ ์ด๋™ํ•˜๊ธฐ ์–ด๋ ค์›€.
  • ๋ฐ˜๋ฉด 3DGS๋Š” explicit representation์ด๋ฏ€๋กœ, ํŠน์ • Gaussian์„ ์ง์ ‘ ์ˆ˜์ •, ์ด๋™, ์‚ญ์ œ, ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒŒ ํ›จ์”ฌ ์‰ฌ์›€.
  • ์˜ˆ๋ฅผ ๋“ค์–ด, ๊ฐ€์ƒ ํ™˜๊ฒฝ์—์„œ ํŠน์ • ๊ฐ์ฒด๋งŒ ์ด๋™์‹œํ‚ค๊ฑฐ๋‚˜ ์‚ญ์ œํ•ด์•ผ ํ•˜๋Š” ๊ฒฝ์šฐ 3DGS๊ฐ€ ํ›จ์”ฌ ์œ ๋ฆฌํ•จ.

3. ๋” ํšจ์œจ์ ์ธ ์ €์žฅ ๋ฐฉ์‹ (Compression & Storage)

  • NeRF๋Š” ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ ๊ธฐ๋ฐ˜์ด๋ผ์„œ ํ•™์Šต๋œ ๋ชจ๋ธ์„ ์ €์žฅํ•ด์•ผ ํ•จ → ํฌ๊ธฐ๊ฐ€ ์ปค์ง.
  • ํŠนํžˆ ๊ณ ํ•ด์ƒ๋„ ํ‘œํ˜„์„ ์œ„ํ•ด์„œ๋Š” ๋„คํŠธ์›Œํฌ ํฌ๊ธฐ๊ฐ€ ๋งค์šฐ ์ปค์ง€๊ณ , ์ด๋ฅผ ์ €์žฅํ•˜๊ณ  ๋กœ๋“œํ•˜๋Š” ๊ฒƒ๋„ ๋ถ€๋‹ด.
  • ๋ฐ˜๋ฉด, 3DGS๋Š” ๋‹จ์ˆœํ•œ Gaussian ๋ถ„ํฌ(์œ„์น˜, ํฌ๊ธฐ, ์ƒ‰์ƒ, ๋ฐฉํ–ฅ)๋“ค๋กœ ํ‘œํ˜„๋˜๋ฏ€๋กœ ์ €์žฅ ๊ณต๊ฐ„์ด ํ›จ์”ฌ ์ ๊ฒŒ ํ•„์š”ํ•จ.

4. ๊ธฐ์กด 3D ๊ทธ๋ž˜ํ”ฝ์Šค ํŒŒ์ดํ”„๋ผ์ธ๊ณผ์˜ ํ˜ธํ™˜์„ฑ

  • NeRF๋Š” ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ ๊ธฐ๋ฐ˜์ด๋ผ์„œ ๊ธฐ์กด 3D ๊ทธ๋ž˜ํ”ฝ์Šค ์—”์ง„(์˜ˆ: Unreal Engine, Unity)๊ณผ ํ†ตํ•ฉ์ด ์–ด๋ ค์›€.
  • ํ•˜์ง€๋งŒ 3DGS๋Š” explicit 3D ๋ฐ์ดํ„ฐ์ด๊ธฐ ๋•Œ๋ฌธ์—, ๊ธฐ์กด ๋ Œ๋”๋ง ๊ธฐ์ˆ (OpenGL, Vulkan ๋“ฑ)๊ณผ ์‰ฝ๊ฒŒ ํ˜ธํ™˜๋จ.

 

 

๐Ÿง ๋…ผ๋ฌธ์—์„œ๋Š” ๊ธฐ์กด์˜ NeRF ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•๋“ค์ด ๋Š๋ฆฌ๊ฑฐ๋‚˜ ์ตœ์ ํ™” ๊ณผ์ •์ด ๋ณต์žกํ•˜๋‹ค๋Š” ๋ฌธ์ œ๋ฅผ ์ง€์ ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ 3D Gaussian Splatting์„ Triplane Representation๊ณผ ๊ฒฐํ•ฉํ•˜์—ฌ ๋” ํšจ์œจ์ ์ธ ๋ฐฉ์‹์„ ์ œ์•ˆํ–ˆ์Šต๋‹ˆ๋‹ค.

 

๋ฐฉ๋ฒ• ํŠน์ง• ๋ฌธ์ œ์ 
NeRF (Implicit) ์ขŒํ‘œ๋ณ„ RGB์™€ ๋ฐ€๋„๋ฅผ ํ•™์Šตํ•˜์—ฌ 3D ๋ณต์› ๋ Œ๋”๋ง ์†๋„๊ฐ€ ๋Š๋ฆผ, ํ•™์Šต ์‹œ๊ฐ„์ด ๊น€
Triplane Representation 3๊ฐœ์˜ ์ง๊ต ํ‰๋ฉด์„ ์‚ฌ์šฉํ•ด 3D ๊ณต๊ฐ„ ํ‘œํ˜„ ๋ณผ๋ฅจ ๋ Œ๋”๋ง์ด ํ•„์š”ํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ ๋น„์šฉ์ด ํผ
3D Gaussian Splatting (Explicit) 3D ๊ณต๊ฐ„์„ Gaussian ๋ถ„ํฌ์˜ ์ง‘ํ•ฉ์œผ๋กœ ํ‘œํ˜„ํ•˜์—ฌ ๋น ๋ฅธ ๋ Œ๋”๋ง ๊ฐ€๋Šฅ 3D Gaussians๋ฅผ ์ง์ ‘ ํ•™์Šตํ•˜๊ธฐ ์–ด๋ ค์›€ (๋น„๊ตฌ์กฐ์ , ๊ณ ์ฐจ์›)
Triplane + 3DGS
(๋…ผ๋ฌธ ์ œ์•ˆ)
Triplane์„ ํ™œ์šฉํ•ด Gaussian ํŠน์ง•์„ ์ถ”์ถœํ•˜๊ณ , Gaussian Splatting์œผ๋กœ ๋ Œ๋”๋ง ๋น ๋ฅธ ์†๋„, ๊ณ ํ’ˆ์งˆ ๋ Œ๋”๋ง, ํ•™์Šต ํšจ์œจ์„ฑ ํ–ฅ์ƒ

 

 

 

 

๐Ÿ“Œ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ• (Triplane-Gaussian Splatting, TGS)

๋…ผ๋ฌธ์—์„œ๋Š” ๊ธฐ์กด 3DGS ์˜ ํ•œ๊ณ„๋ฅผ ๋ณด์™„ํ•˜๊ธฐ ์œ„ํ•ด Triplane Representation ๊ณผ Transformer ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์„ ๊ฒฐํ•ฉํ•œ ์ƒˆ๋กœ์šด ์ ‘๊ทผ๋ฒ•์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

๐ŸŒŸ ํ•ต์‹ฌ ์•„์ด๋””์–ด

  • ๋‹จ์ผ ์ด๋ฏธ์ง€ ์ž…๋ ฅ → Transformer ๋กœ 3D ๊ตฌ์กฐ ์ƒ์„ฑ 
    • Point Cloud Decoder: ๋‹จ์ˆœํ•œ 3D ์  ํด๋ผ์šฐ๋“œ๋ฅผ ๋จผ์ € ์ƒ์„ฑ
    • Triplane Decoder: ์ด ์ ์„ ๊ธฐ๋ฐ˜์œผ๋กœ 3D ๊ณต๊ฐ„์˜ ํŠน์ง•์„ ์ถ”์ถœ
    • 3D Gaussian Decoder: Triplane ์—์„œ ์–ป์€ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜์—ฌ Gaussian ์†์„ฑ์„ ๋””์ฝ”๋”ฉ
  • Explicit + Implicit ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๊ตฌ์กฐ
    • Explicit (Point Cloud) → ๋Œ€๋žต์ ์ธ ํ˜•์ƒ์„ ์ •์˜
    • Implicit (Triplane) → ์„ธ๋ถ€์ ์ธ ํ˜•์ƒ์„ ์ •๊ตํ™”
    • 3D Gaussian Splatting → ์ตœ์ข… ๋ Œ๋”๋ง์„ ๋น ๋ฅด๊ฒŒ ์ˆ˜ํ–‰
  • ๋ Œ๋”๋ง ์†๋„์™€ ํ’ˆ์งˆ์„ ๋™์‹œ์— ๊ฐœ์„ 
    • ๊ธฐ์กด NeRF ๋ณด๋‹ค ๋น ๋ฅด๊ณ  (์ˆ˜ ์ดˆ ์ด๋‚ด์— ๊ฒฐ๊ณผ ์ƒ์„ฑ)
    • ๊ธฐ์กด Gaussian Splatting ๋ณด๋‹ค ๊ตฌ์กฐ์  ์ •๋ณด ํ™œ์šฉ ๊ฐ€๋Šฅ

 

โœ… 1. Explicit Point Cloud (๋ช…์‹œ์  ์  ํด๋ผ์šฐ๋“œ) → ๊ฐ์ฒด์˜ ๋Œ€๋žต์ ์ธ ํ˜•์ƒ์„ ์ •์˜

  • Point Cloud(์  ํด๋ผ์šฐ๋“œ)๋ž€ 3D ๊ณต๊ฐ„์—์„œ ๊ฐ์ฒด์˜ ํ‘œ๋ฉด์„ ์ •์˜ํ•˜๋Š” ์ ๋“ค์˜ ์ง‘ํ•ฉ
  • ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹จ์ผ ์ด๋ฏธ์ง€์—์„œ 3D ๋ชจ๋ธ์„ ๋ณต์›ํ•˜๊ธฐ ์œ„ํ•ด Transformer ๊ธฐ๋ฐ˜์˜ Point Cloud Decoder๋ฅผ ์‚ฌ์šฉํ•ด์„œ ๋จผ์ € ๊ฐ์ฒด์˜ ๋Œ€๋žต์ ์ธ ํ˜•์ƒ(geometry)์„ ์ƒ์„ฑ
  • ํ•˜์ง€๋งŒ, ์  ํด๋ผ์šฐ๋“œ๋งŒ์œผ๋กœ๋Š” ๋””ํ…Œ์ผํ•œ ํ‘œํ˜„์ด ์–ด๋ ต๊ณ , ์ƒ‰์ƒ์ด๋‚˜ ํˆฌ๋ช…๋„ ๊ฐ™์€ ์ •๋ณด๊ฐ€ ๋ถ€์กฑ

์ฆ‰, Point Cloud๋Š” ๋‹จ์ˆœํžˆ 3D ํ˜•์ƒ์˜ ๋ผˆ๋Œ€๋ฅผ ๋งŒ๋“ค๊ธฐ ์œ„ํ•œ ๋‹จ๊ณ„

 

โœ… 2. Implicit Triplane Field (์•”์‹œ์  ํŠธ๋ผ์ดํ”Œ๋ ˆ์ธ ํ•„๋“œ) → ํ˜•์ƒ ์ •๊ตํ™” + Gaussian ์†์„ฑ ์ธ์ฝ”๋”ฉ

  • Triplane Representation์€ 3๊ฐœ์˜ 2D ํ‰๋ฉด์„ ์ด์šฉํ•˜์—ฌ 3D ๊ณต๊ฐ„์„ ํšจ๊ณผ์ ์œผ๋กœ ํ‘œํ˜„ํ•˜๋Š” ๋ฐฉ๋ฒ•
    • 3๊ฐœ์˜ ์ถ• ์ •๋ ฌ(Orthogonal)๋œ ํ‰๋ฉด์„ ์‚ฌ์šฉํ•˜์—ฌ 3D ํŠน์ง•์„ ์ €์žฅํ•จ.
    • ๊ฐ ํ‰๋ฉด์—์„œ 3D ๊ณต๊ฐ„์˜ ํŠน์ • ์œ„์น˜๋ฅผ ํˆฌ์˜ํ•˜์—ฌ ํŠน์ง•์„ ์ถ”์ถœํ•  ์ˆ˜ ์žˆ์Œ.
  • ๋…ผ๋ฌธ์—์„œ๋Š” Transformer ๊ธฐ๋ฐ˜ Triplane Decoder๋ฅผ ์‚ฌ์šฉํ•ด์„œ, Point Cloud๋กœ ์ƒ์„ฑ๋œ ๊ฑฐ์นœ ํ˜•์ƒ์„ ๋” ์ •๋ฐ€ํ•˜๊ฒŒ ๋‹ค๋“ฌ์Œ
  • ๋˜ํ•œ, ๋‹จ์ˆœํ•œ ํ˜•์ƒ ์ •๋ณด๋ฟ๋งŒ ์•„๋‹ˆ๋ผ 3D Gaussian ์†์„ฑ๋„ ํ•จ๊ป˜ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.

์ฆ‰, Triplane์€ Point Cloud๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋””ํ…Œ์ผ์„ ์ถ”๊ฐ€ํ•˜๊ณ , Gaussian ์†์„ฑ์„ ํฌํ•จํ•˜๋Š” ์—ญํ• 

 

โœ… 3. 3D Gaussian Properties (3D Gaussian ์†์„ฑ)

Triplane Representation์ด 3D Gaussian์˜ ์†์„ฑ์„ ํฌํ•จ
์ด ์†์„ฑ๋“ค์€ Opacity(๋ถˆํˆฌ๋ช…๋„), Spherical Harmonics(๊ตฌ๋ฉด ์กฐํ™” ํ•จ์ˆ˜) ๋“ฑ์„ ํฌํ•จ

  • Opacity (๋ถˆํˆฌ๋ช…๋„, α)
    • ๊ฐ Gaussian์ด ์–ผ๋งˆ๋‚˜ ํˆฌ๋ช…ํ•œ์ง€ ๋‚˜ํƒ€๋ƒ„.
    • ์˜ˆ๋ฅผ ๋“ค์–ด, ์œ ๋ฆฌ ๊ฐ™์€ ๋ฌผ์ฒด๋Š” ํˆฌ๋ช…๋„(α ๊ฐ’)๊ฐ€ ๋‚ฎ๊ณ , ๋ถˆํˆฌ๋ช…ํ•œ ๋ฌผ์ฒด๋Š” α ๊ฐ’์ด ๋†’์Œ.
  • Spherical Harmonics (๊ตฌ๋ฉด ์กฐํ™” ํ•จ์ˆ˜, SH)
    • 3D ํ™˜๊ฒฝ์—์„œ ์กฐ๋ช…์„ ํ‘œํ˜„ํ•˜๋Š” ๋ฐ ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” ์ˆ˜ํ•™์  ๋ชจ๋ธ.
    • ํŠน์ • ๋ฐฉํ–ฅ์—์„œ์˜ ์กฐ๋ช… ํšจ๊ณผ๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ์‚ฌ์‹ค์ ์ธ ๋ผ์ดํŒ… ํ‘œํ˜„ ๊ฐ€๋Šฅ.

์ฆ‰, Triplane์„ ํ†ตํ•ด ๋‹จ์ˆœํ•œ ํ˜•์ƒ ์ •๋ณด๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ๋ Œ๋”๋ง์— ํ•„์š”ํ•œ ์ƒ‰์ƒ, ์กฐ๋ช…, ํˆฌ๋ช…๋„ ๊ฐ™์€ ์ •๋ณด๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์Œ.

 

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” End-to-End ํ•™์Šต๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

End-to-End (E2E) ๋ž€?
๋ฐ์ดํ„ฐ ์ž…๋ ฅ(Input)๋ถ€ํ„ฐ ์ตœ์ข… ์ถœ๋ ฅ(Output)๊นŒ์ง€, ์ค‘๊ฐ„ ๊ณผ์ • ์—†์ด ํ•˜๋‚˜์˜ ๋ชจ๋ธ๋กœ ์ง์ ‘ ํ•™์Šตํ•˜๋Š” ๋ฐฉ์‹

  • ๋ชจ๋“  ๊ณผ์ •์ด ํ•˜๋‚˜์˜ ๋„คํŠธ์›Œํฌ์—์„œ ์ž๋™์œผ๋กœ ์ตœ์ ํ™”๋จ
    • ์‚ฌ๋žŒ์ด ์ค‘๊ฐ„์— ๊ฐœ์ž…ํ•˜์—ฌ ์ˆ˜์ž‘์—…์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณ€ํ™˜ํ•˜๊ฑฐ๋‚˜ ์กฐ์ •ํ•  ํ•„์š” ์—†์Œ.
    • ๋ฐ์ดํ„ฐ ์ž…๋ ฅ๋งŒ ์ฃผ์–ด์ง€๋ฉด ๋ชจ๋ธ์ด ์•Œ์•„์„œ ์ตœ์ ์˜ ํ‘œํ˜„์„ ํ•™์Šต.
  • ๋ชจ๋“  ๊ณผ์ •์ด ํ•˜๋‚˜์˜ ๋„คํŠธ์›Œํฌ์—์„œ ์ž๋™์œผ๋กœ ์ตœ์ ํ™”๋จ
    • ์‚ฌ๋žŒ์ด ์ค‘๊ฐ„์— ๊ฐœ์ž…ํ•˜์—ฌ ์ˆ˜์ž‘์—…์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณ€ํ™˜ํ•˜๊ฑฐ๋‚˜ ์กฐ์ •ํ•  ํ•„์š” ์—†์Œ.
    • ๋ฐ์ดํ„ฐ ์ž…๋ ฅ๋งŒ ์ฃผ์–ด์ง€๋ฉด ๋ชจ๋ธ์ด ์•Œ์•„์„œ ์ตœ์ ์˜ ํ‘œํ˜„์„ ํ•™์Šต.
  • ๋” ์ ์€ ์ธ๊ณต์ ์ธ ๊ฐœ์ž… (Feature Engineering ์ด ํ•„์š”์—†์Œ)
    • ๊ธฐ์กด์—๋Š” ์‚ฌ๋žŒ์ด ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€๊ณต(์˜ˆ: ํŠน์ง• ์ถ”์ถœ, ์ „์ฒ˜๋ฆฌ) ํ•ด์•ผ ํ–ˆ์ง€๋งŒ, E2E ๋ชจ๋ธ์—์„œ๋Š” ์ด๋Ÿฐ ๊ณผ์ •์ด ํ•„์š” ์—†์ด ๋”ฅ๋Ÿฌ๋‹์ด ์ง์ ‘ ํŠน์ง•์„ ํ•™์Šตํ•จ.

 

 

 


Method

 

1๏ธโƒฃ Hybrid Triplane-Gaussian

  1. ํ•˜์ด๋ธŒ๋ฆฌ๋“œ 3D ํ‘œํ˜„ ๋ฐฉ์‹ (Hybrid Representation)
    1. Explicit (๋ช…์‹œ์ ) → ์  ํด๋ผ์šฐ๋“œ

      1. 3D ๊ณต๊ฐ„์—์„œ ๊ฐ์ฒด์˜ ๋Œ€๋žต์ ์ธ ํ˜•์ƒ์„ ์ •์˜ํ•˜๋Š” ์  ์ง‘ํ•ฉ (Point Cloud).
    2. Implicit (์•”์‹œ์ ) → Triplane

      1. 3๊ฐœ์˜ ์ถ• ์ •๋ ฌ๋œ ํ‰๋ฉด(Triplane)์œผ๋กœ 3D ๊ณต๊ฐ„์˜ ํŠน์„ฑ์„ ์ €์žฅํ•˜๋Š” ๋ฐฉ์‹.
      2. ๊ฐ ํ‰๋ฉด์€ 3D Gaussian ์†์„ฑ(์ƒ‰์ƒ, ํˆฌ๋ช…๋„, ์กฐ๋ช… ๋“ฑ)์„ ํฌํ•จํ•œ ํŠน์ง• ํ•„๋“œ(Feature Field)๋ฅผ ์ธ์ฝ”๋”ฉ.
      3. C: ํŠน์ง• ์ฐจ์›(Feature Channels), H: ๋†’์ด, W: ๋„ˆ๋น„
  2.  Triplane ๊ตฌ์กฐ (Txy, Txz, Tyz)
    1. Triplane T์€ 3๊ฐœ์˜ ์ง๊ตํ•˜๋Š” ํ‰๋ฉด(Orthogonal Feature Planes)์œผ๋กœ ๊ตฌ์„ฑ๋จ:
      1. Txy → XY ํ‰๋ฉด
      2. Txz → XZ ํ‰๋ฉด
      3. Ty → YZ ํ‰๋ฉด
    2. ์ด 3๊ฐœ์˜ ํ‰๋ฉด์„ ์ด์šฉํ•˜์—ฌ 3D ๊ณต๊ฐ„์˜ ์ž„์˜์˜ ์œ„์น˜์—์„œ ํŠน์ง• ๋ฒกํ„ฐ๋ฅผ ์ถ”์ถœํ•  ์ˆ˜ ์žˆ์Œ.
  3. 3D ์œ„์น˜์—์„œ Feature Vector ์ถ”์ถœ (Querying Feature Vector)
    • ํŠน์ • 3D ์ขŒํ‘œ x๊ฐ€ ์ฃผ์–ด์ง€๋ฉด:
      1. ํ•ด๋‹น ์ขŒํ‘œ๋ฅผ ๊ฐ ํ‰๋ฉด์— ํˆฌ์˜(Projection).
      2. ๊ฐ ํ‰๋ฉด์—์„œ Trilinear Interpolation์„ ์‚ฌ์šฉํ•ด ํŠน์ง•์„ ๋ณด๊ฐ„(interp)ํ•˜์—ฌ ์ถ”์ถœ.
      3. ์ตœ์ข…์ ์œผ๋กœ 3๊ฐœ์˜ ํ‰๋ฉด์—์„œ ์–ป์€ ํŠน์ง•์„ ์—ฐ๊ฒฐ(⊕, Concatenation)ํ•˜์—ฌ ์ตœ์ข… Feature Vector ft๋ฅผ ์ƒ์„ฑ.
    • ์—ฌ๊ธฐ์„œ:
      1. interp: Trilinear Interpolation (3D ๊ณต๊ฐ„์—์„œ์˜ ๋ณด๊ฐ„ ๊ธฐ๋ฒ•)
      2. : ํŠน์ง• ๋ฒกํ„ฐ๋ฅผ ์—ฐ๊ฒฐ(Concatenation)
      3. pxy,pxz,py: ๊ฐ ํ‰๋ฉด์—์„œ์˜ ํˆฌ์˜๋œ ์œ„์น˜

โ–ถ ์ฆ‰, ํŠน์ง• ๋ฒกํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ Gaussian ์†์„ฑ์„ ๋””์ฝ”๋”ฉํ•˜์—ฌ 3D ๋ชจ๋ธ์„ ์ •๋ฐ€ํ•˜๊ฒŒ ํ‘œํ˜„ ๊ฐ€๋Šฅ!!

 

More Details,

1๏ธโƒฃ 3D Gaussian Decoder.

  • ์ž…๋ ฅ:
    • ํŠน์ • ์œ„์น˜ x ∈ R^ (Point Cloud์˜ ์ )
    • Triplane์—์„œ ๊ฐ€์ ธ์˜จ ํŠน์ง• ๋ฒกํ„ฐ f
  • ๊ณผ์ •:
    • MLP(๋‹ค์ธต ํผ์…‰ํŠธ๋ก ) ฯ•g์„ ์‚ฌ์šฉํ•ด 3D Gaussian ์†์„ฑ์„ ์˜ˆ์ธก
  • ์ถœ๋ ฅ:
    • Gaussian ์†์„ฑ๋“ค:
      • Opacity α (๋ถˆํˆฌ๋ช…๋„)
      • Anisotropic Covariance (๋น„๋“ฑ๋ฐฉ์„ฑ ๊ณต๋ถ„์‚ฐ) → ํฌ๊ธฐ & ํšŒ์ „ q
      • Spherical Harmonics shsh (SH ๊ณ„์ˆ˜, ์กฐ๋ช… ์ •๋ณด)
      • ์œ„์น˜ ์˜คํ”„์…‹ Δx → ์ ์˜ ์œ„์น˜ ๋ณด์ •
  • ์œ„์น˜ ์˜คํ”„์…‹ Δx ์ถ”๊ฐ€ (Surface Points ๋ณด์ •)
    • ํ‘œ๋ฉด์˜ ์ ๋“ค๋งŒ ์‚ฌ์šฉํ•˜๋ฉด ์ •ํ™•ํ•œ Gaussian ํ‘œํ˜„์ด ์–ด๋ ต๊ธฐ ๋•Œ๋ฌธ์—,
    • ์ถ”๊ฐ€์ ์ธ ์œ„์น˜ ์˜คํ”„์…‹ Δx ๋ฅผ ์˜ˆ์ธกํ•˜์—ฌ ์ตœ์ ์˜ ์œ„์น˜๋ฅผ ์กฐ์ •ํ•จ.
    • ์ตœ์ข… ์œ„์น˜๋Š”:

  • Triplane ํŠน์ง• + ์ด๋ฏธ์ง€ ํŠน์ง• ๊ฒฐํ•ฉ (Texture ํ’ˆ์งˆ ํ–ฅ์ƒ)
    • Triplane์—์„œ ๊ฐ€์ ธ์˜จ ํŠน์ง• f ๋งŒ์œผ๋กœ๋Š” ์ถฉ๋ถ„ํ•˜์ง€ ์•Š์Œ.
      • : Triplane ๊ธฐ๋ฐ˜์˜ 3D ๊ณต๊ฐ„์  ํŠน์ง•
      • fl : ์ด๋ฏธ์ง€ ๊ธฐ๋ฐ˜์˜ ๋กœ์ปฌ ํŠน์ง•
    • ์ตœ์ข… ํŠน์ง• ๋ฒกํ„ฐ ๋Š”:

  • ์ด๋ฏธ์ง€ ๊ธฐ๋ฐ˜ ๋กœ์ปฌ ํŠน์ง• ์ถ”๊ฐ€ (Self-Occlusion ํ•ด๊ฒฐ)
    • Self-Occlusion (์ž์ฒด ๊ฐ€๋ฆผ ํ˜„์ƒ) ๋ฌธ์ œ ํ•ด๊ฒฐ์„ ์œ„ํ•ด, ์ด๋ฏธ์ง€์—์„œ ์ง์ ‘ ํŠน์ง•์„ ๊ฐ€์ ธ์˜ด.
    • ํˆฌ์˜(Projection) ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜์—ฌ Point Cloud์˜ ๊ฐ ์ ์„ ์›๋ณธ ์ด๋ฏธ์ง€ ์œ„์— ๋งคํ•‘:

      • ์—ฌ๊ธฐ์„œ P()๋Š” ํˆฌ์˜ ํ•จ์ˆ˜(projection function)
      • π ๋Š” ์นด๋ฉ”๋ผ์˜ ์œ„์น˜/๊ฐ๋„ ์ •๋ณด(Camera Pose)
      • P ๋Š” ์  ํด๋ผ์šฐ๋“œ(Point Cloud)
    • ์ถ”๊ฐ€๋˜๋Š” ๋กœ์ปฌ ์ด๋ฏธ์ง€ ํŠน์ง•:
      • RGB ์ƒ‰์ƒ
      • DINOv2 Feature (์‚ฌ์ „ ํ•™์Šต๋œ ์ด๋ฏธ์ง€ ํŠน์ง•)
      • Mask (๊ฐ์ฒด ์˜์—ญ์„ ๊ตฌ๋ถ„ํ•˜๋Š” ๋งˆ์Šคํฌ)
      • 2D Distance Transform (๊ฐ์ฒด ๊ฒฝ๊ณ„ ์ •๋ณด ํ™œ์šฉ)

 

โœ… ์™œ 3D ์ขŒํ‘œ ๋ฅผ Triplane์— Projection ํ•˜๋Š”๊ฐ€?

์˜๋ฌธ์ : 3D ์ขŒํ‘œ x ๋ฅผ Triplane ์— ํˆฌ์˜ ์‹œํ‚ค๋ฉด ์ •๋ณด๋“ค์ด ๋˜ฎ~!! ๋‚˜ํƒ€๋‚˜๋Š”๊ฐ€??
→ ์•„๋‹ˆ์šฉ~ ํˆฌ์˜ ์ž์ฒด๋Š” ์ •๋ณด๋ฅผ ์ƒˆ๋กœ ๋งŒ๋“œ๋Š” ๊ฒŒ ์•„๋‹ˆ๋ผ, ์ด๋ฏธ Triplane ์— ์ €์žฅ๋œ ์ •๋ณด๋ฅผ ์ฐพ์•„์˜ค๋Š” ๊ณผ์ •!!

  • Triplane ์—๋Š” ์ด๋ฏธ 3D ๊ณต๊ฐ„์˜ ์ •๋ณด๊ฐ€ ๊ฐ„์ ‘์ ์œผ๋กœ ์ €์žฅ๋˜์–ด ์žˆ์Œ!
  • 3D ์œ„์น˜ x ๋ฅผ Triplane ์— ํˆฌ์˜ํ•˜๋ฉด, ์ €์žฅ๋œ ํŠน์ง• ๋ฒกํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ์Œ
  • ์ฆ‰, Triplane ์ž์ฒด๊ฐ€ 3D Gaussian ์†์„ฑ์„ ํฌํ•จํ•  ์ˆ˜ ์žˆ๋„๋ก ์‚ฌ์ „ ํ•™์Šต๋จ!!

๊ทธ๋Ÿผ ์–ด๋–ป๊ฒŒ Triplane ์„ ํ•™์Šต ํ–ˆ๋Š”๊ฐ€??

  1. Triplane ์€ ํ•™์Šต ๊ฐ€๋Šฅํ•œ ํ‘œํ˜„ (Learned Representation)
    • ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๋ฅผ ํ†ตํ•ด ํ•™์Šต๋˜๋ฉด์„œ 3D ๊ณต๊ฐ„์˜ ์ •๋ณด(๋ฐ€๋„, Opacity, Spherical Harmonics ๋“ฑ)๋ฅผ ์ €์žฅ.
    • ๋งˆ์น˜ "๋ฉ”๋ชจ๋ฆฌ ์ €์žฅ์†Œ" ์ฒ˜๋Ÿผ ๋™์ž‘
  2. MLP ๋Š” Triplane ์„ ์ง์ ‘ ํ•™์Šตํ•˜๋ฉฐ 3D ์†์„ฑ์„ ์ธ์ฝ”๋”ฉ
    • MLP๋Š” ๋‹จ์ˆœํ•œ ์„ ํ˜• ๋ณ€ํ™˜์ด ์•„๋‹ˆ๋ผ, 3D ์ •๋ณด๋ฅผ ํ•™์Šตํ•˜์—ฌ ์ธ์ฝ”๋”ฉ๋œ ๊ตฌ์กฐ
    • Triplane ์€ MLP ์™€ ํ•จ๊ป˜ ํ•™์Šต๋˜๋ฉด์„œ ๋‹จ์ผ ์ด๋ฏธ์ง€์—์„œ 3D ๊ตฌ์กฐ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋„๋ก ์ตœ์ ํ™”๋จ.
    • ํ•™์Šต ๊ณผ์ • ๐Ÿ’š
      1. ์ด๋ฏธ์ง€ → 2D Feature ์ถ”์ถœ
        1. ์ด๋ฏธ์ง€๊ฐ€ ์ž…๋ ฅ๋˜๋ฉด, Transformer ๊ธฐ๋ฐ˜ ๋„คํŠธ์›Œํฌ๊ฐ€ 2D ์ด๋ฏธ์ง€์˜ ํŠน์ง•์„ ์ถ”์ถœ
      2. Triplane ์„ ์ƒ์„ฑํ•˜๋ฉด์„œ 3D ์ •๋ณด ํ•™์Šต
        1. 3๊ฐœ์˜ ํ‰๋ฉด์— 3D ์ •๋ณด๋ฅผ ์ €์žฅํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ํ•™์Šต๋จ
      3. Projection ์„ ํ†ตํ•ด ํŠน์ง•์„ ๊ฐ€์ ธ์˜ด
        1. ํŠน์ • 3D ์œ„์น˜์—์„œ Projection ์„ ์ˆ˜ํ–‰ํ•˜๋ฉด, ํ•ด๋‹น ์œ„์น˜์—์„œ ํ•™์Šต๋œ 3D ํŠน์ง•์„ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ์Œ
      4. MLP ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ 3D Gaussian ์†์„ฑ ๋””์ฝ”๋”ฉ
        1. ฯ•gโ€‹(x,f) ์—์„œ MLP ๊ฐ€ 3D Gaussian ์†์„ฑ(๋ฐ€๋„, Opacity, SH ๋“ฑ)์„ ๋ณต์›
      5. Loss ๋ฅผ ํ†ตํ•ด ํ•™์Šต ์ตœ์ ํ™”
        1. ๋ Œ๋”๋ง๋œ ์ด๋ฏธ์ง€์™€ ์›๋ณธ ์ด๋ฏธ์ง€๋ฅผ ๋น„๊ตํ•˜๋ฉฐ ํ•™์Šต์ด ์ง„ํ–‰

โ–ถ ์ฆ‰, Pre-trained ๋ชจ๋ธ ์—†์ด๋„ Triplane ๊ณผ MLP ์˜ End-to-End ํ•™์Šต์„ ํ†ตํ•ด 3D ๊ณต๊ฐ„์„ ๊ตฌํ˜„ํ•ด๋ƒ„.

 

2๏ธโƒฃ Rendering.

์ด ๋…ผ๋ฌธ์€ 3DGS ์˜ ๋ Œ๋”๋ง ๋ฐฉ์‹์„ ์ฐจ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.

๊ฐœ๋… ์„ค๋ช…
Differentiable Tile-Based Rasterization ๋ฏธ๋ถ„ ๊ฐ€๋Šฅํ•˜๊ณ , ํƒ€์ผ ๋‹จ์œ„๋กœ ๋‚˜๋ˆ ์„œ ๋น ๋ฅด๊ฒŒ ๋ Œ๋”๋ง
Fast α-Blending of Anisotropic Splats ๋น„๋“ฑ๋ฐฉ์„ฑ Gaussian์„ ๋น ๋ฅด๊ฒŒ ํ˜ผํ•ฉํ•˜์—ฌ ๋ถ€๋“œ๋Ÿฌ์šด ๋ Œ๋”๋ง
Fast Backward Pass by Tracking α Values ๋ˆ„์ ๋œ α ๊ฐ’์„ ์ถ”์ ํ•˜์—ฌ ๋น ๋ฅด๊ฒŒ ์—ญ์ „ํŒŒ ๊ฐ€๋Šฅ
Higher Resolution & Lower GPU Cost ๊ณ ํ•ด์ƒ๋„ ์ด๋ฏธ์ง€ ํ•™์Šต ๊ฐ€๋Šฅ, ์ ์€ GPU ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ

 


 

2๏ธโƒฃ Reconstruction from Single-View Images

 



Image Encoding

์ž…๋ ฅ ์ด๋ฏธ์ง€๋ฅผ ์–ด๋–ป๊ฒŒ ์ฒ˜๋ฆฌํ•˜์—ฌ 3D ๋ณต์›์— ํ™œ์šฉํ•˜๋Š”์ง€์— ๋Œ€ํ•ด ์•Œ์•„๋ด…์‹œ๋‹ค.

  1. Pre-trained ViT ๊ธฐ๋ฐ˜ DINOv2 ๋ฅผ ์‚ฌ์šฉํ•ด ์ด๋ฏธ์ง€ ์ถ”์ถœ
    1. DINOv2 (ViT ๊ธฐ๋ฐ˜ Encoder(transformer ๊ธฐ๋ฐ˜)) ์„ ์‚ฌ์šฉํ•˜์—ฌ,
    2. ์ž…๋ ฅ ์ด๋ฏธ์ง€๋ฅผ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ํŒจ์น˜(patch) ๋กœ ๋‚˜๋ˆˆ ํ›„, Transformer์„ ์ด์šฉํ•ด ๊ฐ ํŒจ์น˜์— ๋Œ€ํ•œ Feature Tokens ๋ฅผ ์ถ”์ถœ
    3. ์ด Feature Tokens ์ด Triplane ๋ฐ Point Cloud ์ƒ์„ฑ์„ ์œ„ํ•œ ๊ธฐ์ดˆ์ ์ธ ํŠน์ง• ๋ฒกํ„ฐ ์—ญํ• ์„ ํ•จ
  2. ์นด๋ฉ”๋ผ ์ •๋ณด(Camera Parameters, ์ด๋ฏธ ์•Œ๊ณ ์žˆ๋‹ค๊ณ  ๊ฐ€์ •)๋ฅผ ํ™œ์šฉํ•œ Adaptive Layer Normalization(AdaLN)
    1. 3D ๋ณต์›์„ ๋” ์ •ํ™•ํ•˜๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด, ์นด๋ฉ”๋ผ ์ •๋ณด(์‹œ์  ์ •๋ณด)๋ฅผ ์ด๋ฏธ์ง€ ํŠน์ง•์— ๋ฐ˜์˜
    2. ์นด๋ฉ”๋ผ ์ •๋ณด๋ž€?
      1. ์นด๋ฉ”๋ผ Extrinsic Matrix T∈R^4×4 → ์นด๋ฉ”๋ผ์˜ ์œ„์น˜ ๋ฐ ๋ฐฉํ–ฅ ์ •๋ณด
      2. ์นด๋ฉ”๋ผ Intrinsic Matrix K∈R^3×3 → ์นด๋ฉ”๋ผ ๋ Œ์ฆˆ์˜ ์ดˆ์  ๊ฑฐ๋ฆฌ, ์™œ๊ณก ์ •๋ณด
    3. AdaLN์ด ํ•˜๋Š” ์ผ
      1. ์นด๋ฉ”๋ผ ๋งคํŠธ๋ฆญ์Šค๋ฅผ ํŽผ์ณ์„œ(Flatten) ๋ฒกํ„ฐ ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜ ํ›„, ๊ณ ์ฐจ์› ์นด๋ฉ”๋ผ ํŠน์ง• fc∈R^25๋กœ ๋งคํ•‘
      2. ์ด ์นด๋ฉ”๋ผ ํŠน์ง•์„ ํ™œ์šฉํ•ด ์ด๋ฏธ์ง€ ํŠน์ง•์„ ์กฐ์ • → ์ฆ‰, ViT๊ฐ€ ์ถ”์ถœํ•œ ์ด๋ฏธ์ง€ ํŠน์ง•์ด ์นด๋ฉ”๋ผ ์‹œ์ (Viewpoint)์— ๋งž๊ฒŒ ์กฐ์ ˆ๋จ
      3. MLP๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ Scale(์Šค์ผ€์ผ)๊ณผ Shift(์ด๋™๊ฐ’)์„ ์˜ˆ์ธกํ•˜์—ฌ, ์ด๋ฏธ์ง€ ํŠน์ง•์„ ์ ์ ˆํžˆ ๋ณ€ํ™˜.

 

 

Transformer Backbone

  • ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ(Points) ์™€ Triplane ์„ ์œ„ํ•œ Feature Tokens ์‚ฌ์šฉ
    • ๋…ผ๋ฌธ์—์„œ๋Š” ๋‘๊ฐ€์ง€ 3D ํ‘œํ˜„ ๋ฐฉ์‹(Points & Triplane) ์— ๋Œ€ํ•ด ๋ณ„๋„์˜ Feature Tokens๋ฅผ ์‚ฌ์šฉ
    • ๊ฐ๊ฐ์˜ Feature Token ์€ Transformer ์— ์ž…๋ ฅ๋˜์–ด, 3D ๋ณต์›์„ ์œ„ํ•œ ํŠน์ง•์„ ํ•™์Šตํ•จ.
      • {fiโ€‹}p: ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ์— ๋Œ€ํ•œ Feature Tokens
      • {fiโ€‹}t: triplane ์— ๋Œ€ํ•œ Feature Tokens
  • Learnable Positional Embedding ์„ ์‚ฌ์šฉํ•˜์—ฌ Feature Token ์„ ์ดˆ๊ธฐํ™”
    • Transformer ๋Š” ์ˆœ์„œ๋‚˜ ์œ„์น˜ ์ •๋ณด๋ฅผ ์ง์ ‘ ์ธ์‹ํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์—, "์œ„์น˜ ์ •๋ณด๋ฅผ ์ถ”๊ฐ€" ํ•ด์•ผํ•จ.
    • ์ด๋ฅผ ์œ„ํ•ด, feature token ์„ learnable positional embedding ์œผ๋กœ ์ดˆ๊ธฐํ™”.
  • Transformer ๋ธ”๋ก์˜ ๊ตฌ์„ฑ
    • ๊ฐ Transformer ๋ธ”๋ก์€ 3๊ฐœ์˜ ์ฃผ์š” ๋ ˆ์ด์–ด๋กœ ๊ตฌ์„ฑ๋จ
    • Self-Attention Layer
      • ๊ฐ Feature Token ์ด ๊ฐ™์€ 3D ํ‘œํ˜„ ๋‚ด์—์„œ ์„œ๋กœ ์ •๋ณด๋ฅผ ๊ตํ™˜ํ•˜๋Š” ๊ณผ์ •
      • ์˜ˆ๋ฅผ ๋“ค์–ด, ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ ๋‚ด๋ถ€์—์„œ ์–ด๋–ค ์ ์ด ๋‹ค๋ฅธ ์ ๋“ค๊ณผ ์–ด๋–ป๊ฒŒ ์—ฐ๊ด€๋˜๋Š”์ง€ ํ•™์Šตํ•จ
    • Cross-Attention Layer
      • ์ด๋ฏธ์ง€์—์„œ ์ถ”์ถœํ•œ ํ† ํฐ(Viewpoint-Augmented Image Tokens, ์นด๋ฉ”๋ผ ์‹œ์  ์ •๋ณด๋ฅผ ํฌํ•จํ•œ ์ด๋ฏธ์ง€ ํŠน์ง•)๊ณผ 3D Token ์„ ์—ฐ๊ฒฐํ•˜๋Š” ๊ณผ์ •
      • ์ฆ‰, ์ด๋ฏธ์ง€์—์„œ ์–ป์€ ์ •๋ณด๋ฅผ ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ ๋ฐ Triplane์— ๋ฐ˜์˜ํ•˜์—ฌ ๋” ์ •ํ™•ํ•œ 3D ๋ณต์› ๊ฐ€๋Šฅ.
    • Feed-Forward Layer (MLP)
      • Self-Attention๊ณผ Cross-Attention์„ ๊ฑฐ์นœ Feature Token์„ ์ตœ์ข…์ ์œผ๋กœ ๊ฐ€๊ณตํ•˜๋Š” ๋‹จ๊ณ„.

 

 

Point Cloud Decoder

  • Point Cloud Decoder ์˜ ์—ญํ• 
    • Point Cloud Decoder ๋Š” 3D ๊ฐ์ฒด์˜ ๋Œ€๋žต์ ์ธ ํ˜•์ƒ(geometry) ์„ ์ƒ์„ฑํ•˜๋Š” ์—ญํ• ์„ ํ•จ.
    • ์ด ๋‹จ๊ณ„์—์„œ ์ƒ์„ฑ๋œ Point Cloud ์ขŒํ‘œ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ 3D Gaussians ์„ ๋ฐฐ์น˜ํ•  ์ˆ˜ ์žˆ์Œ
    • ์ฆ‰, ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ๊ฐ€ ๋จผ์ € ์ƒ์„ฑ๋˜๊ณ , ์ด๋ฅผ ์ด์šฉํ•ด 3DGS ์ˆ˜ํ–‰.
  • Transformer ๋ฅผ ํ™œ์šฉํ•œ Point cloud ์ƒ์„ฑ
    • ViT ๊ธฐ๋ฐ˜ Encoder๊ฐ€ 2D ์ด๋ฏธ์ง€์—์„œ Feature Tokens์„ ์ถ”์ถœ.
    • 6-layer Transformer Backbone ์„ ์‚ฌ์šฉํ•˜์—ฌ Point Cloud ๋ฅผ ๋””์ฝ”๋”ฉ 
      • "Learnable Positional Embeddings" ์„ ์‚ฌ์šฉํ•˜์—ฌ 3D Point Cloud๋ฅผ ์ง์ ‘ ์ƒ์„ฑ.
      • ์ด embedding๋“ค์€ Point Cloud Token ์—ญํ• ์„ ํ•˜๋ฉฐ, ์ตœ์ข…์ ์œผ๋กœ 3D ํฌ์ธํŠธ๊ฐ€ ๋จ
      • ์—ฌ๊ธฐ์„œ๋Š” coarse ํ•˜๊ฒŒ 2048 ๊ฐœ์˜ point cloud ๋งŒ์„ decoding 
    • Transformer๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šต๋œ ํ† ํฐ(Token)๋“ค์„ 3D ์ขŒํ‘œ(Point Cloud)๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋ฐฉ์‹!

 

 

Point Upsampling with Projection-Aware Conditioning

  • ์œ„์—์„œ ๋งŒ๋“ค์–ด์ง„ Point Cloud ์ ๋“ค์€ low-resolution ์ด๊ธฐ ๋•Œ๋ฌธ์— 3D gaussian ์„ ์ƒ์„ฑํ•˜๊ธฐ์— ์ถฉ๋ถ„ํ•˜์ง€ ์•Š์Œ
  • 2 step ์˜ Snowflake point deconvolution (SPD) ์„ ์ด์šฉํ•˜์—ฌ 2048๊ฐœ → 16384 ๊ฐœ์˜ point ๋กœ densify ํ•จ
    (up-sampling)
  • SnowflakeNet ์˜ coarse → detailed ๊ณผ์ •
    • Global Shape code ๋ฅผ ์ถ”์ถœ
      • ์ž…๋ ฅ๋œ ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ์—์„œ ์ „์ฒด์ ์ธ ํ˜•์ƒ Shape ๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ์ฝ”๋“œ๋ฅผ ์ถ”์ถœ → ๋ฒกํ„ฐ ์ •๋ณด
    • Point Displacement (์  ์ด๋™) ์˜ˆ์ธก์„ ํ†ตํ•ด ์—… ์ƒ˜ํ”Œ๋ง
      • ๊ธฐ์กด ์ ์—์„œ ์•ฝ๊ฐ„ ์ด๋™ํ•œ ์ƒˆ๋กœ์šด ์ ์„ ์ถ”๊ฐ€ํ•˜๋ฉด์„œ ์—… ์ƒ˜ํ”Œ๋ง ์ˆ˜ํ–‰
  • SnowflakeNet ์€ ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ๋งŒ์œผ๋กœ๋Š” ๋ถ€์กฑํ•œ ์„ธ๋ถ€์ ์ธ ํ˜•์ƒ์„ ๋ณด์™„ํ•˜๊ธฐ ์œ„ํ•ด, ์ด๋ฏธ์ง€ ์ •๋ณด๋ฅผ ํ™œ์šฉ
    • ์ด๋ฅผ ์œ„ํ•ด Projection-Aware Conditioning ๊ธฐ๋ฒ• ์ ์šฉ
      • ์ด๋ฏธ์ง€์—์„œ ์–ป์€ ํŠน์ง•์„ 3D ๋ณต์› ๊ณผ์ •์— ๋ฐ˜์˜ํ•˜๊ธฐ ์œ„ํ•ด, ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ๋ฅผ ์ด๋ฏธ์ง€ ๊ณต๊ฐ„์œผ๋กœ ํˆฌ์˜(Projection) ํ•˜๋Š” ๊ธฐ๋ฒ•
      • ๊ฐ 3D ํฌ์ธํŠธ๋ฅผ 2D ์ด๋ฏธ์ง€ ์ƒ์˜ ๋Œ€์‘๋˜๋Š” ์œ„์น˜๋กœ ๋งคํ•‘ํ•˜์—ฌ, ํ•ด๋‹น ์œ„์น˜์˜ ์ด๋ฏธ์ง€ ํŠน์ง•์„ ๊ฐ€์ ธ์˜ด
    • ์ฆ‰, ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ๋ฅผ ์ด๋ฏธ์ง€์— ํˆฌ์˜ํ•˜์—ฌ,์ด๋ฏธ์ง€์—์„œ ์–ป์€ ๋กœ์ปฌํŠน์ง•์„ Point Cloud ์˜ shape Code์— ๋ฐ˜์˜

 

 

Triplane Decoder with Geometry-Aware Encoding

  • Triplane Decoder ์˜ ์—ญํ• 
    • 3D ๊ณต๊ฐ„์˜ ํŠน์ง•์„ ์ €์žฅํ•˜๋Š” Implicit Feature Field ๋ฅผ ์ƒ์„ฑ
    • ์ž…๋ ฅ ๊ฐ’
      • ์ด๋ฏธ์ง€(Image Tokens)
      • ์ดˆ๊ธฐ Point Cloud (๋Œ€๋žต์ ์ธ ํ˜•์ƒ์„ ๋‚˜ํƒ€๋‚ด๋Š” ์  ์ง‘ํ•ฉ)
    • ์ถœ๋ ฅ ๊ฐ’
      • Triplane (3๊ฐœ์˜ 2D ํ‰๋ฉด์— ์ €์žฅ๋œ 3D ๊ณต๊ฐ„ ํŠน์ง•)
      • ์ดํ›„, Triplane ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํŠน์ • ์œ„์น˜์—์„œ 3D Gaussian ์†์„ฑ์„ ๋””์ฝ”๋”ฉ ๊ฐ€๋Šฅ.
    • 10-layer Transformer ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋” ์ •๊ตํ•œ ํŠน์ง• ํ•™์Šต
  • ๊ธฐ์กด์˜ ๋‹จ์ˆœ Positional Embedding ๋Œ€์‹ , Point Cloud๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ Positional Embedding์„ ์‚ฌ์šฉํ•˜์—ฌ ๋” ๋‚˜์€ Geometry-Aware Encoding์„ ์ˆ˜ํ–‰!
  • Point Cloud ๋ฅผ PointNet ์„ ์‚ฌ์šฉํ•˜์—ฌ Local Feature ๋ฅผ ํ•™์Šตํ•œ ํ›„, Triplane ์— ํˆฌ์˜(Projection)
    • PointNet ์„ ์‚ฌ์šฉํ•˜์—ฌ ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ์˜ ์ง€์—ญ์ ์ธ ํŠน์ง•์„ ํ•™์Šต
    • ์ด ๊ณผ์ •์—์„œ ๊ฐ™์€ ์œ„์น˜์— ํˆฌ์˜๋œ ํŠน์ง•๋“ค์€ Average Pooling ์„ ์‚ฌ์šฉํ•˜์—ฌ ํ•˜๋‚˜์˜ ๊ฐ’์œผ๋กœ ํ•ฉ์นจ

 

 

Training

  • ์ตœ์ข…์ ์œผ๋กœ 3D Gaussian์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ Œ๋”๋งํ•œ ์ด๋ฏธ์ง€๊ฐ€ ์ž…๋ ฅ ์ด๋ฏธ์ง€์™€ ์ตœ๋Œ€ํ•œ ์œ ์‚ฌํ•ด์•ผ ํ•จ.
Loss Description
LCD (Chamfer Distance) ์˜ˆ์ธก๋œ ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ๊ฐ€ GT ๋ฐ์ดํ„ฐ์™€ ์ž˜ ์ •๋ ฌ๋˜๋„๋ก ํ•™์Šต
LEMD (Earth Mover’s Distance) ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ ๊ฐ„ ์ตœ์ ์˜ ์ด๋™ ๊ฒฝ๋กœ๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ์ •๋ฐ€ํ•œ ์ •๋ ฌ ์œ ๋„
LMSE (Mean Squared Error) ๋ Œ๋”๋ง๋œ ์ด๋ฏธ์ง€๊ฐ€ ์›๋ณธ ์ด๋ฏธ์ง€์™€ ์ตœ๋Œ€ํ•œ ์œ ์‚ฌํ•˜๋„๋ก ํ•™์Šต
LMASK (Mask Loss) ๊ฐ์ฒด์˜ ํ˜•ํƒœ(์‹ค๋ฃจ์—ฃ)๊ฐ€ ์ •ํ™•ํžˆ ๋ณต์›๋˜๋„๋ก ๋งˆ์Šคํฌ ์ •๋ ฌ
LSSIM (Structural Similarity) ๊ตฌ์กฐ์  ์œ ์‚ฌ์„ฑ์„ ์ฆ๊ฐ€์‹œ์ผœ ๋” ์ž์—ฐ์Šค๋Ÿฌ์šด 3D ๋ณต์› ๊ฐ€๋Šฅ
LLPIPS (Perceptual Loss) ๊ณ ์ˆ˜์ค€ ํŠน์ง•์„ ๋น„๊ตํ•˜์—ฌ ์‚ฌ๋žŒ์˜ ๋ˆˆ์— ์ž์—ฐ์Šค๋Ÿฌ์šด 3D ๋ณต์› ์œ ๋„

 

 

 

 

AI๋Š” ๋ธ”๋ž™๋ฐ•์Šค๋‹ค......