๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๐Ÿ˜ŽAI/3D Reconstruction

[Paper Review] Wonder3D: Single Image to 3D Using Cross-Domain Diffusion

by SolaKim 2025. 3. 13.

https://arxiv.org/abs/2310.15008

 

Wonder3D: Single Image to 3D using Cross-Domain Diffusion

In this work, we introduce Wonder3D, a novel method for efficiently generating high-fidelity textured meshes from single-view images.Recent methods based on Score Distillation Sampling (SDS) have shown the potential to recover 3D geometry from 2D diffusion

arxiv.org

 

์ด ๋…ผ๋ฌธ์—์„œ ์ œ์‹œํ•˜๋Š” Wonder3D ๋Š” ๋‹ค์ค‘ ์‹œ์  ์ผ๊ด€์„ฑ์„ ์œ ์ง€ํ•˜๋Š” ๋ฒ•์„  ๋งต(normal maps) ๊ณผ ์ƒ‰์ƒ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ๋‹จ์ผ ์ด๋ฏธ์ง€ ๊ธฐ๋ฐ˜ 3D ๋ณต์›์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

 

๋ฌธ์ œ ์ •์˜:

  • ๋‹จ์ผ ์ด๋ฏธ์ง€์—์„œ 3D ๊ธฐํ•˜ํ•™์„ ๋ณต์›ํ•˜๋Š” ๊ฒƒ์€ ๊ทธ๋ž˜ํ”ฝ์Šค, VR, ๊ฒŒ์ž„, ๋กœ๋ด‡ ๋“ฑ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ์ค‘์š”ํ•œ ๋ฌธ์ œ
  • ํ•˜์ง€๋งŒ ์ด ๋ฌธ์ œ๋Š” ill-posed(๋ถˆ์™„์ „ ์ •์˜) ๋˜์–ด ์žˆ์–ด, ๋ณด์ด์ง€ ์•Š๋Š” ๋ถ€๋ถ„๊นŒ์ง€ ์˜ˆ์ธกํ•ด์•ผ ํ•จ
  • 2D ํ™•์‚ฐ ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์ด ๋“ฑ์žฅํ–ˆ์ง€๋งŒ, ํšจ์œจ์„ฑ๊ณผ ์ผ๊ด€์„ฑ ๋ฌธ์ œ๊ฐ€ ์กด์žฌ

 

๊ธฐ์กด ์—ฐ๊ตฌ ๋ฌธ์ œ์ :

  • Score Distillation Sampling(SDS) ๊ธฐ๋ฐ˜ ๋ชจ๋ธ: ์ตœ์ ํ™” ๊ณผ์ •์ด ์˜ค๋ž˜ ๊ฑธ๋ฆผ (์ˆ˜์‹ญ ๋ถ„~์ˆ˜ ์‹œ๊ฐ„ ์†Œ์š”)
  • 2D ๊ธฐ๋ฐ˜ ๋ชจ๋ธ: ๊ฐ ๋ทฐ๋ฅผ ๋…๋ฆฝ์ ์œผ๋กœ ์ƒ์„ฑ โ†’ ๋‹ค์ค‘ ์‹œ์ ์—์„œ ์ผ๊ด€์„ฑ ๋ฌธ์ œ(Janus ๋ฌธ์ œ) ๋ฐœ์ƒ
  • 3D ์ง์ ‘ ์ƒ์„ฑ ๋ชจ๋ธ: ํ•™์Šต ๋ฐ์ดํ„ฐ ๋ถ€์กฑ์œผ๋กœ ๋ฒ”์šฉ์„ฑ์ด ๋–จ์–ด์ง
  • ๋‹ค์ค‘ ์‹œ์  ์ด๋ฏธ์ง€ ์ƒ์„ฑ ๋ชจ๋ธ(SyncDreamer, MVDream): ์ƒ‰์ƒ ์ •๋ณด๋งŒ ํ™œ์šฉํ•˜์—ฌ ๋ฒ•์„  ์ •๋ณด ๋ถ€์กฑ

 

 

Contribution

  • Multi-view cross-domain 2D diffusion (๋‹ค์ค‘ ์‹œ์  ๊ต์ฐจ ๋„๋ฉ”์ธ)
    • nomal maps ์™€ color images ๋ฅผ ์˜ˆ์ธกํ•œ๋‹ค.
    • 3D ํ‘œ๋ฉด์˜ ์„ธ๋ถ€ ์ •๋ณด๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํฌ์ฐฉ
  • Cross-domain attention mechanism (๊ต์ฐจ ๋„๋ฉ”์ธ ์–ดํ…์…˜) 
    • ๋‹ค์ค‘ ์‹œ์ ์—์„œ ์ผ๊ด€๋œ ๋ฒ•์„  ๋งต๊ณผ ์ƒ‰์ƒ ์ด๋ฏธ์ง€ ์ƒ์„ฑ
  • Geometry-aware normal fusion algorithm (์ง€์˜ค๋ฉ”ํŠธ๋ฆฌ ์ธ์‹ ๋ฒ•์„  ์œตํ•ฉ)
    • ์ƒ์„ฑ๋œ ๋ฒ•์„  ๋งต๊ณผ ์ƒ‰์ƒ ์ด๋ฏธ์ง€์—์„œ ๊นจ๋—ํ•œ 3D ํ˜•์ƒ ์ถ”์ถœ ๊ฐ€๋Šฅ

 

 

Related Works (๊ด€๋ จ ์—ฐ๊ตฌ)

2D ํ™•์‚ฐ ๋ชจ๋ธ์„ ์ด์šฉํ•œ 3D ์ƒ์„ฑ

  • DreamFusion, Magic3D ๋“ฑ์€ 2D ํ™•์‚ฐ ๋ชจ๋ธ์„ ์ด์šฉํ•ด 3D๋ฅผ ์ƒ์„ฑํ•˜์ง€๋งŒ, ์ตœ์ ํ™” ๊ณผ์ •์ด ์˜ค๋ž˜ ๊ฑธ๋ฆฌ๊ณ , ์ผ๊ด€์„ฑ์ด ๋ถ€์กฑ

3D ์ƒ์„ฑ ๋ชจ๋ธ

  • Point Clouds, Meshes, Neural Fields ๋“ฑ ์ง์ ‘ 3D๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ชจ๋ธ ์กด์žฌ
  • ํ•˜์ง€๋งŒ 3D ๋ฐ์ดํ„ฐ ๋ถ€์กฑ์œผ๋กœ ์ธํ•ด ํŠน์ • ์นดํ…Œ๊ณ ๋ฆฌ์—์„œ๋งŒ ๋™์ž‘

๋‹ค์ค‘ ์‹œ์  ํ™•์‚ฐ ๋ชจ๋ธ(Multi-view Diffusion Models)

  • SyncDreamer, MVDream ๋“ฑ์€ ๋‹ค์ค‘ ์‹œ์ ์—์„œ ์ผ๊ด€์„ฑ์„ ๊ฐœ์„ ํ•˜๋ ค ํ•˜์ง€๋งŒ, ํ…์Šค์ฒ˜ ์ •๋ณด๋งŒ ํ™œ์šฉํ•˜์—ฌ ๋ฒ•์„  ์ •๋ณด ๋ถ€์กฑ

 

 

Problem Formulation (๋ฌธ์ œ ์ •์˜ ๋ฐ ๋ชจ๋ธ ์„ค๊ณ„)

๊ธฐ์กด 3D ์ž์‚ฐ ํ‘œํ˜„ ๋ฐฉ์‹ ๋ฌธ์ œ

      • ๊ธฐ์กด ์—ฐ๊ตฌ๋“ค์€ 3D ๋ฐ์ดํ„ฐ๋ฅผ ์ง์ ‘ ํ•™์Šต(NeRF, SDF, Mesh, Point Cloud) ํ•˜๊ฑฐ๋‚˜, ๋‹จ์ผ ๋ทฐ ๊ธฐ๋ฐ˜ ํ•™์Šต์„ ์ˆ˜ํ–‰
      • Wonder3D๋Š” 3D๋ฅผ ์ง์ ‘ ํ•™์Šตํ•˜์ง€ ์•Š๊ณ , "๋‹ค์ค‘ ์‹œ์  ๋ฒ•์„  ๋งต & ์ƒ‰์ƒ ์ด๋ฏธ์ง€"์˜ ๊ณต๋™ ๋ถ„ํฌ(Joint Distribution)๋ฅผ ํ•™์Šต
      • f(y,ฯ€1:K) (์ƒ์„ฑ ํ•จ์ˆ˜)
        • ์ž…๋ ฅ: ๋‹จ์ผ ์ด๋ฏธ์ง€ ์™€ ์นด๋ฉ”๋ผ ์ •๋ณด ฯ€1:K
        • ์ถœ๋ ฅ: ๋‹ค์ค‘ ์‹œ์  ๋ฒ•์„  ๋งต n1:K ๊ณผ ์ƒ‰์ƒ ์ด๋ฏธ์ง€ x1:K
        • ์ฆ‰, ์ด ํ•จ์ˆ˜๋Š” ๋‹จ์ผ ์ด๋ฏธ์ง€์—์„œ ์—ฌ๋Ÿฌ ์‹œ์ (View)์œผ๋กœ ํ™•์žฅ๋œ 2D ํ‘œํ˜„์„ ์ƒ์„ฑํ•˜๋Š” ์—ญํ• 
        • ฯ€1:K (์นด๋ฉ”๋ผ ์‹œ์  ์ •๋ณด)
          • K๊ฐœ์˜ ์นด๋ฉ”๋ผ ํŒŒ๋ผ๋ฏธํ„ฐ(๊ฐ๋„, ์œ„์น˜ ๋“ฑ) ์ง‘ํ•ฉ
          • ๋‹ค์ค‘ ์‹œ์ ์—์„œ์˜ 2D ๋ฒ•์„  ๋งต๊ณผ ์ƒ‰์ƒ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋ ค๋ฉด ๊ฐ ์‹œ์ (View)์—์„œ์˜ ์ขŒํ‘œ๊ณ„๋ฅผ ์•Œ์•„์•ผ ํ•จ
    •  

 

Wonder3D์˜ ์ˆ˜ํ•™์  ํ‘œํ˜„

    • ํ™•์‚ฐ ๋ชจ๋ธ์˜ ๋งˆ๋ฅด์ฝ”ํ”„ ์ฒด์ธ ๊ธฐ๋ฐ˜ ํ™•๋ฅ  ๋ถ„ํฌ๋กœ ๋ชจ๋ธ๋ง
    • ํฌ๋กœ์Šค ๋„๋ฉ”์ธ ํ™•์‚ฐ(Cross-Domain Diffusion)์„ ํ†ตํ•ด ๋ฒ•์„  ๋งต๊ณผ ์ƒ‰์ƒ ์ด๋ฏธ์ง€๋ฅผ ๋™์‹œ์— ์ƒ์„ฑ

  • ํ™•์‚ฐ ๋ชจ๋ธ์—์„œ ๋ฐ์ดํ„ฐ๋Š” ์ฒ˜์Œ์— ์™„์ „ํ•œ ๊ฐ€์šฐ์‹œ์•ˆ ๋…ธ์ด์ฆˆ ์ƒํƒœ์—์„œ ์‹œ์ž‘
  • ์ดํ›„ t ์—์„œ t-1 ๋กœ ์ ์ง„์ ์œผ๋กœ ๋…ธ์ด์ฆˆ๋ฅผ ์ œ๊ฑฐํ•˜๋ฉด์„œ ์ ์  ๋” ์„ ๋ช…ํ•œ ๋ฒ•์„  ๋งต๊ณผ ์ƒ‰์ƒ ์ด๋ฏธ์ง€๋ฅผ ๋ณต์›
  • ์ตœ์ข…์ ์œผ๋กœ t=0 ๋‹จ๊ณ„์—์„œ๋Š” ๊นจ๋—ํ•œ ๋‹ค์ค‘ ์‹œ์  ๋ฒ•์„  ๋งต๊ณผ ์ƒ‰์ƒ ์ด๋ฏธ์ง€๊ฐ€ ์ƒ์„ฑ
  • ๋งˆ๋ฅด์ฝ”ํ”„ ์ฒด์ธ์˜ ์†์„ฑ์„ ํ™œ์šฉํ•˜์—ฌ, ๊ฐ ๋‹จ๊ณ„์—์„œ์˜ ๋ณ€ํ™˜์ด ์ง์ „ ๋‹จ๊ณ„์—๋งŒ ์˜์กดํ•˜๋„๋ก ์„ค๊ณ„

 

Method

๋ฟฝ์ด~ ๋ถ€์—‰ ๋ถ€์—‰~~

Overview ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  1. ๋‹ค์ค‘ ์‹œ์  ํ™•์‚ฐ ๋ชจ๋ธ(Multi-view Diffusion Scheme) ์ ์šฉ
    • ๋‹ค์ค‘ ์‹œ์ ์—์„œ ๋ฒ•์„  ๋งต๊ณผ ์ƒ‰์ƒ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑ
    • ๋ฉ€ํ‹ฐ๋ทฐ ์–ดํ…์…˜(Multi-view Attention) ์„ ํ†ตํ•ด ์‹œ์  ๊ฐ„ ์ผ๊ด€์„ฑ ์œ ์ง€
  2. Multi-view attentions => ๋„๋ฉ”์ธ ์Šค์œ„์ฒ˜(Domain Switcher) ๋„์ž…
    • ๊ธฐ์กด ํ™•์‚ฐ ๋ชจ๋ธ(์˜ˆ: Stable Diffusion)์€ ๋‹จ์ผ ๋„๋ฉ”์ธ(์ด๋ฏธ์ง€) ๊ธฐ๋ฐ˜
    • ๋„๋ฉ”์ธ ์Šค์œ„์ฒ˜๋ฅผ ํ™œ์šฉํ•ด ๋ฒ•์„  ๋งต & ์ƒ‰์ƒ ์ด๋ฏธ์ง€ ๋‘ ๋„๋ฉ”์ธ์—์„œ ๋™์ž‘ ๊ฐ€๋Šฅ
    • ๊ธฐ์กด ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ์„ ์žฌํ•™์Šต ์—†์ด ํ™œ์šฉ ๊ฐ€๋Šฅ โ†’ ๋ฒ”์šฉ์„ฑ์ด ๋†’์•„์ง
  3. ํฌ๋กœ์Šค ๋„๋ฉ”์ธ ์–ดํ…์…˜(Cross-Domain Attention) ๋„์ž…
    • ๋ฒ•์„  ๋งต๊ณผ ์ƒ‰์ƒ ์ด๋ฏธ์ง€ ๊ฐ„ ์ •๋ณด ๊ตํ™˜์„ ํ†ตํ•ด ๊ธฐํ•˜ํ•™์  & ์‹œ๊ฐ์  ์ผ๊ด€์„ฑ ์œ ์ง€
  4. ๊ธฐํ•˜ํ•™ ๊ธฐ๋ฐ˜ ๋ฒ•์„  ์œตํ•ฉ(Geometry-aware Normal Fusion) ์ ์šฉ
    • ๋‹ค์ค‘ ์‹œ์  ๋ฒ•์„  ๋งต๊ณผ ์ƒ‰์ƒ ์ด๋ฏธ์ง€๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ๊ณ ํ’ˆ์งˆ 3D ํ˜•์ƒ ๋ณต์›

 

 

Consistent Multi-view Generation

๊ธฐ์กด 2D ํ™•์‚ฐ ๋ชจ๋ธ์€ ๊ฐ ์ด๋ฏธ์ง€๋ฅผ ๊ฐœ๋ณ„์ ์œผ๋กœ ์ƒ์„ฑํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋‹ค์ค‘ ์‹œ์  ๊ฐ„ ๊ธฐํ•˜ํ•™์  & ์‹œ๊ฐ์  ์ผ๊ด€์„ฑ์ด ๋ถ€์กฑํ•œ ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, Wonder3D๋Š” ๋‹ค์ค‘ ์‹œ์  ์ •๋ณด๋ฅผ ๊ณต์œ ํ•˜๋Š” ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ํ™œ์šฉํ•˜์—ฌ ์ผ๊ด€์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค.

  • ๋‹ค์ค‘ ์‹œ์  ์ •๋ณด ๊ณต์œ ๋ฅผ ์œ„ํ•œ ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜ ํ™œ์šฉ
    • ๊ธฐ์กด Self-Attention ์„ ํ™•์žฅํ•˜์—ฌ ๊ธ€๋กœ๋ฒŒ ์–ดํ…์…˜(Global-aware Attention) ์ ์šฉ
    • ๊ฐ ์‹œ์ (View) ๊ฐ„ Key-Value ๋ฅผ ์—ฐ๊ฒฐํ•˜์—ฌ ์ •๋ณด ๊ตํ™˜ ๊ฐ•ํ™”
      • Query: ํŠน์ • ์‹œ์ (View)์˜ ํ”ฝ์…€
      • Key/Value: ๊ฐ™์€ ๋„๋ฉ”์ธ(๋ฒ•์„  or ์ƒ‰์ƒ) ๋‚ด์˜ ๋‹ค๋ฅธ ์‹œ์ (View) ํ”ฝ์…€
    • ์ฆ‰, ํ•œ ์‹œ์ ์—์„œ ์–ป์€ ์ •๋ณด๊ฐ€ ๋‹ค๋ฅธ ์‹œ์ ์˜ ์ด๋ฏธ์ง€ ์ƒ์„ฑ์—๋„ ๋ฐ˜์˜๋˜๋„๋ก ํ•จ
  • ์–ดํ…์…˜์„ ํ†ตํ•ด ๋‹ค์ค‘ ์‹œ์  ๊ฐ„์˜ ์˜์กด์„ฑ์„ ํ•™์Šต
    • ๋ชจ๋ธ์ด ๋‹ค์ค‘ ์‹œ์  ๊ฐ„ ๊ณตํ†ต์ ์ธ ํŠน์ง•์„ ์ธ์‹ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•™์Šต
    • ๊ทธ ๊ฒฐ๊ณผ, ์ผ๊ด€๋œ ๋‹ค์ค‘ ์‹œ์  ์ƒ‰์ƒ ์ด๋ฏธ์ง€ & ๋ฒ•์„  ๋งต ์ƒ์„ฑ ๊ฐ€๋Šฅ!

 

 

Cross-Domain Diffusion

Naive Solutions

 

  • ๊ธฐ์กด Stable Diffusion ๋ชจ๋ธ์€ ๋‹จ์ผ ๋„๋ฉ”์ธ(์ด๋ฏธ์ง€) ์ƒ์„ฑ ์ „์šฉ์œผ๋กœ ์„ค๊ณ„๋จ
  • Wonder3D์—์„œ๋Š” ๋ฒ•์„  ๋งต(Geometry)๊ณผ ์ƒ‰์ƒ ์ด๋ฏธ์ง€(Color) ๋‘ ๊ฐ€์ง€ ๋„๋ฉ”์ธ์„ ๋‹ค๋ค„์•ผ ํ•จ
  • ๋”ฐ๋ผ์„œ, Stable Diffusion์„ ์–ด๋–ป๊ฒŒ ๋‹ค์ค‘ ๋„๋ฉ”์ธ์œผ๋กœ ํ™•์žฅํ•  ๊ฒƒ์ธ์ง€๊ฐ€ ํ•ต์‹ฌ ๊ณผ์ œ

๋‹จ์ˆœํ•œ ํ•ด๊ฒฐ์ฑ…(ํ•˜์ง€๋งŒ ํ•œ๊ณ„๊ฐ€ ์žˆ๋Š” ๋ฐฉ๋ฒ•๋“ค)

  1. ์ถœ๋ ฅ ์ฑ„๋„์„ ํ™•์žฅํ•˜๋Š” ๋ฐฉ๋ฒ•
    • UNet์˜ ์ถœ๋ ฅ์— ๋ฒ•์„  ๋งต์„ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ์‹(์ฑ„๋„ ์ˆ˜ ํ™•์žฅ)
    • ๊ทธ๋Ÿฌ๋‚˜, ๊ธฐ์กด ๋ชจ๋ธ์˜ ์‚ฌ์ „ ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜(Pre-trained Weights)๊ฐ€ ์†์ƒ๋  ์œ„ํ—˜ โ†’ ํ•™์Šต ์†๋„๊ฐ€ ๋Š๋ ค์ง€๊ณ  ๋ฒ”์šฉ์„ฑ์ด ๋–จ์–ด์ง
  2. 2๋‹จ๊ณ„ ํ›ˆ๋ จ ๋ฐฉ๋ฒ•(๋ฒ•์„  ๋งต ๋จผ์ € ์ƒ์„ฑ ํ›„, ์ƒ‰์ƒ ์ด๋ฏธ์ง€ ์ƒ์„ฑ)
    • ์ฒซ ๋ฒˆ์งธ ๋ชจ๋ธ์ด ๋ฒ•์„  ๋งต์„ ์ƒ์„ฑํ•˜๊ณ ,
    • ๋‘ ๋ฒˆ์งธ ๋ชจ๋ธ์ด ์ƒ์„ฑ๋œ ๋ฒ•์„  ๋งต์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ƒ‰์ƒ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ์‹
    • ํ•˜์ง€๋งŒ, ์—ฐ์‚ฐ ๋น„์šฉ์ด ํฌ๊ฒŒ ์ฆ๊ฐ€ํ•˜๊ณ , ์„ฑ๋Šฅ์ด ์ €ํ•˜๋จ

 

=> ์œ„ ๋ฐฉ๋ฒ•๋“ค์€ ํ˜„์‹ค์ ์œผ๋กœ ํšจ๊ณผ์ ์ด์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ๋” ๋‚˜์€ ๋ฐฉ๋ฒ•์ด ํ•„์š”!!

 

Domain Switcher

๊ทธ๋ž˜์„œ ๋‚˜์˜ค๊ฒŒ ๋œ๊ฒŒ ๋ฐ”๋กœ ๋„๋ฉ”์ธ ์Šค์œ„์ณ!

๋„๋ฉ”์ธ ์Šค์œ„์ฒ˜ s

  • s ๋Š” 1์ฐจ์› ๋ฒกํ„ฐ
  • ๋ฒ•์„  ๋งต ์ƒ์„ฑ ์‹œ์—๋Š” sn๊ฐ’์„ ์‚ฌ์šฉ
  • ์ƒ‰์ƒ ์ด๋ฏธ์ง€ ์ƒ์„ฑ ์‹œ์—๋Š” sc๊ฐ’์„ ์‚ฌ์šฉ
  • ์ฆ‰, ์–ด๋–ค ๋„๋ฉ”์ธ(๋ฒ•์„ /์ƒ‰์ƒ)์„ ์ƒ์„ฑํ• ์ง€ ๊ตฌ๋ถ„ํ•˜๋Š” ๋ฒกํ„ฐ ์—ญํ• 

๋„๋ฉ”์ธ ์Šค์œ„์ฒ˜๋Š” ์–ด๋–ป๊ฒŒ ๋™์ž‘ํ• ๊นŒ?

  • ์Šค์œ„์ฒ˜๋Š” ์œ„์น˜ ์ธ์ฝ”๋”ฉ(Positional Encoding)์œผ๋กœ ๋ณ€ํ™˜
    • ๋„๋ฉ”์ธ ์Šค์œ„์ฒ˜๋Š” ๋ฒ•์„  ๋งต/์ƒ‰์ƒ ์ด๋ฏธ์ง€ ์ค‘ ์–ด๋–ค ๋„๋ฉ”์ธ์„ ์ƒ์„ฑํ• ์ง€๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ์ •๋ณด
    • ์œ„์น˜ ์ธ์ฝ”๋”ฉ์„ ํ†ตํ•ด์„œ s ๋ฅผ ๊ณ ์ฐจ์› ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜
    • ๋ชจ๋ธ์ด ์ด๋ฅผ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋„๋ก ์ ์ ˆํ•œ ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜
  • ๋ณ€ํ™˜๋œ ์Šค์œ„์ฒ˜์˜ ๊ณ ์ฐจ์› ๋ฒกํ„ฐ๋ฅผ ์‹œ๊ฐ„ ์ž„๋ฒ ๋”ฉ(Time Embedding)๊ณผ ๊ฒฐํ•ฉํ•˜์—ฌ UNet์— ์ž…๋ ฅ
    • ์‹œ๊ฐ„(t๋‹จ๊ณ„) ์ž„๋ฒ ๋”ฉ์€ ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ๋‹จ๊ณ„๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ์ •๋ณด 
    • ๊ฒฐํ•ฉ๋œ ๋ฒกํ„ฐ๋Š” Diffusion ๋ชจ๋ธ์˜ UNet์— ์ž…๋ ฅ์œผ๋กœ ์ œ๊ณต๋จ
      • Diffusion ๋ชจ๋ธ์ด ํ˜„์žฌ์˜ ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ๋‹จ๊ณ„(t)๋ฅผ ์•Œ ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ธฐ ์œ„ํ•ด์„œ
    • ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ๊ธฐ์กด Stable Diffusion ๋ชจ๋ธ์˜ ๊ตฌ์กฐ๋ฅผ ๊ทธ๋Œ€๋กœ ์œ ์ง€ํ•˜๋ฉด์„œ๋„ ๋‘๊ฐœ์˜ ๋„๋ฉ”์ธ์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Œ
  • ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ์‚ฌ์ „ ํ•™์Šต๋œ Stable Diffusion ๋ชจ๋ธ์„ ๊ทธ๋Œ€๋กœ ํ™œ์šฉํ•˜๋ฉด์„œ๋„ ๋‹ค์ค‘ ๋„๋ฉ”์ธ ์ ์šฉ์ด ๊ฐ€๋Šฅ

 

 

Cross-domain Attention

ํ•˜์ง€๋งŒ, ๋„๋ฉ”์ธ ์Šค์œ„์ฒ˜๋งŒ์œผ๋กœ๋Š” ๋ฒ•์„  ๋งต๊ณผ ์ƒ‰์ƒ ์ด๋ฏธ์ง€ ๊ฐ„์˜ ๊ธฐํ•˜ํ•™์  ์ผ๊ด€์„ฑ์ด ์™„๋ฒฝํžˆ ๋ณด์žฅ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
โ†’ ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ํฌ๋กœ์Šค ๋„๋ฉ”์ธ ์–ดํ…์…˜(Cross-Domain Attention) ์„ ๋„์ž…

  • ๊ธฐ์กด Self-Attention์„ ํ™•์žฅํ•˜์—ฌ ๋„๋ฉ”์ธ ๊ฐ„ ์ •๋ณด ๊ตํ™˜์ด ๊ฐ€๋Šฅํ•˜๋„๋ก ์„ค๊ณ„
  • ํฌ๋กœ์Šค ๋„๋ฉ”์ธ ์–ดํ…์…˜ ์ธต์€ UNet์˜ ๊ฐ Transformer Block ๋‚ด์—์„œ ํฌ๋กœ์Šค ์–ดํ…์…˜ ์ „์— ์ถ”๊ฐ€๋จ
  • ๋ฒ•์„  ๋งต๊ณผ ์ƒ‰์ƒ ์ด๋ฏธ์ง€์˜ Key-Value๋ฅผ ํ•จ๊ป˜ ํ•™์Šตํ•˜์—ฌ ๋‘ ๋„๋ฉ”์ธ์ด ์„œ๋กœ ์ผ๊ด€๋œ ์ •๋ณด๋ฅผ ๊ณต์œ ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•จ

์ด ๋ฐฉ์‹ ๋•๋ถ„์—, ์ƒ‰์ƒ ์ด๋ฏธ์ง€์™€ ๋ฒ•์„  ๋งต์ด ๊ธฐํ•˜ํ•™์ ์œผ๋กœ ์ •๋ ฌ๋œ ์ƒํƒœ๋กœ ์ƒ์„ฑ๋จ
โœ… 3D ๋ณต์› ์‹œ ์™œ๊ณก์ด ์ค„์–ด๋“ค๊ณ , ํ’ˆ์งˆ์ด ํ–ฅ์ƒ๋จ

 

 

 

 

Textured Mesh Extraction

์ด ํŒŒํŠธ์—์„œ๋Š” Wonder3D ๊ฐ€ 2D๋กœ ์ƒ์„ฑ๋œ ๋ฒ•์„  ๋งต๊ณผ ์ƒ‰์ƒ ์ด๋ฏธ์ง€๋ฅผ ํ™œ์šฉํ•˜์—ฌ 3D ๊ธฐํ•˜ํ•™์  ํ˜•์ƒ์„ ๋ณต์›ํ•˜๋Š” ๊ณผ์ •์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

Wonder3D๋Š” Signed Distance Field (SDF) ๋ฅผ ์ตœ์ ํ™”ํ•˜์—ฌ 3D ํ˜•์ƒ์„ ๋งŒ๋“ค๋ฉฐ, ๊ธฐ์กด SDF ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•๋“ค์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด Geometry-aware Optimization(๊ธฐํ•˜ํ•™์  ์ตœ์ ํ™” ๊ธฐ๋ฒ•) ์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

SDF๋ž€?

    • SDF๋Š” 3D ๊ณต๊ฐ„์˜ ๊ฐ ์ ์ด ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ํ‘œ๋ฉด๊นŒ์ง€์˜ ๋ถ€ํ˜ธ๊ฐ€ ์žˆ๋Š” ๊ฑฐ๋ฆฌ(Signed Distance)๋ฅผ ๊ฐ€์ง€๋Š” ํ•จ์ˆ˜
    • ์–‘์ˆ˜์ด๋ฉด ํ‘œ๋ฉด ๋ฐ”๊นฅ, 0์ด๋ฉด ํ‘œ๋ฉด ์œ„, ์Œ์ˆ˜์ด๋ฉด ํ‘œ๋ฉด ๋‚ด๋ถ€
    • SDF ๊ธฐ๋ฐ˜ ํ‘œํ˜„์€ ๋ฉ”์‰ฌ๋ณด๋‹ค ๋” ๋ถ€๋“œ๋Ÿฝ๊ณ  ๋ฏธ๋ถ„ ๊ฐ€๋Šฅ(Differentiable)ํ•˜์—ฌ ์ตœ์ ํ™”๊ฐ€ ์•ˆ์ •์ ์ž„

๊ธฐ์กด SDF ๊ธฐ๋ฐ˜ 3D ๋ณต์› ๊ธฐ๋ฒ•์˜ ๋ฌธ์ œ์ 

  • ๊ธฐ์กด SDF ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ•(์˜ˆ: NeuS)์€ ์‹ค์ œ ์ดฌ์˜๋œ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•˜๋Š” ๋ฐฉ์‹์ด์–ด์„œ ์ •ํ™•ํ•œ ๋‹ค์ค‘ ์‹œ์  ๋ฐ์ดํ„ฐ๊ฐ€ ํ•„์š”
  • ๊ทธ๋Ÿฌ๋‚˜, Wonder3D๋Š” ์ƒ์„ฑ๋œ(normal generated) ๋ฐ์ดํ„ฐ์ด๋ฏ€๋กœ ์™„๋ฒฝํ•˜๊ฒŒ ์ •๋ฐ€ํ•˜์ง€ ์•Š์Œ
  • ์ƒ์„ฑ๋œ ๋ฐ์ดํ„ฐ๋Š” ์‹œ์ ์ด ๋น„๊ต์  ์ ๊ณ , ํ”ฝ์…€ ์ˆ˜์ค€์—์„œ ์ž‘์€ ์˜ค์ฐจ๊ฐ€ ์กด์žฌํ•  ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ์Œ
  • ์ด๋กœ ์ธํ•ด ๊ธฐ์กด ๋ฐฉ๋ฒ•์„ ๊ทธ๋Œ€๋กœ ์ ์šฉํ•˜๋ฉด ๊ธฐํ•˜ํ•™์  ์˜ค๋ฅ˜(์™œ๊ณก, ๋…ธ์ด์ฆˆ, ๋ถˆ์™„์ „ํ•œ ํ˜•์ƒ)๊ฐ€ ๋ˆ„์ ๋จ

 

Geometry-aware Optimization

Optimization Objectives.

 

  • ๊ฐ์ฒด ๋งˆ์Šคํฌ(Mask) ์ถ”์ถœ
    • 2D ๋ฒ•์„  ๋งต & ์ƒ‰์ƒ ์ด๋ฏธ์ง€์—์„œ ๊ฐ์ฒด์˜ ์˜์—ญ(๋งˆ์Šคํฌ) M0:N์„ ๋ถ„๋ฆฌ
    • ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜(Segmentation) ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐฐ๊ฒฝ๊ณผ ๊ฐ์ฒด๋ฅผ ๋ถ„๋ฆฌํ•จ
  • 3D ๊ณต๊ฐ„์—์„œ ์ƒ˜ํ”Œ๋ง๋œ ํ”ฝ์…€ ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ ์ƒ์„ฑ
    • ๋ชจ๋“  ๋ทฐ(Views)์—์„œ ๋žœ๋ค ํ”ฝ์…€๋“ค์„ ์ƒ˜ํ”Œ๋งํ•˜์—ฌ ํ•ด๋‹น ํ”ฝ์…€์˜ ์ •๋ณด(๋ฒ•์„ , ์ƒ‰์ƒ, ๋งˆ์Šคํฌ, ๊ด‘์„  ๋ฐฉํ–ฅ)๋ฅผ ๊ฐ€์ ธ์˜ด

 

๊ฐ ์†์‹ค ํ•ญ๋ชฉ ์„ค๋ช…

์†์‹ค ํ•จ์ˆ˜ ์„ค๋ช…
Lnormal ๋ฒ•์„  ๋งต๊ณผ SDF ๋ฒ•์„  ๊ฐ’ ์‚ฌ์ด์˜ ์ฐจ์ด๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ์†์‹ค
Lrgb ์ƒ์„ฑ๋œ ์ƒ‰์ƒ ๊ฐ’๊ณผ ๋ณต์›๋œ ์ƒ‰์ƒ ๊ฐ’์˜ MSE ์†์‹ค
Lmask ๋งˆ์Šคํฌ ์˜ˆ์ธก ์˜ค๋ฅ˜๋ฅผ ์ค„์ด๋Š” Binary Cross-Entropy ์†์‹ค
Reik Eikonal ์ •๊ทœํ™”: SDF์˜ ๊ทธ๋ผ๋””์–ธํŠธ ํฌ๊ธฐ๋ฅผ 1๋กœ ์œ ์ง€ํ•˜์—ฌ ์˜ฌ๋ฐ”๋ฅธ SDF๋ฅผ ์ƒ์„ฑ
Rsparse SDF ํฌ์†Œ์„ฑ ์ •๊ทœํ™”: SDF๊ฐ€ ๋ถˆํ•„์š”ํ•œ ํ”Œ๋กœํŒ… ์•„ํ‹ฐํŒฉํŠธ(๊ณต์ค‘์— ๋–  ์žˆ๋Š” ๋…ธ์ด์ฆˆ)๋ฅผ ๋ฐฉ์ง€
Rsmooth SDF ๋ถ€๋“œ๋Ÿฌ์›€ ์ •๊ทœํ™”: 3D ๊ณต๊ฐ„์—์„œ SDF ๋ณ€ํ™”๊ฐ€ ๋„ˆ๋ฌด ๊ธ‰๊ฒฉํ•˜์ง€ ์•Š๋„๋ก ์กฐ์ •

 

Geometry-aware Nomal Loss.

Wonder3D๋Š” ๋ฒ•์„  ๋งต์„ ๊ธฐ๋ฐ˜์œผ๋กœ 3D ํ˜•์ƒ์„ ๋ณต์›ํ•  ๋•Œ, ๋ถ€์ •ํ™•ํ•œ ๋ฒ•์„  ์ •๋ณด๋ฅผ ๋ณด์ •ํ•˜๋Š” ๊ธฐ๋ฒ•์„ ์ ์šฉ

  • : SDF ๋ฒ•์„  gk^๊ณผ ์ƒ์„ฑ๋œ ๋ฒ•์„  gk ์‚ฌ์ด์˜ ์˜ค์ฐจ (Cosine Similarity ์‚ฌ์šฉ)
  • wk(๊ฐ€์ค‘์น˜): ๋ทฐ ๋ฐฉํ–ฅ( vk )๊ณผ ๋ฒ•์„  ๋ฒกํ„ฐ gk ์˜ ๊ฐ๋„๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‹ค๋ฅด๊ฒŒ ์ ์šฉ
    • ์™œ ๊ฐ€์ค‘์น˜๋ฅผ ์ ์šฉํ• ๊นŒ?
      • ๋™์ผํ•œ 3D ์ ์€ ์—ฌ๋Ÿฌ ๋ทฐ(View)์—์„œ ๊ด€์ฐฐ๋  ์ˆ˜ ์žˆ์Œ
      • ํ•˜์ง€๋งŒ ๋ชจ๋“  ๋ทฐ์—์„œ ์ƒ์„ฑ๋œ ๋ฒ•์„  ๋งต์ด ์™„๋ฒฝํžˆ ์ผ์น˜ํ•˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Œ
      • ๋”ฐ๋ผ์„œ, ๋” ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ๋ฒ•์„  ์ •๋ณด์— ๋” ๋†’์€ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•˜์—ฌ ๊ธฐํ•˜ํ•™์  ์ •ํ™•๋„๋ฅผ ๋†’์ž„

 

 

  • : k๋ฒˆ์งธ ์ƒ˜ํ”Œ์˜ ๋ทฐ ๋ฐฉํ–ฅ(Viewing Direction)
  • gk : k๋ฒˆ์งธ ์ƒ˜ํ”Œ์˜ ๋ฒ•์„  ๋ฒกํ„ฐ(Normal Vector)
  • cosโก(vk,gk) : ๋ทฐ ๋ฐฉํ–ฅ๊ณผ ๋ฒ•์„  ๋ฒกํ„ฐ ์‚ฌ์ด์˜ ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„(๊ฐ๋„ ๊ด€๊ณ„)
  • ฯต: ์ž„๊ณ„๊ฐ’(Threshold), ์ž‘์€ ์Œ์ˆ˜ ๊ฐ’(์˜ˆ: -0.1)
  • wk: ํ•ด๋‹น ํ”ฝ์…€์˜ ์‹ ๋ขฐ๋„ ๊ฐ€์ค‘์น˜(Geometry-aware Weight)

 

๐Ÿ”ผ ์œ„์˜ ์‹ ๊ฒฝ์šฐ

  • ๊ฐ€์ค‘์น˜ = 0
  • ์ฆ‰, ๋ฒ•์„  ๋ฒกํ„ฐ์™€ ๋ทฐ ๋ฐฉํ–ฅ์ด ๋„ˆ๋ฌด ๊ฐ€๊นŒ์šด ๊ฒฝ์šฐ, ํ•ด๋‹น ๋ฐ์ดํ„ฐ๋ฅผ ์‹ ๋ขฐํ•˜์ง€ ์•Š์Œ
  • ๋ฒ•์„  ๋ฒกํ„ฐ๋Š” ํ‘œํ˜„์ด ๋ฐ”๋ผ๋ณด๋Š” ๋ฐฉํ–ฅ (outward-facing)์ด๊ณ , ๋ทฐ ๋ฐฉํ–ฅ์€ ์นด๋ฉ”๋ผ ๋ฐฉํ–ฅ(inward-facing)
    • ์ด ๋‘˜์˜ ๋‚ด์ (์ฝ”์‚ฌ์ธ ๊ฐ’์ด) ํŠน์ • ์ž„๊ณ„๊ฐ’๋ณด๋‹ค ํฌ๋ฉด, ๋ฒ•์„  ๋ฐฉํ–ฅ์ด ์ž˜๋ชป๋˜์—ˆ์„ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์Œ
    • ๋”ฐ๋ผ์„œ ์ด๋Ÿฐ ๋ฐ์ดํ„ฐ๋Š” ๋ฌด์‹œํ•จ

๐Ÿ”ฝ ์•„๋ž˜ ์‹์˜ ๊ฒฝ์šฐ

  • ๋ฒ•์„  ๋ฒกํ„ฐ์™€ ๋ทฐ ๋ฐฉํ–ฅ์ด ์ ์ ˆํ•œ ๊ฒฝ์šฐ, ํ•ด๋‹น ๋ฐ์ดํ„ฐ๋ฅผ ๋” ์‹ ๋ขฐํ•จ
  • ์ฝ”์‚ฌ์ธ ๊ฐ’์ด ์ž‘์„์ˆ˜๋ก(์ฆ‰, ๋ฒ•์„  ๋ฒกํ„ฐ์™€ ๋ทฐ ๋ฐฉํ–ฅ์ด ๋” ์ˆ˜์ง์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก) ๊ฐ€์ค‘์น˜๊ฐ€ ์ปค์นจ
  • exp ์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€๋“œ๋Ÿฝ๊ฒŒ ์กฐ์ •

 

 

Outlier-dropping Losses.

๋ฒ•์„ , ์ƒ‰์ƒ, ๋งˆ์Šคํฌ ์ •๋ณด๊ฐ€ ์ผ๋ถ€ ๋ถ€์ •ํ™•ํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ์ตœ์ ํ™” ๊ณผ์ •์—์„œ ์˜ค๋ฅ˜๊ฐ€ ํฐ ๋ฐ์ดํ„ฐ๋“ค์„ ์ œ๊ฑฐํ•˜๋Š” ๊ธฐ๋ฒ•์„ ์ ์šฉํ•จ.

  • ๋‹จ์ˆœํžˆ ๋ชจ๋“  ํ”ฝ์…€ ์†์‹ค์„ ํ•ฉ์‚ฐํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ์†์‹ค์ด ํฐ ๋ฐ์ดํ„ฐ(์—๋Ÿฌ๊ฐ€ ์‹ฌํ•œ ํ”ฝ์…€)๋ฅผ ์ผ๋ถ€ ์ œ๊ฑฐํ•˜๋Š” ๋ฐฉ์‹
  • ์˜ˆ๋ฅผ ๋“ค์–ด, ์ƒ‰์ƒ ์†์‹ค์„ ๊ณ„์‚ฐํ•  ๋•Œ, ์—๋Ÿฌ๊ฐ€ ํฐ ์ƒ์œ„ %๋ฅผ ๋ฒ„๋ฆฌ๋Š” ์ „๋žต ์‚ฌ์šฉ
  • ์˜ค๋ฅ˜๊ฐ€ ์‹ฌํ•œ ๋ฐ์ดํ„ฐ๋Š” ์‹ ๋ขฐ๋„๊ฐ€ ๋‚ฎ๊ธฐ ๋•Œ๋ฌธ์—, ์ด๋ฅผ ๋ฐฐ์ œํ•˜๋ฉด ๋” ์ •ํ™•ํ•œ 3D ๋ณต์›์ด ๊ฐ€๋Šฅ

โœ… ์ด ๋ฐฉ์‹ ๋•๋ถ„์— ์ตœ์ ํ™” ๊ณผ์ •์—์„œ ์™œ๊ณก๋œ ํ˜•์ƒ์ด๋‚˜ ๋…ธ์ด์ฆˆ(๊ตฌ๋ฉ, ์ด์ƒํ•œ ํ˜•์ƒ)๊ฐ€ ์ค„์–ด๋“ ๋‹ค!