๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๐Ÿ˜ŽAI/3D Reconstruction

[Supplementary Review] PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

by SolaKim 2024. 11. 4.

์ด๋ฒˆ์—๋Š” " PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation " ๋…ผ๋ฌธ์˜ Supplementary ๋ฆฌ๋ทฐ๋กœ ๋Œ์•„์™”์Šต๋‹ˆ๋‹ค!

 

Supplementary๋Š” ์ด B ~ H ๊นŒ์ง€ ์ด 7๊ฐœ์˜ ํŒŒํŠธ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

์—ฌ๊ธฐ์„œ ์ €๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ชฉ์ฐจ๋กœ ๋ฆฌ๋ทฐํ•ด๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

 

[๋ชฉ์ฐจ]

C : neural network architectures ์™€ training hyper parameters ์— ๋Œ€ํ•ด ๋””ํ…Œ์ผํ•˜๊ฒŒ ์•Œ์•„๋ณด๋Š” ์„น์…˜

D : Detection pipeline ์— ๊ด€ํ•ด ๋””ํ…Œ์ผํ•˜๊ฒŒ ์•Œ์•„๋ณด๋Š” ์„น์…˜

E : PointNet์˜ ์—ฌ๋Ÿฌ ์‘์šฉ ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๋Š” ์„น์…˜

F : ๋”์šฑ ๋””ํ…Œ์ผํ•˜๊ฒŒ PointNet ๊ตฌ์กฐ๋ฅผ ๋ถ„์„ํ•ด๋ณด๋Š” ์„น์…˜

 

 

Section C : Network Architecture and Training Details

 

 

PointNet Classification Network

PointNet์˜ ์ž…๋ ฅํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ๋ฅผ ์ •๊ทœํ™”ํ•˜์—ฌ ๋ชจ๋ธ์˜ ๋ณ€ํ™˜ ๋ถˆ๋ณ€์„ฑ(invariance) ๋ฅผ ๊ฐ•ํ™”ํ•˜๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค. PointNet์—๋Š” ๋‘ ๊ฐœ์˜ Transformation Network ๊ฐ€ ์กด์žฌํ•˜๋ฉฐ, ๊ฐ๊ฐ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ชฉ์ ๊ณผ ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค.

 

1. Transformation Network: ๊ธฐ๋ณธ ๊ตฌ์„ฑ๊ณผ ์—ญํ• 

[์ฒซ ๋ฒˆ์งธ transformation network]

- ์ž…๋ ฅ์œผ๋กœ raw point cloud ๋ฅผ ๋ฐ›์•„ 3 x 3 ํ–‰๋ ฌ์„ ์˜ˆ์ธก(regress) ํ•ฉ๋‹ˆ๋‹ค.

- ์ด ๋„คํŠธ์›Œํฌ๋Š” ๊ฐ ํฌ์ธํŠธ์— MLP (64, 128, 1024) ๊ตฌ์กฐ๋ฅผ ๊ณต์œ ํ•˜์—ฌ ์ ์šฉํ•˜๊ณ , ๊ฐ ์ธต์˜ ์ถœ๋ ฅ ํฌ๊ธฐ๋‚˜๋Š ๊ฐ๊ฐ 64, 128, 1024 ์ž…๋‹ˆ๋‹ค.

- Max pooling ์„ ์‚ฌ์šฉํ•ด์„œ ํฌ์ธํŠธ ๊ฐ„์— ์ •๋ณด๋ฅผ ํ†ตํ•ฉํ•˜๊ณ , ์ด์–ด fully connected layer 2 ๊ฐœ๋ฅผ ํ†ต๊ณผํ•˜์—ฌ ์ตœ์ข… 3 x 3 ํ–‰๋ ฌ์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

- ๊ฐ fully connected layer ์˜ ์ถœ๋ ฅ ํฌ๊ธฐ๋Š” 512, 256 ์ด๋ฉฐ, ์ถœ๋ ฅ ํ–‰๋ ฌ์€ ํ•ญ๋“ฑํ–‰๋ ฌ(identity matrix)๋กœ ์ดˆ๊ธฐํ™”๋ฉ๋‹ˆ๋‹ค.

- ๋ชจ๋“  ์ธต์—๋Š” ๋งˆ์ง€๋ง‰์„ ์ œ์™ธํ•˜๊ณ  ReLU ํ™œ์„ฑํ™” ํ•จ์ˆ˜์™€ ๋ฐฐ์น˜ ์ •๊ทœํ™”(batch normalization)์ด ์ ์šฉ๋ฉ๋‹ˆ๋‹ค.

[๋‘ ๋ฒˆ์งธ transformation network]

- ์ฒซ ๋ฒˆ์งธ ๋„คํŠธ์›Œํฌ์™€ ๋™์ผํ•œ ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง€์ง€๋งŒ, 64 x 64 ํ–‰๋ ฌ์„ ์ถœ๋ ฅํ•˜๋„๋ก ์„ค๊ณ„ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

- ์ด ํ–‰๋ ฌ ๋˜ํ•œ ํ•ญ๋“ฑํ–‰๋ ฌ๋กœ ์ดˆ๊ธฐํ™”๋˜๋ฉฐ, ํ–‰๋ ฌ์„ ์ง๊ต(orthogonal)์— ๊ฐ€๊น๊ฒŒ ๋งŒ๋“œ๋Š” ์ •๊ทœํ™” ์†์‹ค(regularization loss) ์ด softmax ๋ถ„๋ฅ˜ ์†์‹ค์— ์ถ”๊ฐ€๋ฉ๋‹ˆ๋‹ค. ์ •๊ทœํ™” ์†์‹ค ๊ฐ€์ค‘์น˜๋Š” 0.001๋กœ ์„ค์ •๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

 

2. ํ•™์Šต ์„ค์ • (parameter)

- drop out : ๋งˆ์ง€๋ง‰ fully connected layer ์— ๋“œ๋กญ์•„์›ƒ์ด ์ ์šฉ๋˜๋ฉฐ, keep ratio๋Š” 0.7 ๋กœ ์„ค์ •. ์ด๋Š” ๋„คํŠธ์›Œํฌ๊ฐ€ ๊ณผ์ ํ•ฉ ๋˜๋Š”๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•จ์ž…๋‹ˆ๋‹ค.

- decay rate (๋ฐฐ์น˜ ์ •๊ทœํ™”์˜ ๊ฐ์‡ ์œจ) : init = 0.5, max=0.99

- Adam optimizer : init = 0.001, momentum = 0.9, batch_size = 32๋กœ 20 ์—ํฌํฌ๋งˆ๋‹ค ํ•™์Šต๋ฅ ์„ ์ ˆ๋ฐ˜์œผ๋กœ ์ค„์ž…๋‹ˆ๋‹ค. 

- ํ•™์Šต ์‹œ๊ฐ„ : ModelNet ๋ฐ์ดํ„ฐ์…‹์—์„œ ํ•™์Šตํ•  ๋•Œ, TensorFlow์™€ GTX1080 GPU ํ™˜๊ฒฝ์—์„œ 3-6์‹œ๊ฐ„ ์ •๋„ ์†Œ์š”๋ฉ๋‹ˆ๋‹ค.

 


 

PointNet segmentation Network

 

1. PointNet ํŒŒํŠธ ์„ธ๋ถ„ํ™” ๋„คํŠธ์›Œํฌ์˜ ๊ธฐ๋ณธ ๊ตฌ์กฐ

- ์ž…๋ ฅ ํฌ์ธํŠธ ๋ณ€ํ™˜(T1) ๋ฐ ํŠน์ง• ๋ณ€ํ™˜(T2) : T1๊ณผ T2๋Š” ์ž…๋ ฅ ํฌ์ธํŠธ์™€ ํŠน์ง•์˜ ์ •๋ ฌ/๋ณ€ํ™˜์„ ๋‹ด๋‹นํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ํฌ์ธํŠธ์™€ ํŠน์ง•์— ๋Œ€ํ•ด 3 x 3 ๋ฐ 64 x 64 ํ–‰๋ ฌ๋กœ ๋ณ€ํ™˜์„ ์ ์šฉํ•˜์—ฌ ๋ชจ๋ธ์ด ์ž…๋ ฅ์˜ ๋‹ค์–‘ํ•œ ๋ณ€ํ™˜์— ๋ถˆ๋ณ€์„ฑ์„ ๊ฐ–๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. 

- ์™„์ „ ์—ฐ๊ฒฐ์ธต(Fully Connected, FC) : ๊ฐ ํฌ์ธํŠธ์— ์™„์ „ ์—ฐ๊ฒฐ ์ธต์ด ์ ์šฉ๋˜๋ฉฐ, ์ธต์˜ ์ถœ๋ ฅ ํฌ๊ธฐ๋Š” ์ˆœ์„œ๋Œ€๋กœ n x 64, n x 128, n x 128, n x 512, n x 2048 ์ž…๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰ FC ์ธต์—์„œ๋Š” max pooling ์„ ํ†ตํ•ด ์ „์—ญ ํŠน์ง•(global feature)์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.

- One-hot ๋ฒกํ„ฐ ์ถ”๊ฐ€ : ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ์˜ ๋ถ„๋ฅ˜ ๋ ˆ์ด๋ธ”์„ ๋‚˜ํƒ€๋‚ด๋Š” ํฌ๊ธฐ 16์˜ one-hot ๋ฒกํ„ฐ๊ฐ€ ๋„คํŠธ์›Œํฌ์— ์ถ”๊ฐ€๋ฉ๋‹ˆ๋‹ค. ์ด ๋ฒกํ„ฐ๋Š” ์ „์—ญ ํŠน์ง•์— ๊ฒฐํ•ฉ๋˜์–ด ์ž…๋ ฅ์˜ ์ข…๋ฅ˜(์˜ˆ: ์˜์ž, ํ…Œ์ด๋ธ” ๋“ฑ)๋ฅผ ๋ฐ˜์˜ํ•ฉ๋‹ˆ๋‹ค.

 

2. ์ง€์—ญ ๋ฐ ์ „์—ญ ํŠน์ง• ๊ฒฐํ•ฉ

PointNet์˜ ํŒŒํŠธ segmentation ๋Š” ์ง€์—ญ(local) ํฌ์ธํŠธ ํŠน์ง•๊ณผ ์ „์—ญ(global) ํŠน์ง•์„ ๊ฒฐํ•ฉํ•˜์—ฌ ๊ฐ ํฌ์ธํŠธ์˜ ์˜ˆ์ธก ์„ฑ๋Šฅ์„ ๋†’์ž…๋‹ˆ๋‹ค.

- Max pooling ๊ฒฐ๊ณผ์™€ ์ง€์—ญ ํŠน์ง•์˜ ๊ฒฐํ•ฉ: ๋‘ ๋ฒˆ์งธ ๋ณ€ํ™˜ ๋„คํŠธ์›Œํฌ(T2) ์ดํ›„์˜ ์ง€์—ญ ํŠน์ง•๊ณผ max pooling ์„ ํ†ตํ•ด ์ถ”์ถœ๋œ ์ „์—ญ ํŠน์ง•์„ ๊ฐ ํฌ์ธํŠธ์— ๋Œ€ํ•ด ๊ฒฐํ•ฉํ•˜์—ฌ, segmentation ์ž‘์—…์— ์ ํ•ฉํ•œ ํ’๋ถ€ํ•œ ํŠน์ง• ํ‘œํ˜„์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

- ์Šคํ‚ต ์—ฐ๊ฒฐ(skip links): ์„œ๋กœ ๋‹ค๋ฅธ ์ธต์—์„œ ์ถ”์ถœํ•œ ์ง€์—ญ ํŠน์ง•์„ ๊ฒฐํ•ฉํ•˜๋Š” ์Šคํ‚ต ์—ฐ๊ฒฐ์„ ์ถ”๊ฐ€ํ•˜์—ฌ, ์„ธ๋ถ„ํ™” ๋„คํŠธ์›Œํฌ๋กœ ๋“ค์–ด๊ฐ€๋Š” ํฌ์ธํŠธ ํŠน์ง• ์ž…๋ ฅ์ด ๋”์šฑ ํ’๋ถ€ํ•ด์ง€๋„๋ก ํ–ˆ์Šต๋‹ˆ๋‹ค.

 

3. MLP ๋ฐ ์ถœ๋ ฅ

segmentation network ์—์„œ๋Š” MLP (256, 256, 128)์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ํฌ์ธํŠธ์— ๋Œ€ํ•ด segmentation ์ ์ˆ˜๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. ์ตœ์ข… ์ถœ๋ ฅ์€ n x 50 ํฌ๊ธฐ์˜ ํŒŒํŠธ ์ ์ˆ˜๋กœ, ๊ฐ ํฌ์ธํŠธ๊ฐ€ ํŠน์ • ํŒŒํŠธ์— ์†ํ•  ํ™•๋ฅ ์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.


4. ํ•™์Šต ์„ค์ •

- ๋“œ๋กญ์•„์›ƒ ์—†์Œ

- ํ•™์Šต ํŒŒ๋ผ๋ฏธํ„ฐ : classification network ์™€ ๋™์ผํ•œ ํ•™์Šต ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ, Adam ์˜ตํ‹ฐ๋งˆ์ด์ €์™€ ํ•™์Šต๋ฅ  ๊ฐ์†Œ ๊ธฐ๋ฒ•์ด ์ ์šฉ๋œ๋‹ค. 

 


 

Baseline 3D CNN Segmentation Network

 

ShapeNet ์˜ part segmentation ์‹คํ—˜์—์„œ๋Š” PointNet ์˜ ์„ธ๋ถ„ํ™” ๋„คํŠธ์›Œํฌ์™€ ๊ธฐ์กด์˜ ์ „ํ†ต์  ์„ธ๋ถ„ํ™”๋ฐฉ๋ฒ• ๋‘ ๊ฐ€์ง€, ๊ทธ๋ฆฌ๊ณ  3D ๋ณผ๋ฅ˜๋ฉ”ํŠธ๋ฆญ CNN ๋„คํŠธ์›Œํฌ๋ฅผ ๋น„๊ตํ–ˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” 3D CNN ์„ ํ™œ์šฉํ•œ ๊ธฐ๋ณธ ์„ธ๋ถ„ํ™” ๋„คํŠธ์›Œํฌ(baseline 3D CNN segmentation network)๋ฅผ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

 

1. 3D ๋ณผ๋ฅ˜๋ฉ”ํŠธ๋ฆญ CNN ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ

3D CNN ๋„คํŠธ์›Œํฌ๋Š” VoxNet ์ด๋‚˜ 3DShapeNets ์™€ ๊ฐ™์€ ์ž˜ ์•Œ๋ ค์ง„ 3D CNN ์•„ํ‚คํ…์ฒ˜๋ฅผ ํ™•์žฅํ•˜์—ฌ, fully convolutional ๋ฐฉ์‹์œผ๋กœ ์„ธ๋ถ„ํ™”๋ฅผ ์ˆ˜ํ–‰ํ•˜๋„๋ก ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค. 

 

2. ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ : Volumetric Representation

- Point Cloud ๋ณ€ํ™˜ : ์ž…๋ ฅ ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ๋ฅผ 32 x 32 x 32 ํ•ด์ƒ๋„์˜ ์ ์œ  ๊ทธ๋ฆฌ๋“œ(occupancy grid)๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ 3D ๊ณต๊ฐ„์„ voxel ๋กœ ๋‚˜๋ˆ•๋‹ˆ๋‹ค. 

- ๊ฐ voxel ์€ ํ•ด๋‹น ์œ„์น˜์— ์ ์ด ํฌํ•จ๋˜๋Š”์ง€ ์—ฌ๋ถ€๋ฅผ ๋‚˜ํƒ€๋‚ด๋ฉฐ, ๋„คํŠธ์›Œํฌ๋Š” ์ด ๋ณผ๋ฅ˜๋ฉ”ํŠธ๋ฆญ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ segmentation์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

 

3. ๋„คํŠธ์›Œํฌ ๊ตฌ์„ฑ

- 3D Convolution Layer : ์ดˆ๊ธฐ ํŠน์ง• ์ถ”์ถœ์„ ์œ„ํ•ด stride ๊ฐ€ 1์ธ 32 ์ฑ„๋„ 3D ์ปจ๋ณผ๋ฃจ์…˜ ์—ฐ์‚ฐ์„ ๋‹ค์„ฏ ๋ฒˆ ๋ฐ˜๋ณตํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ ๊ฐ voxel ์˜ receptive field (์ˆ˜์šฉ ์˜์—ญ)๋Š” 19 ์ž…๋‹ˆ๋‹ค. 

- 1 x 1 x 1 3D Convolution Layer : ์ดํ›„์—๋Š” 1 x 1 x 1 ์ปค๋„ ํฌ๊ธฐ๋ฅผ ๊ฐ€์ง„ 3D ์ปจ๋ณผ๋ฃจ์…˜ ๋ ˆ์ด์–ด๋“ค์„ ์ถ”๊ฐ€ํ•˜์—ฌ ์ตœ์ข…์ ์ธ ์„ธ๋ถ„ํ™” ๋ ˆ์ด๋ธ”์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ ˆ์ด์–ด๋“ค์€ ๊ฐ voxel ์— ๋Œ€ํ•œ ์„ธ๋ถ„ํ™” ์ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.

- ํ™œ์„ฑํ™” ํ•จ์ˆ˜์™€ ์ •๊ทœํ™” : ๋ชจ๋“  ๋ ˆ์ด์–ด์— ReLU ํ™œ์„ฑํ™” ํ•จ์ˆ˜์™€ ๋ฐฐ์น˜ ์ •๊ทœํ™”(batch normalization)๋ฅผ ์ ์šฉํ•˜๋ฉฐ, ๋งˆ์ง€๋ง‰ ๋ ˆ์ด์–ด์—๋Š” ์ ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค. 

 

4. ํ•™์Šต ๋ฐ ๋น„๊ต

- ๋ฒ”์ฃผ(categories) ๊ธฐ๋ฐ˜ ํ•™์Šต : ๋„คํŠธ์›Œํฌ๋Š” ์—ฌ๋Ÿฌ ๊ฐ์ฒด ๋ฒ”์ฃผ์— ๊ฑธ์ณ ํ•™์Šต๋˜์ง€๋งŒ, ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•๋“ค๊ณผ ๊ณต์ •ํ•˜๊ฒŒ ๋น„๊ตํ•˜๊ธฐ ์œ„ํ•ด ํŠน์ • ๊ฐ์ฒด ๋ฒ”์ฃผ๋งŒ์„ ๊ณ ๋ คํ•˜์—ฌ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. 

์ฆ‰, ์ด 3D CNN ๊ธฐ๋ฐ˜์˜ ๋ณผ๋ฅ˜๋ฉ”ํŠธ๋ฆญ segmentation network๋Š” voxel์„ ํ™œ์šฉํ•ด 3D ๊ณต๊ฐ„์„ ํ‘œํ˜„ํ•˜๊ณ , 3D ์ปจ๋ณผ๋ฃจ์…˜์„ ํ†ตํ•ด ๊ฐ voxel ์˜ ์„ธ๋ถ„ํ™” ๋ ˆ์ด๋ธ”์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. PointNet๊ณผ ๋‹ฌ๋ฆฌ, ์ž…๋ ฅ์„ volumetric representation์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ์ „์ฒ˜๋ฆฌ ๋‹จ๊ณ„๊ฐ€ ํ•„์š”ํ•˜์ง€๋งŒ, 3D ๋ฐ์ดํ„ฐ์—์„œ ๋ณด๋‹ค ์ง๊ด€์ ์ธ 3์ฐจ์› ํŠน์ง•์„ ์ถ”์ถœํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

 


 

Section D : 3D ๊ฐ์ฒด ํƒ์ง€ ํŒŒ์ดํ”„๋ผ์ธ ์„ธ๋ถ€์‚ฌํ•ญ

 

Detection Pipeline ์€ ๊ฐ์ฒด ํƒ์ง€ ์‹œ์Šคํ…œ์˜ ๋‹จ๊ณ„์ ์ธ ์ ˆ์ฐจ๋ฅผ ์˜๋ฏธํ•˜๋ฉฐ, ํŠน์ • ์žฅ๋ฉด(Scene) ์—์„œ ๊ฐ์ฒด์˜ ์œ„์น˜์™€ ๋ฒ”์ฃผ๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•ด ์„ค๊ณ„๋œ ์ผ๋ จ์˜ ์ž‘์—… ํ๋ฆ„์ž…๋‹ˆ๋‹ค. ๊ฐ ๋‹จ๊ณ„๋Š” ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณ€ํ™˜ํ•˜๊ณ  ๊ฐ์ฒด์˜ ์œ„์น˜๋‚˜ ์ข…๋ฅ˜๋ฅผ ์ถ”๋ก ํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ๊ธฐ๋ฒ•์„ ์ ์šฉํ•˜๋ฉฐ, ๋ณดํ†ต ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋‹จ๊ณ„๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. 

 

1. ๊ฐ์ฒด ์ œ์•ˆ ์ƒ์„ฑ

Connected Component ๋ฐฉ๋ฒ•์œผ๋กœ ๊ฐ™์€ ๋ ˆ์ด๋ธ”์˜ ์ธ์ ‘ํ•œ ์ ๋“ค์„ ๋ฌถ์–ด ๊ฐ์ฒด ์ œ์•ˆ(ํด๋Ÿฌ์Šคํ„ฐ)์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ 200๊ฐœ ์ด์ƒ์˜ ์ ์„ ํฌํ•จํ•  ๊ฒฝ์šฐ, ์ด๋ฅผ ๊ฐ์ฒด ์ œ์•ˆ์œผ๋กœ ๊ฐ„์ฃผํ•˜๊ณ  ๊ฒฝ๊ณ„ ์ƒ์ž๋ฅผ ๊ฐ์ฒด๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

๊ฐ ๊ฐ์ฒด์˜ ์ ์ˆ˜๋Š” ํ•ด๋‹น ๋ฒ”์ฃผ์˜ ์ ์ˆ˜ ํ‰๊ท ์œผ๋กœ ๊ณ„์‚ฐ๋˜๋ฉฐ, ์ž‘์€ ์ œ์•ˆ๋“ค์€ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค.

2. ๋ฐ€์ง‘ ๊ฐ์ฒด ํ•ด๊ฒฐ

์˜์ž ๋“ฑ ๋ฐ€์ง‘๋œ ๊ฐ์ฒด์˜ ๊ฒฝ์šฐ Connected Component๋งŒ์œผ๋กœ๋Š” ๊ตฌ๋ถ„์ด ์–ด๋ ค์›Œ ์Šฌ๋ผ์ด๋”ฉ ์œˆ๋„์šฐ ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•ด ํƒ์ง€ํ•ฉ๋‹ˆ๋‹ค.

์ด์ง„ ๋ถ„๋ฅ˜ ๋„คํŠธ์›Œํฌ๋ฅผ ํ†ตํ•ด ๊ฐ ๊ฐ์ฒด ๋ฒ”์ฃผ๋ฅผ ํ•™์Šตํ•˜๊ณ , ๋น„์ตœ๋Œ€ ์–ต์ œ(NMS)๋กœ ์ค‘๋ณต๋œ ์ƒ์ž๋ฅผ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค.

์ตœ์ข… ํ‰๊ฐ€๋ฅผ ์œ„ํ•ด Connected Component์™€ ์Šฌ๋ผ์ด๋”ฉ ์œˆ๋„์šฐ์˜ ๊ฒฐ๊ณผ๋ฅผ ๊ฒฐํ•ฉํ•ฉ๋‹ˆ๋‹ค.

3. ํ‰๊ฐ€

๊ฐ ๋ชจ๋ธ์„ ๋‹ค์„ฏ ๊ฐœ์˜ ๊ตฌ์—ญ์—์„œ ํ›ˆ๋ จํ•˜๊ณ , ๋‚˜๋จธ์ง€ ํ•œ ๊ตฌ์—ญ์—์„œ ํ…Œ์ŠคํŠธํ•ฉ๋‹ˆ๋‹ค.

๋ชจ๋“  ํ…Œ์ŠคํŠธ ๊ฒฐ๊ณผ๋ฅผ ๋ชจ์•„ ์ •๋ฐ€๋„-์žฌํ˜„์œจ(PR) ๊ณก์„ ์„ ์ƒ์„ฑํ•˜์—ฌ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

4. ํ›„์ฒ˜๋ฆฌ

๋ถˆํ•„์š”ํ•˜๊ฒŒ ์ž‘์€ ์˜์—ญ์ด๋‚˜ ๋ถ€ํ”ผ์˜ ๊ฐ์ฒด๋Š” ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค.

์˜์ž, ํ…Œ์ด๋ธ”, ์†ŒํŒŒ ๋“ฑ์˜ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค๋Š” ๋‹ค๋ฆฌ๊ฐ€ ๋ถ„๋ฆฌ๋œ ๊ฒฝ์šฐ ๋ฐ”๋‹ฅ๊นŒ์ง€ ํ™•์žฅํ•ฉ๋‹ˆ๋‹ค.

 


Section E : More Applications of PointNet

1. Bounding Box Adjustment

Small Area Pruning: ์ž‘์€ ์˜์—ญ์ด๋‚˜ ๋ถ€ํ”ผ๋Š” ํƒ์ง€์—์„œ ์ œ์™ธํ•˜์—ฌ ํšจ์œจ์„ฑ์„ ๋†’์ž…๋‹ˆ๋‹ค.

Bounding Box ํ™•์žฅ: ํ…Œ์ด๋ธ”, ์˜์ž, ์†ŒํŒŒ์™€ ๊ฐ™์ด ๋‹ค๋ฆฌ๊ฐ€ ๋ถ„๋ฆฌ๋œ ๊ฐ์ฒด๋Š” ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค๋ฅผ ๋ฐ”๋‹ฅ๊นŒ์ง€ ํ™•์žฅํ•˜์—ฌ ๋” ์ •ํ™•ํ•˜๊ฒŒ ํฌ์ฐฉํ•ฉ๋‹ˆ๋‹ค.

2. Shape Retrieval (ํ˜•์ƒ ๊ฒ€์ƒ‰)

PointNet์€ ์ž…๋ ฅ๋œ ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ์˜ ๊ธ€๋กœ๋ฒŒ ํ˜•์ƒ ์‹œ๊ทธ๋‹ˆ์ฒ˜๋ฅผ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์„œ๋กœ ์œ ์‚ฌํ•œ ํ˜•ํƒœ๋ฅผ ๊ฐ€์ง„ ๊ฐ์ฒด๋“ค์ด ๋น„์Šทํ•œ ์‹œ๊ทธ๋‹ˆ์ฒ˜๋ฅผ ๊ฐ€์งˆ ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋ฉ๋‹ˆ๋‹ค.

์ด๋ฅผ ํ…Œ์ŠคํŠธํ•˜๊ธฐ ์œ„ํ•ด, ModelNet ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์…‹์—์„œ ์ฟผ๋ฆฌ ํ˜•ํƒœ์˜ ๊ธ€๋กœ๋ฒŒ ์‹œ๊ทธ๋‹ˆ์ฒ˜๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ , ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์…‹์—์„œ ์ตœ๊ทผ์ ‘ ์ด์›ƒ ๊ฒ€์ƒ‰์„ ํ†ตํ•ด ์œ ์‚ฌํ•œ ํ˜•ํƒœ๋ฅผ ๊ฒ€์ƒ‰ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์„ ํ†ตํ•ด ํ˜•์ƒ ๊ฒ€์ƒ‰ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

3. Shape Correspondence (ํ˜•์ƒ ๋Œ€์‘)

PointNet์ด ํ•™์Šตํ•œ ํฌ์ธํŠธ ํŠน์ง•์€ ํ˜•์ƒ ๋Œ€์‘์„ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐ ์œ ์šฉํ•˜๊ฒŒ ์“ฐ์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‘ ๊ฐœ์˜ ์œ ์‚ฌํ•œ ๊ฐ์ฒด(์˜ˆ: ๋‘ ๊ฐœ์˜ ์˜์ž๋‚˜ ํ…Œ์ด๋ธ”)์—์„œ ์ค‘์š”ํ•œ ํฌ์ธํŠธ ์ง‘ํ•ฉ์„ ์ถ”์ถœํ•œ ํ›„, ๊ธ€๋กœ๋ฒŒ ํŠน์ง•์—์„œ ๋™์ผํ•œ ์ฐจ์›์„ ํ™œ์„ฑํ™”ํ•˜๋Š” ์ ๋“ค์„ ๋งค์นญํ•˜์—ฌ ๋‘ ๊ฐ์ฒด ๊ฐ„์˜ ํ˜•์ƒ ๋Œ€์‘ ๊ด€๊ณ„๋ฅผ ํŒŒ์•…ํ•ฉ๋‹ˆ๋‹ค.

 


Section F : More Architecture Analysis

 

1. ๋ณ‘๋ชฉ ์ฐจ์› ๋ฐ ์ž…๋ ฅ ํฌ์ธํŠธ ์ˆ˜์˜ ํšจ๊ณผ

๋ชจ๋ธ ์„ฑ๋Šฅ์€ ์ž…๋ ฅ ํฌ์ธํŠธ ์ˆ˜๊ฐ€ ์ฆ๊ฐ€ํ• ์ˆ˜๋ก ํ–ฅ์ƒ๋˜์ง€๋งŒ, ์•ฝ 1,000ํฌ์ธํŠธ์—์„œ ์„ฑ๋Šฅ์ด ํฌํ™”๋ฉ๋‹ˆ๋‹ค.

์ตœ๋Œ€ ๊ณ„์ธต ํฌ๊ธฐ(๋ณ‘๋ชฉ ์ฐจ์›)๋„ ์ค‘์š”ํ•œ ์š”์†Œ๋กœ, ์ด๋ฅผ 64์—์„œ 1024๋กœ ์ฆ๊ฐ€์‹œํ‚ค๋ฉด 2-4%์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์ด ๋‚˜ํƒ€๋‚˜๋ฉฐ, ๋‹ค์–‘ํ•œ 3D ํ˜•์ƒ์„ ๊ตฌ๋ณ„ํ•˜๊ธฐ ์œ„ํ•ด ์ถฉ๋ถ„ํ•œ ํฌ์ธํŠธ ๊ธฐ๋Šฅ์ด ํ•„์š”ํ•˜๋‹ค๋Š” ๊ฒƒ์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

64๊ฐœ์˜ ํฌ์ธํŠธ๋งŒ ์ž…๋ ฅํ•ด๋„ PointNet์€ ๊ดœ์ฐฎ์€ ์„ฑ๋Šฅ์„ ๋ณด์ž…๋‹ˆ๋‹ค. 

 

2. MNIST ์ˆซ์ž ๋ถ„๋ฅ˜

3D ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ ํ•™์Šต์„ ๋ชฉ์ ์œผ๋กœ ํ•˜์ง€๋งŒ, 2D ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ(MNIST ํ”ฝ์…€ ์ง‘ํ•ฉ)์—๋„ ์ ์šฉํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์„ฑ๋Šฅ์ด ๊ดœ์ฐฎ์Šต๋‹ˆ๋‹ค.

CNN๋ณด๋‹ค ์„ฑ๋Šฅ์€ ๋‚ฎ์ง€๋งŒ, PointNet์ด 2D ์ด๋ฏธ์ง€๋ฅผ ํฌ์ธํŠธ ์„ธํŠธ๋กœ ๊ฐ„์ฃผํ•˜์—ฌ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

 

3. ๋ฒ•์„  ๋ฒกํ„ฐ ์˜ˆ์ธก

PointNet์˜ ์„ธ๋ถ„ํ™” ๋ฒ„์ „์—์„œ๋Š” ๋กœ์ปฌ ๋ฐ ๊ธ€๋กœ๋ฒŒ ํŠน์ง•์„ ๊ฒฐํ•ฉํ•˜์—ฌ ๋กœ์ปฌ ํฌ์ธํŠธ์˜ ๋งฅ๋ฝ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

๋ฒ•์„  ๋ฒกํ„ฐ ์˜ˆ์ธก์„ ํ†ตํ•ด ๋กœ์ปฌ ๊ธฐํ•˜ํ•™์  ํŠน์ง•์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์Œ์„ ์ž…์ฆํ–ˆ์œผ๋ฉฐ, PointNet์˜ ์˜ˆ์ธก์ด ๋ณด๋‹ค ๋ถ€๋“œ๋Ÿฝ๊ณ  ์—ฐ์†์ ์œผ๋กœ ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค.

์œ„ ๊ทธ๋ฆผ 16์€ PointNet์˜ ๋ฒ•์„  ๋ฒกํ„ฐ ๋ณต์› ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์™ผ์ชฝ ์—ด์€ ํฌ์ธํŠธ๋„ท์˜ ์˜ˆ์ธก ๊ฒฐ๊ณผ, ์˜ค๋ฅธ์ชฝ ์—ด์€ ์‹ค์ œ ๋ฉ”์‰ฌ๋กœ๋ถ€ํ„ฐ ๊ณ„์‚ฐ๋œ ๋ฒ•์„  ๋ฒกํ„ฐ(ground-truth)์ž…๋‹ˆ๋‹ค.

 

4. ์„ธ๋ถ„ํ™” ๊ฐ•๊ฑด์„ฑ

PointNet์€ ๋ฐ์ดํ„ฐ ์†์ƒ ๋ฐ ํฌ์ธํŠธ ๋ˆ„๋ฝ์— ๋Œ€ํ•ด ๊ฐ•๊ฑดํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ „์ฒด ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ์—์„œ ์ค‘์š”ํ•œ ํฌ์ธํŠธ๋งŒ์„ ํ†ตํ•ด ์ „์—ญ ํ˜•์ƒ ํŠน์ง•์„ ์ถ”์ถœํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

์„ธ๋ถ„ํ™” ์ž‘์—…์—์„œ๋„ ์ด๋Ÿฌํ•œ ๊ฐ•๊ฑด์„ฑ์ด ์œ ์ง€๋˜๋ฉฐ, ์ž…๋ ฅ ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ์— ๋Œ€ํ•ด ์ผ๊ด€๋œ ์„ธ๋ถ„ํ™” ๊ฒฐ๊ณผ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

• ์œ„ ๊ทธ๋ฆผ 17์€ PointNet์˜ ์„ธ๋ถ„ํ™”(segmentation) ์ผ๊ด€์„ฑ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

      • ์™ผ์ชฝ: ์ž…๋ ฅ๋œ ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ(3D ์ ๋“ค์˜ ์ง‘ํ•ฉ)  S .

      • ์ค‘๊ฐ„: ์ค‘์š”ํ•œ ํฌ์ธํŠธ๋“ค๋กœ ๊ตฌ์„ฑ๋œ Critical Point Sets  C_S . PointNet์ด ํŠน์ • ํ˜•์ƒ์„ ์ธ์‹ํ•˜๊ธฐ ์œ„ํ•ด ํ•ต์‹ฌ ํฌ์ธํŠธ๋“ค๋งŒ์„ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šตํ•œ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.

      • ์˜ค๋ฅธ์ชฝ: Upper-bound Shapes  N_S ๋กœ, ๊ฐ ๊ฐ์ฒด์˜ ์„ธ๋ถ„ํ™” ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค. ์ „์ฒด ํ˜•์ƒ์„ ๊ณ ๋ คํ•˜์—ฌ ๋‹ค์–‘ํ•œ ๋ถ€๋ถ„์„ ์ƒ‰์ƒ์œผ๋กœ ๊ตฌ๋ถ„ํ•ฉ๋‹ˆ๋‹ค.

 

 

5. ์ƒˆ๋กœ์šด ํ˜•์ƒ ๋ฒ”์ฃผ์— ๋Œ€ํ•œ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ

PointNet์€ ํ›ˆ๋ จ๋˜์ง€ ์•Š์€ ์ƒˆ๋กœ์šด ํ˜•์ƒ(์˜ˆ: ์–ผ๊ตด, ์ง‘, ํ† ๋ผ, ์ฃผ์ „์ž)์—๋„ ์ผ๋ถ€ ์ผ๋ฐ˜ํ™”๋ฉ๋‹ˆ๋‹ค.

ํฌ์ธํŠธ๋„ท์ด ์ฃผ๋กœ ํ‰๋ฉด ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง„ ์ธ๊ณต ๊ฐ์ฒด์— ๋Œ€ํ•ด ํ•™์Šต๋˜์—ˆ์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , ์ƒˆ๋กœ์šด ํ˜•์ƒ๋“ค์— ๋Œ€ํ•ด ์ผ๋ฐ˜ํ™”ํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

• ์œ„ ๊ทธ๋ฆผ 18์€ ๋ณด์ด์ง€ ์•Š์€ ๊ฐ์ฒด์— ๋Œ€ํ•œ ํฌ์ธํŠธ๋„ท์˜ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. 

      ๊ฐ ํ–‰์˜ ๊ตฌ์„ฑ:

         1. Original Shape (์›๋ž˜ ๋ชจ์–‘): ๊ฐ ๊ฐ์ฒด์˜ ์‹ค์ œ ํ˜•์ƒ์ž…๋‹ˆ๋‹ค.

         2. Critical Point Sets (์ค‘์š”ํ•œ ํฌ์ธํŠธ ์ง‘ํ•ฉ): PointNet์ด ํ˜•์ƒ์„ ์ธ์‹ํ•˜๊ธฐ ์œ„ํ•ด ์ค‘์š”ํ•œ ํฌ์ธํŠธ๋“ค๋กœ๋งŒ ๊ตฌ์„ฑ๋œ ์ง‘ํ•ฉ์ž…๋‹ˆ๋‹ค.

         3. Upper-bound Shapes (์ƒ์œ„ ๊ฒฝ๊ณ„ ํ˜•์ƒ): ํ•™์Šต๋œ ๋ชจ๋ธ์ด ์ƒ์„ฑํ•œ ์ตœ์ข… ์„ธ๋ถ„ํ™” ๊ฒฐ๊ณผ๋กœ, ๊ฐ์ฒด์˜ ์ „์ฒด์ ์ธ ํ˜•์ƒ์„ ๋‚˜ํƒ€๋‚ด๊ณ  ์žˆ์œผ๋ฉฐ, ๋‹ค์–‘ํ•œ ์ƒ‰์ƒ์€ ๊นŠ์ด(depth) ์ •๋ณด๋ฅผ ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค.