๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๐Ÿ˜ŽAI/Generative AI

[Paper Review][Generative AI] SeqDeepFake: Detecting and Recovering Sequential DeepFake Manipulation

by SolaKim 2023. 7. 17.
SeqDeepFake: Detecting and Recovering Sequential DeepFake Manipulation
S-Lab, ECCV 2022

 

์ด ๋…ผ๋ฌธ์€ ๋”ฅํŽ˜์ดํฌ ๊ธฐ์ˆ ์„ ์‚ฌ์šฉํ•œ ๊ฐ€์งœ ์˜์ƒ ์กฐ์ž‘์„ ๊ฐ์ง€ํ•˜๊ณ  ๋ณต๊ตฌํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ๊นŠ์ด ์žˆ๋Š” ์—ฐ๊ตฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค. 

 

๋…ผ๋ฌธ์˜ ์ฃผ์š” ๋ชฉ์ ์€ ์‹œํ€€์…œ ๋”ฅํŽ˜์ดํฌ ์กฐ์ž‘(Detecting Sequential DeepFake Manipulation)์„ ๊ฐ์ง€ํ•˜๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ์—ฐ๊ตฌ ๋ฌธ์ œ๋ฅผ ์ œ์‹œํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

์ผ๋ฐ˜์ ์œผ๋กœ ๋”ฅํŽ˜์ดํฌ ์กฐ์ž‘์€ ๋‹จ์ผ ๋‹จ๊ณ„์˜ ์กฐ์ž‘์„ ๊ฐ์ง€ํ•˜๋Š” ๊ฒƒ์— ์ดˆ์ ์„ ๋งž์ถ”๊ณ  ์žˆ์œผ๋‚˜, ์ตœ๊ทผ์—๋Š” ์–ผ๊ตด ์กฐ์ž‘ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ํ†ตํ•ด ๋‹ค๋‹จ๊ณ„ ์กฐ์ž‘์ด ๊ฐ€๋Šฅํ•ด์กŒ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์‹œํ€€์…œ ๋”ฅํŽ˜์ดํฌ ์กฐ์ž‘์€ ๊ธฐ์กด์˜ ๋ฐฉ๋ฒ•์œผ๋กœ๋Š” ๊ฐ์ง€ํ•˜๊ธฐ ์–ด๋ ค์šด ๋„์ „์ ์ธ ๋ฌธ์ œ๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

 

 

๋…ผ๋ฌธ์—์„œ๋Š” ์‹œํ€€์…œ ๋”ฅํŽ˜์ดํฌ ์กฐ์ž‘์„ ๊ฐ์ง€ํ•˜๊ณ  ๋ณต๊ตฌํ•˜๊ธฐ ์œ„ํ•ด Seq-DeepFake Transformer (SeqFakeFormer)๋ผ๋Š” ์ƒˆ๋กœ์šด ๋ชจ๋ธ์„ ์ œ์•ˆํ•˜์˜€์Šต๋‹ˆ๋‹ค. SeqFakeFormer๋Š” ์ด๋ฏธ์ง€-์‹œํ€€์Šค ์ž‘์—…์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ˆœ์ฐจ์ ์ธ ๋”ฅํŽ˜์ดํฌ ์กฐ์ž‘์„ ๊ฐ์ง€ํ•˜๋Š” ๋ฐ์— ํƒ์›”ํ•œ ์„ฑ๋Šฅ์„ ๋ฐœํœ˜ํ•ฉ๋‹ˆ๋‹ค.  

 

 

SeqFakeFormer ๋ชจ๋ธ์€ ๋‘ ๊ฐ€์ง€ ์ฃผ์š” ๋ชจ๋“ˆ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.

์ฒซ์งธ, Spatial Relation Extraction ๋ชจ๋“ˆ์€ ์–ผ๊ตด ์ด๋ฏธ์ง€๋ฅผ ์ปจ๋ณผ๋ฃจ์…˜ ์‹ ๊ฒฝ๋ง(CNN)์— ํ†ต๊ณผ์‹œ์ผœ ๊ณต๊ฐ„์ ์ธ ์กฐ์ž‘ ์˜์—ญ์˜ ํŠน์ง•์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ์ธ์ฝ”๋”์—์„œ๋Š” ์ž๊ธฐ-์ฃผ์˜(self-attention) ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ณต๊ฐ„์ ์ธ ๊ด€๊ณ„๋ฅผ ํ•™์Šตํ•˜๊ณ  ์ธ์ฝ”๋”ฉํ•ฉ๋‹ˆ๋‹ค.
๋‘˜์งธ, Sequential Relation Modeling with Spatially Enhanced Cross-Attention ๋ชจ๋“ˆ์€ ๋””์ฝ”๋”์— ์œ„์น˜ํ•˜๋ฉฐ, ๊ต์ฐจ-์ฃผ์˜(cross-attention) ๋ฉ”์ปค๋‹ˆ์ฆ˜๊ณผ ์ž๊ธฐ-์ฃผ์˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ํ™œ์šฉํ•˜์—ฌ ์ˆœ์ฐจ์ ์ธ ๊ด€๊ณ„๋ฅผ ๋ชจ๋ธ๋งํ•˜๊ณ  ๋”ฅํŽ˜์ดํฌ ์กฐ์ž‘์„ ๊ฐ์ง€ํ•ฉ๋‹ˆ๋‹ค.

 

SeqFakeFormer ๋ชจ๋ธ์€ ์ œํ•œ๋œ ์ฃผ์„ ๋ฐ์ดํ„ฐ๋ฅผ ๊ณ ๋ คํ•˜์—ฌ Spatially Enhanced Cross-Attention (SECA) ๋ชจ๋“ˆ์„ ํ†ตํ•ฉํ•˜์—ฌ ๋ณด๋‹ค ํšจ๊ณผ์ ์ธ ๊ต์ฐจ-์ฃผ์˜๋ฅผ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค.

 

 

์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์‹คํ—˜ ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด SeqFakeFormer ๋ชจ๋ธ์˜ ์šฐ์ˆ˜์„ฑ์ด ์ž…์ฆ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

์ œ์•ˆ๋œ ๋ชจ๋ธ์€ ๋‹ค์–‘ํ•œ ์กฐ์ž‘ ์œ ํ˜•์—์„œ ์ตœ์‹  ๋”ฅํŽ˜์ดํฌ ๊ฐ์ง€ ๋ฐฉ๋ฒ•๋“ค์„ ๋Šฅ๊ฐ€ํ•˜๋Š” ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์‹œํ€€์…œ ๋”ฅํŽ˜์ดํฌ ์กฐ์ž‘ ๊ฐ์ง€์˜ ์ค‘์š”์„ฑ๊ณผ SeqFakeFormer ๋ชจ๋ธ์˜ ํšจ๊ณผ์„ฑ์„ ๊ฐ•์กฐํ•˜๋Š” ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.  

๋˜ํ•œ, ๋…ผ๋ฌธ์€ ์ฒซ ๋ฒˆ์งธ Seq-DeepFake ๋ฐ์ดํ„ฐ์…‹์„ ๊ตฌ์ถ•ํ•˜์—ฌ ์—ฐ๊ตฌ์ž๋“ค์ด ๋” ๋‚˜์€ ๋”ฅํŽ˜์ดํฌ ๊ฐ์ง€ ๋ชจ๋ธ์„ ๊ฐœ๋ฐœํ•˜๊ณ  ํ‰๊ฐ€ํ•˜๋Š” ๋ฐ์— ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ์ง€์›ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

 

 

์ด ๋ฐ์ดํ„ฐ์…‹์€ ์‹œํ€€์…œ์œผ๋กœ ์กฐ์ž‘๋œ ์–ผ๊ตด ์ด๋ฏธ์ง€์™€ ํ•ด๋‹น ์กฐ์ž‘์˜ ์ˆœ์ฐจ์ ์ธ ๋ฒกํ„ฐ ์ฃผ์„์„ ํฌํ•จํ•˜๊ณ  ์žˆ์–ด ๋”ฅํŽ˜์ดํฌ ๊ฐ์ง€ ์—ฐ๊ตฌ์— ์ค‘์š”ํ•œ ์ž๋ฃŒ๋กœ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.  

"SeqDeepFake: Detecting and Recovering Sequential DeepFake Manipulation" ๋…ผ๋ฌธ์€ ๋”ฅํŽ˜์ดํฌ ๊ฐ์ง€์™€ ๋ณด๋‹ค ์•ˆ์ „ํ•œ ์˜์ƒ ํ™˜๊ฒฝ์„ ์œ„ํ•œ ์—ฐ๊ตฌ์— ๋งค์šฐ ์ค‘์š”ํ•œ ๋‚ด์šฉ์„ ์ œ์‹œํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

SeqFakeFormer ๋ชจ๋ธ๊ณผ Seq-DeepFake ๋ฐ์ดํ„ฐ์…‹์€ ๋”ฅํŽ˜์ดํฌ ์กฐ์ž‘ ๊ฐ์ง€์™€ ๊ด€๋ จ๋œ ์—ฐ๊ตฌ์™€ ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ์˜ ์ ์šฉ์„ ์œ„ํ•œ ๋งค์šฐ ์œ ์šฉํ•œ ๋„๊ตฌ๋กœ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

์ž์„ธํ•œ ๋‚ด์šฉ๊ณผ ์‹คํ—˜ ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•˜๊ณ  ์‹ถ์€ ๋ถ„๋“ค์€ ์›๋ฌธ ๋…ผ๋ฌธ์„ ์ฐธ๊ณ ํ•˜์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค. 

 

์ด์ƒ์œผ๋กœ "SeqDeepFake: Detecting and Recovering Sequential DeepFake Manipulation" ๋…ผ๋ฌธ์— ๋Œ€ํ•œ ๋ฆฌ๋ทฐ๋ฅผ ๋งˆ์น˜๊ฒ ์Šต๋‹ˆ๋‹ค!

๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค ๐Ÿคฉ