Bay Jekyll Theme | CS231A Lecture 7 review

SFM 문제 복습

The SFM problem

m개의 이미지에서 n개의 3차원 고정 포인트를 관측했을 때, $\mathbf{x}_{ij}=\mathbf{M}_i \mathbf{X}_j$
여기서 다음을 추정하라
- $m$ projection matrices $\mathbf{M}_i$→motion
- $n$ 3d points $\mathbf{X}_j$→structure

Affine SFM

더 단순한 상황을 생각해볼 수 있다. 카메라들이 affine하다고 가정하는 것
그냥 perspective model에서는 알 수 있는 것이 없지만, affine 가정에서는 $\mathbf{x}=[\mathbf{m}_1\mathbf{X}, \mathbf{m}_2\mathbf{X}, 1]^\top$ 이므로,

$\mathbf{x}^E=(\mathbf{m}_1\mathbf{X}, \mathbf{m}_2\mathbf{X})^\top=\begin{bmatrix}\mathbf{A}&\mathbf{b}\end{bmatrix}\mathbf{X}=\mathbf{A}\mathbf{X}^E+ \mathbf{b}$

이 때 카메라 $i$와 $j$번째 점에 대해 쓰면, $\mathbf{x}_{ij}=\mathbf{A}_i\mathbf{X}_j+\mathbf{b}_i$이고, 각각 2x1, 2x3, 3x1, 2x1 벡터임에 유의하여야 한다.
$m\times n$개의 관측값 $\mathbf{x}_{ij}$에 대해 각 행렬와 3차원 점의 위치를 추정하는 것이 affine SFM의 목적이다.
$2m\times n$개 방정식에 $8m+3n-8$개 변수가 들어가있음
접근법
- 기하학적 접근(by fundamental matrix): Affine epipolar geometry, F 추정, 카메라, 점
- Factorization 방법(by SVD)

Tomasi&Kanade Algorithm

Data centering&Factorization 으로 구성되어 있음

Data centering

Data centering: $\hat{\mathbf{x}}_{ij}=\mathbf{x}_{ij}-\bar{\mathbf{x}}_i$ where $\bar{\mathbf{x}}=\frac{1}{n} \sum_{k=1}^{n}\mathbf{x}_{ik}$
계산하면 3차원 센터링에 행렬 먹인걸로 바뀜:$\hat{\mathbf{x}}_{ij}=\mathbf{A}_i (\mathbf{X}_{j}-\bar{\mathbf{X}})=\mathbf{A}_i \hat{\mathbf{X}_j}$
이미 세계의 중심이 센터라면? $\hat{\mathbf{x}}_{ij}=\mathbf{A}_i \mathbf{X}_j$
관계식은 구했으나, 좌변만 현재 계산 가능한 상황.

Factorization

다음 행렬을 측정 행렬이라고 한다.$\mathbf{D}=[\hat{\mathbf{x}}_{ij}]_{1\leq i\leq m, 1\leq j\leq n}$. 이 때 각 원소가 $2\times1$ 벡터라 $2m\times n$ 행렬임에 주의.
이제 아까 구했던 센터링 식으로 행렬을 찢으면,

$\mathbf{D}=\begin{bmatrix}\mathbf{A}_1\\ \mathbf{A}_2\\ \vdots\\ \mathbf{A}_m\end{bmatrix}_{2m\times3}\begin{bmatrix}\mathbf{X}_1&\mathbf{X}_2&\cdots\mathbf{X}_n\end{bmatrix}_{3\times n}=\mathbf{M}\mathbf{S}$

식에 따라서 측정 행렬의 랭크가 3임을 알 수 있다.
측정 행렬을 어떻게 쪼갤 것인가? 랭크가 3이므로 singular value가 3개인 SVD
크기가 3인 SVD를 고려하여 각각 $\mathbf{X}_3, \mathbf{X}_3, \mathbf{X}_3$을 생각하면, $\mathbf{D}=\mathbf{U}_3(\mathbf{W}_3\mathbf{V}_3^\top)=\mathbf{M}\mathbf{S}$
그런데 측정 행렬이 노이즈나 근사치 때매 랭크가 3을 넘어가면 어떢함?→ 정리에 의해 위 분해가 가장 가까운 rank-3 approximation임이 밝혀져있다.(프로베니우스 노름 기준)

Ambiguity in reconstruction

Affine ambiguity

$\mathbf{D}=\mathbf{M}H\cdot H^{-1}\mathbf{S}$ 분해가 유일하지 않아서 추가 조건이 필요함.

Similarity ambiguity

3차원 scene은 up to similarity transformation에 의해 유일함.
카메라가 칼리브레이트되어있으면, 이게 유일한 모호성임.
어찌보면 직관적으로 당연함. 이미지만 가지고 적대적인 크기를 알 수 있음?
이런 ambiguity를 해결하는 방법을 metric reconstruction이라고 한다.

Perspective SFM

더 일반적인 상황에서는 $4\times 4$ projective transformation에 의해 ambiguity가 생긴다.
같은 상황에서 행렬 마지막 원소만 1인 $4\times3$ 행렬이 각각 생기고, 따라서 $\mathbf{x}_{ij}=\mathbf{M}_i\mathbf{X}_j$에서 $2m\times n$ 개의 방정식이 $11m+3n-15$개 변수를 풀기 위해 생긴다.
Metric reconstruction: 이 perspective case에서 projective ambiguity를 해결하는 문제를 Self-calibration이라고 한다.

SFM 1: Algebraic approach(2-view case)

오히려 projective ambiguity를 활용하여 한쪽 카메라 행렬이 canonical한 경우만 고려할 수 있다. $\hat{M}_1=M_1 H^{-1}$이 canonical, $\hat{M}_2=M_2 H^{-1}$가 일반 카메라라고 가정하고 $\hat{\mathbf{X}}=H\mathbf{X}$라고 하자.
그러면 $\mathbf{x}=[\mathbf{I}|0]\hat{\mathbf{X}}$이고 $\mathbf{x}’=[\mathbf{A}|\mathbf{b}]\hat{\mathbf{X}}=\mathbf{A}\mathbf{x}+\mathbf{b}$라는 관계식을 얻을 수 있다.
이 관계식으로부터, $\mathbf{x}’\times\mathbf{b}=\mathbf{A}\mathbf{x}\times\mathbf{b}$, 이 외적은 $\mathbf{x}’$과 수직이므로 $0=\mathbf{x}’^\top\cdot(\mathbf{A}\mathbf{x}\times\mathbf{b})=\mathbf{x}’^\top\cdot(\mathbf{b}\times\mathbf{A}\mathbf{x})=\mathbf{x}’^\top F \mathbf{x}$
즉 fundamental matrix에 대한 constraint로 치환할 수 있다.
$\mathbf{b}$의 계산: 정의에 의해 fundamental matrix와 곱하면 0이 나온다. fundamental matrix가 singular matrix이므로, $\mathbf{b}$를 SVD를 통해 크기가 1인 Least square solution으로 구할 수 있다.
$\mathbf{A}$의 계산: 계산에 의해 $\mathbf{A}=-[\mathbf{b}_\times]F$임을 알 수 있다.
따라서 $\hat{M}_1=[I, 0]$, $\hat{M}_2=[-[\mathbf{b}_\times]F, \mathbf{b}]$
$\mathbf{b}$의 정체: 사실 fundamental matrix와 곱한 값이 0이 되는 애들은 epipole이다.
Triangulation: 이제 카메라 행렬, 2차원 점을 다 알고 있으므로 SVD를 좀 쓰면 3차원 점을 구할 수 있다.
시점이 $N$개인 경우는? 2개씩 이미지를 잡으면 pairwise solution을 구할 수 있는데, 나중에 bundle adjustment 기법으로 써먹을 수 있다.

SFM 2: Factorization by SVD

Factorization method의 한계: 모든 점이 보여야 함→ 일부 점이 사진에 따라 생략되거나, 이미지 간에 대응이 안될 수 있음
Algebraic method는 2개 시점에만 적용 가능함

SFM 3: Bundle adjustment

SFM을 정밀하게 하는 비선형 방법
다음 reprojection error를 최소화하는 것을 목표로 한다.

$E(M, \mathbf{X})=\sum_{i=1}^m\sum_{j=1}^n D(\mathbf{x}_{ij}, M_i X_j)^2$

여기서 $D$는 비선형함수이므로, 뉴턴 방법이나 Levenberg-Marquardt Algorithm 등을 사용한다.
- 두 방법 다 반복적이고, 초기해로부터 시작함. 초기해가 멀면 느림
- 추정해가 초기해의 함수값으로 나올 수 있음
- 뉴턴 방법에선 J랑 H를 계산해야됨
- LM 알고리즘에선 H는 계산할 필요 없음
Bundle adjustment의 장점: 많은 시각을 다룰 수 있음, 안보이는 점도 커버함.
Bundle adjustment의 단점: 최적화 문제가 시각이 많으면 너무 커져버림. 초기 조건이 좋아야함.
보통 SFM의 마지막 스텝에 적용되며, algebraic method나 factorization은 초기해를 주기 위해 보통 사용된다.

Self-calibration

보통 카메라에 대한 몇 가지 가정으로 metric reconstruction이 가능함
쓰이는 테크닉들:
- single-view metrology constraints 사용(Lecture 4)
- Direct approach for 2 views(Kruppa equation)
- Algebraic approach
- Stratified approach
bundle adjustment 과정에서 카메라 정보를 주입하면, similarity ambiguity만 빼고 제거할 수 있다.

Direct Approach(Kruppa Eq.)

Algebraic Approach

Application

Reconstruction and Texture mapping(1998)
Incremental reconstruction of construction sites(2008)

Summary

구조와 움직임을 추정한다.
1. algebraic
2. factorization
3. bundle adjustment
Perspective에서 metric으로 고친다(self-calibration)
Bundle Adjustment

또는,

self-calibration constraints가 있는 상태로 바로 bundle adjustment