Active Pictorial Structures

  1. Definition
  2. Gaussian Markov Random Field
  3. Model
  4. Cost Function and Optimization
  5. Fitting Example
  6. References
  7. API Documentation

We highly recommend that you render all matplotlib figures inline the Jupyter notebook for the best menpowidgets experience. This can be done by running
%matplotlib inline
in a cell. Note that you only have to run it once and not in every rendering cell.

1. Definition

Active Pictorial Structures (APS) is a statistical deformable model of the shape and appearance of a deformable object class. It is a generative model that takes advantage of the strengths, and overcomes the disadvantages, of both Pictorial Structures (PS) [3] and Active Appearance Models (AAMs) [2]. APS is motivated by the tree-based structure of PS and further expands on it, as it can for­mulate the relations between parts using any graph struc­ture; not only trees. From AAMs, it borrows the use of the Gauss-Newton algorihtm in combination with a statis­tical shape model. The weighted inverse compositional al­gorithm with fixed Jacobian and Hessian that is used for optimization is very fast. In this page, we provide a basic mathematical definition of APS. For a more in-depth explanation, please refer to the relevant literature in References and especially [1].

A shape instance of a deformable object is represented as s=[1T,2T,,LT]T=[x1,y1,,xL,yL]T\mathbf{s}=\big[{\mathbf{\ell}_1}^{\mathsf{T}},{\mathbf{\ell}_2}^{\mathsf{T}},\ldots,{\mathbf{\ell}_L}^{\mathsf{T}}\big]^{\mathsf{T}}=\big[x_1,y_1,\ldots,x_L,y_L\big]^{\mathsf{T}}, a 2L×12L\times 1 vector consisting of LL landmark points coordinates i=[xi,yi],i=1,,L\mathbf{\ell}_i=[x_i,y_i],\forall i=1,\ldots,L. Moreover, let us denote by A(I,s)\mathcal{A}(\mathbf{I}, \mathbf{s}) a patch-based feature extraction function that returns an M×1M\times 1 feature vector given an input image I\mathbf{I} and a shape instance s\mathbf{s}. The features (e.g. SIFT) are computed on patches that are centered around the landmark points.

2. Gaussian Markov Random Field

Let us define an undirected graph between the LL land­mark points of an object as G=(V,E)G=(V, E), where V={v1,v2,,vL}V=\big\lbrace v_1, v_2, \ldots, v_L\big\rbrace is the set of LL vertexes and there is an edge (vi,vj)E(v_i, v_j)\in E for each pair of connected landmark points. Moreover, let us assume that we have a set of random vari­ables X={Xi}, i:viVX=\{X_i\},~\forall i:v_i\in V which represent an abstract feature vector of length kk extracted from each vertex viv_i, i.e. xi, i:viV\mathbf{x}_i,~\forall i:v_i\in V. We model the likelihood probability of two ran­dom variables that correspond to connected vertexes with a normal distribution p(Xi=xi,Xj=xjG)N(mij,Σij), i,j:(vi,vj)E p(X_i=\mathbf{x}_i, X_j=\mathbf{x}_j|G)\sim\mathcal{N}(\mathbf{m}_{ij}, \mathbf{\Sigma}_{ij}),~\forall i,j: (v_i,v_j)\in E where mij\mathbf{m}_{ij} is the 2k×12k\times 1 mean vector and Σij\mathbf{\Sigma}_{ij} is the 2k×2k2k\times 2k covariance matrix. Consequently, the cost of observing a set of feature vectors {xi}, i:viV\{\mathbf{x}_i\},~\forall i: v_i\in V can be computed using a Mahalanobis distance per edge, i.e. i,j:(vi,vj)E([xixj]mij)TΣij1([xixj]mij) \sum_{\forall i,j: (v_i,v_j)\in E}\left(\left[\begin{array}{c}\mathbf{x}_i\\ \mathbf{x}_j\end{array}\right]-\mathbf{m}_{ij}\right)^{\mathsf{T}} {\mathbf{\Sigma}_{ij}}^{-1} \left(\left[\begin{array}{c}\mathbf{x}_i\\ \mathbf{x}_j\end{array}\right]-\mathbf{m}_{ij}\right) In practice, the computational cost of computing this cost is too expensive because it requires looping over all the edges of the graph. Inference can be much faster if we convert this cost to an equivalent matrical form as (xm)TΣ1(xm) (\mathbf{x}-\mathbf{m})^{\mathsf{T}}\mathbf{\Sigma}^{-1}(\mathbf{x}-\mathbf{m}) This is equivalent to modelling the set of random variables XX with a Gaussian Markov Random Field (GMRF) [4]. A GMRF is described by an undirected graph, where the vertexes stand for random variables and the edges impose statistical constraints on these random variables. Thus, the GMRF models the set of random variables with a multivari­ate normal distribution p(X=xG)N(m,Σ)p(X=\mathbf{x}|G)\sim\mathcal{N}(\mathbf{m},\mathbf{\Sigma}), where m=[m1T,,mLT]T=[E(X1)T,,E(XL)T]T\mathbf{m}=\big[{\mathbf{m}_1}^{\mathsf{T}},\ldots,{\mathbf{m}_L}^{\mathsf{T}}\big]^{\mathsf{T}}=\big[E(X_1)^{\mathsf{T}},\ldots,E(X_L)^{\mathsf{T}}\big]^{\mathsf{T}} is the kL×1kL\times 1 mean vectors and Σ\mathbf{\Sigma} the kL×kLkL\times kL overall covariance matrix. We denote by Q\mathbf{Q} the block-sparse pre­cision matrix that is the inverse of the covariance matrix, i.e. Q=Σ1\mathbf{Q}=\mathbf{\Sigma}^{-1}. By applying the GMRF we make the as­sumption that the random variables satisfy the three Markov properties (pairwise, local and global) and that the blocks of the precision matrix that correspond to disjoint vertexes are zero, i.e. Qij=0k×k, i,j:(vi,vj)E\mathbf{Q}_{ij}=\mathbf{0}_{k\times k},~\forall i,j: (v_i, v_j)\notin E. By defining Gi={(i1)k+1,(i1)k+2,,ik}\mathcal{G}_i=\big\lbrace(i-1)k+1, (i-1)k+2, \ldots, ik\big\rbrace to be a set of indices for sampling a matrix, we can prove that the structure of the precision matrix is Q={j:(vi,vj)EΣij1(G1,G1)+j:(vj,vi)EΣji1(G2,G2), i:viV,at (Gi,Gi)Σij1(G1,G2), i,j:(vi,vj)E,at (Gi,Gj) and (Gj,Gi)0,elsewhere \mathbf{Q}=\left\lbrace\begin{array}{ll} \sum_{\forall j: (v_i, v_j)\in E}{\mathbf{\Sigma}_{ij}}^{-1}(\mathcal{G}_1,\mathcal{G}_1) + \sum_{\forall j: (v_j, v_i)\in E}{\mathbf{\Sigma}_{ji}}^{-1}(\mathcal{G}_2,\mathcal{G}_2),~\forall i: v_i\in V, & \text{at~}(\mathcal{G}_i, \mathcal{G}_i)\\\\ {\mathbf{\Sigma}_{ij}}^{-1}(\mathcal{G}_1,\mathcal{G}_2),~\forall i,j: (v_i, v_j)\in E, & \text{at~}(\mathcal{G}_i, \mathcal{G}_j)\text{~and~}(\mathcal{G}_j, \mathcal{G}_i)\\\\ 0, & \text{elsewhere} \end{array}\right.

Using the same assumptions and given a directed graph (cyclic or acyclic) G=(V,E)G = (V, E), where (vi,vj)E(v_i, v_j)\in E means that viv_i is the parent of vjv_j, we can show that (xm)TΣ1(xm)=i,j:(vi,vj)E(xixjmij)TΣij1(xixjmij) (\mathbf{x}-\mathbf{m})^{\mathsf{T}}\mathbf{\Sigma}^{-1}(\mathbf{x}-\mathbf{m}) = \sum_{\forall i,j: (v_i, v_j)\in E} (\mathbf{x}_i-\mathbf{x}_j-\mathbf{m}_{ij})^{\mathsf{T}}{\mathbf{\Sigma}_{ij}}^{-1}(\mathbf{x}_i-\mathbf{x}_j-\mathbf{m}_{ij}) is true if Q={j:(vi,vj)EΣij1+j:(vj,vi)EΣji1, i:viV,at (Gi,Gi)Σij1, i,j:(vi,vj)E,at (Gi,Gj) and (Gj,Gi)0,elsewhere \mathbf{Q}=\left\lbrace\begin{array}{ll} \sum_{\forall j: (v_i, v_j)\in E}{\mathbf{\Sigma}_{ij}}^{-1} + \sum_{\forall j: (v_j, v_i)\in E}{\mathbf{\Sigma}_{ji}}^{-1},~\forall i: v_i\in V, & \text{at~}(\mathcal{G}_i, \mathcal{G}_i)\\\\ -{\mathbf{\Sigma}_{ij}}^{-1},~\forall i,j: (v_i, v_j)\in E, & \text{at~}(\mathcal{G}_i, \mathcal{G}_j)\text{~and~}(\mathcal{G}_j, \mathcal{G}_i)\\\\ 0, & \text{elsewhere} \end{array}\right. where mij=E(XiXj)\mathbf{m}_{ij}=E(X_i-X_j) and m=[m1T,,mLT]T=[E(X1)T,,E(XL)T]T\mathbf{m}=\big[{\mathbf{m}_1}^{\mathsf{T}},\ldots,{\mathbf{m}_L}^{\mathsf{T}}\big]^{\mathsf{T}}=\big[E(X_1)^{\mathsf{T}},\ldots,E(X_L)^{\mathsf{T}}\big]^{\mathsf{T}}. In this case, if GG is a tree, then we have a Bayesian network. Please refer to the supplementary material of [1] for detailed proofs of the above.

3. Model

An APS [1] is trained using a set of NN images {I1,I2,,IN}\big\lbrace\mathbf{I}_1,\mathbf{I}_2,\ldots,\mathbf{I}_N\big\rbrace that are annotated with a set of LL landmarks. It consists of the following parts:

  • Shape Model
    The shape model is trained as explained in the Point Distributon Model section. The training shapes {s1,s2,,sN}\big\lbrace\mathbf{s}_1,\mathbf{s}_2,\ldots,\mathbf{s}_N\big\rbrace are first aligned using Generalized Procrustes Analysis and then an orthonormal basis is created using Principal Component Analysis (PCA) which is further augmented with four eigenvectors that represent the similarity transform (scaling, in-plane rotation and translation). This results in {s¯,U} \big\lbrace\bar{\mathbf{s}}, \mathbf{U}\big\rbrace where UR2L×n\mathbf{U}\in\mathbb{R}^{2L\times n} is the orthonormal basis of nn eigenvectors (including the four similarity components) and s¯R2L×1\bar{\mathbf{s}}\in\mathbb{R}^{2L\times 1} is the mean shape vector. We define SR2L\mathcal{S}\in\mathbb{R}^{2L} which generates a new shape instance given the shape model's basis, an input shape and the shape parameters vector as S(s,p)=s+Up \mathcal{S}(\mathbf{s}, \mathbf{p}) = \mathbf{s} + \mathbf{U}\mathbf{p} where p=[p1,p2,,pn]T\mathbf{p}=\big[p_1,p_2,\ldots,p_n\big]^{\mathsf{T}} are the parameters' values. Similarly we define the set of functions SiR2L, i=1,,L\mathcal{S}_i\in\mathbb{R}^{2L},~\forall i=1,\ldots,L that return the coordinates of the ithi^{\text{th}} landmark of the shape instance as Si(s,s)=s2i1,2i+U2i1,2ip, i=1,,L \mathcal{S}_i(\mathbf{s}, \mathbf{s}) = \mathbf{s}_{2i-1, 2i} + \mathbf{U}_{2i-1, 2i}\mathbf{p},~\forall i=1,\ldots,L where s2i1,2i\mathbf{s}_{2i-1, 2i} denotes the coordinates' vector of the ithi^{\text{th}} landmark point and U2i1,2i\mathbf{U}_{2i-1, 2i} denotes the 2i12i-1 and 2i2i rows of the shape subspace U\mathbf{U}.

    Another way to build the shape model is by using a GMRF structure. Specifically, given an undi­rected graph Gs=(Vs,Es)G^s = (V^s, E^s) and assuming that the pairwise locations' vector of two connected landmarks fol­lows a normal distribution as [xi,yi,xj,yj]TN(mijs,Σijs)\big[x_i, y_i, x_j, y_j\big]^{\mathsf{T}}\sim\mathcal{N}(\mathbf{m}_{ij}^s, \mathbf{\Sigma}_{ij}^s), we formulate a GMRF. Thus, this can be expressed in matricial format as shown in the GMRF paragraph as p(sGs)N(s¯,Σs)p(\mathbf{s}|G^s)\sim\mathcal{N}(\bar{\mathbf{s}}, \mathbf{\Sigma}^s). After obtaining the GMRF precision matrix Qs\mathbf{Q}^s, we can invert it and apply PCA on the resulting covariance matrix in order to obtain a linear shape model. Note that as shown in [1], it is more beneficial to directly apply PCA rather than using a GMRF for the shape model.

  • Appearance Model
    Given an undirected graph Ga=(Va,Ea)G^a=(V^a, E^a) and assuming that the concatenation of the appearance vectors of two connected landmarks follows a normal distribution, we form a GMRF that can be expressed as p(A(I,s)Ga)N(a¯,Σa)p(\mathcal{A}(\mathbf{I},\mathbf{s})|G^a)\sim\mathcal{N}(\bar{\mathbf{a}},\mathbf{\Sigma}^a) where a¯\bar{\mathbf{a}} is the M×1M\times 1 mean appearance vector and Qa=Σa1\mathbf{Q}^a={\mathbf{\Sigma}^a}^{-1} is the M×MM\times M precision matrix that is formulated as shown in the GMRF paragraph. During the training of the appear­ance model, the low rank representation of each edgewise covariance matrix Σija\mathbf{\Sigma}_{ij}^a is utilized by using the first mm sin­gular values of its SVD factorization. Given a¯\bar{\mathbf{a}} and Qa\mathbf{Q}^a, the cost of an observed appearance vector A(I,s)\mathcal{A}(\mathbf{I},\mathbf{s}) corresponding to a shape instance s=S(s¯,p)\mathbf{s}=\mathcal{S}(\bar{\mathbf{s}},\mathbf{p}) in an image I\mathbf{I} is A(I,S(s¯,p))a¯Qa2=[A(I,S(s¯,p))a¯]TQa[A(I,S(s¯,p))a¯] \big\lVert\mathcal{A}\big(\mathbf{I},\mathcal{S}(\bar{\mathbf{s}},\mathbf{p})\big)-\bar{\mathbf{a}}\big\rVert^2_{\mathbf{Q}^a} = \big[\mathcal{A}\big(\mathbf{I},\mathcal{S}(\bar{\mathbf{s}},\mathbf{p})\big)-\bar{\mathbf{a}}\big]^{\mathsf{T}} \mathbf{Q}^a \big[\mathcal{A}\big(\mathbf{I},\mathcal{S}(\bar{\mathbf{s}},\mathbf{p})\big)-\bar{\mathbf{a}}\big]

  • Deformation Prior
    This is similar to the deformation model of Pictorial Structures [3]. Specifically, we define a directed (cyclic or acyclic) graph between the landmark points Gd=(Vd,Ed)G^d=(V^d,E^d) and model the relative locations between the parent and child of each edge with a GMRF. We assume that the relative loaction between the vertexes of each edge follows a normal distribution [xixj,yiyj]TN(mijd,Σijd)[x_i-x_j, y_i-y_j]^{\mathsf{T}}\sim\mathcal{N}(\mathbf{m}_{ij}^d, \mathbf{\Sigma}_{ij}^d), i,j:(vid,vjd)Ed\forall i,j: (v_i^d,v_j^d)\in E^d and model the overall structure with a GMRF that has a 2L×2L2L\times 2L block-sparse precision matrix. Qd\mathbf{Q}^d computed as shown in the GMRF paragraph. Note that the mean relative location vector is the same as the mean shape vector. This spring-like prior model manages to constrain extreme shape configurations that could be evoked during fitting with very bad initialization, leading the optimization process towards a better result. Given s¯\bar{\mathbf{s}} and Qd\mathbf{Q}^d, the cost of observing a shape instance s=S(s¯,p)\mathbf{s}=\mathcal{S}(\bar{\mathbf{s}}, \mathbf{p}) is S(s¯,p)s¯Qd2=S(s¯,p)S(s¯,0)Qd2=S(0,p)TQdS(0,p) \big\lVert\mathcal{S}(\bar{\mathbf{s}},\mathbf{p})-\bar{\mathbf{s}}\big\rVert^2_{\mathbf{Q}^d} = \big\lVert\mathcal{S}(\bar{\mathbf{s}},\mathbf{p})-\mathcal{S}(\bar{\mathbf{s}},\mathbf{0})\big\rVert^2_{\mathbf{Q}^d} = \mathcal{S}(\mathbf{0},\mathbf{p})^{\mathsf{T}} \mathbf{Q}^d \mathcal{S}(\mathbf{0},\mathbf{p}) where we use the properties S(s¯,0)=s¯+U0=s¯\mathcal{S}(\bar{\mathbf{s}},\mathbf{0})=\bar{\mathbf{s}}+\mathbf{U}\mathbf{0}=\bar{\mathbf{s}} and S(s¯,p)s¯=s¯+Ups¯=S(0,p)\mathcal{S}(\bar{\mathbf{s}},\mathbf{p})-\bar{\mathbf{s}}=\bar{\mathbf{s}}+\mathbf{U}\mathbf{p}-\bar{\mathbf{s}}=\mathcal{S}(\mathbf{0},\mathbf{p}).

4. Cost Function and Optimization

Given a test image I\mathbf{I}, the optimization cost function of APS is argminpA(I,S(s¯,p))a¯Qa2+λS(s¯,p)s¯Qd2 \arg\min_{\mathbf{p}} \big\lVert\mathcal{A}\big(\mathbf{I},\mathcal{S}(\bar{\mathbf{s}},\mathbf{p})\big)-\bar{\mathbf{a}}\big\rVert^2_{\mathbf{Q}^a} + \lambda\big\lVert\mathcal{S}(\bar{\mathbf{s}},\mathbf{p})-\bar{\mathbf{s}}\big\rVert^2_{\mathbf{Q}^d} The minimization of this function with respect to the shape parameters p\mathbf{p} is performed with a variant of the Gauss-Newton algorithm. The hyper-parameter λ\lambda controls the weighting between the appearance and deformation costs, which give the generative nature of the model it can greatly determine the performance. An optimal value of this hyper-parameter can be found using a validation dataset. The optimization procedure can be ap­plied in two different ways, depending on the coordinate system in which the shape parameters are updated: (1) for­ward and (2) inverse. Additionally, the parameters update could be carried out either in an additive or compositional manner. The compositional update has the form S(s¯,p)S(s,p)S(s¯,Δp)1=S(S(s¯,Δp),p)=S(s¯,pΔp)\mathcal{S}(\bar{\mathbf{s}},\mathbf{p})\leftarrow\mathcal{S}(\mathbf{s},\mathbf{p})\circ\mathcal{S}(\bar{\mathbf{s}},\Delta\mathbf{p})^{-1}=\mathcal{S}(\mathcal{S}(\bar{\mathbf{s}},-\Delta\mathbf{p}),\mathbf{p}) = \mathcal{S}(\bar{\mathbf{s}},\mathbf{p}-\Delta\mathbf{p}). Consequently, due to the translational nature of the motion model, the compositional parameters update is reduced to the parameters subtraction, as ppΔp\mathbf{p}\leftarrow\mathbf{p}-\Delta\mathbf{p}, which is equivalent to the additive update.

  • Weighted Inverse Compositional with fixed Jacobian and Hessian
    By using this compositional update of the parameters and having an initial estimate of p\mathbf{p}, the cost function of APS is expressed as minimizing argminΔpA(I,S(s¯,p))a¯(S(s¯,Δp))Qa2+λS(s¯,p)S(s¯,Δp)Qd2 \arg\min_{\Delta\mathbf{p}} \big\lVert\mathcal{A}\big(\mathbf{I}, \mathcal{S}(\bar{\mathbf{s}},\mathbf{p})\big)-\bar{\mathbf{a}}(\mathcal{S}(\bar{\mathbf{s}},\Delta\mathbf{p}))\big\rVert^2_{\mathbf{Q}^a} + \lambda\big\lVert\mathcal{S}(\bar{\mathbf{s}},\mathbf{p})-\mathcal{S}(\bar{\mathbf{s}},\Delta\mathbf{p})\big\rVert^2_{\mathbf{Q}^d} with respect to Δp\Delta\mathbf{p}. With some abuse of notation due to a¯\bar{\mathbf{a}} being a vector, a¯(S(s¯,Δp))\bar{\mathbf{a}}(\mathcal{S}(\bar{\mathbf{s}},\Delta\mathbf{p})) can be described as a¯(S(s¯,Δp))=[m1a(S1(s¯,Δp))mLa(SL(s¯,Δp))] \bar{\mathbf{a}}(\mathcal{S}(\bar{\mathbf{s}},\Delta\mathbf{p}))=\left[\begin{array}{c}\mathbf{m}_1^a(\mathcal{S}_1(\bar{\mathbf{s}},\Delta\mathbf{p}))\\ \\\vdots\\ \\\mathbf{m}_L^a(\mathcal{S}_L(\bar{\mathbf{s}},\Delta\mathbf{p}))\end{array}\right] where mia,i=1,,L\mathbf{m}_i^a, \forall i=1,\ldots,L is the mean appearance vector per patch. In order to find the solution we need to linearize around Δp=0\Delta\mathbf{p}=\mathbf{0} as {a¯(S(s¯,Δp))a¯+Ja¯p=0ΔpS(s¯,Δp)s¯+JSp=0Δp \left\lbrace\begin{array}{l}\bar{\mathbf{a}}(\mathcal{S}(\bar{\mathbf{s}},\Delta\mathbf{p}))\approx\bar{\mathbf{a}} + {\left.\mathbf{J}_{\bar{\mathbf{a}}}\right|}_{\mathbf{p}=\mathbf{0}}\Delta\mathbf{p}\\ \mathcal{S}(\bar{\mathbf{s}},\Delta\mathbf{p})\approx\bar{\mathbf{s}} + {\left.\mathbf{J}_{\mathcal{S}}\right|}_{\mathbf{p}=\mathbf{0}}\Delta\mathbf{p} \end{array}\right. where JSp=0=JS=Sp=U{\left.\mathbf{J}_{\mathcal{S}}\right|}_{\mathbf{p}=\mathbf{0}} = \mathbf{J}_{\mathcal{S}} = \frac{\partial\mathcal{S}}{\partial\mathbf{p}}=\mathbf{U} is the 2L×n2L\times n shape Jacobian and Ja¯p=0=Ja¯{\left.\mathbf{J}_{\bar{\mathbf{a}}}\right|}_{\mathbf{p}=\mathbf{0}} = \mathbf{J}_{\bar{\mathbf{a}}} is the M×nM\times n appearance Jacobian Ja¯=a¯Sp=a¯U=[m1aU1,2mLaU2L1,2L] \mathbf{J}_{\bar{\mathbf{a}}} = \nabla\bar{\mathbf{a}}\frac{\partial\mathcal{S}}{\partial\mathbf{p}} = \nabla\bar{\mathbf{a}}\mathbf{U} = \left[\begin{array}{c}\nabla\mathbf{m}_1^a\mathbf{U}_{1,2}\\ \vdots\\ \nabla\mathbf{m}_L^a\mathbf{U}_{2L-1,2L}\end{array}\right] where U2i1,2i\mathbf{U}_{2i-1,2i} denotes the 2i12i-1 and 2i2i row vectors of the basis U\mathbf{U}. Note that we make an abuse of notation by writing mia\nabla\mathbf{m}_i^a because mia\mathbf{m}_i^a is a vector. However, it represents the gradient of the mean patch-based appearance that corresponds to landmark ii. By substituting, taking the partial derivative with respect to Δp\Delta\mathbf{p}, equating it to 0\mathbf{0} and solving for Δp\Delta\mathbf{p} we get Δp=H1[Ja¯TQa(A(I,S(s¯,p))a¯)+λHSp] \Delta\mathbf{p}=\mathbf{H}^{-1} \big[{\mathbf{J}_{\bar{\mathbf{a}}}}^T \mathbf{Q}^a \left(\mathcal{A}(\mathbf{I}, \mathcal{S}(\bar{\mathbf{s}},\mathbf{p}))-\bar{\mathbf{a}}\right) + \lambda\mathbf{H}_{\mathcal{S}}\mathbf{p}\big] where Ha¯=Ja¯TQaJa¯HS=JSTQdJS=UTQdU}H=Ha¯+λHS \left.\begin{array}{l}\mathbf{H}_{\bar{\mathbf{a}}}={\mathbf{J}_{\bar{\mathbf{a}}}}^T \mathbf{Q}^a \mathbf{J}_{\bar{\mathbf{a}}}\\ \mathbf{H}_{\mathcal{S}}={\mathbf{J}_{\mathcal{S}}}^T \mathbf{Q}^d \mathbf{J}_{\mathcal{S}}=\mathbf{U}^T\mathbf{Q}^d\mathbf{U}\end{array}\right\rbrace\Rightarrow \mathbf{H}=\mathbf{H}_{\bar{\mathbf{a}}}+\lambda\mathbf{H}_{\mathcal{S}} is the combined n×nn\times n Hessian matrix and we use the property JSTQd(S(s¯,p)s¯)=UTQdUp=HSp{\mathbf{J}_{\mathcal{S}}}^T \mathbf{Q}^d \left(\mathcal{S}(\bar{\mathbf{s}},\mathbf{p})-\bar{\mathbf{s}}\right) = \mathbf{U}^T\mathbf{Q}^d\mathbf{U}\mathbf{p}=\mathbf{H}_{\mathcal{S}}\mathbf{p}. Note that Ja¯\mathbf{J}_{\bar{\mathbf{a}}}, Ha¯\mathbf{H}_{\bar{\mathbf{a}}}, HS\mathbf{H}_{\mathcal{S}} and H1\mathbf{H}^{-1} can be precomputed. The computational cost per iteration is only O(nM)\mathcal{O}(nM) which is practically a multiplication between an n×Mn\times M matrix and a M×1M\times1 vector that leads to a very fast fitting algorithm.

  • Forward
    The cost function in this case takes the form of minimizing argminΔpA(I,S(s¯,p+Δp))a¯Qa2+λS(0,p+Δp)Qd2 \arg\min_{\Delta\mathbf{p}} \big\lVert\mathcal{A}\big(\mathbf{I},\mathcal{S}(\bar{\mathbf{s}},\mathbf{p}+\Delta\mathbf{p})\big)-\bar{\mathbf{a}}\big\rVert^2_{\mathbf{Q}^a} + \lambda\big\lVert\mathcal{S}(\mathbf{0},\mathbf{p}+\Delta\mathbf{p})\big\rVert^2_{\mathbf{Q}^d} with respect to Δp\Delta\mathbf{p}. In order to find the solution we need to linearize around p\mathbf{p}, thus using first order Taylor expansion at p+Δp=pΔp=0\mathbf{p}+\Delta\mathbf{p}=\mathbf{p}\Rightarrow\Delta\mathbf{p}=\mathbf{0} as {A(S(s¯,p+Δp))A(S(s¯,p))+JAp=pΔpS(0,p+Δp)S(0,p)+JSp=pΔp \left\lbrace\begin{array}{l}\mathcal{A}(\mathcal{S}(\bar{\mathbf{s}},\mathbf{p}+\Delta\mathbf{p}))\approx\mathcal{A}(\mathcal{S}(\bar{\mathbf{s}},\mathbf{p})) + {\left.\mathbf{J}_{\mathcal{A}}\right|}_{\mathbf{p}=\mathbf{p}}\Delta\mathbf{p}\\ \mathcal{S}(\mathbf{0},\mathbf{p}+\Delta\mathbf{p})\approx\mathcal{S}(\mathbf{0},\mathbf{p}) + {\left.\mathbf{J}_{\mathcal{S}}\right|}_{\mathbf{p}=\mathbf{p}}\Delta\mathbf{p}\end{array}\right. where JSp=p=JS{\left.\mathbf{J}_{\mathcal{S}}\right|}_{\mathbf{p}=\mathbf{p}} = \mathbf{J}_{\mathcal{S}} is the 2L×n2L\times n shape Jacobian JS=Sp=U\mathbf{J}_{\mathcal{S}} = \frac{\partial\mathcal{S}}{\partial\mathbf{p}}=\mathbf{U} and JAp=p=JA{\left.\mathbf{J}_{\mathcal{A}}\right|}_{\mathbf{p}=\mathbf{p}} = \mathbf{J}_{\mathcal{A}} is the M×nM\times n appearance Jacobian JA=ASp=AU=[A(I,S1(s¯,p))U1,2A(I,S2(s¯,p))U3,4A(I,SL(s¯,p))U2L1,2L] \mathbf{J}_{\mathcal{A}} = \nabla_{\mathcal{A}}\frac{\partial\mathcal{S}}{\partial\mathbf{p}} = \nabla_{\mathcal{A}}\mathbf{U} = \left[\begin{array}{c}\nabla\mathcal{A}(\mathbf{I},\mathcal{S}_1(\bar{\mathbf{s}},\mathbf{p}))\mathbf{U}_{1,2}\\\nabla\mathcal{A}(\mathbf{I},\mathcal{S}_2(\bar{\mathbf{s}},\mathbf{p}))\mathbf{U}_{3,4}\\\vdots\\\nabla\mathcal{A}(\mathbf{I},\mathcal{S}_L(\bar{\mathbf{s}},\mathbf{p}))\mathbf{U}_{2L-1,2L}\end{array}\right] where U2i1,2i\mathbf{U}_{2i-1,2i} denotes the 2i12i-1 and 2i2i row vectors of the basis U\mathbf{U}. By substituting, taking the partial derivative with respect to Δp\Delta\mathbf{p}, equating it to 0\mathbf{0} and solving for Δp\Delta\mathbf{p} we get Δp=H1[JATQa(A(S(s¯,p))a¯)+λHSp] \Delta\mathbf{p}=-\mathbf{H}^{-1} \big[{\mathbf{J}_{\mathcal{A}}}^T \mathbf{Q}^a \left(\mathcal{A}(\mathcal{S}(\bar{\mathbf{s}},\mathbf{p}))-\bar{\mathbf{a}}\right) + \lambda\mathbf{H}_{\mathcal{S}}\mathbf{p}\big] where HA=JATQaJAHS=JSTQsJS=UTQsU}H=HA+λHS \left.\begin{array}{l}\mathbf{H}_{\mathcal{A}}={\mathbf{J}_{\mathcal{A}}}^T \mathbf{Q}^a \mathbf{J}_{\mathcal{A}}\\ \mathbf{H}_{\mathcal{S}}={\mathbf{J}_{\mathcal{S}}}^T \mathbf{Q}^s \mathbf{J}_{\mathcal{S}}=\mathbf{U}^T\mathbf{Q}^s\mathbf{U}\end{array}\right\rbrace\Rightarrow \mathbf{H}=\mathbf{H}_{\mathcal{A}}+\lambda\mathbf{H}_{\mathcal{S}} is the combined n×nn\times n Hessian matrix with getting into account that JSTQsS(0,p)=UTQsUp=HSp{\mathbf{J}_{\mathcal{S}}}^T \mathbf{Q}^s \mathcal{S}(\mathbf{0},\mathbf{p}) = \mathbf{U}^T\mathbf{Q}^s\mathbf{U}\mathbf{p} = \mathbf{H}_{\mathcal{S}}\mathbf{p}. HS\mathbf{H}_{\mathcal{S}} can be precomputed but JA\mathbf{J}_{\mathcal{A}} and H1\mathbf{H}^{-1} need to be computed at each iteration. Consequently, the total computational cost is O(nM2+nM+n3)\mathcal{O}(nM^2 + nM + n^3) which is much slower than the cost of the weighted inverse compositional algorithm with fixed Jacobian and Hessian.

5. Fitting Example

6. References

[1] E. Antonakos, J. Alabort-i-Medina, and S. Zafeiriou. "Active Pictorial Structures", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.

[2] I. Matthews, and S. Baker. "Active Appearance Models Revisited", International Journal of Computer Vision, vol. 60, no. 2, pp. 135-164, 2004.

[3] P. F. Felzenszwalb and D. P. Huttenlocher. "Pictorial Struc­tures for Object Recognition", International Journal of Com­puter Vision (IJCV), 61(1):55-79, 2005.

[4] H. Rue and L. Held. "Gaussian Markov Random Fields: Theory and Applications", CRC Press, 2005.

results matching ""

    No results matching ""