Active Pictorial Structures
- Gaussian Markov Random Field
- Cost Function and Optimization
- Fitting Example
- API Documentation
We highly recommend that you render all matplotlib figures inline
the Jupyter notebook for the best menpowidgets
This can be done by running
in a cell. Note that you only have to run it once and not in every rendering cell.
Active Pictorial Structures (APS) is a statistical deformable model of the shape and appearance of a deformable object class.
It is a generative model that takes advantage of the strengths, and overcomes the disadvantages, of both Pictorial Structures (PS)  and Active Appearance Models (AAMs) .
APS is motivated by the tree-based structure of PS and further expands on it, as it can formulate the relations between parts using any graph structure; not only trees. From AAMs, it borrows the use of the Gauss-Newton algorihtm in combination with a statistical shape model. The weighted inverse compositional algorithm with fixed Jacobian and Hessian that is used for optimization is very fast. In this page, we provide a basic mathematical definition of APS. For a more in-depth explanation, please refer to the relevant literature in References and especially .
A shape instance of a deformable object is represented as , a vector consisting of landmark points coordinates . Moreover, let us denote by a patch-based feature extraction function that returns an feature vector given an input image and a shape instance . The features (e.g. SIFT) are computed on patches that are centered around the landmark points.
2. Gaussian Markov Random Field
Let us define an undirected graph between the landmark points of an object as , where is the set of vertexes and there is an edge for each pair of connected landmark points. Moreover, let us assume that we have a set of random variables
which represent an abstract feature vector of length extracted from each vertex , i.e. . We model the likelihood probability of two random variables that correspond to connected vertexes with a normal distribution
where is the mean vector and is the covariance matrix. Consequently, the cost of observing a set of feature vectors can be computed using a Mahalanobis distance per edge, i.e.
In practice, the computational cost of computing this cost is too expensive because it requires looping over all the edges of the graph. Inference can be much faster if we convert this cost to an equivalent matrical form as
This is equivalent to modelling the set of random variables with a Gaussian Markov Random Field (GMRF) . A GMRF is described by an undirected graph, where the vertexes stand for random variables and the edges impose statistical constraints on these random variables. Thus, the GMRF models the set of random variables with a multivariate normal distribution , where is the mean vectors and the overall covariance matrix. We denote by the block-sparse precision matrix that is the inverse of the covariance matrix, i.e. . By applying the GMRF we make the assumption that the random variables satisfy the three Markov properties (pairwise, local and global) and that the blocks of the precision matrix that correspond to disjoint vertexes are zero, i.e. . By defining to be a set of indices for sampling a matrix, we can prove that the structure of the precision matrix is
Using the same assumptions and given a directed graph (cyclic or acyclic) , where means that is the parent of , we can show that
is true if
where and . In this case, if is a tree, then we have a Bayesian network. Please refer to the supplementary material of  for detailed proofs of the above.
An APS  is trained using a set of images that are annotated with a set of landmarks. It consists of the following parts:
The shape model is trained as explained in the Point Distributon Model section. The training shapes are first aligned using Generalized Procrustes Analysis and then an orthonormal basis is created using Principal Component Analysis (PCA) which is further augmented with four eigenvectors that represent the similarity transform (scaling, in-plane rotation and translation). This results in
where is the orthonormal basis of eigenvectors (including the four similarity components) and is the mean shape vector. We define which generates a new shape instance given the shape model's basis, an input shape and the shape parameters vector as
where are the parameters' values. Similarly we define the set of functions that return the coordinates of the landmark of the shape instance as
where denotes the coordinates' vector of the landmark point and denotes the and rows of the shape subspace .
Another way to build the shape model is by using a GMRF structure. Specifically, given an undirected graph and assuming that the pairwise locations' vector of two connected landmarks follows a normal distribution as , we formulate a GMRF. Thus, this can be expressed in matricial format as shown in the GMRF paragraph as . After obtaining the GMRF precision matrix , we can invert it and apply PCA on the resulting covariance matrix in order to obtain a linear shape model. Note that as shown in , it is more beneficial to directly apply PCA rather than using a GMRF for the shape model.
Given an undirected graph and assuming that the concatenation of the appearance vectors of two connected landmarks follows a normal distribution, we form a GMRF that can be expressed as where is the mean appearance vector and is the precision matrix that is formulated as shown in the GMRF paragraph. During the training of the appearance model, the low rank representation of each edgewise covariance matrix is utilized by using the first singular values of its SVD factorization. Given and , the cost of an observed appearance vector corresponding to a shape instance in an image is
This is similar to the deformation model of Pictorial Structures . Specifically, we define a directed (cyclic or acyclic) graph between the landmark points and model the relative locations between the parent and child of each edge with a GMRF. We assume that the relative loaction between the vertexes of each edge follows a normal distribution , and model the overall structure with a GMRF that has a block-sparse precision matrix. computed as shown in the GMRF paragraph. Note that the mean relative location vector is the same as the mean shape vector. This spring-like prior model manages to constrain extreme shape configurations that could be evoked during fitting with very bad initialization, leading the optimization process towards a better result. Given and , the cost of observing a shape instance is
where we use the properties and .
4. Cost Function and Optimization
Given a test image , the optimization cost function of APS is
The minimization of this function with respect to the shape parameters is performed with a variant of the Gauss-Newton algorithm. The hyper-parameter controls the weighting between the appearance and deformation costs, which give the generative nature of the model it can greatly determine the performance. An optimal value of this hyper-parameter can be found using a validation dataset. The optimization procedure can be applied in two different ways, depending on the coordinate system in which the shape parameters are updated: (1) forward and (2) inverse. Additionally, the parameters update could be carried out either in an additive or compositional manner. The compositional update has the form . Consequently, due to the translational nature of the motion model, the compositional parameters update is reduced to the parameters subtraction, as , which is equivalent to the additive update.
Weighted Inverse Compositional with fixed Jacobian and Hessian
By using this compositional update of the parameters and having an initial estimate of , the cost function of APS is expressed as minimizing
with respect to . With some abuse of notation due to being a vector, can be described as
where is the mean appearance vector per patch. In order to find the solution we need to linearize around as
where is the shape Jacobian and is the appearance Jacobian
where denotes the and row vectors of the basis . Note that we make an abuse of notation by writing because is a vector. However, it represents the gradient of the mean patch-based appearance that corresponds to landmark . By substituting, taking the partial derivative with respect to , equating it to and solving for we get
is the combined Hessian matrix and we use the property . Note that , , and can be precomputed. The computational cost per iteration is only which is practically a multiplication between an matrix and a vector that leads to a very fast fitting algorithm.
The cost function in this case takes the form of minimizing
with respect to . In order to find the solution we need to linearize around , thus using first order Taylor expansion at as
where is the shape Jacobian and is the appearance Jacobian
where denotes the and row vectors of the basis . By substituting, taking the partial derivative with respect to , equating it to and solving for we get
is the combined Hessian matrix with getting into account that . can be precomputed but and need to be computed at each iteration. Consequently, the total computational cost is which is much slower than the cost of the weighted inverse compositional algorithm with fixed Jacobian and Hessian.
5. Fitting Example
 E. Antonakos, J. Alabort-i-Medina, and S. Zafeiriou. "Active Pictorial Structures", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
 I. Matthews, and S. Baker. "Active Appearance Models Revisited", International Journal of Computer Vision, vol. 60, no. 2, pp. 135-164, 2004.
 P. F. Felzenszwalb and D. P. Huttenlocher. "Pictorial Structures for Object Recognition", International Journal of Computer Vision (IJCV), 61(1):55-79, 2005.
 H. Rue and L. Held. "Gaussian Markov Random Fields: Theory and Applications", CRC Press, 2005.