Active Template Model

  1. Definition
  2. Warp Functions
  3. Cost Function and Optimization
  4. References
  5. API Documentation

We highly recommend that you render all matplotlib figures inline the Jupyter notebook for the best menpowidgets experience. This can be done by running
%matplotlib inline
in a cell. Note that you only have to run it once and not in every rendering cell.

1. Definition

The aim of deformable image alignment is to find the optimal alignment between a constant template and an input image with rspect to the parameters of a parametric shape model. Active Template Model (ATM) is such method which is inspired by the Lucas-Kanade Affine Image Alignment and the Active Appearance Model. Note that we invented the name "Active Template Model" for the purpose of the Menpo Project. The term is not established in literature. In this page, we provide a basic mathematical definition of an ATM and all its variations that are implemented within menpofit.

A shape instance of a deformable object is represented as s=[x1,y1,,xL,yL]T\mathbf{s}=\big[x_1,y_1,\ldots,x_L,y_L\big]^{\mathsf{T}}, a 2L×12L\times 1 vector consisting of LL landmark points coordinates (xi,yi),i=1,,L(x_i,y_i),\forall i=1,\ldots,L. An ATM is constructed using a template image that is annotated with LL landmark points and a set of NN shapes {s1,s2,,sN}\big\lbrace\mathbf{s}_1,\mathbf{s}_2,\ldots,\mathbf{s}_N\big\rbrace that are essential for building the hsape model. Specifically, it consists of the following parts:

  • Shape Model
    The shape model is trained as explained in the Point Distributon Model section. The training shapes {s1,s2,,sN}\big\lbrace\mathbf{s}_1,\mathbf{s}_2,\ldots,\mathbf{s}_N\big\rbrace are first aligned using Generalized Procrustes Analysis and then an orthonormal basis is created using Principal Component Analysis (PCA) which is further augmented with four eigenvectors that represent the similarity transform (scaling, in-plane rotation and translation). This results in {s¯,Us} \big\lbrace\bar{\mathbf{s}}, \mathbf{U}_s\big\rbrace where UsR2L×n\mathbf{U}_s\in\mathbb{R}^{2L\times n} is the orthonormal basis of nn eigenvectors (including the four similarity components) and s¯R2L×1\bar{\mathbf{s}}\in\mathbb{R}^{2L\times 1} is the mean shape vector. An new shape instance can be generated as sp=s¯+Usp\mathbf{s}_{\mathbf{p}}=\bar{\mathbf{s}} + \mathbf{U}_s\mathbf{p}, where p=[p1,p2,,pn]T\mathbf{p}=\big[p_1,p_2,\ldots,p_n\big]^{\mathsf{T}} is the vector of shape parameters.

  • Motion Model
    The motion model consists of a warp function W(p)\mathcal{W}(\mathbf{p}) which is essential for warping the texture related to a shape instance generated with parameters p\mathbf{p} into a common reference_shape. The reference_shape is by default the mean shape s¯\bar{\mathbf{s}}, however you can pass in a reference_shape of your preference during construction of the ATM.

  • Template
    The provided template image Ia\mathbf{I}_a which is annotated with landmarks sa\mathbf{s}_a is further processed by:

    1. First extracting features using the features function F()\mathcal{F}() defined by holistic_features, i.e. F(Ia)\mathcal{F}(\mathbf{I}_a)
    2. Warping the feature-based image into the reference_shape in order to get F(Ia)(W(pa))\mathcal{F}(\mathbf{I}_a)(\mathcal{W}(\mathbf{p}_a))
    3. Vectorizing the warped image as a¯=F(Ia)(W(pa))\bar{\mathbf{a}} = \mathcal{F}(\mathbf{I}_a)(\mathcal{W}(\mathbf{p}_a)) where a¯RM×1\bar{\mathbf{a}}\in\mathbb{R}^{M\times 1}

Let's first load a test image and a template image a¯\bar{\mathbf{a}}. We'll load two images of the same person (Amanda Peet, actress) from LFPW trainset (see Importing Images for download instructions).

from pathlib import Path
import as mio

path_to_lfpw = Path('/path/to/lfpw/trainset/')

image = mio.import_image(path_to_lfpw / 'image_0004.png')
image = image.crop_to_landmarks_proportion(0.5)

template = mio.import_image(path_to_lfpw / 'image_0005.png')
template = template.crop_to_landmarks_proportion(0.5)

The image and template can be visualized as:

%matplotlib inline
import matplotlib.pyplot as plt

plt.gca().set_title('Input Image')

template.view_landmarks(marker_face_colour='white', marker_edge_colour='black',
Template image

Let's also load the shapes of LFPW trainset that will be used in order to train the PDM:

from menpo.visualize import print_progress

training_shapes = []
for lg in print_progress(mio.import_landmark_files(path_to_lfpw / '*.pts', verbose=True)):

The shapes can be visualized using a widget as:

from menpowidgets import visualize_pointclouds

2. Warp Functions

With an abuse of notation, let us define t(W(p))F(I)(W(p)) \mathbf{t}(\mathcal{W}(\mathbf{p}))\equiv \mathcal{F}(\mathbf{I})(\mathcal{W}(\mathbf{p})) as the feature-based warped M×1M\times 1 vector of an image I\mathbf{I} given its shape instance generated with parameters p\mathbf{p}.

menpofit provides five different ATM versions, which differ on the way that this appearance warping t(W(p))\mathbf{t}(\mathcal{W}(\mathbf{p})) is performed. Specifically:

The HolisticATM uses a holistic appearance representation obtained by warping the texture into the reference frame with a non-linear warp function W(p)\mathcal{W}(\mathbf{p}). Two such warp functions are currently supported: Piecewise Affine Warp and Thin Plate Spline. The reference frame is the mask of the mean shape's convex hull.

The MaskedATM uses the same warp logic as the HolsiticATM. The only difference between them is that the reference frame consists of rectangular mask patches centered around the landmarks instead of the convex hull of the mean shape.

The LinearATM is an experimental variation that utilizes a linear warp function W(p)\mathcal{W}(\mathbf{p}) in the motion model, thus a dense statistical shape model which has one shape point per pixel in the reference frame. The advantage is that the linear nature of such warp function makes the computation of its Jacobian trivial.

Similar to the relation between HolisticATM and MaskedATM, a LinearMaskedATM is exactly the same with a LinearATM, with the difference that the reference frame is masked.

A PatchATM represents the appearance in a patch-based fashion, i.e. rectangular patches are extracted around the landmark points. Thus, the warp function t(W(p))\mathbf{t}(\mathcal{W}(\mathbf{p})) simply samples the patches centered around the landmarks of the shape instance generated with parameters p\mathbf{p}.

Let's now create a HolisticATM using IGO features:

from menpofit.atm import HolisticATM
from menpo.feature import igo

atm = HolisticATM(template, training_shapes, group='PTS',
                  diagonal=180, scales=(0.25, 1.0),
                  holistic_features=igo, verbose=True)

and visualize it:


3. Cost Function and Optimization

Fitting an ATM on a test image involves the optimization of the following cost function argminpt(W(p))a¯2 \arg\min_{\mathbf{p}} \big\lVert \mathbf{t}(\mathcal{W}(\mathbf{p})) - \bar{\mathbf{a}} \big\rVert^{2} with respect to the shape parameters. Note that this cost function is exactly the same as in the case of Lucas-Kanade for Affine Image Alignment. The only difference has to do with the nature of the transform - and thus p\mathbf{p} - that is used in the motion model W(p)\mathcal{W}(\mathbf{p}). Similarly, the cost function is very similar to the one of an Active Appearance Model with the difference that an ATM has no appearance subspace.

The optimization of the ATM deformable image alignment is performed with the Lucas-Kanade gradient descent algorithm. This is the same as in the case of affine image transform, so you can refer to the Lucas-Kanade chapter for more information. We currently support Inverse-Compositional and Forward-Compositional optimization.

Let's now create a Fitter using the ATM we created, as:

from menpofit.atm import LucasKanadeATMFitter, InverseCompositional

fitter = LucasKanadeATMFitter(atm,
                              lk_algorithm_cls=InverseCompositional, n_shape=[5, 15])

Information about the fitter can be retrieved as:


which returns

Holistic Active Template Model
 - Images warped with DifferentiablePiecewiseAffine transform
 - Images scaled to diagonal: 180.00
 - Scales: [0.25, 1.0]
   - Scale 0.25
     - Holistic feature: igo
     - Template shape: (38, 38)
     - Shape model class: OrthoPDM
       - 132 shape components
       - 4 similarity transform parameters
   - Scale 1.0
     - Holistic feature: igo
     - Template shape: (133, 134)
     - Shape model class: OrthoPDM
       - 132 shape components
       - 4 similarity transform parameters
Inverse Compositional Algorithm
 - Scales: [0.25, 1.0]
   - Scale 0.25
     - 3 active shape components
     - 4 similarity transform components
   - Scale 1.0
     - 20 active shape components
     - 4 similarity transform components

Let's know fit the ATM on the image we loaded in the beggining. We will use the DLib face detector from menpodetect, in order to acquire an initial bounding box, as:

from menpodetect import load_dlib_frontal_face_detector

# Load detector
detect = load_dlib_frontal_face_detector()

# Detect
bboxes = detect(image)
print("{} detected faces.".format(len(bboxes)))

# View
if len(bboxes) > 0:
    image.view_landmarks(group='dlib_0', line_colour='white',
                         render_markers=False, line_width=3);
Visualize detected bounding box

and fit the ATM as:

# initial bbox
initial_bbox = bboxes[0]

# fit image
result = fitter.fit_from_bb(image, initial_bbox, max_iters=20,

# print result

which prints

Fitting result of 68 landmark points.
Initial error: 0.0877
Final error: 0.0196

The result can be visualized as:


or using a widget as:


Remember that the shape per iteration can be retrieved as


Similarly, the shape and appearance parameters per iteration can be obtained as:


4. References

[1] I. Matthews, and S. Baker. "Active Appearance Models Revisited", International Journal of Computer Vision, vol. 60, no. 2, pp. 135-164, 2004.

results matching ""

    No results matching ""