# Active Template Model

We highly recommend that you render all matplotlib figures

**inline**the Jupyter notebook for the best

*menpowidgets*experience. This can be done by running

`%matplotlib inline`

### 1. Definition

The aim of deformable image alignment is to find the optimal alignment between a constant template and an input image with rspect to the parameters of a parametric shape model.
Active Template Model (ATM) is such method which is inspired by the Lucas-Kanade Affine Image Alignment and the Active Appearance Model. Note that we invented the name "Active Template Model" for the purpose of the Menpo Project. The term is not established in literature. In this page, we provide a basic mathematical definition of an ATM and all its variations that are implemented within `menpofit`

.

A shape instance of a deformable object is represented as $\mathbf{s}=\big[x_1,y_1,\ldots,x_L,y_L\big]^{\mathsf{T}}$, a $2L\times 1$ vector consisting of $L$ landmark points coordinates $(x_i,y_i),\forall i=1,\ldots,L$. An ATM is constructed using a template image that is annotated with $L$ landmark points and a set of $N$ shapes $\big\lbrace\mathbf{s}_1,\mathbf{s}_2,\ldots,\mathbf{s}_N\big\rbrace$ that are essential for building the hsape model. Specifically, it consists of the following parts:

**Shape Model**

The shape model is trained as explained in the Point Distributon Model section. The training shapes $\big\lbrace\mathbf{s}_1,\mathbf{s}_2,\ldots,\mathbf{s}_N\big\rbrace$ are first aligned using Generalized Procrustes Analysis and then an orthonormal basis is created using Principal Component Analysis (PCA) which is further augmented with four eigenvectors that represent the similarity transform (scaling, in-plane rotation and translation). This results in $\big\lbrace\bar{\mathbf{s}}, \mathbf{U}_s\big\rbrace$ where $\mathbf{U}_s\in\mathbb{R}^{2L\times n}$ is the orthonormal basis of $n$ eigenvectors (including the four similarity components) and $\bar{\mathbf{s}}\in\mathbb{R}^{2L\times 1}$ is the mean shape vector. An new shape instance can be generated as $\mathbf{s}_{\mathbf{p}}=\bar{\mathbf{s}} + \mathbf{U}_s\mathbf{p}$, where $\mathbf{p}=\big[p_1,p_2,\ldots,p_n\big]^{\mathsf{T}}$ is the vector of shape parameters.**Motion Model**

The motion model consists of a warp function $\mathcal{W}(\mathbf{p})$ which is essential for warping the texture related to a shape instance generated with parameters $\mathbf{p}$ into a common`reference_shape`

. The`reference_shape`

is by default the mean shape $\bar{\mathbf{s}}$, however you can pass in a`reference_shape`

of your preference during construction of the ATM.**Template**

The provided template image $\mathbf{I}_a$ which is annotated with landmarks $\mathbf{s}_a$ is further processed by:- First extracting features using the features function $\mathcal{F}()$ defined by
`holistic_features`

, i.e. $\mathcal{F}(\mathbf{I}_a)$ - Warping the feature-based image into the
`reference_shape`

in order to get $\mathcal{F}(\mathbf{I}_a)(\mathcal{W}(\mathbf{p}_a))$ - Vectorizing the warped image as $\bar{\mathbf{a}} = \mathcal{F}(\mathbf{I}_a)(\mathcal{W}(\mathbf{p}_a))$ where $\bar{\mathbf{a}}\in\mathbb{R}^{M\times 1}$

- First extracting features using the features function $\mathcal{F}()$ defined by

Let's first load a test image and a template image $\bar{\mathbf{a}}$. We'll load two images of the same person (Amanda Peet, actress) from LFPW trainset (see Importing Images for download instructions).

```
from pathlib import Path
import menpo.io as mio
path_to_lfpw = Path('/path/to/lfpw/trainset/')
image = mio.import_image(path_to_lfpw / 'image_0004.png')
image = image.crop_to_landmarks_proportion(0.5)
template = mio.import_image(path_to_lfpw / 'image_0005.png')
template = template.crop_to_landmarks_proportion(0.5)
```

The image and template can be visualized as:

```
%matplotlib inline
import matplotlib.pyplot as plt
plt.subplot(121)
image.view()
plt.gca().set_title('Input Image')
plt.subplot(122)
template.view_landmarks(marker_face_colour='white', marker_edge_colour='black',
marker_size=4)
plt.gca().set_title('Template');
```

Let's also load the shapes of LFPW trainset that will be used in order to train the PDM:

```
from menpo.visualize import print_progress
training_shapes = []
for lg in print_progress(mio.import_landmark_files(path_to_lfpw / '*.pts', verbose=True)):
training_shapes.append(lg['all'])
```

The shapes can be visualized using a widget as:

```
from menpowidgets import visualize_pointclouds
visualize_pointclouds(training_shapes)
```

### 2. Warp Functions

With an abuse of notation, let us define $\mathbf{t}(\mathcal{W}(\mathbf{p}))\equiv \mathcal{F}(\mathbf{I})(\mathcal{W}(\mathbf{p}))$ as the feature-based warped $M\times 1$ vector of an image $\mathbf{I}$ given its shape instance generated with parameters $\mathbf{p}$.

`menpofit`

provides five different ATM versions, which differ on the way that this appearance warping $\mathbf{t}(\mathcal{W}(\mathbf{p}))$ is performed.
Specifically:

**HolisticATM**

The `HolisticATM`

uses a holistic appearance representation obtained by warping the texture into the reference frame
with a non-linear warp function $\mathcal{W}(\mathbf{p})$. Two such warp functions are currently supported:
Piecewise Affine Warp and Thin Plate Spline. The reference frame is the mask of the mean shape's convex hull.

**MaskedATM**

The `MaskedATM`

uses the same warp logic as the `HolsiticATM`

. The only difference between them is that the
reference frame consists of rectangular mask patches centered around the landmarks instead of the convex hull of the mean shape.

**LinearATM**

The `LinearATM`

is an experimental variation that utilizes a linear warp function $\mathcal{W}(\mathbf{p})$ in the motion model, thus a *dense* statistical shape model which has one shape point per pixel in the reference frame. The advantage is that the linear nature of such warp function makes the computation of its Jacobian trivial.

**LinearMaskedATM**

Similar to the relation between `HolisticATM`

and `MaskedATM`

, a `LinearMaskedATM`

is exactly the same with a
`LinearATM`

, with the difference that the reference frame is masked.

**PatchATM**

A `PatchATM`

represents the appearance in a patch-based fashion, i.e. rectangular patches are extracted around the landmark points.
Thus, the warp function $\mathbf{t}(\mathcal{W}(\mathbf{p}))$ simply *samples* the patches centered around the landmarks of the shape instance generated with parameters $\mathbf{p}$.

Let's now create a `HolisticATM`

using IGO features:

```
from menpofit.atm import HolisticATM
from menpo.feature import igo
atm = HolisticATM(template, training_shapes, group='PTS',
diagonal=180, scales=(0.25, 1.0),
holistic_features=igo, verbose=True)
```

and visualize it:

```
atm.view_shape_models_widget()
```

```
atm.view_atm_widget()
```

### 3. Cost Function and Optimization

Fitting an ATM on a test image involves the optimization of the following cost function $\arg\min_{\mathbf{p}} \big\lVert \mathbf{t}(\mathcal{W}(\mathbf{p})) - \bar{\mathbf{a}} \big\rVert^{2}$ with respect to the shape parameters. Note that this cost function is exactly the same as in the case of Lucas-Kanade for Affine Image Alignment. The only difference has to do with the nature of the transform - and thus $\mathbf{p}$ - that is used in the motion model $\mathcal{W}(\mathbf{p})$. Similarly, the cost function is very similar to the one of an Active Appearance Model with the difference that an ATM has no appearance subspace.

The optimization of the ATM deformable image alignment is performed with the Lucas-Kanade gradient descent algorithm. This is the same as in the case of affine image transform, so you can refer to the Lucas-Kanade chapter for more information. We currently support Inverse-Compositional and Forward-Compositional optimization.

Let's now create a `Fitter`

using the ATM we created, as:

```
from menpofit.atm import LucasKanadeATMFitter, InverseCompositional
fitter = LucasKanadeATMFitter(atm,
lk_algorithm_cls=InverseCompositional, n_shape=[5, 15])
```

Information about the fitter can be retrieved as:

```
print(fitter)
```

which returns

```
Holistic Active Template Model
- Images warped with DifferentiablePiecewiseAffine transform
- Images scaled to diagonal: 180.00
- Scales: [0.25, 1.0]
- Scale 0.25
- Holistic feature: igo
- Template shape: (38, 38)
- Shape model class: OrthoPDM
- 132 shape components
- 4 similarity transform parameters
- Scale 1.0
- Holistic feature: igo
- Template shape: (133, 134)
- Shape model class: OrthoPDM
- 132 shape components
- 4 similarity transform parameters
Inverse Compositional Algorithm
- Scales: [0.25, 1.0]
- Scale 0.25
- 3 active shape components
- 4 similarity transform components
- Scale 1.0
- 20 active shape components
- 4 similarity transform components
```

Let's know fit the ATM on the `image`

we loaded in the beggining.
We will use the DLib face detector from `menpodetect`

, in order to acquire an initial bounding box, as:

```
from menpodetect import load_dlib_frontal_face_detector
# Load detector
detect = load_dlib_frontal_face_detector()
# Detect
bboxes = detect(image)
print("{} detected faces.".format(len(bboxes)))
# View
if len(bboxes) > 0:
image.view_landmarks(group='dlib_0', line_colour='white',
render_markers=False, line_width=3);
```

and fit the ATM as:

```
# initial bbox
initial_bbox = bboxes[0]
# fit image
result = fitter.fit_from_bb(image, initial_bbox, max_iters=20,
gt_shape=image.landmarks['PTS'].lms)
# print result
print(result)
```

which prints

```
Fitting result of 68 landmark points.
Initial error: 0.0877
Final error: 0.0196
```

The result can be visualized as:

```
result.view(render_initial_shape=True)
```

or using a widget as:

```
result.view_widget()
```

Remember that the shape per iteration can be retrieved as

```
result.shapes
```

Similarly, the shape and appearance parameters per iteration can be obtained as:

```
print(result.shape_parameters.shape)
print(result.appearance_parameters.shape)
```

### 4. References

[1] I. Matthews, and S. Baker. "Active Appearance Models Revisited", International Journal of Computer Vision, vol. 60, no. 2, pp. 135-164, 2004.