Background#

animal-soup uses a series of convolutional neural networks (CNNs) in order to perform automated animal behavior classification of the Hantman Lab reach-to-grab task.

Note

The animal-soup architecture was adopted from DeepEthogram. You can find their eLife paper here and their GitHub repo here.

There are three main model components: a flow generator, feature extractor, and sequence model.

Flow Generator#

The flow generator is a CNN that calculates optic flow. It takes a given window size (default 11) and creates “clips” to generate optic flow features. Optic flow features summarize the motion across frames and can be used in determining the behavior at a given time point of a trial.

There are three different flow generator models that can be used: TinyMotionNet3D, MotionNet, and TinyMotionNet. Their corresponding modes are listed below:

mode	model
fast	TinyMotionNet
medium	MotionNet
slow	TinyMotionNet3D

The primary difference between the models is the number of layers. As a result, there is higher accuracy but at the consequence of speed.

Feature Extractor#

The feature extractor is a two-stream fused model that extracts the relevant features in each frame.

The model consists of a flow and spatial classifier. The flow classifier takes in optic flow features from a flow generator and the spatial classifier takes in individual raw frames. The results of these two classifiers is a lower dimensional representation of the features in a given trial.

The type of flow and spatial classifiers constructed are based on the ResNet models listed below:

mode	feature model
slow	ResNet3D-34
medium	ResNet50
fast	ResNet18

Again, the primary difference between the models is the number of layers which introduces the same accuracy versus speed dilemma as above.

Sequence Model#

The sequence model is a TGMJ model. This is a type of Temporal Gaussian Mixture model that is used for activity detection across a series of sequences from a trial. This model allows for long-term learning over a temporal period.

The model takes in the spatial and flow features extracted by the feature extractor and seeks to give the probabilities of a given behavior occurring at each time point. These probabilities can be used to then create a binary matrix (number of behaviors, number of time points) called an ethogram that represents the behavioral classification of a given trial.