MViT

According to "Multiscale Vision Transformers", arXiv:2104.11227 [cs.CV]

pooling_attention.png

Configuration

Imports

Configuration

Data

Model

Utilities

Attention

$$ O = V \mathrm{softmax}\left[\frac{1}{\sqrt{c}}K^{\intercal}Q\right] $$

Transformer

MViT

Training

Optimizer

Setup trainer

Start training