T2T-ViT

According to "Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet", arXiv:2101.11986 [cs.CV]

ViT achieves inferior performance to CNNs when trained from scratch on a midsize dataset like ImageNet.

vit-features.png

Proposals:

t2t-block.png

Configuration

Imports

Configuration

Data

Model

Utilities

Attention

Transformer

T2T module

T2T-ViT

Training

History

Optimizer

Setup trainer

Start training