2.4. MobileNet-v2

MobileNet模型是Google针对手机等嵌入式设备提出的一种轻量级的深层神经网络,其使用的核心思想便是depthwise separable convolution。

The MobilenetV2 with depthwise convolution and inverted residuals has fewer operations(faster) and less parameters(smaller) compared to other models. Additionally, it has a tunable depth-multiplier parameter(speed-accuracy) for application specific requirements.[7]

MobileNet-v2 [9] utilizes a module architecture similar to the residual unit with bottleneck architecture of ResNet; the modified version of the residual unit where conv3x3 is replaced by depthwise convolution.

As you can see from the above, contrary to the standard bottleneck architecture, the first conv1x1 increases the channel dimension, then depthwise conv is performed, and finally the last conv1x1 decreases the channel dimension.

By reordering the building blocks as above and comparing it with MobileNet-v1 (separable conv), we can see how this architecture works (this reordering does not change the overall model architecture because the MobileNet-v2 is the stack of this module). That is to say, the above module be regarded as a modified version of separable conv where the single conv1x1 in separable conv is factorized into two conv1x1s. Letting T denote an expansion factor of channel dimension, the computational cost of two conv1x1s is 2HWN²/T while that of conv1x1 in separable conv is HWN². In [5], T = 6 is used, reducing the computational cost for conv1x1 by a factor of 3 (T/2 in general).

MobileNetV2是MobileNet的升级版,它具有两个特征点:

1、Inverted residuals,在ResNet50里我们认识到一个结构,bottleneck design结构,在3x3网络结构前利用1x1卷积降维,在3x3网络结构后,利用1x1卷积升维,相比直接使用3x3网络卷积效果更好,参数更少,先进行压缩,再进行扩张。而在MobileNetV2网络部分,其采用Inverted residuals结构,在3x3网络结构前利用1x1卷积升维,在3x3网络结构后,利用1x1卷积降维,先进行扩张,再进行压缩。

2、Linear bottlenecks,为了避免Relu对特征的破坏,在在3x3网络结构前利用1x1卷积升维,在3x3网络结构后,再利用1x1卷积降维后,不再进行Relu6层,直接进行残差网络的加法。[6]

import torch
model = torch.hub.load('pytorch/vision:v0.6.0', 'mobilenet_v2', pretrained=True)
model.eval()
import torch
from torchvision.models import mobilenet_v2

model = mobilenet_v2(pretrained=True)

model.eval()
input_tensor = torch.rand(1,3,224,224)

script_model = torch.jit.trace(model,input_tensor)
script_model.save("mobilenet-v2.pt")

#[8]
import torch
import torchvision
import yaml

# Save traced TorchScript model.
traced_script_module.save("MobileNetV2.pt")

# Dump root ops used by the model (for custom build optimization).
ops = torch.jit.export_opnames(traced_script_module)

with open('MobileNetV2.yaml', 'w') as output:
    yaml.dump(ops, output)

所有的预训练模型都期望输入图像以同样的方式归一化,即小批3通道RGB图像的形状(3 x H x W),其中H和W预计至少为224。图像加载到范围为[0,1],然后使用mean =[0.485, 0.456, 0.406]和std =[0.229, 0.224, 0.225]进行归一化。

模型描述

MobileNet v2架构基于一个反向残差结构,其中残差块的输入和输出是薄的瓶颈层,与传统残差模型相反,后者在输入中使用扩展表示。MobileNet v2使用轻量级深度卷积来过滤中间扩展层的特征。此外,为了保持代表性,在狭窄的层中去除了非线性。

Model Description The MobileNet v2 architecture is based on an inverted residual structure where the input and output of the residual block are thin bottleneck layers opposite to traditional residual models which use expanded representations in the input. MobileNet v2 uses lightweight depthwise convolutions to filter features in the intermediate expansion layer. Additionally, non-linearities in the narrow layers were removed in order to maintain representational power.

Model structure Top-1 error Top-5 error mobilenet_v2 28.12 9.71

MobileNet v2架构是基于一个倒置的残差结构,其中残差块的输入和输出是薄瓶颈层,与传统的残差模型相反,传统的残差模型在输入中使用扩展表示。MobileNet v2使用轻量级的深度卷积来过滤中间扩展层的特性。此外,为了保持代表性,在窄层中去除非线性。

相比MobileNetV1,MobileNetV2提出了Linear bottlenecks与Inverted residual block作为网络基本结构,通过大量地堆叠这些基本模块,构成了MobileNetV2的网络结构。最终,在FLOPS只有MobileNetV1的一半的情况下取得了更高的分类精度。[5]

继续使用Mobilenet V1的深度可分离卷积降低卷积计算量。 增加skip connection,使前向传播时提供特征复用。 采用Inverted residual block结构。该结构使用Point wise convolution先对feature map进行升维,再在升维后的特征接ReLU,减少ReLU对特征的破坏。[9]

Model structure

Top-1 error

Top-5 error

mobilenet_v2

28.12

9.71

#[6]
class MobileNetV2(nn.Module):
    def __init__(self, n_class=1000, input_size=224, width_mult=1.):
        super(MobileNetV2, self).__init__()
        block = InvertedResidual
        input_channel = 32
        last_channel = 1280

        interverted_residual_setting = [
            # t, c, n, s
            # 473,473,3 -> 237,237,32
            # 237,237,32 -> 237,237,16
            [1, 16, 1, 1],
            # 237,237,16 -> 119,119,24
            [6, 24, 2, 2],
            # 119,119,24 -> 60,60,32
            [6, 32, 3, 2],
            # 60,60,32 -> 30,30,64
            [6, 64, 4, 2],
            # 30,30,64 -> 30,30,96
            [6, 96, 3, 1],
            # 30,30,96 -> 15,15,160
            [6, 160, 3, 2],
            # 15,15,160 -> 15,15,320
            [6, 320, 1, 1],
        ]

        assert input_size % 32 == 0
        # 建立stem层
        input_channel = int(input_channel * width_mult)
        self.last_channel = int(last_channel * width_mult) if width_mult > 1.0 else last_channel

        self.features = [conv_bn(3, input_channel, 2)]

        # 根据上述列表进行循环,构建mobilenetv2的结构
        for t, c, n, s in interverted_residual_setting:
            output_channel = int(c * width_mult)
            for i in range(n):
                if i == 0:
                    self.features.append(block(input_channel, output_channel, s, expand_ratio=t))
                else:
                    self.features.append(block(input_channel, output_channel, 1, expand_ratio=t))
                input_channel = output_channel

        # mobilenetv2结构的收尾工作
        self.features.append(conv_1x1_bn(input_channel, self.last_channel))
        self.features = nn.Sequential(*self.features)

        # 最后的分类部分
        self.classifier = nn.Sequential(
            nn.Dropout(0.2),
            nn.Linear(self.last_channel, n_class),
        )

        self._initialize_weights()

    def forward(self, x):
        x = self.features(x)
        x = x.mean(3).mean(2)
        x = self.classifier(x)
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
                if m.bias is not None:
                    m.bias.data.zero_()
            elif isinstance(m, BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()
            elif isinstance(m, nn.Linear):
                n = m.weight.size(1)
                m.weight.data.normal_(0, 0.01)
                m.bias.data.zero_()

def load_url(url, model_dir='./model_data', map_location=None):
    if not os.path.exists(model_dir):
        os.makedirs(model_dir)
    filename = url.split('/')[-1]
    cached_file = os.path.join(model_dir, filename)
    if os.path.exists(cached_file):
        return torch.load(cached_file, map_location=map_location)
    else:
        return model_zoo.load_url(url,model_dir=model_dir)

def mobilenetv2(pretrained=False, **kwargs):
    model = MobileNetV2(n_class=1000, **kwargs)
    if pretrained:
        model.load_state_dict(load_url('http://sceneparsing.csail.mit.edu/model/pretrained_resnet/mobilenet_v2.pth.tar'), strict=False)
    return model