Lecture 9. CNN Architectures

Table of Contents

CS231n 课程的官方地址:http://cs231n.stanford.edu/index.html

该笔记根据的视频课程版本是 Spring 2017(BiliBili),PPt 资源版本是 Spring 2018.

另有该 Lecture 9. 扩展讲义资料:


Review: LeNet-5

[LeCun et al., 1998]

Case Studies

AlexNet

[Krizhevsky et al. 2012]

第一个在 ImageNet 的分类比赛中获得成功的大型卷积神经网络。

ZFNet

[Zeiler and Fergus, 2013]

  • ImageNet Large Scale Visual Recognition Challenge (ILSVRC) winners

VGG

[Simonyan and Zisserman, 2014]

  • ImageNet Large Scale Visual Recognition Challenge (ILSVRC) winners

  • Q:Why use smaller fileters? (3x3 conv)
    • Stack of three 3x3 conv (stride 1) layers has same effective receptive field as one 7x7 conv layer
    • But deeper, more non-linearities
    • And fewer parameters: $3 * (3^2C^2)$ vs. $7^2C^2$ for C channels per layer
  • Q: What is the effective receptive field of three 3x3 conv (stride 1) layers?
    • See this post:

  • Details
    • ILSVRC'14 2nd in classification, 1st in localization
    • Similar training procedure as Krizhevsky 2012
    • No Local Response Normalisation (LRN)
    • Use VGG16 or VGG19 (VGG19 only slightly better, more memory)
    • Use ensembles for best results
    • FC7 features generalize well to other tasks

GoogLeNet

[Szegedy et al., 2014]

Deeper networks, computational efficiency

  • “Inception module”

    • design a good local network topology (network within a network) and then stack these modules on top of each other
    • Apply parallel filter operations on the input from previous layer:
      • Multiple receptive field sizes for convolution (1x1, 3x3, 5x5)
      • Pooling operation (3x3)
    • Concatenate all filter outputs together depth-wise

    • Q:What is the problem with this?

      • Solutions:“bottlenect” layers that use 1x1 convolutions to reduce feature depth

最后,

ResNet

  • ImageNet Large Scale Visual Recognition Challenge (ILSVRC) winners

  • [He et al., 2015]

  • What happens when we continue stacking deeper layers on a “plain” convolutional neural networks?

    • The deeper model performs worse, but it’s not caused by overfitting!
  • Hypothesis: the problem is an optimization problem, deeper models are harder to optimize.

    • The deeper model should be able to perform at least as well as the shallower model.
    • A solution by construction is copying the learned layers from the shallower model and setting additional layers to identity mapping.
  • **Solution:**Use network layers to fit a residual mapping instead of directly trying to fit a desired underlying mapping.

最后,

也用瓶颈层:

最后给出实践表现:

An Analysis of Deep Neural Network Models for Practical Applications, 2017

Other architectures to know…

NiN (Network in Network)

[Lin et al. 2014]

Improving ResNets…

Identity Mappings in Deep Residual Networks

[He et al. 2016]

Wide ResNet

[Zagoruyko et al. 2016]

ResNeXT

[Xie et al. 2016]

Stochastic Depth

[Huang et al. 2016]

“Good Practices for Deep Feature Fusion”

[Shao et al. 2016]

Squeeze-and-Excitation Network (SENet)

[Hu et al. 2017]

Beyond ResNets…

FractalNet: Ultra-Deep Neural Networks without Residuals

[Larsson et al. 2017]

DenseNet: Densely Connected Convolutional Networks

[Huang et al. 2017]

Efficient networks…

SqueezeNet: AlexNet-level Accuracy With 50x Fewer Parameters and <0.5Mb Model Size

[landola et al. 2017]

Meta-learning: Learning to learn network architectures…

NASNet (Neural Architecture Search with Reinforcement Learning)

[Zoph et al. 2016]

Learning Transferable Architectures for Scalable Image Recognition

[Zoph et al. 2017]

Next
Previous

Related