analyzed together.
It usually forms part of a larger pattern recognition system. It has been implemented using a perceptron network whose connection weights were trained with back propagation (supervised learning).[16]
Convolutional
Edit
Main article: Convolutional neural network
A convolutional neural network (CNN, or ConvNet or shift invariant or space invariant) is a class of deep network, composed of one or more convolutional layers with fully connected layers (matching those in typical ANNs) on top.[17][18] It uses tied weights and pooling layers. In particular, max-pooling.[19] It is often structured via Fukushima's convolutional architecture.[20] They are variations of multilayer perceptrons that use minimal preprocessing.[21] This architecture allows CNNs to take advantage of the 2D structure of input data.
Its unit connectivity pattern is inspired by the organization of the visual cortex. Units respond to stimuli in a restricted region of space known as the receptive field. Receptive fields partially overlap, over-covering the entire visual field. Unit response can be approximated mathematically by a convolution operation.[22]
CNNs are suitable for processing visual and other two-dimensional data.[23][24] They have shown superior results in both image and speech applications. They can be trained with standard backpropagation. CNNs are easier to train than other regular, deep, feed-forward neural networks and have many fewer parameters to estimate.[25]
Capsule Neural Networks (CapsNet) add structures called capsules to a CNN and reuse output from several capsules to form more stable (with respect to various perturbations) representations.[26]
Examples of applications in computer vision include DeepDream[27] and robot navigation.[28] They have wide applications in image and video recognition, recommender systems[29] and natural language processing.[30]
Deep stacking network
Edit
A deep stacking network (DSN)[31] (deep convex network) is based on a hierarchy of blocks of simplified neural network modules. It was introduced in 2011 by Deng and Dong.[32] It formulates the learning as a convex optimization problem with a closed-form solution, emphasizing the mechanism's similarity to stacked generalization.[33] Each DSN block is a simple module that is easy to train by itself in a supervised fashion without backpropagation for the entire blocks.[34]
Each block consists of a simplified multi-layer perceptron (MLP) with a single hidden layer. The hidden layer h has logistic sigmoidal units, and the output layer has linear units. Connections between these layers are represented by weight matrix U; input-to-hidden-layer connections have weight matrix W. Target vectors t form the columns of matrix T, and the input data vectors x form the columns of matrix X. The matrix of hidden units is
�
=
�
(
�
�
�
)
{\boldsymbol {H}}=\sigma ({\boldsymbol {W}}^{T}{\boldsymbol {X}}). Modules are trained in order, so lower-layer weights W are known at each stage. The function performs the element-wise logistic sigmoid operation. Each block estimates the same final label class y, and its estimate is concatenated with original input X to form the expanded input for the next block. Thus, the input to the first block contains the original data only, while downstream blocks' input adds the output of preceding blocks. Then learning the upper-layer weight matrix U given other weights in the network can be formulated as a convex optimization problem:
min
�
�
�
=
‖
�
�
�
−
�
‖
�
2
,
{\displaystyle \min _{U^{T}}f=\|{\boldsymbol {U}}^{T}{\boldsymbol {H}}-{\boldsymbol {T}}\|_{F}^{2},}
which has a closed-form solution.[31]
Unlike other deep architectures, such as DBNs, the goal is not to discover the transformed feature representation. The structure of the hierarchy of this kind of architecture makes parallel learning straightforward, as a batch-mode optimization problem. In purely discriminative tasks, DSNs outperform conventional DBNs.