Deep Learning
11 Sep 2018Deep Learning
\[a = \sigma(z) = \frac 1 {1+e^{-z}} \\ \frac {\partial \sigma} {\partial z} = \sigma(1-\sigma)\]\(z_j^{l+1} = \sum_{k=1} (w_{jk}^la_k^l) + b_j^{l+1}\) \(z= w \cdot a + b\)
Cost for sample $x$ \(C_x= \frac 1 2 \sum_{j=1}^m (y_j-a_j)^2\)
total Cost:
\[C= \frac 1 2 \sum_{x}C_x\]the objective function: \(min(C), w.t. w_{jk}, b_j\)
\[\frac {\partial C_x} {\partial z_j^L} = ( a_j^L - y_j) \frac {\partial a_j^L} {\partial z_j^L} \\ =(y_j - \sigma)\cdot \sigma \cdot (1-\sigma) \\ \sigma = a_j^L\]define: \(\delta^l_j \equiv \frac {\partial C} {\partial z_j^l} \\\)
\(z_j^{l+1} = \sum_{k=1} (w_{jk}^l\sigma(z_k^l) + b_j^{l+1} \\ \frac {\partial z_j^{l+1}} {\partial z_k^{l}} = w_{jk} \cdot \sigma^{\prime} = w_{jk}^{l+1} \cdot \sigma(z_k^{l})(1-\sigma(z_k^{l}))\) to z $$ \frac {\partial C_x} {\partial z_k^l} = \sum_j { \frac {\partial C_x} {\partial z_j^{l+1} } \cdot \frac {\partial z_j^{l+1}} {\partial z_k^{l}} }\
\delta^l_k = \sum_j \delta^{l+1}_j \cdot {\frac {\partial z_j^{l+1}} {\partial z_k^{l} }}
\(Output layer:\)
\delta^L_j \equiv \frac {\partial C} {\partial z_j^L}
=\frac {\partial C} {\partial a_j^L} \cdot \sigma^{\prime}(z_j^L)
= ( a_j^L - y_j) \cdot \sigma(z_j^L) \cdot (1 - \sigma(z_j^L))
$$
matrix based: \(\delta^L \equiv \frac {\partial C} {\partial z^L} \\ = \nabla_aC \odot {\sigma^{\prime}(z^L)} \\ = ( a^L - y) \odot {\sigma^{\prime}(z^L)}\) back propagation:
$$
\delta^l_k = \sum_j {\delta^{l+1}_k}
\(\) \delta^L = \nabla_aC \odot {\sigma^{\prime}(z^L)} \
\delta^l = ((w^{l+1})^T \delta^{l+1}) \odot {\sigma^{\prime}(z^l)} \
$$
with $w_{jk}$
$$
\nabla_{b^l}C = \delta^l
\nabla_{w^l}C = \delta^l \sigma^{l-1} \
$$