Published on

Deep Learning

Authors

Deep Learning

a=σ(z)=11+ezσz=σ(1σ)a = \sigma(z) = \frac{1}{1+e^{-z}} \\ \frac{\partial \sigma}{\partial z} = \sigma(1-\sigma) zjl+1=k=1(wjklakl)+bjl+1z_j^{l+1} = \sum_{k=1} (w_{jk}^l a_k^l) + b_j^{l+1} z=wa+bz = w \cdot a + b

Cost for sample xx

Cx=12j=1m(yjaj)2C_x = \frac{1}{2} \sum_{j=1}^m (y_j - a_j)^2

total Cost:

C=12xCxC = \frac{1}{2} \sum_{x} C_x

the objective function:

min(C), w.t. wjk,bj\min(C), \text{ w.t. } w_{jk}, b_j CxzjL=(ajLyj)ajLzjL=(yjσ)σ(1σ)σ=ajL\frac{\partial C_x}{\partial z_j^L} = (a_j^L - y_j) \frac{\partial a_j^L}{\partial z_j^L} \\ = (y_j - \sigma) \cdot \sigma \cdot (1 - \sigma) \\ \sigma = a_j^L

define:

δjlCzjl\delta^l_j \equiv \frac{\partial C}{\partial z_j^l} \\ zjl+1=k=1(wjklσ(zkl)+bjl+1)zjl+1zkl=wjkσ=wjkl+1σ(zkl)(1σ(zkl))z_j^{l+1} = \sum_{k=1} (w_{jk}^l \sigma(z_k^l) + b_j^{l+1}) \\ \frac{\partial z_j^{l+1}}{\partial z_k^{l}} = w_{jk} \cdot \sigma^{\prime} = w_{jk}^{l+1} \cdot \sigma(z_k^{l})(1 - \sigma(z_k^{l}))

to z

Cxzkl=j(Cxzjl+1zjl+1zkl)δkl=jδjl+1zjl+1zkl\frac{\partial C_x}{\partial z_k^l} = \sum_j \left( \frac{\partial C_x}{\partial z_j^{l+1}} \cdot \frac{\partial z_j^{l+1}}{\partial z_k^{l}} \right) \\ \delta^l_k = \sum_j \delta^{l+1}_j \cdot \frac{\partial z_j^{l+1}}{\partial z_k^{l}}

Output layer:

δjLCzjL=CajLσ(zjL)=(ajLyj)σ(zjL)(1σ(zjL))\delta^L_j \equiv \frac{\partial C}{\partial z_j^L} \\ = \frac{\partial C}{\partial a_j^L} \cdot \sigma^{\prime}(z_j^L) \\ = (a_j^L - y_j) \cdot \sigma(z_j^L) \cdot (1 - \sigma(z_j^L))

matrix based:

δLCzL=aCσ(zL)=(aLy)σ(zL)\delta^L \equiv \frac{\partial C}{\partial z^L} \\ = \nabla_a C \odot \sigma^{\prime}(z^L) \\ = (a^L - y) \odot \sigma^{\prime}(z^L)

back propagation:

δkl=jδkl+1\delta^l_k = \sum_j \delta^{l+1}_k δL=aCσ(zL)δl=((wl+1)Tδl+1)σ(zl)\delta^L = \nabla_a C \odot \sigma^{\prime}(z^L) \\ \delta^l = ((w^{l+1})^T \delta^{l+1}) \odot \sigma^{\prime}(z^l) \\

with wjkw_{jk}

blC=δlwlC=δlσl1\nabla_{b^l} C = \delta^l \\ \nabla_{w^l} C = \delta^l \sigma^{l-1} \\