πŸ“š Common Concepts

Common concepts of Artificial Neural Networks

Basic Concepts of ANN

🍭 Basic Neural Network

Convention: The NN in the image called to be a 2-layers NN since input layer is not being counted πŸ“’β—

πŸ“š Common Terms

Term

Description

🌚 Input Layer

A layer that contains the inputs to the NN

🌜 Hidden Layer

The layer(s) where computational operations are being done

🌝 Output Layer

The final layer of the NN and it is responsible for generating the predicted value yΜ‚

🧠 Neuron

A placeholder for a mathematical function, it applies a function on inputs and provides an output

πŸ’₯ Activation Function

A function that converts an input signal of a node to an output signal by applying some transformation

πŸ‘Ά Shallow NN

NN with few number of hidden layers (one or two)

πŸ’ͺ Deep NN

NN with large number of hidden layers

​n[l]n^{[l]}​

Number of units in l layer

🧠 What does an artificial neuron do?

It calculates a weighted sum of its input, adds a bias and then decides whether it should be fired or not due to an activaiton function

My detailed notes on activation functions are here πŸ‘©β€πŸ«

πŸ‘©β€πŸ”§ Parameters Dimension Control

Parameter

Dimension

​w[l]w^{[l]}​

​(n[l],n[lβˆ’1])(n^{[l]},n^{[l-1]})​

​b[l]b^{[l]}​

​(n[l],1)(n^{[l]},1)​

​dw[l]dw^{[l]}​

​(n[l],n[lβˆ’1])(n^{[l]},n^{[l-1]})​

​db[l]db^{[l]}​

​(n[l],1)(n^{[l]},1)​

Making sure that these dimensions are true help us to write better and bug-free :bug: codes

🎈 Summary of Forward Propagation Process

​

​

Input:

​a[lβˆ’1]a^{[l-1]}​

Output:

​a[l],chache(z[l])a^{[l]}, chache (z^{[l]})​

πŸ‘©β€πŸ”§ Vectorized Equations

​Z[l]=W[l]A[lβˆ’1]+b[l]Z^{[l]} =W^{[l]}A^{[l-1]}+b^{[l]} A[l]=g[l](Z[l])A^{[l]} = g^{[l]}(Z^{[l]})​

🎈 Summary of Back Propagation Process

​

​

Input:

​da[l]da^{[l]}​

Output:

​da[lβˆ’1],dW[l],db[l]da^{[l-1]}, dW^{[l]}, db^{[l]}​

πŸ‘©β€πŸ”§ Vectorized Equations

​dZ[l]=dA[l]βˆ—g[l]β€²(Z[l])dZ^{[l]}=dA^{[l]} * {g^{[l]}}'(Z^{[l]})​

​dW[l]=1mdZ[l]A[lβˆ’1]TdW^{[l]}=\frac{1}{m}dZ^{[l]}A^{[l-1]T}​

​db[l]=1mnp.sum(dZ[l],axis=1,keepdims=True)db^{[l]}=\frac{1}{m}np.sum(dZ^{[l]}, axis=1, keepdims=True)​

​dA[lβˆ’1]=W[l]TdZ[l]dA^{[l-1]}=W^{[l]T}dZ^{[l]}​

➰➰ To Put Forward Prop. and Back Prop. Together

πŸ˜΅πŸ€•

✨ Parameters vs Hyperparameters

πŸ‘©β€πŸ« Parameters

  • ​W[1],W[2],W[3]W^{[1]}, W^{[2]}, W^{[3]}​

  • ​b[1],b[2]b^{[1]}, b^{[2]}​

  • ......

πŸ‘©β€πŸ”§ Hyperparameters

  • Learning rate

  • Number of iterations

  • Number of hidden layers

  • Number of hidden units

  • Choice of activation function

  • ......

We can say that hyperparameters control parameters πŸ€”