πŸ•ΈοΈ
Deep Learning
  • πŸ’«Deep Learning Notes
  • πŸ’ΌPractical Tools
  • πŸ’ŽConcepts of Neural Networks
    • 🌱Introduction
    • πŸ”ŽThe Problem in General
    • πŸ‘·β€β™€οΈ Implementation Notes
    • πŸ“šCommon Concepts
    • πŸ’₯Activation Functions
    • 🎈Practical Aspects
    • πŸ‘©β€πŸ”§ NN Regularization
    • ✨Optimization Algorithms
    • 🎨Softmax Regression
    • πŸƒβ€β™€οΈ Introduction to Tensorflow
    • πŸ‘©β€πŸ’» Python Code Snippets
  • πŸ™‹β€β™€οΈ Hello World of Deep Learning with Neural Networks
    • 🌱Introduction
    • 🌐CNNs In Browser
  • πŸšͺIntroduction to Computer Vision
    • 🌱Introduction
  • 🚩Concepts of Convolutional Neural Networks
    • 🌱Introduction
    • πŸ“ŒCommon Concepts
    • 🌟Advanced Concepts
    • πŸ‘€Visualization
    • πŸ‘΅Classic Networks
    • ✨Other Approaches
    • πŸ•ΈοΈCommon Applications
  • πŸ‘©β€πŸ’» Works and Notes on CNNs
    • 🌱Introduction
  • πŸ’„Popular Strategies of Deep Learning
    • 🌱Introduction
    • πŸš™Transfer Learning
    • πŸ“šOther Strategies
  • 🀑Image Augmentation
    • 🌱Introduction
  • πŸ€Έβ€β™€οΈ Notes on Applied Machine Learning
    • 🌱Introduction
    • πŸ‘©β€πŸ”§ Notes on Structuring Machine Learning Projects
    • πŸ‘©β€πŸ« Implementation Guidelines
  • πŸ•΅οΈβ€β™€οΈ Basics of Object Detection
    • 🌱Introduction
    • β­•Region-Based CNNs
    • 🀳SSD and YOLO
    • πŸ€–TensorFlow Object Detection API
    • 🐞Model Debugging
  • ➰Sequence Models In Deep Learning
    • 🌱Introduction
    • πŸ“šGeneral Concepts
    • πŸ”„Recurrent Neural Networks
    • 🌌Vanishing Gradients with RNNs
    • 🌚Word Representation
    • πŸ’¬Mixed Info On NLP
  • πŸ’¬NLP
    • 🌱Introduction
  • πŸ’¬Applied NLP
    • πŸ™ŒπŸ» Handling texts
    • 🧩Regex
  • πŸ‘€Quick Visual Info
  • πŸ“šPDFs that I found and recommend
Powered by GitBook
On this page
  • πŸ“ƒ Types of Activaiton Functions
  • πŸ“ˆ Linear Activation Function (Identity Function)
  • 🎩 Sigmoid Function
  • 🎩 Tangent Function
  • πŸ™„ Downsides on Tanh and Sigmoid
  • 🎩 Rectified Linear Activation Unit (Relu ✨)
  • 🎩 Leaky Relu
  • πŸŽ€ Advantages of Relu's
  • πŸ€” Why Do NNs Need non-linear Activation Functions
  • πŸ‘©β€πŸ« Rules For Choosing Activation Function
  • 🧐 Read More

Was this helpful?

Export as PDF
  1. Concepts of Neural Networks

Activation Functions

PreviousCommon ConceptsNextPractical Aspects

Last updated 4 years ago

Was this helpful?

The main purpose of Activation Functions is to convert an input signal of a node in an ANN to an output signal by applying a transformation. That output signal now is used as a input in the next layer in the stack.

πŸ“ƒ Types of Activaiton Functions

Function

Description

Linear Activation Function

Inefficient, used in regression

Sigmoid Function

Good for output layer in binary classification problems

Tanh Function

Better than sigmoid

Relu Function ✨

Default choice for hidden layers

Leaky Relu Function

Little bit better than Relu, Relu is more popular

πŸ“ˆ Linear Activation Function (Identity Function)

Formula:

linear(x)=xlinear(x)=xlinear(x)=x

Graph:

It can be used in regression problem in the output layer

🎩 Sigmoid Function

Formula:

Graph:

🎩 Tangent Function

Almost always strictly superior than sigmoid function

Formula:

Shifted version of the Sigmoid function πŸ€”

Graph:

Activation functions can be different for different layers, for example, we may use tanh for a hidden layer and sigmoid for the output layer

πŸ™„ Downsides on Tanh and Sigmoid

If z is very large or very small then the derivative (or the slope) of these function becomes very small (ends up being close to 0), and so this can slow down gradient descent 🐒

🎩 Rectified Linear Activation Unit (Relu ✨)

Another and very popular choice

Formula:

Graph:

So the derivative is 1 when z is positive and 0 when z is negative

Disadvantage: derivative=0 when z is negative 😐

🎩 Leaky Relu

Formula:

Graph:

Or: πŸ˜›

πŸŽ€ Advantages of Relu's

  • A lot of the space of z the derivative of the activation function is very different from 0

  • NN will learn much faster than when using tanh or sigmoid

πŸ€” Why Do NNs Need non-linear Activation Functions

Well, if we use linear function then the NN is just outputting a linear function of the input, so no matter how many layers out NN has πŸ™„, all it is doing is just computing a linear function πŸ˜•

❗ Remember that the composition of two linear functions is itself a linear function

πŸ‘©β€πŸ« Rules For Choosing Activation Function

  • If the output is 0 or 1 (binary classification) ➑ sigmoid is good for output layer

  • For all other units ➑ Relu ✨

We can say that relu is the default choice for activation function

Note:

If you are not sure which one of these functions work best 😡, try them all πŸ€• and evaluate on different validation set and see which one works better and go with that πŸ€“πŸ˜‡

🧐 Read More

sigmoid(x)=11+exp(βˆ’x)sigmoid(x)=\frac{1}{1+exp(-x)}sigmoid(x)=1+exp(βˆ’x)1​

tanh(x)=21+eβˆ’2xβˆ’1tanh(x)=\frac{2}{1+e^{-2x}}-1tanh(x)=1+eβˆ’2x2β€‹βˆ’1

relu(x)={0,ifx<0x,ifxβ‰₯0relu(x)=\left\{\begin{matrix} 0, if x<0 \\ x,if x\geq0 \end{matrix}\right.relu(x)={0,ifx<0x,ifxβ‰₯0​
leaky_relu(x)={0.01x,ifx<0x,ifxβ‰₯0leaky\_relu(x)=\left\{\begin{matrix} 0.01x, if x<0 \\ x,if x\geq0 \end{matrix}\right.leaky_relu(x)={0.01x,ifx<0x,ifxβ‰₯0​

πŸ’Ž
πŸ’₯
Which Activation Function Should I Use? (Siraj Raval ✨)
Activation Functions in Neural Networks
Understanding Activation Functions in Neural Networks