πŸ•ΈοΈ
Deep Learning
  • πŸ’«Deep Learning Notes
  • πŸ’ΌPractical Tools
  • πŸ’ŽConcepts of Neural Networks
    • 🌱Introduction
    • πŸ”ŽThe Problem in General
    • πŸ‘·β€β™€οΈ Implementation Notes
    • πŸ“šCommon Concepts
    • πŸ’₯Activation Functions
    • 🎈Practical Aspects
    • πŸ‘©β€πŸ”§ NN Regularization
    • ✨Optimization Algorithms
    • 🎨Softmax Regression
    • πŸƒβ€β™€οΈ Introduction to Tensorflow
    • πŸ‘©β€πŸ’» Python Code Snippets
  • πŸ™‹β€β™€οΈ Hello World of Deep Learning with Neural Networks
    • 🌱Introduction
    • 🌐CNNs In Browser
  • πŸšͺIntroduction to Computer Vision
    • 🌱Introduction
  • 🚩Concepts of Convolutional Neural Networks
    • 🌱Introduction
    • πŸ“ŒCommon Concepts
    • 🌟Advanced Concepts
    • πŸ‘€Visualization
    • πŸ‘΅Classic Networks
    • ✨Other Approaches
    • πŸ•ΈοΈCommon Applications
  • πŸ‘©β€πŸ’» Works and Notes on CNNs
    • 🌱Introduction
  • πŸ’„Popular Strategies of Deep Learning
    • 🌱Introduction
    • πŸš™Transfer Learning
    • πŸ“šOther Strategies
  • 🀑Image Augmentation
    • 🌱Introduction
  • πŸ€Έβ€β™€οΈ Notes on Applied Machine Learning
    • 🌱Introduction
    • πŸ‘©β€πŸ”§ Notes on Structuring Machine Learning Projects
    • πŸ‘©β€πŸ« Implementation Guidelines
  • πŸ•΅οΈβ€β™€οΈ Basics of Object Detection
    • 🌱Introduction
    • β­•Region-Based CNNs
    • 🀳SSD and YOLO
    • πŸ€–TensorFlow Object Detection API
    • 🐞Model Debugging
  • ➰Sequence Models In Deep Learning
    • 🌱Introduction
    • πŸ“šGeneral Concepts
    • πŸ”„Recurrent Neural Networks
    • 🌌Vanishing Gradients with RNNs
    • 🌚Word Representation
    • πŸ’¬Mixed Info On NLP
  • πŸ’¬NLP
    • 🌱Introduction
  • πŸ’¬Applied NLP
    • πŸ™ŒπŸ» Handling texts
    • 🧩Regex
  • πŸ‘€Quick Visual Info
  • πŸ“šPDFs that I found and recommend
Powered by GitBook
On this page
  • πŸ”„ Recurrent Neural Networks
  • πŸ”Ž Definition
  • 🧱 Architecture
  • πŸ”Ά The Whole RNN Architecture
  • 🧩 An RNN Cell
  • πŸ”₯ Advanced Recurrent Neural Networks

Was this helpful?

Export as PDF
  1. Sequence Models In Deep Learning

Recurrent Neural Networks

Details of recurrent neural networks

PreviousGeneral ConceptsNextVanishing Gradients with RNNs

Last updated 4 years ago

Was this helpful?

πŸ”„ Recurrent Neural Networks

πŸ”Ž Definition

A class of neural networks that allow previous outputs to be used as inputs to the next layers

They remember things they learned during training ✨

🧱 Architecture

πŸ”Ά The Whole RNN Architecture

🧩 An RNN Cell

Basic RNN cell. Takes as input $$x^{⟨t⟩}$$ (current input) and $$a^{⟨tβˆ’1⟩}$$ (previous hidden state containing information from the past), and outputs $$a^{⟨t⟩}$$ which is given to the next RNN cell and also used to predict $$y^{⟨t⟩}$$

⏩ Forward Propagation

To find $a^{}$ :

To find $\hat{y}^{}$ :

$$\hat{y}^{} = g(W_{ya}a^{}+b_y)$$

πŸ‘€ Visualization

βͺ Back Propagation

Loss Function is defined like the following

$$L^{}(\hat{y}^{}, y^{})=-y^{}log(\hat{y})-(1-y^{})log(1-\hat{y}^{})$$

$$L(\hat{y},y)=\sum_{t=1}^{T_y}L^{}(\hat{y}^{}, y^{})$$

🎨 Types of RNNs

  • 1️⃣ ➑ 1️⃣ One-to-One (Traditional ANN)

  • 1️⃣ ➑ πŸ”’ One-to-Many (Music Generation)

  • πŸ”’ ➑ 1️⃣ Many-to-One (Semantic Analysis)

  • πŸ”’ ➑ πŸ”’ Many-to-Many $$T_x = T_y$$ (Speech Recognition)

  • πŸ”’ ➑ πŸ”’ Many-to-Many $$T_x \neq T_y$$ (Machine Translation)

πŸ”₯ Advanced Recurrent Neural Networks

πŸ”„ Bidirectional RNNs (BRNN)

  • In many applications we want to output a prediction of $$y^{(t)}$$ which may depend on the whole input sequence

  • Bidirectional RNNs combine an RNN that moves forward through time beginning from the start of the sequence with another RNN that moves backward through time beginning from the end of the sequence ✨

πŸ’¬ In Other Words

  • Bidirectional recurrent neural networks(RNN) are really just putting two independent RNNs together.

  • The input sequence is fed in normal time order for one network, and in reverse time order for another.

  • The outputs of the two networks are usually concatenated at each time step.

  • πŸŽ‰ This structure allows the networks to have both backward and forward information about the sequence at every time step.

πŸ‘Ž Disadvantages

We need the entire sequence of data before we can make prediction anywhere.

e.g: not suitable for real time speech recognition

πŸ‘€ Visualization

πŸ•Έ Deep RNNs

The computation in most RNNs can be decomposed into three blocks of parameters and associated transformations: 1. From the input to the hidden state, $$x^{(t)}$$ ➑ $$a^{(t)}$$ 2. From the previous hidden state to the next hidden state, $$a^{(t-1)}$$ ➑ $$a^{(t)}$$ 3. From the hidden state to the output, $$a^{(t)}$$ ➑ $$y^{(t)}$$

We can use multiple layers for each of the above transformations, which results in deep recurrent networks πŸ˜‹

πŸ‘€ Visualization

❌ Problem: Vanishing Gradients with RNNs

  • An RNN that processes a sequence data with the size of 10,000 time steps, has 10,000 deep layers which is very hard to optimize πŸ™„

  • Same in Deep Neural Networks, deeper networks are getting into the vanishing gradient problem πŸ₯½

  • That also happens with RNNs with a long sequence size πŸ›

πŸ§™β€β™€οΈ Solutions

🧐 Read More

a=g(Waaa+Waxx+ba)a^{}=g(W_{aa}a^{}+W_{ax}x^{}+b_a)a=g(Waa​a+Wax​x+ba​)

Read for my notes on Vanishing Gradients with RNNs πŸ€Έβ€β™€οΈ

➰
πŸ”„
Part-2
Recurrent Neural Networks Cheatsheet ✨
All About RNNs πŸš€