🎈Practical Aspects

πŸ“ˆ Data Normalization

It is a part of data preparation

  • If we have a feature that is all positive or all negative, this will make learning harder for the nodes in the layer that follows. They will have to zigzag like the ones following a sigmoid activation function.

  • If we transform our data so it has a mean close to zero, we will thereby make sure that there are both positive values and negative ones.

Formula:

normalized=xiβˆ’ΞΌΟƒnormalized=\frac{x_{i}-\mu }{\sigma}

Benifit: It makes cost function J easier and faster to optimize πŸ˜‹

🚩 Things to think well before implementing NN

Number of layers, number of hidden units, learning rates, activation functions...

It is too difficult to choose them all true at the first time so it is an iterative process

Idea ➑ Code ➑ Experiment ➑ Idea πŸ”

So the point here is how to go efficiently around this cycle πŸ€”

πŸ‘·β€β™€οΈ Train / Dev / Test Splitting

For good evaluation it is good to split dataset like the following:

Part

Description

Training Set

Used to fit the model

Development (Validation) Set

Used to provide an unbiased evaluation while tuning model hyperparameters

Test Set

Used to provide an unbiased evaluation of a final model

πŸ€“ Training Set

The actual dataset that we use to train the model (weights and biases in the case of Neural Network).

The model sees and learns from this data πŸ‘Ά

😐 Validation (Development) Set

The sample of data used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyperparameters. The evaluation becomes more biased as skill on the validation dataset is incorporated into the model configuration.

The model sees this data, but never learns from this πŸ‘¨β€πŸš€

🧐 Test Set

The sample of data used to provide an unbiased evaluation of a final model fit on the training dataset. It provides the gold standard used to evaluate the model 🌟.

Implementation Note: Test set should contain carefully sampled data that spans the various classes that the model would face, when used in the real world πŸš©πŸš©πŸš©β—β—β—

It is only used once a model is completely trained πŸ‘¨β€πŸŽ“

πŸ˜• Bias / Variance

πŸ•Ή Bias

Bias is how far are the predicted values from the actual values. If the average predicted values are far off from the actual values then the bias is high.

Having high-bias implies that the model is too simple and does not capture the complexity of data thus underfitting the data πŸ€•

πŸ•Ή Variance

  • Variance is the variability of model prediction for a given data point or a value which tells us spread of our data.

  • Model with high variance fails to generalize on the data which it hasn’t seen before.

Having high-variance implies that algorithm models random noise present in the training data and it overfits the data πŸ€“

πŸ‘€ Variance / Bias Visualization

β†˜ While implementing the model..

If we aren't able to get wanted performance we should ask these questions to improve our model:

We check the performance of the following solutions on dev set

  1. Do we have high bias? If yes, it is a trainig problem, we may:

    • Try bigger network

    • Train longer

    • Try better optimization algorithm

    • Try another NN architecture

We can say that it is a structural problem πŸ€”

  1. Do we have high variance? If yes, it is a dev set performance problem, we may:

    • Get more data

    • Do regularization

      • L2, dropout, data augmentation

We can say that maybe it is data or algorithmic problem πŸ€”

  1. No high variance and no high bias?

TADAAA it is done πŸ€—πŸŽ‰πŸŽŠ

🧐 References

Last updated