š©āš§ Notes on Structuring Machine Learning Projects
Make your training procedure more effective
Last updated
Make your training procedure more effective
Last updated
While looking to precesion P and recall R (for example) we may be not able to choose the best model correctly
So we have to create a new evaluation metric that makes a relation between P and R
Now we can choose the best model due to our new metric š£
For example: (as a popular associated metric) F1 Score is:
To summarize: we can construct our own metrics due to our models and values to be able to get the best choice š©āš«
For better evaluation we have to classify our metrics as the following:
Technically, If we have N
metrics we have to try to optimize 1
metric and to satisfice N-1
metrics š
š Clarification: we tune satisficing metrics due to a threshold that we determine
It is recommended to choose the dev and test sets from the same distribution, so we have to shuffle the data randomly and then split it.
As a result, both test and dev sets have data from all categories āØ
We have to choose a dev set and test set - from same distribution - to reflect data we expect to get in te future and consider important to do well on
If we have a small dataset (m < 10,000)
60% training, 20% dev, 20% test will be good
If we have a huge dataset (1M for example)
99% trainig, %1 dev, 1% test will be acceptable
And so on, considering these two statuses we can choose the correct ratio š®ā
Guideline: if doing well on metric + dev/test set and doesn't correspond to doing well in the real world application, we have to change our metric and/or dev/test set š³
Metric Type
Description
āØ Optimizing Metric
A metric that has to be in its best value
š¤ Satisficing Metric
A metric that just has to be good enough