👩🔧 Notes on Structuring Machine Learning Projects
Make your training procedure more effective
- While looking to precesion P and recall R (for example) we may be not able to choose the best model correctly
- So we have to create a new evaluation metric that makes a relation between P and R
- Now we can choose the best model due to our new metric 🐣
- For example: (as a popular associated metric) F1 Score is:
-
To summarize: we can construct our own metrics due to our models and values to be able to get the best choice 👩🏫
For better evaluation we have to classify our metrics as the following:
Metric Type | Description |
✨ Optimizing Metric | A metric that has to be in its best value |
🤗 Satisficing Metric | A metric that just has to be good enough |
Technically, If we have
N
metrics we have to try to optimize 1
metric and to satisfice N-1
metrics 🙄🙌 Clarification: we tune satisficing metrics due to a threshold that we determine
- It is recommended to choose the dev and test sets from the same distribution, so we have to shuffle the data randomly and then split it.
- As a result, both test and dev sets have data from all categories ✨
We have to choose a dev set and test set - from same distribution - to reflect data we expect to get in te future and consider important to do well on
- If we have a small dataset (m < 10,000)
- 60% training, 20% dev, 20% test will be good
- If we have a huge dataset (1M for example)
- 99% trainig, %1 dev, 1% test will be acceptableAnd so on, considering these two statuses we can choose the correct ratio 👮
Guideline: if doing well on metric + dev/test set and doesn't correspond to doing well in the real world application, we have to change our metric and/or dev/test set 🏳
Last modified 2yr ago