๐Ÿ‘ฉโ€๐Ÿ”ง Notes on Structuring Machine Learning Projects

โœจ How to effectively set up evaluation metrics?

  • While looking to precesion P and recall R (for example) we may be not able to choose the best model correctly

    • So we have to create a new evaluation metric that makes a relation between P and R

    • Now we can choose the best model due to our new metric ๐Ÿฃ

    • For example: (as a popular associated metric) F1 Score is:

      • โ€‹F1=21P+1RF1 = \frac{2}{\frac{1}{P}+\frac{1}{R}}โ€‹

To summarize: we can construct our own metrics due to our models and values to be able to get the best choice ๐Ÿ‘ฉโ€๐Ÿซ

๐Ÿ“š Types of Metrics

For better evaluation we have to classify our metrics as the following:

Metric Type

Description

โœจ Optimizing Metric

A metric that has to be in its best value

๐Ÿค— Satisficing Metric

A metric that just has to be good enough

Technically, If we have N metrics we have to try to optimize 1 metric and to satisfice N-1 metrics ๐Ÿ™„

๐Ÿ™Œ Clarification: we tune satisficing metrics due to a threshold that we determine

๐Ÿš€ How to set up datasets to maximize the efficiency

  • It is recommended to choose the dev and test sets from the same distribution, so we have to shuffle the data randomly and then split it.

  • As a result, both test and dev sets have data from all categories โœจ

๐Ÿ‘ฉโ€๐Ÿซ Guideline

We have to choose a dev set and test set - from same distribution - to reflect data we expect to get in te future and consider important to do well on

๐Ÿค” How to choose the size of sets

  • If we have a small dataset (m < 10,000)

    • 60% training, 20% dev, 20% test will be good

  • If we have a huge dataset (1M for example)

    • 99% trainig, %1 dev, 1% test will be acceptable

      And so on, considering these two statuses we can choose the correct ratio ๐Ÿ‘ฎโ€

๐Ÿ™„ When to change dev/test sets and metrics

Guideline: if doing well on metric + dev/test set and doesn't correspond to doing well in the real world application, we have to change our metric and/or dev/test set ๐Ÿณ