Standard Evaluation Procedures

 

 

TunedIT provides evaluation procedures calculating three most common types of metrics: accuracy, RMSE and MAE. They can be applied to challeneges, where the solutions have the form of lists of predicted values (as opposed to a working algorithm).

Accuracy

Accuracy is a standard metric for classification problems. It is equal to the ratio of the correctly classified samples to the total number of samples, i.e.:

(1)
\begin{align} Accuracy = \frac{\#(\text{Correctly classified samples})}{\text{\#(All samples)}} \end{align}

The higher the score is, the better, i.e. more samples were correctly classified.

Class labels must be integers, this procedure does not work currently with symbolic labels.

RMSE (Root Mean Squared Error)

RMSE is one of the most common metrics for regression problems. It is defined as the square root of the mean squared error:

(2)
\begin{align} RMSE = \sqrt{\frac{1}{n}\sum_{i=1}^{n} (p_i - t_i)^2 } \end{align}

where
$n$ is the total number of samples,
$p_i$ is the predicted value,
$t_i$ is the target value.

This metric calculates an error, so the smaller this value is, the better.

MAE (Mean Absolute Error)

Another popular metric for regression problems:

(3)
\begin{align} RMSE = \frac{1}{n}\sum_{i=1}^{n} |p_i - t_i| \end{align}

where
$n$ is the total number of samples,
$p_i$ is the predicted value,
$t_i$ is the target value.

Similarly to RMSE, smaller value of this metric denotes a better algorithm.

Required files format

Evaluation procedures work on two files: the test file provided by the challenge organizer and a participant's solution. For standard evaluation procedures, both of them should be text files containing lists of values, one per line.

The solution should simply list all predicted values. The format of the test files is slightly different due to two phases of tests: preliminary and final. Therefore there are also two test sets, each containing different subset of samples. If a sample wasn't chosen to the given test set, it should be marked with and empty line. Thus, a test file should contain target values mixed with blank lines. If one of the files contains a certain value, the other should contain an empty line in this place (and vice versa).

The simplest way to generate appropriate files is to use the Data Wizard.