What is TunedIT Research?

 

 

TunedIT Research is an integrated platform for sharing, evaluation and comparison of machine learning (ML) and data mining (DM) algorithms. Its aim is to help researchers and users evaluate learning methods in reproducible experiments and to enable valid comparison of different algorithms. TunedIT serves also as a place where researchers may share their implementations and datasets with others.

Motivation

Designing new machine-learning or data-mining algorithm is a challenging task. The algorithm cannot be claimed valuable unless its performance is verified experimentally on real-world datasets. Experiments must be repeatable, so that other researchers can validate their results. Unfortunately, in ML/DM repeatability is very hard to achieve. Reproducing someone else's experiments is a highly complex, time-consuming and error-prone task. At the end, when final results occur to be different than expected, it is completely unclear whether the difference invalidates claims of original author of experiment, or rather should be attributed to:

  1. implementation bugs of the new experiment,
  2. mistakes in data preparation or experimental procedure,
  3. non-deterministic behaviour of the algorithm, producing different results in every run, or
  4. seemingly irrelevant differences between the original and new implementation, e.g., the use of slightly different data types.

Usually, it is not possible to resolve this issue. The problem lies in the nature of ML/DM algorithms…

In classical algorithmics, algorithms either work correctly or not at all. They cannot be "partially" correct. Correctness is a binary feature: either the algorithm satisfies the required specification or not. And if not, it is always possible to find a single "counterexample" or "witness" - a particular combination of input values - which prove incorrectness of the algorithm. For instance, if we implemented quicksort and the implementation contained a bug, we could notice for some particular input data that the output is incorrect, since the generated sequence would be improperly sorted.

In ML/DM, there is no such thing like "incorrect algorithm". The algorithm can be "more" or "less" correct, but never "incorrect". We all assume as a basic axiom of ML/DM that algorithms may occasionaly make mistakes. It is just impossible to design an ML/DM algorithm that is always right when tested on real-world data. Thus, wrong answers for some input samples do not invalidate the whole algorithm. If we had a classifier trained to recognize hand-written digits and passed an image of "7" but the answer was "1", we would not start looking for implementation bugs, but rather presume that the input pattern was vague or atypical.

If the algorithm is always "correct", we do not have any indications of implementation bugs. There are no "witnesses" that would clearly prove incorrectness and point in the direction where bugs are hidden. Even if the experimenter presumes that something is wrong, he has no clues where to start investigation. For these reasons, reimplementing and reproducing someone else's experiments is practically impossible. The researcher may never be sure whether the experiment is reproduced correctly, with all important details done in the same way as originally, without any implementation bugs nor mistakes in experimental procedure.

If experiments are not repeatable, verification of experimental results is impossible, so it is easier to design a new algorithm than to verify results of an existing one. In consequence, there are thousands of competing algorithms for every type of ML/DM problem, but no general consensus over their actual quality, strengths and weaknesses of each of them. This makes the quest for better ones difficult, if not blindfold. Cogent illustration of these paradoxes can be found in Empiricism Is Not a Matter of Faith (Pedersen, 2008).

Here comes TunedIT. With its creation we want to give ML/DM community the tools that will help conduct reproducible research and obtain meaningful results, leading to formulation of generally-accepted conclusions.

We want to make experiments fully repeatable through their automation with TunedTester - the automation going side by side with flexibility and extendibility, provided by the plug-in architecture of TunedTester and its ability to handle entirely new evaluation procedures, designed for new types of tasks and algorithms.

We are creating a collaboration environment for researchers, where general consensus over performance of different algorithms could arise. The central point of this environment is Knowledge Base (KB), where all researchers can submit experimental results and build together rich and comprehensive database of performance profiles of different algorithms. The results stored in KB are repeatable and verifiable by everyone. KB is coupled with public Repository of ML/DM resources: algorithms, datasets, evaluation methods and others. Repository secures interpretability of results collected in KB and fosters exchange of data, implementations and ideas among researchers.

Finally, with development of these tools, we want to facilitate design of even more advanced and effective algorithms, able to solve numerous practical problems unsolvable today.

TunedIT builds upon previous efforts of scientific community to facilitate experimentation and collaboration in ML&DM. In particular, it employs and extends the ideas which lain in the basis of:

ExpDB: Experiment Databases for Machine Learning,
MLOSS: Machine Learning Open Source Software,
DELVE: software environment for evaluation of learning algorithms in valid experiments.

TunedIT combines strengths of these systems to deliver comprehensive, extendible and easy-to-use platform for ML&DM research.

See next: Architecture