TunedIT platform is composed of three complementary tools:
- TunedTester: a stand-alone application for automated evaluation of algorithms.
- Repository: a database of ML&DM resources. These include algorithms, datasets and evaluation procedures, which can be used by TunedTester to set up and execute experiments.
- Knowledge Base: a database of test results. On user's request, TunedTester may send results of tests to TunedIT. Here, results submitted by different researchers are merged into rich and comprehensive Knowledge Base that can be easily browsed for accurate and thorough information on specific algorithms or datasets.
TunedIT = Repository + TunedTester + Knowledge Base
Repository is a database of ML&DM-related files - resources. It is located on TunedIT server and is accessible for all registered users - they can view and download resources, as well as upload new ones. The role of Repository in TunedIT is three-fold:
- It serves as a collection of algorithms, datasets and evaluation procedures that can be downloaded by TunedTester and used in tests.
- It provides space where users can share ML&DM resources with each other.
- It constitutes a context and point of reference for interpretation of results generated by TunedTester and logged in Knowledge Base. For instance, when you are browsing KB and viewing results for a given test specification, you can easily navigate to corresponding resources in Repository and check their contents, so as to validate research hypotheses or come up with new ones. Thus, Repository is not only a convenient tool that facilitates execution of tests and sharing of resources, but - most of all - secures interpretability of results collected in Knowledge Base.
Repository has similar structure as a local file system. It contains a hierarchy of folders, which in turn contain files - resources. Upon registration, every user is assigned home folder in Repository's root folder, with its name being the same as the user's login. The user has full access to his home folder, where he can upload/delete files, create subfolders and manage access rights for resources. All resources uploaded by users have unique names (access paths in Repository) and can be used in TunedTester exactly in the same way as preexisting resources.
Every file or folder in Repository is either public (by default) or private. All users can view and download public resources. Private files are visible only to the owner, while to other users they appear like if they did not exist - they cannot be viewed nor downloaded and their results do not show up at KB page. Private folders cannot be viewed by other users, although subfolders and files contained in them can be viewed by others, given that they are public themselves. In other words, the property of being private does not propagate from a folder to files and subfolders contained inside.
TunedTester (TT) is a Java application for automated evaluation of algorithms, according to test specification provided by the user. Single run of evaluation is called a test or experiment and corresponds to a triple of resources from Repository:
- Algorithm is the subject of evaluation.
- Dataset represents an instance of a data mining problem to be solved by the algorithm.
- Evaluation procedure is a Java class that implements all steps of the experiment and, at the end, calculates a quality measure.
Evaluation procedure is not hard-wired into TunedTester but is a part of test configuration just like the algorithm and dataset themselves. Every user can implement new evaluation procedures to handle new kinds of algorithms, data types, quality measures or data mining tasks. In this way, TunedTester provides not only full automation of experiments, but also high level of flexibility and extendability.
TT runs locally on user's computer. All resources necessary to set up a test are automatically downloaded from Repository. If requested, TT can submit results of tests to Knowledge Base. They can be analysed later on with convenient web interface of KB.
Resources for TunedTester
All TunedIT resources are either files, like UCI/hepatitis.arff, or Java classes contained in JAR files, like Weka/weka-3.6.1.jar:weka.classifiers.lazy.IB1.
Typically, datasets have a form of files, while evaluation procedures and algorithms have a form of Java classes. For datasets and algorithms this is not a strict rule, though.
To be executable by TunedTester, evaluation procedure must be a subclass of org.tunedit.core.EvaluationProcedure located in TunedIT/core.jar file in Repository. TunedIT/core.jar contains also ResourceLoader and StandardLoader classes, which can be used by the evaluation procedure to communicate with TunedTester environment and read the algorithm and dataset files. It is up to the evaluation procedure how the contents of these files is interpreted: as bytecode of Java classes, as a text file, as an ARFF, CSV or ZIP file etc. Thus, different evaluation procedures may expect different file formats and not every evaluation procedure must be compatible with a given algorithm or dataset. This is natural, because usually the incompatibility of file formats is just a reflection of more inherent incompatibility of resource types. There are many different types of algorithms - for classification, regression, feature selection, clustering - and datasets - time series, images, graphs etc. - and each of them must be evaluated differently anyway. Nonetheless, it is also possible that the evaluation procedure supports several different formats at the same time.
Data file formats and algorithm APIs that are most commonly used in TunedIT and are supported by standard evaluation procedures include:
- ARFF file format for data representation. Introduced by Weka, it became one of the most popular in ML community.
- Debellor's API defined by org.debellor.core.Cell class for implementation of algorithms.
- Weka's API defined by weka.classifiers.Classifier class.
- Rseslib's API defined by rseslib.processing.classification.Classifier interface.
It is also possible for a dataset to be represented by a Java class, the class exposing methods that return data samples when requested. This is a way to overcome the problem of custom file formats. If a given dataset is stored in atypical file format, one can put it into a JAR file as a Java resource and prepare a wrapper class that reads the data and returns samples in common representation, for example as instances of Debellor's Sample class. This wrapper approach was used to give access to MNIST database of hand-written digits, which is originally stored in a custom binary representation. See some results of classification accuracy on MNIST 10K collected in KB.
Test specification is a formal description for TunedTester of how the test should be set up. It is a combination of three identifiers (TunedIT resource names) of TunedIT resources which represent an evaluation procedure, algorithm and dataset that will be employed in the test:
Test specification = Evaluation procedure + Algorithm + Dataset
TunedIT resource name is the full access path to the resource in Repository, as it appears on Repository page. It does not include a leading slash "/". For example, the name of file containing Iris data and located in UCI folder is: UCI/iris.arff.
Java classes contained in JARs are also treated as resources, although they do not show up on Repository pages. TunedIT name of a Java class is composed of the containing JAR's name followed by a colon ":" and full (with package) name of the class. For instance, ClassificationTT70 class contained in TunedIT/base/ClassificationTT70.jar and org.tunedit.base package has the following name:
Note that resource names are case-sensitive.
Many algorithms expose parameters that can be set by the user to control and modify their behavior. Currently, test specification does not include values of parameters, and thus it is expected that the algorithm will apply default values. If the user wants to test an algorithm with non-default parameters he should write a wrapper class which internally invokes the algorithm with parameters set to some non-default values. The values must be hard-wired in the wrapper class, so that the wrapper itself does not expose any parameters.
Users of TunedTester may safely execute tests of any algorithms present in Repository, even if the code cannot be fully trusted. TunedTester exploits advanced features of Java Security Architecture to assure that the code executed during tests do not perform any harmful operation, like deleting files on disk or connecting through the network. Code downloaded from Repository executes in a sandbox which blocks the code's ability to interact with system environment. This is achieved through the use of a dedicated Java class loader and custom security policies. Similar mechanisms are used in web browsers to protect the system from potentially malicious applets found on websites.
Communication between TunedTester and TunedIT server is efficient thanks to the cache directory which keeps local copies of resources from Repository. When the resource is needed for the first time and must be downloaded from the server, its copy is saved in the cache. In subsequent tests, when the resource is needed again, the copy is used instead. In this way, resources are downloaded from Repository only once. TunedTester detects if the resource has been updated in Repository and downloads the newest version in such case. Also, any changes introduced to the local copies of resources are detected, so it is not possible to run a test with corrupted or intentionally faked resources.
TunedTester may be started in a special challenge mode, used to evaluate solutions submitted to a challenge. In this mode, TT repeatedly queries TunedIT for new submissions, then downloads and evaluates them. It runs as a background process and does not require user's interaction. Challenge mode is activated by passing the challenge name as an argument to command-line option -c when starting TT. The user must be the organizer of the challenge and must give his TunedIT username and password using -u and -p options to authenticate himself. If authentication fails, TT will have no access to challenge resources.
In challenge mode, GUI is not available and TT reports its current operations to the console.
It is possible to run more than one instance of TT for a given challenge in parallel. This is particularly useful when evaluation of a single solution is time-consuming, e.g., lasts more than an hour. With parallel execution, the queue of pending tests becomes shorter.
Different instances of TT running in parallel are independent of each other. The organizer may start a new instance or stop a given one at any time. Job scheduling is coordinated by TunedIT server. The instances may run on the same machine or on different ones. When running several instances on a single machine, take into account that sharing of hardware resources (CPU time, memory limit) may lead to variable evaluation conditions for different tests.
Knowledge Base (KB) is a database of test results generated by TunedTester. It is located on TunedIT server.
To guarantee that results collected in KB are always consistent with the contents of Repository and that Repository can serve indeed as a context for interpreration of results, when a new version of resource is uploaded, KB gets automatically cleaned out of all out-dated results related to the old version of the resource. Thus, there is no way to insert results into KB that are inconsistent with the contents of Repository.
Aggregated vs atomic results
Atomic result is the result of a single test executed by TunedTester. It is possible to execute many tests of the same specification and log all their results in KB. Thus, there can be many atomic results present in KB which correspond to the same specification. Note that usually these results will differ among each other, because most tests include nondeterministic factors. For instance, ClassificationTT70 and RegressionTT70 evaluation procedures split data randomly into training and test parts, which yields different splits in every trial and usually results in different outcomes of the tests. Algorithms may also employ randomness. For example, neural networks perform random initialization of weights at the beginning of learning.
Aggregated result is the aggregation (arithmetic mean, standard deviation etc.) of all atomic results from KB related to a given test specification. There can be only one aggregated result for a given specification. Aggregated results are the ones which are presented on Knowledge Base page. Currently, users of TunedIT do not have direct access to atomic results.
If tests of a given specification are fully deterministic, they will always produce the same outcome and thus the aggregated result (mean) will be the same as all atomic results, with standard deviation equal to zero. The presence of nondeterminism in tests is highly desirable, as it allows to obtain broader knowledge about the tested algorithm (non-zero deviation measures how reliably and repeatably the algorithm behaves) and more reliable estimation of expected quality of the algorithm (mean of multiple atomic results which are different between each other) - deviation of atomic results is a very good confidence measure of the aggregated result.
Security issues. Validity of results
The user may assume that results generated by others and collected in KB are valid, in a sense that if the user runs the same tests by himself he would obtain the same expected results. In other words, results in KB can be trusted even if their authors - unknown users of TunedIT - cannot be trusted. This is possible thanks to numerous security measures built into Repository, TunedTester and KB, which ensure that KB contents cannot be polluted neither by accidental mistakes nor intentional fakery of any user.