Examples

 

 

In the following examples we assume that there is user 'John_Smith' registered in TunedIT and his password is 'pass'.

Example 1 - run test with TunedTester

Screenshot below shows how to evaluate J48 algorithm (decision tree induction) from Weka with TunedTester. We use here the default TunedTester's evaluation procedure, ClassificationTT70. Test will be repeated 5 times on each of two data sets, audiology.arff and iris.arff from UCI. Remember that in TunedTester you must give full names of algorithms, evaluation procedures and datasets, including their paths in Repository, as well as full package names for Java classes.

tunedtester_02.png

Example 2 - a classification algorithm suitable for ClassificationTT70 evaluation procedure

Default TunedTester's evaluation procedure is ClassificationTT70 (samples are randomly shuffled before splitting with ratio 70/30 into train and test sets). It was designed to support three kinds of classifier interfaces - these defined in Debellor, Rseslib and Weka libraries. We will show how to write an algorithm which may be then evaluated by the ClassificationTT70 procedure.

Debellor interface

Debellor is an open source extensible data mining framework which provides common architecture for data processing algorithms of various types. The algorithms can be combined together to build data processing networks of large complexity. The unique feature of Debellor is data streaming, which enables efficient processing of large volumes of data. Data streaming is essential for providing scalability of the algorithms. See www.debellor.org for more details.

We will do everything step by step but if you find yourself run out of patience a sligtly modified version of the below example is available and ready for evaluation in the repository - see MajorityClassifier_debellor.jar from the Examples folder.

To be able to compile our Debellor based classifier we will need a copy of the library.

Let's quote some important fragments of the Debellor's Cell class:

package org.debellor.core;
 
(...)
 
/**
 * Guidelines for writing new cells
 * 
 * To implement new data processing algorithm, you have to write a subclass of Cell and override some or all of protected 
 * methods named "on...": onLearn(), onOpen(), onNext(), onClose(), onErase(). They are called during calls to similarly 
 * named public methods (learn, open, ...). If you do not need some method, leave its default implementation, which 
 * will throw exception when called. 
 * 
 * If your cell represents a decision system (classifier, clusterer etc.), the most important methods will be onLearn() 
 * and onNext(). Training algorithm of the decision system will be implemented in onLearn(), while onNext() will perform 
 * application of the trained system to the next input sample. You will also have to override onOpen() and onClose() 
 * to open and close input stream before and after calls to onNext(). Optionally, you may also override onErase() to erase 
 * trained decision model without deallocation of the whole cell.
 */
 
(...)
 
public class Cell {
 
        (...)   
 
        /** 
         * Learning procedure of the cell. For example, may train the internal decision model; read and buffer input data; 
         * calculate an evaluation measure of another cell; calculate data-driven parameters of a preprocessing algorithm
         * (e.g. attribute means for normalization algorithm) etc.
         * 
         * Must be overridden in all subclasses that implement trainable cells. If your cell is not trainable, you must 
         * provide this information to the Cell base class by calling Cell(boolean) instead of Cell() in your constructor.
         */
        protected void onLearn() throws Exception (...)
 
        /** 
         * Called by erase(). Must be overridden in subclasses if erasure is to be used. 
         */
        protected void onErase() throws Exception (...)
 
        /** 
         * Called by open(). Must be overridden in subclasses if open is to be used. 
         */
        protected MetaSample onOpen() throws Exception (...)
 
        /** 
         * Called by Stream.next(). Performs the actual generation of the next output sample. Must be overridden in the 
         * subclass if next is to be used, i.e. if the subclass should generate some output data.
         */
        protected Sample onNext() throws Exception (...)
 
        /** 
         * Called by Stream.close(). Performs the actual closing of the communication session. Must be overridden 
         * in subclasses if close is to be used. Usually the overrider will use onClose to release resources,
         * to let them be garbage-collected.
         */
        protected void onClose() throws Exception (...)
 
        (...)   
 
}

A source code of our simple classifier:

import java.util.*;
import org.debellor.core.*;
import org.debellor.core.data.SymbolicFeature;
 
/**
 * Example implementation of a majority classifier in Debellor architecture.
 * The classifier assigns always the same decision - most frequent in training data.
 */
public class MajorityClassifier extends Cell {
 
        private DataType decisionType;
        private SymbolicFeature decision;
        private Stream input;
 
        public MajorityClassifier() {
                super(true);    // yes, this cell is trainable
        }
 
        protected void onLearn() throws Exception {
                // Open stream of training samples. Check if data type is correct
                Stream input = openInputStream();
                decisionType = input.sampleType.decision;
                if(decisionType.dataClass != SymbolicFeature.class)
                        throw new Exception("MajorityClassifier can handle only symbolic decisions");
 
                Map<String, Integer> counts = new HashMap<String,Integer>();
 
                // Scan all training samples and count occurences of different decisions.
                Sample s;
                while ((s = input.next()) != null) {
                        if (s.decision == null) 
                                continue;
                        SymbolicFeature symb = s.decision.asSymbolicFeature();
                        Integer count = counts.get(symb.value);
                        if (count == null) 
                                count = 0;
                        counts.put(symb.value, count + 1);
                }
                input.close();
 
                // Find decision with the biggest count.
                int bestCount = 0;
                String bestDecision = null;
                for (Map.Entry<String, Integer> stats : counts.entrySet()) {
                        if (stats.getValue() > bestCount) {
                                bestDecision = stats.getKey();
                                bestCount = stats.getValue();
                        }
                }
                decision = new SymbolicFeature(bestDecision, decisionType);
        }
 
        protected Sample.SampleType onOpen() throws Exception {
                input = openInputStream();
                return input.sampleType.setDecision(decisionType);
        }
 
        protected Sample onNext() throws Exception {
                Sample s = input.next();
                if(s == null) return null;
                return s.setDecision(decision);
        }
 
        protected void onClose() throws Exception {
                input.close();
        }
 
        protected void onErase() throws Exception {
                decisionType = null;
                decision = null;
        }
 
}

Copy the above code and save it as a MajorityClassifier.java file. Place it in the same directory as the already downloaded debellor<version>.jar file. Then compile and pack the classifier into the jar archive:

javac -cp debellor<version>.jar MajorityClassifier.java
jar cf DebellorMajorityClassifier.jar MajorityClassifier.class

If everything went well you shoud have DebellorMajorityClassifier.jar file in the current directory.

Now you can upload the classifier into the repository e.g. John_Smith/Classifiers folder:

debellor_classifier_upload_jar.png

Evaluate its accuracy on some data sets using TunedTester GUI - refering to our classifier (if you followed our example) by John_Smith/Classifiers/DebellorMajorityClassifier.jar:MajorityClassifier

debellor_classifier_evaluation_01.pngdebellor_classifier_evaluation_02.png]

Or using command line:

./tunedtester.sh -g -s -u John_Smith -p pass -d UCI/iris.arff 
    -a John_Smith/Classifiers/DebellorMajorityClassifier.jar:MajorityClassifier

And finally you can inspect its results in context of other algorithms' results:

debellor_classifier_results.png

Rseslib interface

As previously - prepared source code may be dowloaded from the repository's Examples folder.

To be able to compile a Rseslib based classifier we will need a copy of the library from the repository.

Let's quote Rseslib's Classifier interface:

package rseslib.processing.classification;
 
(...)
 
public interface Classifier {
 
    /**
     * Assigns a decision to a single test object.
     */
    public abstract double classify(DoubleData dObj) throws PropertyConfigurationException;
 
    /**
     * Calculates statistics.
     */
    public abstract void calculateStatistics();
 
    /**
     * Resets statistics.
     */
    public abstract void resetStatistics();
 
}

We will implement constructor and classify method only:

import java.util.Properties;
import rseslib.processing.classification.Classifier;
import rseslib.structure.data.DoubleData;
import rseslib.structure.table.DoubleDataTable;
import rseslib.system.*;
import rseslib.system.progress.Progress;
 
/**
 * Example implementation of a majority classifier (using Rseslib architecture) which assigns always the same decision - 
 * the most frequent decision in the training set.
 */
 
public class MajorityClassifier extends ConfigurationWithStatistics implements Classifier {
 
        private double decision;
 
        public MajorityClassifier(Properties prop, DoubleDataTable trainTable, Progress prog) 
                        throws PropertyConfigurationException, InterruptedException {
                super(prop);
                prog.set("Training the majority classifier", 1);
                int[] decDistr = trainTable.getDecisionDistribution();
                int bestDecision = 0;
                for (int dec = 1; dec < decDistr.length; dec++)
                        if (decDistr[dec] > decDistr[bestDecision])
                                bestDecision = dec;
                decision = trainTable.attributes().nominalDecisionAttribute().globalValueCode(bestDecision);
                prog.step();
        }
 
        public double classify(DoubleData dObj) {
                return decision;
        }
 
        /**
         * Leaving it empty.
         */
        public void calculateStatistics() {}
 
        /**
         * Leaving it empty.
         */
        public void resetStatistics() {}
 
}

Copy the above code and save it as a MajorityClassifier.java file. Place it in the same directory as the already downloaded rseslib<version>.jar file. Then compile and pack the classifier into the jar archive:

javac -cp rseslib<version>.jar MajorityClassifier.java
jar cf RseslibMajorityClassifier.jar MajorityClassifier.class

If everything went well you shoud have RseslibMajorityClassifier.jar file in the current directory.

As previously upload the classifier's jar file into the repository John_Smith/Classifiers folder and evaluate its accuracy on some data sets using either TunedTester GUI or command line - refering to the classifier by John_Smith/Classifiers/RseslibMajorityClassifier.jar:MajorityClassifier

./tunedtester.sh -g -s -u John_Smith -p pass -d UCI/iris.arff 
    -a John_Smith/Classifiers/RseslibMajorityClassifier.jar:MajorityClassifier

Weka interface

Prepared source code may be dowloaded from the repository's Examples folder.

To be able to compile a Weka based classifier we will need a copy of the library from the repository.

Let's quote some important fragments from Weka's Classifier class:

package weka.classifiers;
 
(...)
 
public abstract class Classifier (...) {
 
        (...)
 
        /**
         * Generates a classifier. Must initialize all fields of the classifier that are not being set via options 
         * (ie. multiple calls of buildClassifier must always lead to the same result).
         * Must not change the dataset in any way.
         */
        public abstract void buildClassifier(Instances data) throws Exception;
 
        /**
         * Classifies the given test instance. The instance has to belong to a dataset when it's being classified. 
         * Note that a classifier MUST implement either this or distributionForInstance().
         */
        public double classifyInstance(Instance instance) throws Exception (...)
 
        /**
         * Predicts the class memberships for a given instance. If an instance is unclassified, the returned array 
         * elements must be all zero. If the class is numeric, the array must consist of only one element,
         * which contains the predicted value. Note that a classifier MUST implement either this or classifyInstance().
         */
        public double[] distributionForInstance(Instance instance) throws Exception (...)
 
        (...)
 
}

We will implement buildClassifier and classifyInstance methods:

import weka.classifiers.Classifier;
import weka.core.Instance;
import weka.core.Instances;
 
/**
 * Example implementation of a majority classifier (using Weka architecture) which assigns always the same decision - 
 * the most frequent decision in the training set.
 */
public class MajorityClassifier extends Classifier {
 
        private double decision;
 
        public void buildClassifier(Instances instances) throws Exception {
                int decAttributeIndex = instances.classAttribute().index();
                int[] valuesCounts = instances.attributeStats(decAttributeIndex).nominalCounts;
                // Find the index with the highest count.
                int bestCountIndex = 0;
                for (int decisionIndex = 1; decisionIndex < valuesCounts.length; decisionIndex++)
                        if (valuesCounts[decisionIndex] > valuesCounts[bestCountIndex]) 
                                bestCountIndex = decisionIndex;
                decision = bestCountIndex;
        }
 
        public double classifyInstance(Instance instance) throws Exception {
                return decision;
        }
 
}

Copy the above code and save it as a MajorityClassifier.java file. Place it in the same directory as the already downloaded weka<version>.jar file. Then compile and pack the classifier into the jar archive:

javac -cp weka<version>.jar MajorityClassifier.java
jar cf WekaMajorityClassifier.jar MajorityClassifier.class

If everything went well you shoud have WekaMajorityClassifier.jar file in the current directory.

As previously upload the classifier's jar file into the repository John_Smith/Classifiers folder and evaluate its accuracy on some data sets using either TunedTester GUI or command line - refering to the classifier by John_Smith/Classifiers/WekaMajorityClassifier.jar:MajorityClassifier

./tunedtester.sh -g -s -u John_Smith -p pass -d UCI/iris.arff 
    -a John_Smith/Classifiers/WekaMajorityClassifier.jar:MajorityClassifier

Example 3 - writing an evaluation procedure

Will appear soon…

&nbsp;

&nbsp;