February 16, 2017

Using the runnable jars

KeLP adopts a simple and intuitive serialization/deserialization formalism for objects, such as kernels and algorithms, that is based on JSON (more details in this page).
JSON is the JavaScript Object Notation, a standard de facto when exchanging data in WEB, and its main characteristic is that it is easily readable by humans, and that it can be efficiently composed to represent object hierarchies.

This allows to compile stand-alone JAR files (runnable from command line) that read the description of a learning algorithm and/or kernel function and operate over a dataset without the need of new Java code. In such a way you are free to specify and parameterize your learning method via an expressive JSON parameter file.

KeLP runnable jars are meant to provide an easy-to-use way to train, for example, a classifier, and use it to make predictions. Two java jars are released: the kelp-learn-x.x.x.jar can be used to train a classifier and to save a KeLP model (more details about models are available here); the kelp-classify-x.x.x.jar instead can be used to make predictions given an already trained classifier (i.e., a model):

  • kelp-learn.x.x.x.jar takes in input a training dataset in KeLP format, a learning algorithm specification in JSON language and the path where to save the model;
  • kelp-classify.x.x.x.jar takes as input a dataset, the model path and the path where to save the predictions.

The two jars kelp-learn.x.x.x.jar and kelp-classify.x.x.x.jar include all the functionalities of svm-light-tk and provide many more kernels functions and learning algorithms.

Click to the following links to download the latest version of kelp-learn and kelp-classify.

In order to show how to use the kelp runnable jars, download the training dataset, the testing dataset and the algorithm specification.

You can inspect the learning algorithm specification by using a common text editor. It should look like:

In the JSON specification, it is possible to see that the adopted algorithm is a BinaryCSVM with a linear kernel that operates on the first (0) representation. C parameter is set to 1.0 (cp and cn) and a One-Vs-All approach is adopted to manage multiple classes.

Training phase: java -jar kelp-learn-2.1.0.jar iris_train.txt learning_algorithm_specification.json model

After the training stage, you can inspect the model simply by opening it with a text editor. The model is written by KeLP, again,  in JSON format; in this way, it is easily readable. In this case, a list of support vectors can be recognized in the model.txt file.

Testing phase: java -jar kelp-classify-2.1.0.jar iris_test.txt model predictions.txt

The output of kelp-classify should be Accuracy on test set: 0.9375.

predictions.txt file contains the predictions made by kelp. In this example, a multiclass problem with three classes has been proposed. Each row of predictions.txt refer to an example (in the same order) of the iris_test.txt file, and the sore for each label is reported: for example, the first row should be similar to:

iris-setosa:1.5446491 iris-versicolor:-1.8121223 iris-virginica:-9.402145