February 17, 2017

What is KeLP?

KeLP is a Java Kernel-based Learning Platform providing the implementation of Kernel-based learning algorithms, as well as kernel functions over generic data representations, e.g., vectorial data or discrete structures, such as graphs, trees, and sequences.
The framework has been designed to decouple data structures, kernel functions and learning algorithms in order to maximize the re-use of existing functionalities: as an example, a new kernel can be included in any existing algorithms, and vice versa.

KeLP can effectively tackle a wide variety of learning problems, including (multi-class, multi-label) classification, regression and clustering. KeLP supports XML and JSON serialization of kernel functions and algorithms, enabling the agile definition of kernel-based learning systems without writing additional lines of code.

KeLP is completely written in Java. Java has been selected as it is the main language in the enterprise development.  Moreover, in NLP/IR many tools are based on the Java language, such as Stanford CoreNLP, OpenNLP or Lucene. Thus, KeLP can be easily integrated in Java-based projects.

Why Kernel-based Learning?

In Machine Learning (ML) instances are often represented as vectors in specific feature spaces, that have been defined beforehand: most of the existing ML platforms (e.g., Weka, LibSVM, scikit-learn) have been developed assuming instances have been already transformed in vectors. The definition of a feature space often requires a complex feature engineering phase. For example, in Natural Language Processing, syntactic information is crucial in many tasks, e.g., Semantic Role Labeling (Carreras and Marquez, 2005). Understanding which syntactic patterns should be captured is non-trivial and usually the resulting feature vector model is a poor approximation.
Instead, a more natural approach is operating directly on parse tree of sentences. Kernel methods provide an efficient and effective solution, allowing to represent data at a more abstract level, while their computation still looks at the informative properties of them. For instance, Tree Kernels take in input two syntactic parse trees, and compute a similarity measure by looking at the shared sub-structures.


References

Xavier Carreras and Lluıs Marquez. Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling. Proceedings of the Ninth Conference on Computational Natural Language Learning 2005.