February 24, 2017

Data Manipulation

KeLP is a general-purpose machine learning platform and does not cover any feature extraction aspect. However it provides some simple data preprocessing features to manipulate the input data. Specific operations on data can be defined by implementing the Manipulator interface. Instances of such class can be then passed to the method manipulate of the class Dataset in order to perform the manipulation operations on the whole dataset.

Some implementations of the class Manipulator are:

  • NormalizationManipolator: it scales vector representations in order to be a unit vector in its explicit feature space. This can be useful when the orientation of the feature vectors is meaningful, while their magnitude is not relevant;
  • StandardizationManipulator: it standardizes the feature values of a vectorial representation. Let x_i be the value of the i-th feature whose mean and standard deviation are \mu_i and \sigma_i respectively. Then, the standardized value is \hat{x_i} = (x_i-\mu_i)/\sigma_i. This operation is useful in order to map all the features to a similar range.
  • VectorConcatenationManipulator: it allows to concatenate vectors into a new SparseVector representation. It is useful when a linear approach must be applied to multiple vectorial representations;
  • PairSimilarityExtractor: it analyzes an ExamplePair extracting some similarity scores between the left and the right examples of the pair. The extracted similarity scores are stored in a DenseVector that is added to the representations set of the ExamplePair.
  • TreePairRelTagger: given an ExamplePair whose left and right examples contain TreeRepresentations, it performs the REL tagging described in (Filice et al., 2015).

 

References

Simone Filice, Giovanni Da San Martino and Alessandro Moschitti. Relational Information for Learning from Structured Text Pairs. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, ACL 2015.