{"id":1025,"date":"2017-07-20T09:53:33","date_gmt":"2017-07-20T09:53:33","guid":{"rendered":"http:\/\/www.kelp-ml.org\/?page_id=1025"},"modified":"2017-08-31T16:25:16","modified_gmt":"2017-08-31T16:25:16","slug":"generating-input-data","status":"publish","type":"page","link":"http:\/\/www.kelp-ml.org\/?page_id=1025","title":{"rendered":"Generating Input Data"},"content":{"rendered":"<p>To generate the input data for KeLP we developed a specific project: <a href=\"https:\/\/github.com\/SAG-KeLP\/kelp-input-generator\">kelp-input-generator<\/a>.<\/p>\n<p>This project relies on third party software components, such as the <a href=\"https:\/\/nlp.stanford.edu\/software\/lex-parser.shtml\">Stanford Parser<\/a>, and provides the functionalities to extract KeLP data structures from text snippets. Being a general purpose machine learning platform, KeLP is not limited to only Natural Language Processing tasks. However, for the moment, we do not provide any feature extraction capability for different fields.<\/p>\n<p>In order to preserve the lightweight of the main KeLP project,\u00a0<a href=\"https:\/\/github.com\/SAG-KeLP\/kelp-input-generator\">kelp-input-generator<\/a>\u00a0is not included in <a href=\"https:\/\/github.com\/SAG-KeLP\/kelp-full\">kelp-full<\/a>. If you want to use the <a href=\"https:\/\/github.com\/SAG-KeLP\/kelp-input-generator\">kelp-input-generator<\/a> functionalities in your maven project you can easily include it with the following Maven repository:<\/p>\n<pre class=\"lang:xhtml decode:true\" title=\"repository\">\t\t\r\n\r\n\t<repositories>\r\n\t\t<repository>\r\n\t\t\t<id>kelp_repo_snap<\/id>\r\n\t\t\t<name>KeLP Snapshots Repository<\/name>\r\n\t\t\t<releases>\r\n\t\t\t\t<enabled>false<\/enabled>\r\n\t\t\t\t<updatePolicy>always<\/updatePolicy>\r\n\t\t\t\t<checksumPolicy>warn<\/checksumPolicy>\r\n\t\t\t<\/releases>\r\n\t\t\t<snapshots>\r\n\t\t\t\t<enabled>true<\/enabled>\r\n\t\t\t\t<updatePolicy>always<\/updatePolicy>\r\n\t\t\t\t<checksumPolicy>fail<\/checksumPolicy>\r\n\t\t\t<\/snapshots>\r\n\t\t\t<url>http:\/\/sag.art.uniroma2.it:8081\/artifactory\/kelp-snapshot\/<\/url>\r\n\t\t<\/repository>\r\n\t\t<repository>\r\n\t\t\t<id>kelp_repo_release<\/id>\r\n\t\t\t<name>KeLP Stable Repository<\/name>\r\n\t\t\t<releases>\r\n\t\t\t\t<enabled>true<\/enabled>\r\n\t\t\t\t<updatePolicy>always<\/updatePolicy>\r\n\t\t\t\t<checksumPolicy>warn<\/checksumPolicy>\r\n\t\t\t<\/releases>\r\n\t\t\t<snapshots>\r\n\t\t\t\t<enabled>false<\/enabled>\r\n\t\t\t\t<updatePolicy>always<\/updatePolicy>\r\n\t\t\t\t<checksumPolicy>fail<\/checksumPolicy>\r\n\t\t\t<\/snapshots>\r\n\t\t\t<url>http:\/\/sag.art.uniroma2.it:8081\/artifactory\/kelp-release\/<\/url>\r\n\t\t<\/repository>\r\n<\/repositories>\t\t\t\r\n\t\t\r\n\t\t\r\n<\/pre>\n<p>Then, the <a href=\"http:\/\/maven.apache.org\/\">Maven<\/a> dependency for the kelp-input-generator project is:<\/p>\n<pre class=\"lang:xhtml decode:true\" title=\"repository0\">\t\t\r\n\r\n<dependencies>\r\n\t\t<dependency>\r\n\t\t\t<groupId>it.uniroma2.sag.kelp<\/groupId>\r\n\t\t\t<artifactId>kelp-input-generator<\/artifactId>\r\n\t\t\t<version>1.0.1-SNAPSHOT<\/version>\r\n\t\t<\/dependency>\r\n<\/dependencies>\r\n\r\n<\/pre>\n<p>Currently, <a href=\"https:\/\/github.com\/SAG-KeLP\/kelp-input-generator\">kelp-input-generator<\/a> allows to easily generate <a href=\"http:\/\/www.kelp-ml.org\/kelp-javadoc\/current-version\/it\/uniroma2\/sag\/kelp\/data\/representation\/tree\/TreeRepresentation.html\">TreeRepresentation<\/a>s from text snippets. In particular, it provides the capabilities to extract the LOCT, LCT and GRCT representations, which are a tree views of a dependency graph, as introduced in (Croce et al., 2011).<\/p>\n<div>KeLP uses its own format for representing graph data. However, a converter from the popular gSpan format is available in<a href=\"https:\/\/github.com\/SAG-KeLP\/kelp-input-generator\">kelp-input-generator<\/a>: it.uniroma2.sag.kelp.input.gra<wbr \/>ph.GspanFormatConverter. The main method on the class can be invoked passing as parameter the gSpan file (and optionally a file with the target labels if they are available and they are not included in the gSpan file).<\/div>\n<div>If your input graphs are in a format supported by <a href=\"http:\/\/openbabel.org\/\" target=\"_blank\" rel=\"noopener\" data-saferedirecturl=\"https:\/\/www.google.com\/url?hl=en&amp;q=http:\/\/openbabel.org&amp;source=gmail&amp;ust=1501079599676000&amp;usg=AFQjCNHCcb6b9XHf6ZQCNFCRXXnUweFiFg\">Open Babel<\/a>, the following <a href=\"https:\/\/github.com\/axot\/GLP\/blob\/master\/tools\/sdf2gsp.py\" target=\"_blank\" rel=\"noopener\">script<\/a>\u00a0converts graphs from one of the Open Babel to gSpan. Therefore, all 111 Open Babel formats are indirectly supported as well.<\/div>\n<p>In the future we plan to extend <a href=\"https:\/\/github.com\/SAG-KeLP\/kelp-input-generator\">kelp-input-generator<\/a> by adding the possibility to extract shallow and constituency tree representations, as well as <a href=\"http:\/\/www.kelp-ml.org\/kelp-javadoc\/current-version\/it\/uniroma2\/sag\/kelp\/data\/representation\/sequence\/SequenceRepresentation.html\">SequenceRepresentation<\/a>s and <a href=\"http:\/\/www.kelp-ml.org\/kelp-javadoc\/current-version\/it\/uniroma2\/sag\/kelp\/data\/representation\/graph\/DirectedGraphRepresentation.html\">DirectedGraphRepresentation<\/a>s.<\/p>\n<h3>References<\/h3>\n<p>Danilo Croce, Alessandro Moschitti, and Roberto Basili.\u00a0<em>Structured lexical similarity via convolution kernels on dependency trees<\/em>. In Proceedings of EMNLP, Edinburgh, Scotland, UK., 2011.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>To generate the input data for KeLP we developed a specific project: kelp-input-generator. This project relies on third party software components, such as the Stanford Parser, and provides the functionalities to extract KeLP data structures from text snippets. Being a general purpose machine learning platform, KeLP is not limited to only Natural Language Processing tasks. <a href=\"http:\/\/www.kelp-ml.org\/?page_id=1025\" rel=\"nofollow\"><span class=\"sr-only\">Read more about Generating Input Data<\/span>[&hellip;]<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"_links":{"self":[{"href":"http:\/\/www.kelp-ml.org\/index.php?rest_route=\/wp\/v2\/pages\/1025"}],"collection":[{"href":"http:\/\/www.kelp-ml.org\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/www.kelp-ml.org\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/www.kelp-ml.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.kelp-ml.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1025"}],"version-history":[{"count":14,"href":"http:\/\/www.kelp-ml.org\/index.php?rest_route=\/wp\/v2\/pages\/1025\/revisions"}],"predecessor-version":[{"id":1062,"href":"http:\/\/www.kelp-ml.org\/index.php?rest_route=\/wp\/v2\/pages\/1025\/revisions\/1062"}],"wp:attachment":[{"href":"http:\/\/www.kelp-ml.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1025"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}