public class SimpleDataset extends Object implements Dataset
| Constructor and Description |
|---|
SimpleDataset()
Initializes an empty dataset
|
| Modifier and Type | Method and Description |
|---|---|
void |
addExample(Example example)
Add an example to the dataset
|
void |
addExamples(Dataset datasetToBeAdded)
Add all the examples contained in
datasetToBeAdded |
static Dataset |
extractExamplesOfClasses(Dataset dataset,
List<Label> labels)
This method extracts examples of given
labels from
dataset |
List<Label> |
getClassificationLabels()
Returns all the classification labels in the dataset.
|
Example |
getExample(int exampleIndex)
Return the example stored in the
exampleIndex position |
List<Example> |
getExamples()
Returns an array containing all the stored examples
|
Example |
getNextExample()
Returns the next
n Examples stored in the Dataset or a fewer number
if n examples are not available. |
List<Example> |
getNextExamples(int n)
Returns the next
Example stored in the Dataset |
int |
getNumberOfExamples()
Returns the number of
Examples in the dataset |
int |
getNumberOfNegativeExamples(Label positiveClass)
Returns the number of negative
Examples of a given class |
int |
getNumberOfPositiveExamples(Label positiveClass)
Returns the number of positive
Examples of a given class |
Example |
getRandExample() |
List<Example> |
getRandExamples(int k) |
List<Label> |
getRegressionProperties()
Returns all the regression properties in the dataset.
|
SimpleDataset |
getShuffledDataset() |
Vector |
getZeroVector(String representationIdentifier)
Returns a zero vector compliant with the representation identifier by
representationIdentifier containing all zeros |
boolean |
hasNextExample()
Returns a boolean declaring whether there are other Examples in the dataset
|
boolean |
isConsistent()
Evaluates whether the examples included in this dataset are compatible
with each other.
|
void |
manipulate(Manipulator... manipulators)
Manipulates all the examples in the dataset accordingly to the strategies defined by the given
manipulators. |
SimpleDataset[] |
nFolding(int n)
Returns
n datasets. |
SimpleDataset[] |
nFoldingClassDistributionInvariant(int n)
Returns
n datasets. |
void |
populate(DatasetReader reader)
Populate the dataset using the provided
reader |
void |
populate(String filename)
Populate the dataset by reading it from a KeLP
compliant file.
|
void |
reset()
Reset the reading pointer
|
void |
save(String outputFilePath)
Save the dataset in a file.
|
void |
setSeed(long seed)
Sets the seed of the random generator used to shuffling examples and getting random examples
|
void |
shuffleExamples(Random randomGenerator)
Shuffles the examples in the dataset
|
SimpleDataset[] |
split(float percentage)
Returns two datasets created by splitting this dataset accordingly to
percentage. |
SimpleDataset[] |
splitClassDistributionInvariant(float percentage)
Returns two datasets created by splitting this dataset accordingly to
percentage. |
public void addExample(Example example)
addExample in interface Datasetexample - the example to be addedpublic void addExamples(Dataset datasetToBeAdded)
datasetToBeAddeddatasetToBeAdded - the dataset containing all the examples to be addedpublic Example getExample(int exampleIndex)
exampleIndex positionexampleIndex - the index of the example to returnexampleIndex positionpublic boolean hasNextExample()
DatasethasNextExample in interface Datasettrue if and only if there is at least another Example in the datasetpublic Example getNextExample()
Datasetn Examples stored in the Dataset or a fewer number
if n examples are not available.getNextExample in interface Datasetn Examplespublic List<Example> getNextExamples(int n)
DatasetExample stored in the DatasetgetNextExamples in interface Datasetn - the number of examples to be returnedExamplepublic void reset()
Datasetpublic int getNumberOfPositiveExamples(Label positiveClass)
DatasetExamples of a given classgetNumberOfPositiveExamples in interface DatasetpositiveClass - the class whose number of positive Examples are requiredExamples of positiveClasspublic int getNumberOfNegativeExamples(Label positiveClass)
DatasetExamples of a given classgetNumberOfNegativeExamples in interface DatasetpositiveClass - the class whose number of negative Examples are requiredExamples of positiveClasspublic int getNumberOfExamples()
DatasetExamples in the datasetgetNumberOfExamples in interface DatasetExamples in the datasetpublic List<Label> getClassificationLabels()
DatasetgetClassificationLabels in interface Datasetpublic List<Label> getRegressionProperties()
DatasetgetRegressionProperties in interface Datasetpublic void shuffleExamples(Random randomGenerator)
randomGenerator - a random number generatorpublic SimpleDataset[] splitClassDistributionInvariant(float percentage)
percentage. The original distribution of the examples among
the classes is maintained in the two datasets. The examples are split
accordingly to their order. Thus the first dataset consists of the first
percentage% of examples of each class, while the second
dataset consists in all the remaining examplespercentage - should be a number in [0,1]public SimpleDataset[] split(float percentage)
percentage. The examples are split accordingly to their
order without maintaining the original data distribution among the
classes. Thus the first dataset consists of the first
percentage% of examples, while the second dataset consists
in all the remaining examplespercentage - should be a number in [0,1]public SimpleDataset[] nFoldingClassDistributionInvariant(int n)
n datasets. Each dataset is a fold storing 1/n of
the total examples. The folds are not overlapped and maintain the
original distribution of the examples among the classes. The example in
this dataset are split into n folds accordingly to their
order, so that for instance the first folds has all the first examples of
each classn - the number of folds to createn datasets each one consisting of 1/n% of the
examplespublic SimpleDataset[] nFolding(int n)
n datasets. Each dataset is a fold storing 1/n of
the total examples. The folds are not overlapped and do not maintain the
original distribution of the examples among the classes. The example in
this dataset are split into n folds accordingly to their
order, so that for instance the first folds has all the first examples.n - the number of folds to createn datasets each one consisting of 1/n% of the
examplespublic List<Example> getExamples()
DatasetgetExamples in interface Datasetpublic static Dataset extractExamplesOfClasses(Dataset dataset, List<Label> labels) throws InstantiationException, IllegalAccessException
labels from
datasetdataset - original datasetlabels - labels of interestIllegalAccessExceptionInstantiationExceptionpublic void populate(String filename) throws Exception
filename - the path of the file to be readExceptionpublic void populate(DatasetReader reader) throws Exception
readerdatasetReader - the readerExceptionpublic Example getRandExample()
getRandExample in interface Datasetpublic List<Example> getRandExamples(int k)
getRandExamples in interface Datasetk - the number of examples to be returnedk random examples.
NOTE: Duplicates are allowed
public SimpleDataset getShuffledDataset()
getShuffledDataset in interface Datasetpublic void setSeed(long seed)
Datasetpublic Vector getZeroVector(String representationIdentifier)
DatasetrepresentationIdentifier containing all zeros
NOTE: it assumes that there is at least an example in the dataset and that the representation is directly available on the example using the getRepresentation method (i.e., the example is not an ExamplePair storing the representation in its left or right element)
getZeroVector in interface DatasetrepresentationIdentifier - the identifier of the representationrepresentationIdentifier containing all zerospublic void manipulate(Manipulator... manipulators)
Datasetmanipulators.
manipulator in the arraymanipulate in interface Datasetmanipulators - the manipulators that must be applied to all the examples in the datasetpublic void save(String outputFilePath) throws FileNotFoundException, IOException
outputFilePath - the file pathFileNotFoundExceptionIOExceptionpublic boolean isConsistent()
Copyright © 2018 Semantic Analytics Group @ Uniroma2. All rights reserved.