HOWTO: Working with Decimated Sets of Ordered Amino Acids


  1. To generate decimated sets of ordered amino acids (singles, pairs, or triples) for testing classification with smaller sets of features, the toolkit provides the program GenerateSVMSubsetRenormalize.java.

  2. Ensure you have available the base data you are interested in converting to a decimated data set. These files could be the base data files generated for the cross validation process or could be the training and testing sets used in the bootstrapping tests.

  3. Copy the Java file GenerateSVMSubsetRenormalize.java into your current working directory and compile this program with the command:

    javac GenerateSVMSubsetRenormalize.java

  4. To convert a base data set to a decimated data set, two files are need. One is the list of features to extract from the base data set and the other is the base data set itself. An example of a file that contains features to extract is found in the dataFiles/ directory as dataFiles/importantTriples.01.

  5. To perform the actual generation of a decimated test set, run the GenerateSVMSubsetRenormalize program as follows:

    java GenerateSVMSubsetRenormalize subsetFile svmFormattedInputFile svmFormattedOutputFile numberOfEntriesInSubsetFile numberOfSVMFeatures

    As an example, if the subsetFile is the list of important triples called importantTriples.500 and contains 500 indices of interest, the SVM file with all features is randomClassificationTestGroup (one of the outputs from the bootstrapping processes) and has 8000 features, and one desires the newly formatted SVM file to be called randomClassificationTestGroup.decimated500, the command line would be:

    java GenerateSVMSubsetRenormalize importantTriples.500 randomClassificationTestGroup randomClassificationTestGroup.decimated500 500 8000