Updated the kkmeans example to show how to use the new max sv

settings of the kcentroid.

--HG--
extra : convert_revision : svn%3Afdd8eb12-d10e-0410-9acb-85c331704f74/trunk%402936
This commit is contained in:
Davis King 2009-03-15 23:24:08 +00:00
parent 8432290011
commit cd6f196eb7
1 changed files with 20 additions and 22 deletions

View File

@ -41,28 +41,18 @@ int main()
typedef radial_basis_kernel<sample_type> kernel_type; typedef radial_basis_kernel<sample_type> kernel_type;
// Here we declare an instance of the kcentroid object. The first argument to the constructor // Here we declare an instance of the kcentroid object. It is the object used to
// is the kernel we wish to use. The second is a parameter that determines the numerical // represent each of the centers used for clustering. The kcentroid has 4 parameters
// accuracy with which the object will perform part of the learning algorithm. Generally // you need to set. The first argument to the constructor is the kernel we wish to
// smaller values give better results but cause the algorithm to run slower. You just have // use. The second is a parameter that determines the numerical accuracy with which
// to play with it to decide what balance of speed and accuracy is right for your problem. // the object will perform part of the learning algorithm. Generally, smaller values
// Here we have set it to 0.01. // give better results but cause the algorithm to attempt to use more support vectors
// // (and thus run slower and use more memory). The third argument, however, is the
// Also, since we are using the radial basis kernel we have to pick the RBF width parameter. // maximum number of support vectors a kcentroid is allowed to use. So you can use
// Here we have it set to 0.1. But in general, a reasonable way of picking this value is // it to control the complexity. Finally, the last argument should always be set to
// to start with some initial guess and to just run all the data through the resulting // false when using a kcentroid for clustering (see the kcentroid docs for details on
// kcentroid. Then print out kc.dictionary_size() to see how many support vectors the // this parameter).
// kcentroid object is using. A good rule of thumb is that you should have somewhere kcentroid<kernel_type> kc(kernel_type(0.1),0.01, 8, false);
// in the range of 10-100 support vectors (but this rule isn't carved in stone).
// So if you aren't in that range then you can change the RBF parameter. Making it
// smaller will decrease the dictionary size and making it bigger will increase the
// dictionary size.
//
// So what I often do is I set the kcentroid's second parameter to 0.01 or 0.001. Then
// I find an RBF kernel parameter that gives me the number of support vectors that I
// feel is appropriate for the problem I'm trying to solve. Again, this just comes down
// to playing with it and getting a feel for how things work.
kcentroid<kernel_type> kc(kernel_type(0.1),0.01);
// Now we make an instance of the kkmeans object and tell it to use kcentroid objects // Now we make an instance of the kkmeans object and tell it to use kcentroid objects
// that are configured with the parameters from the kc object we defined above. // that are configured with the parameters from the kc object we defined above.
@ -145,6 +135,14 @@ int main()
cout << test(samples[i+2*num]) << "\n"; cout << test(samples[i+2*num]) << "\n";
} }
// Now print out how many support vectors each center used. Note that
// the maximum number of 8 was reached. If you went back to the kcentroid
// constructor and changed the 8 to some bigger number you would see that these
// numbers would go up. However, 8 is all we need to correctly cluster this dataset.
cout << "num sv for center 0: " << test.get_kcentroid(0).dictionary_size() << endl;
cout << "num sv for center 1: " << test.get_kcentroid(1).dictionary_size() << endl;
cout << "num sv for center 2: " << test.get_kcentroid(2).dictionary_size() << endl;
} }