Comparison to R2W2


Comparison to R2W2

R2W2 [WMC+00] is the state-of-the-art feature selection algorithm for the Support Vector Machines (SVM) classifier. This algorithm is a sophisticated wrapper for SVM and therefore uses the maximal margin principle for feature selection indirectly. The goal of the algorithm is to find a weights vector over the features which will minimize the objective function of the SVM optimization problem. This objective function can be written as $R^2W^2$ where $R$ is the radius of a ball containing all the training data and $W$ is the norm of the linear separator. The optimization is done using gradient descent. After each gradient step a new SVM optimization problem is constructed and solved. Thus it becomes cumbersome for large scale data.

The derivation of R2W2 algorithm assumes that the data are linearly separable. Since this cannot be guaranteed in the general case we use the ``ridge'' trick of adding a constant value to the diagonal of the kernel matrix. Note also that R2W2 is designed for binary classification tasks only. There are several ways in which it can be extended to multi class problems. However, these extensions will make the algorithm even more demanding than its original version.

As in SVM, R2W2 can be used together with a kernel function. We chose to use the Radial Basis Function (RBF) kernel. The RBF kernel is defined to be

\begin{displaymath}K(\mathbf{x}_1,\mathbf{x}_2)=e^{-\frac{\Vert \mathbf{x}_1-\mathbf{x}_2 \Vert}{2\sigma^2}}\end{displaymath}

where $\sigma$ is a predefined parameter. The choice of the RBF kernel is due to the similarity between SVM with RBF kernel and the nearest-neighbor rule. Our implementation is based on the one in the Spider package [WEBS04].

Ran Gilad-Bachrach 2004-12-07