We begin by proving a simple lemma which shows that the class of nearest neighbor
classifiers is a subset of the class of
-Lipschitz functions.
Let
be a function such
that the sign of
is the label that the nearest neighbor
rule assigns to
, while the magnitude is the sample-margin, i.e.
the distance between
and the decision boundary.
Let
be a set of features and let
be a labeled sample. Then for any
:
The second case is when
and
have alternating
signs. Since
is continuous, there is a
point
on the line connecting
and
such that
is on the decision boundary. Hence,


Proof(of theorem 1):
Let
be a set of features such that
and let
. In order to use theorem 2
we need to compute the fat-shattering dimension of
the class of nearest neighbor classification rules which use the set
of features
. As we saw in lemma 1 this
class is a subset of the class of
-Lipschitz functions on these
features. Hence we can bound the fat-shattering dimension of the class
of NN rules by the dimension of Lipschitz functions.
Since
is supported in a ball of radius
and
,
we need to calculate the fat-shattering dimension of Lipschitz functions
acting on points in
with norm bounded by
. The fat
-dimension
of the 1-NN functions on the features
is thus bounded by the
largest
packing of a ball in
with radius
,
which in turn is bounded by
.
Therefore, for a fixed set of features
we can apply to theorem 2
and use the bound on the fat-shattering dimension just calculated.
Let
and we have according to theorem 2
with probability
over sample
of size
that
for any
Ran Gilad-Bachrach 2004-12-07