Jan Arpe and Ruediger Reischuk
Given n Boolean input variables representing a set of attritubes, we
consider Boolean functions f (i.e., binary classifications of tuples)
that actually depend only on a small but unknown subset of these
variables/attributes, in the following called relevant. The goal is
to
determine these relevant attributes given a sequence of examples -
input vectors X with their classification f(X).
We analyze two simple greedy strategies and prove that they are able
to achieve this goal for various kinds of Boolean functions and
various input distributions according to which the examples are drawn
at random. This generalizes results obtained by Akutsu, Miyano, and
Kuhara for the uniform distribution.
The analysis also provides explicit upper bounds on the number of
necessary examples. They depend on the distribution and combinatorial
properties of the function to be inferred.
Our second contribution is an extension of these results to a
situation where the examples contain errors, that means a certain
number of input bits x_i may be wrong. This is a typical situation,
for example in medical research, where not all attributes can be
measured reliably.
We show that even in such an error-prone situation reliable inference
of the relevant attributes can be performed by proving that our greedy
strategies are robust even against a linear number of errors.