Caractérisation combinatoire de l'apprentissage exact avec les requêtes d'adhésion

15

Edit: Comme je n'ai reçu aucune réponse / commentaire en une semaine, je voudrais ajouter que je suis heureux d'entendre quoi que ce soit sur le problème. Je ne travaille pas dans la région, donc même si c'est une simple observation, je ne le sais peut-être pas. Même un commentaire comme "je travaille dans la région, mais je n'ai pas vu une caractérisation comme celle-ci" serait utile!

Contexte:

Il existe plusieurs modèles d'apprentissage bien étudiés dans la théorie de l'apprentissage (par exemple, l'apprentissage par PAC, l'apprentissage en ligne, l'apprentissage exact avec des requêtes d'adhésion / d'équivalence).

Par exemple, dans l'apprentissage PAC, l'échantillon de complexité d'une classe de concept a une belle caractérisation combinatoire en termes de dimension VC de la classe. Donc, si nous voulons apprendre une classe avec une précision et une confiance constantes, cela peut être fait avec des échantillons , où est la dimension VC. (Notez que nous parlons de la complexité des échantillons, pas de la complexité du temps.) Il y a aussi une caractérisation plus raffinée en termes de précision et de confiance. De même, le modèle lié à l'erreur de l'apprentissage en ligne a une belle caractérisation combinatoire. $\Theta(d)$ $d$

Question:

Je veux savoir si un résultat similaire est connu pour le modèle d'apprentissage exact avec des requêtes d'adhésion. Le modèle est défini comme suit: Nous avons accès à une boîte noire qui en entrée vous donne . Nous savons que provient d'une classe de concept $x$ $f(x)$ $f$ $C$ . Nous voulons déterminer avec le moins de requêtes possible. $f$

Existe-t-il un paramètre combinatoire d'une classe de concept $C$ qui caractérise le nombre de requêtes nécessaires pour apprendre un concept dans le modèle d'apprentissage exact avec des requêtes d'appartenance?

Ce que je sais:

La meilleure caractérisation que j'ai trouvée se trouve dans cet article de Servedio et Gortler , en utilisant un paramètre qu'ils attribuent à Bshouty, Cleve, Gavaldà, Kannan et Tamon . Ils définissent un paramètre combinatoire appelé , où est la classe de concept, qui a les propriétés suivantes. (Soit le nombre optimal de requêtes nécessaires pour apprendre $\gamma^C$ $C$ $Q_C$ $C$ dans ce modèle.)

$Q_C = \Omega(1/\gamma^C)\qquad Q_C = \Omega(\log |C|) \qquad Q_C = O(\log |C|/\gamma^C)$

This characterization is almost tight. However, there could be a quadratic gap between the upper and lower bounds. For example if $1/\gamma^C = \log |C| = k$ , then the lower bound is $\Omega(k)$ , but the upper bound is $O(k^2)$ . (I also think this gap is achievable, i.e., there exists a concept class for which the lower bounds are both $\Omega(k)$ , but the upper bound is $O(k^2)$ .)

reference-request machine-learning lg.learning

— Robin Kothari
source

1

"Haystack dimension" characterizes the query complexity of optimizing a function: cis.upenn.edu/~mkearns/papers/haystack.pdf , This is different than what you want, but you might enjoy the related work which discusses what is known about characterizing the query complexity of exact learning.

— Aaron Roth

6

To drive home the point of anonymous moose's example, consider the concept class that consists of functions that output 1 on only one point in {0,1}^n. The class is of size 2^n, and 2^n queries are needed in the worst-case. Take a look at worst-case Teaching Dimension (Goldman & Schapire) which provides something similar to what you're looking for.

— anonymous goose
source

1

Thanks! Searching for the Teaching Dimension led me to the Extended Teaching Dimension, which is similar to the combinatorial parameter I mentioned in the question, which then led me to many other interesting papers on the topic.

— Robin Kothari

4

I don't know of such a characterization. However, it's worthwhile to note that for almost any concept class, one needs to query all points. To see this, consider the concept class that consists of all n-dimensional boolean vectors with Hamming weight 1. This concept class obviously requires n queries to learn, which is equal to its cardinality. You can probably generalize this observation to get that almost any concept class also requires performing all queries.

I would suspect that given a concept class C as input, it is NP-hard to determine the complexity of exactly learning the concept class with membership queries, or even to approximate it up to say a constant. This would give some indication that a "good" combinatorial characterization does not exist. If you wish to prove such an NP-hardness result but try and fail feel free to post here and I'll see if I can figure it out (I have some ideas).

— anonymous moose
source

1

Thanks for the response. Even if it is true that almost all concept classes (under some reasonable distribution over classes) are hard to learn, some classes are easy to learn and it would be interesting to have a combinatorial parameter that characterizes this. I don't mind if the parameter is hard to compute. Even the VC dimension is not known to be efficiently computable.

— Robin Kothari

1

Although others have pointed out the answer. I thought I may make it self-contained and show why teaching dimension is the answer.

Consider a concept class $C$ over input space $X$ . A set of elements $S\subseteq X$ is called a teaching set for a concept $f$ if $f$ is the only concept in $C$ consistent with $S$ .

Let $\mathcal{T}(f)$ be the set of all teaching sets for $f$ and define TD $(f,C)=min\{\ |S|\ | \ S\in \mathcal{T}(f) \}$ to be the teaching dimension of $f$ . i.e., the cardinality of the smallest teaching set TS $_{min}(f)$ in $\mathcal{T}(f)$ . Similarly, consider TD $(C)=$ max $_{f\in C}$ TD $(f,C)$ to be the teaching dimension of $C$ .

The minimum number of queries needed to identify $f$ is TD $(f,C)$ . This happens when the query strategy uses the sequence TS $_{min}(f)$ . As for any fewer queries we have at least two concepts consistent with it. And TD $(C)$ is the minimum for any $f$ .

— seteropere
source

I don't understand why the teaching dimension upper bounds the query complexity of learning

f

$f$ . What does the algorithm look like? The function

f

$f$ in unknown to the algorithm when we start, so it cannot simply query the teaching set for

f

$f$ .

— Robin Kothari

@RobinKothari TD lower bounds the minimum number of queries in any MQ-algorithm. In practice, there may be no algorithm that blindly achieves this bound without cheating or code tricks. In Angluin's "Queries Revisited" paper, she discussed a parameter called MQ that represent the number of queries needed by the best MQ-algorithm in the worst case. I don't recall its details but certainly TD<=MQ.

— seteropere

1

What I was interested in (when I asked this question) was a parameter that characterizes exact learning with membership queries. It should be both an upper and lower bound. I provided an example of a parameter that achieves this (up to a log |C| factor) in the question. My question was whether something better is known.

— Robin Kothari