PAC Examples

Hopefully these examples can clear up some confusions after reading these formal definitions. Pay attention not just to the algorithm itself but more importantly to the proof that states why it is PAC.

Positive Half-lines

The domain $X = \mathbb{R}$ , $C = H = \{positive\;half\; lines\}$ . // C = H why? by the def of this problem hypothesis is the mapping A positive half line is a threshold (a real number) such that all pts to the left are labeled - and all pts to the right are labels +.

Learning Algorithm

Sort the training data in ascending order Find greatest - labeled pt and smallest + labeled pt Set the threshold anywhere in this interval

Prove PAC

The generalization error of h will be the region b/w h and c. Points in this region is labeled differently by the concept and the hypothesis. We need to show the generalization error is low with high probability. Equivalently (for the convenience of computation?), we need to show $Pr[err_D(h) > \epsilon] \leq \delta$ .

Two cases for $err_D(h) > \epsilon$ : 1. $B_+$ $h$ is more than $\epsilon$ away from $c$ (to the right) 2. $B_-$ $h$ is more than $\epsilon$ away from $c$ (to the left)

$R_+$ as the interval $[c, r_+]$ where $r_+$ is a pt to the right of c such that Pr $[c, r_+]$ is $\epsilon$ . Now we want to find $Pr(B_+)$ . $B_+$ only occurs when $h$ is to the right of $r_+$ . This only happens when all $x_i \notin R_+$ . (i.e. not training data in that interval so the algorithm cannot learn a threshold exists in that interval.)

Since $P(R_+) = \epsilon$ , $Pr[x_i \notin R_+] \leq 1 - \epsilon$ . Then

Pr[B_+] = Pr[x_1 \notin R_+ \land x_2 \notin R_+ \land ... \land x_m \notin R_+] \leq (1 - \epsilon)^m

$x_i$ are all random and independent.

By symmetry, $Pr[B_-] \leq (1 - \epsilon)^m$ .

Now we can bound $\begin{aligned} Pr[err_D > \epsilon] & \leq Pr[B_+ \lor B_-] \\ & \leq Pr[B_+] + Pr[B_-]\quad (union bound) \\ & \leq 2(1 - \epsilon)^m \\ & \leq 2\mathrm{e}^{-\epsilon m} \quad\quad\quad (1 - x \leq \mathrm{e}^{-x}) \end{aligned}$

So we want to set it less than delta, $Pr[err_D > \epsilon] \leq 2\mathrm{e}^{-\epsilon m} \leq \delta$ . Take ln on both sides and solve for m, we get $m \geq \frac{1}{\epsilon} \ln(\frac{2}{\delta})$ . This shows $C$ is PAC-learnable (because as m gets larger $\epsilon$ and $\delta$ are getting smaller). $QED$

Review

We want to bound Pr[err_D > epsilon] first.
Identify the error region then union bound
Set the bound less than delta
Solve for m (m should bound epsilon and delta).

Positive Intervals

Again, the domain $X = \mathbb{R}$ . The concept class is also the hypothesis space $C = H$ . Each $c \in C$ specifies an interval that will be labeled +. All pts outside of the interval will be labeled -.

Learning Algorithm

Pick a consistent $h \in H$ .

Prove PAC

Again, we want to prove $C$ is PAC-learnable by $H$ . In order to bound $Pr[err_D > \epsilon]$ , we need to break down all cases where $err_D > \epsilon$ .

Let $c_l$ be the left boundary of c and $c_r$ be the right boundary of c. Define: 1. $L_-$ h is more than $\frac{\epsilon}{2}$ away from $c_l$ (to the left) 2. $L_+$ h is more than $\frac{\epsilon}{2}$ away from $c_l$ (to the right) 3. $R_-$ h is more than $\frac{\epsilon}{2}$ away from $c_r$ (to the left) 4. $R_+$ h is more than $\frac{\epsilon}{2}$ away from $c_r$ (to the right)

There are 2 x 2 = 4 combinations: $\{L_-, R_-\}$ , $\{L_-, R_+\}$ , $\{L_+, R_-\}$ , $\{L_+, R_+\}$

We want to have this bound, $Pr[err_D > \epsilon] \leq Pr[L_-, R_- \lor L_-, R_+ \lor L_+, R_- \lor L_+, R_+]$ $\leq Pr[L_-, R_-] + Pr[L_-, R_+] + Pr[L_+, R_-] + Pr[L_+, R_+]$ (Union Bound) $\leq 4(1 - \frac{\epsilon}{2})^m$ $\leq 4(\mathbb{e}^{-\frac{\epsilon}{2}})^m = 4\mathbb{e}^{-\frac{\epsilon m}{2}}$ (again $1 - x \leq \mathrm{e}^{-x}$ )

We good. Now set it less than $\delta$ and solve for m Take ln then solve we get $m \geq \frac{2}{\epsilon} \ln(\frac{4}{\delta})$ . $QED$

Learning Axis-aligned Rectangles

Learning Algorithm

c is the target concept. The algorithm finds h which is the smallest consistent rectangle.

Prove PAC

Figure out the cases where $err_D > \epsilon$ . Since h is the smallest rectangle, we grow each edge outward by probability mass $\frac{\epsilon}{4}$ . Why it has to be inside c? Cause every pt inside h is labeled + and every pt outside h is labeled -. Outside of c is -, not an error. So we have to identify the error region first. In this problem, err = c - h.

$\begin{aligned} Pr[err_D > \epsilon] & \leq Pr[B_1 \lor B_2 \lor B_3 \lor B_4] \\ & \leq Pr[B_1] + Pr[B_2] + Pr[B_3] + Pr[B_4] \quad (union \; bound) \\ & \leq 4(1 - \frac{\epsilon}{4})^m \\ & \leq 4\mathrm{e}^{-\frac{\epsilon m}{4}} \quad\quad\quad (1 - x \leq \mathrm{e}^{-x}) \end{aligned}$

Solve for m.

Ending note: The proofs above are all tailored to each problem. There is a more general theorem for proving PAC.

PreviousPAC Learning Model NextUnions of Intervals

Last updated 5 years ago