PAC Examples

Hopefully these examples can clear up some confusions after reading these formal definitions. Pay attention not just to the algorithm itself but more importantly to the proof that states why it is PAC.

Positive Half-lines

The domain X=RX = \mathbb{R}, C=H={positive  half  lines}C = H = \{positive\;half\; lines\}. // C = H why? by the def of this problem hypothesis is the mapping A positive half line is a threshold (a real number) such that all pts to the left are labeled - and all pts to the right are labels +.

Learning Algorithm

Sort the training data in ascending order Find greatest - labeled pt and smallest + labeled pt Set the threshold anywhere in this interval

Prove PAC

The generalization error of h will be the region b/w h and c. Points in this region is labeled differently by the concept and the hypothesis. We need to show the generalization error is low with high probability. Equivalently (for the convenience of computation?), we need to show Pr[errD(h)>ϵ]δPr[err_D(h) > \epsilon] \leq \delta.

Two cases for errD(h)>ϵerr_D(h) > \epsilon: 1. B+B_+ hh is more than ϵ\epsilon away from cc (to the right) 2. BB_- hh is more than ϵ\epsilon away from cc (to the left)

R+R_+ as the interval [c,r+][c, r_+] where r+r_+ is a pt to the right of c such that Pr[c,r+][c, r_+] is ϵ\epsilon. Now we want to find Pr(B+)Pr(B_+). B+B_+ only occurs when hh is to the right of r+r_+. This only happens when all xiR+x_i \notin R_+. (i.e. not training data in that interval so the algorithm cannot learn a threshold exists in that interval.)

Since P(R+)=ϵP(R_+) = \epsilon, Pr[xiR+]1ϵPr[x_i \notin R_+] \leq 1 - \epsilon. Then

Pr[B+]=Pr[x1R+x2R+...xmR+](1ϵ)mPr[B_+] = Pr[x_1 \notin R_+ \land x_2 \notin R_+ \land ... \land x_m \notin R_+] \leq (1 - \epsilon)^m

xix_i are all random and independent.

By symmetry, Pr[B](1ϵ)mPr[B_-] \leq (1 - \epsilon)^m.

Now we can bound Pr[errD>ϵ]Pr[B+B]Pr[B+]+Pr[B](unionbound)2(1ϵ)m2eϵm(1xex)\begin{aligned} Pr[err_D > \epsilon] & \leq Pr[B_+ \lor B_-] \\ & \leq Pr[B_+] + Pr[B_-]\quad (union bound) \\ & \leq 2(1 - \epsilon)^m \\ & \leq 2\mathrm{e}^{-\epsilon m} \quad\quad\quad (1 - x \leq \mathrm{e}^{-x}) \end{aligned}

So we want to set it less than delta, Pr[errD>ϵ]2eϵmδPr[err_D > \epsilon] \leq 2\mathrm{e}^{-\epsilon m} \leq \delta. Take ln on both sides and solve for m, we get m1ϵln(2δ)m \geq \frac{1}{\epsilon} \ln(\frac{2}{\delta}). This shows CC is PAC-learnable (because as m gets larger ϵ\epsilon and δ\delta are getting smaller). QEDQED

Review

  • We want to bound Pr[err_D > epsilon] first.

  • Identify the error region then union bound

  • Set the bound less than delta

  • Solve for m (m should bound epsilon and delta).

Positive Intervals

Again, the domain X=RX = \mathbb{R}. The concept class is also the hypothesis space C=HC = H. Each cCc \in C specifies an interval that will be labeled +. All pts outside of the interval will be labeled -.

Learning Algorithm

Pick a consistent hHh \in H.

Prove PAC

Again, we want to prove CC is PAC-learnable by HH. In order to bound Pr[errD>ϵ]Pr[err_D > \epsilon], we need to break down all cases where errD>ϵerr_D > \epsilon.

Let clc_l be the left boundary of c and crc_r be the right boundary of c. Define: 1. LL_- h is more than ϵ2\frac{\epsilon}{2} away from clc_l (to the left) 2. L+L_+ h is more than ϵ2\frac{\epsilon}{2} away from clc_l (to the right) 3. RR_- h is more than ϵ2\frac{\epsilon}{2} away from crc_r (to the left) 4. R+R_+ h is more than ϵ2\frac{\epsilon}{2} away from crc_r (to the right)

There are 2 x 2 = 4 combinations: {L,R}\{L_-, R_-\}, {L,R+}\{L_-, R_+\}, {L+,R}\{L_+, R_-\}, {L+,R+}\{L_+, R_+\}

We want to have this bound, Pr[errD>ϵ]Pr[L,RL,R+L+,RL+,R+]Pr[err_D > \epsilon] \leq Pr[L_-, R_- \lor L_-, R_+ \lor L_+, R_- \lor L_+, R_+] Pr[L,R]+Pr[L,R+]+Pr[L+,R]+Pr[L+,R+]\leq Pr[L_-, R_-] + Pr[L_-, R_+] + Pr[L_+, R_-] + Pr[L_+, R_+] (Union Bound) 4(1ϵ2)m\leq 4(1 - \frac{\epsilon}{2})^m 4(eϵ2)m=4eϵm2\leq 4(\mathbb{e}^{-\frac{\epsilon}{2}})^m = 4\mathbb{e}^{-\frac{\epsilon m}{2}} (again 1xex1 - x \leq \mathrm{e}^{-x})

We good. Now set it less than δ\delta and solve for m Take ln then solve we get m2ϵln(4δ)m \geq \frac{2}{\epsilon} \ln(\frac{4}{\delta}). QEDQED

Learning Axis-aligned Rectangles

Learning Algorithm

c is the target concept. The algorithm finds h which is the smallest consistent rectangle.

Prove PAC

Figure out the cases where errD>ϵerr_D > \epsilon. Since h is the smallest rectangle, we grow each edge outward by probability mass ϵ4\frac{\epsilon}{4}. Why it has to be inside c? Cause every pt inside h is labeled + and every pt outside h is labeled -. Outside of c is -, not an error. So we have to identify the error region first. In this problem, err = c - h.

Pr[errD>ϵ]Pr[B1B2B3B4]Pr[B1]+Pr[B2]+Pr[B3]+Pr[B4](union  bound)4(1ϵ4)m4eϵm4(1xex)\begin{aligned} Pr[err_D > \epsilon] & \leq Pr[B_1 \lor B_2 \lor B_3 \lor B_4] \\ & \leq Pr[B_1] + Pr[B_2] + Pr[B_3] + Pr[B_4] \quad (union \; bound) \\ & \leq 4(1 - \frac{\epsilon}{4})^m \\ & \leq 4\mathrm{e}^{-\frac{\epsilon m}{4}} \quad\quad\quad (1 - x \leq \mathrm{e}^{-x}) \end{aligned}

Solve for m.

Ending note: The proofs above are all tailored to each problem. There is a more general theorem for proving PAC.

Last updated