Hopefully these examples can clear up some confusions after reading these formal definitions. Pay attention not just to the algorithm itself but more importantly to the proof that states why it is PAC.
Positive Half-lines
The domain X=R, C=H={positivehalflines}. // C = H why? by the def of this problem hypothesis is the mapping A positive half line is a threshold (a real number) such that all pts to the left are labeled - and all pts to the right are labels +.
Learning Algorithm
Sort the training data in ascending order Find greatest - labeled pt and smallest + labeled pt Set the threshold anywhere in this interval
Prove PAC
The generalization error of h will be the region b/w h and c. Points in this region is labeled differently by the concept and the hypothesis. We need to show the generalization error is low with high probability. Equivalently (for the convenience of computation?), we need to show Pr[errD(h)>ϵ]≤δ.
Two cases for errD(h)>ϵ: 1. B+ h is more than ϵ away from c (to the right) 2. B− h is more than ϵ away from c (to the left)
R+ as the interval [c,r+] where r+ is a pt to the right of c such that Pr[c,r+] is ϵ. Now we want to find Pr(B+). B+ only occurs when h is to the right of r+. This only happens when all xi∈/R+. (i.e. not training data in that interval so the algorithm cannot learn a threshold exists in that interval.)
Since P(R+)=ϵ, Pr[xi∈/R+]≤1−ϵ. Then
Pr[B+]=Pr[x1∈/R+∧x2∈/R+∧...∧xm∈/R+]≤(1−ϵ)m xi are all random and independent.
By symmetry, Pr[B−]≤(1−ϵ)m.
Now we can bound Pr[errD>ϵ]≤Pr[B+∨B−]≤Pr[B+]+Pr[B−](unionbound)≤2(1−ϵ)m≤2e−ϵm(1−x≤e−x)
So we want to set it less than delta, Pr[errD>ϵ]≤2e−ϵm≤δ. Take ln on both sides and solve for m, we get m≥ϵ1ln(δ2). This shows C is PAC-learnable (because as m gets larger ϵ and δ are getting smaller). QED
Review
We want to bound Pr[err_D > epsilon] first.
Identify the error region then union bound
Set the bound less than delta
Solve for m (m should bound epsilon and delta).
Positive Intervals
Again, the domain X=R. The concept class is also the hypothesis space C=H. Each c∈C specifies an interval that will be labeled +. All pts outside of the interval will be labeled -.
Learning Algorithm
Pick a consistent h∈H.
Prove PAC
Again, we want to prove C is PAC-learnable by H. In order to bound Pr[errD>ϵ], we need to break down all cases where errD>ϵ.
Let cl be the left boundary of c and cr be the right boundary of c. Define: 1. L− h is more than 2ϵ away from cl (to the left) 2. L+ h is more than 2ϵ away from cl (to the right) 3. R− h is more than 2ϵ away from cr (to the left) 4. R+ h is more than 2ϵ away from cr (to the right)
There are 2 x 2 = 4 combinations: {L−,R−}, {L−,R+}, {L+,R−}, {L+,R+}
We want to have this bound, Pr[errD>ϵ]≤Pr[L−,R−∨L−,R+∨L+,R−∨L+,R+] ≤Pr[L−,R−]+Pr[L−,R+]+Pr[L+,R−]+Pr[L+,R+] (Union Bound) ≤4(1−2ϵ)m ≤4(e−2ϵ)m=4e−2ϵm (again 1−x≤e−x)
We good. Now set it less than δ and solve for m Take ln then solve we get m≥ϵ2ln(δ4). QED
Learning Axis-aligned Rectangles
Learning Algorithm
c is the target concept. The algorithm finds h which is the smallest consistent rectangle.
Prove PAC
Figure out the cases where errD>ϵ. Since h is the smallest rectangle, we grow each edge outward by probability mass 4ϵ. Why it has to be inside c? Cause every pt inside h is labeled + and every pt outside h is labeled -. Outside of c is -, not an error. So we have to identify the error region first. In this problem, err = c - h.
Pr[errD>ϵ]≤Pr[B1∨B2∨B3∨B4]≤Pr[B1]+Pr[B2]+Pr[B3]+Pr[B4](unionbound)≤4(1−4ϵ)m≤4e−4ϵm(1−x≤e−x)
Solve for m.
Ending note: The proofs above are all tailored to each problem. There is a more general theorem for proving PAC.