Unions of Intervals
Problem Setup
Let the domain be X=R, and let C=Cs be the class of concepts defined by unions of s intervals. That is, each concept c is defined by real numbers a1≤b1≤a2≤b2≤...≤as≤bs where c(x)=1 iff x∈[a1,b1]∪[a2,b2]∪...∪[as,bs].
Question
Describe an efficient algorithm that learns the class Cs for every s. Assume that s is known ahead of time to the learner. You should describe a single algorithm that works for all Cs. Since s is known, the learner can choose the number of examples needed as a function of ϵ,δ and s. You can use any hypothesis space you wish.
Learning Algorithm
Given a training set S, the learning algorithm will return a union of smallest intervals that contain positive labels. The algorithm will go through the training data in ascending order. When it encounters the first positive label, that will be the beginning of the first interval. The end of the first interval will be the last continuous positive label since beginning of the interval. Then the algorithm will find the start and the end the second interval and so forth.
h is consistent with the training data. But h does not necessarily need to have s intervals. Two adjacent intervals I1,I2 can have the same value for I1's close and I2's start. Therefore, the hypothesis space H is not the same as the concept class C.
Prove PAC
The generalization error of h will be the region where h and c differs.
Let cli be the left boundary and cri be the right boundary of an arbitrary interval in c.
We have 4 possible cases: 1. L− hli is more than 2sϵ away from cli (to the left) 2. L+ hli is more than 2sϵ away from cli (to the right) 3. R− hri is more than 2sϵ away from cri (to the left) 4. R+ hri is more than 2sϵ away from cri (to the right)
Pr[errD>ϵ]≤n=1∑sPr[L−∨L+∨R−∨R+]≤s×Pr[L−∨L+∨R−∨R+]≤s×(Pr[L−]+Pr[L+]+Pr[R−]+Pr[R+])(unionbound)≤4s(1−2sϵ)m(seeexplanation)≤4se−2sϵm(1−x≤e−x)
Explanation: The probability that we get all m examples from the non-error region is at most (1−2sϵ)m. Here we use the fact that each example is random and independent.
Now we set 4se−2sϵm≤δ (because our confidence measure in the good event must be at least 1−δ). Solve for m.
Thus, this learning algorithm takes at least m examples to ensure a probability at least 1−δ for the hypothesis h to have an error at most ϵ. We have proved C is PAC-learnable by H. QED.
Edge Cases

Overlapping Intervals: If two or more intervals "stick" together (the algorithm will detect less than s intervals), that is still fine. Because each error-region has probability mass at most 2sϵ.
All negative labels: The algorithm does not explicitly handle the case where the training data is all negative labels. But this case has a very small probability to happen (less than δ).
All positive labels: The algorithm will only detect one interval in this case. But again a case like this only has very low probability to happen. We can minimize the likelihood to happen with a larger sample size.
There are a lot more edge cases that are not covered here. But these trivial cases all share the similar property that we have shown here - low probability.
Run-time
Argue briefly that your algorithm runs in time polynomial in ϵ1,δ1, and s.
Since the cost of processing each example is a polynomial (check + or -), the running time is a polynomial of m. We know m is a polynomial in ϵ1,δ1, and s. Therefore, the algorithm runs in time polynomial in ϵ1,δ1, and s as required.
Last updated