Unions of Intervals
Last updated
Last updated
Let the domain be , and let be the class of concepts defined by unions of intervals. That is, each concept c is defined by real numbers where iff .
Describe an efficient algorithm that learns the class for every . Assume that is known ahead of time to the learner. You should describe a single algorithm that works for all . Since is known, the learner can choose the number of examples needed as a function of and . You can use any hypothesis space you wish.
Given a training set , the learning algorithm will return a union of smallest intervals that contain positive labels. The algorithm will go through the training data in ascending order. When it encounters the first positive label, that will be the beginning of the first interval. The end of the first interval will be the last continuous positive label since beginning of the interval. Then the algorithm will find the start and the end the second interval and so forth.
is consistent with the training data. But does not necessarily need to have s intervals. Two adjacent intervals can have the same value for 's close and 's start. Therefore, the hypothesis space is not the same as the concept class .
The generalization error of will be the region where and differs.
Let be the left boundary and be the right boundary of an arbitrary interval in c.
We have 4 possible cases: 1. is more than away from (to the left) 2. is more than away from (to the right) 3. is more than away from (to the left) 4. is more than away from (to the right)
Explanation: The probability that we get all m examples from the non-error region is at most . Here we use the fact that each example is random and independent.
All positive labels: The algorithm will only detect one interval in this case. But again a case like this only has very low probability to happen. We can minimize the likelihood to happen with a larger sample size.
There are a lot more edge cases that are not covered here. But these trivial cases all share the similar property that we have shown here - low probability.
Now we set (because our confidence measure in the good event must be at least ). Solve for .
Thus, this learning algorithm takes at least examples to ensure a probability at least for the hypothesis to have an error at most . We have proved is PAC-learnable by . .
Overlapping Intervals: If two or more intervals "stick" together (the algorithm will detect less than intervals), that is still fine. Because each error-region has probability mass at most .
All negative labels: The algorithm does not explicitly handle the case where the training data is all negative labels. But this case has a very small probability to happen (less than ).
Argue briefly that your algorithm runs in time polynomial in , and .
Since the cost of processing each example is a polynomial (check + or -), the running time is a polynomial of . We know is a polynomial in , and . Therefore, the algorithm runs in time polynomial in , and as required.