π Tutorial: Introduction to Machine Learning β Learning Paradigms β PAC Learning
π§ 1. What is Machine Learning?
Machine Learning (ML) is a subfield of artificial intelligence (AI) that focuses on building systems that learn from data to make decisions or predictions without being explicitly programmed.
Arthur Samuel’s definition (1959):
βMachine Learning is the field of study that gives computers the ability to learn without being explicitly programmed.β
π― 2. Goals of Machine Learning
- Discover patterns in data
- Make predictions on unseen data
- Improve performance with experience
- Generalize well to new situations
π 3. Learning Paradigms in ML
Machine learning problems can be broadly classified into three learning paradigms based on the supervision in training data:
β 3.1 Supervised Learning
- Input: Labeled dataset (
X
,Y
) - Goal: Learn a function
f : X β Y
to map input to output - Examples:
- Classification (e.g., spam detection)
- Regression (e.g., house price prediction)
β 3.2 Unsupervised Learning
- Input: Unlabeled dataset (
X
) - Goal: Discover structure or patterns in data
- Examples:
- Clustering (e.g., customer segmentation)
- Dimensionality reduction (e.g., PCA)
βοΈ 3.3 Reinforcement Learning
- Input: Environment and rewards
- Goal: Learn to make sequences of decisions to maximize reward
- Examples:
- Game playing (e.g., AlphaGo)
- Robotics (e.g., walking, grasping)
π§ͺ 3.4 Semi-Supervised Learning
- Mix of labeled and unlabeled data
- Useful when labeling is expensive
π§βπ€βπ§ 3.5 Self-Supervised Learning
- Learns supervision signal from the data itself (e.g., predicting missing words in a sentence)
π§ͺ 4. Introduction to PAC Learning
Probably Approximately Correct (PAC) Learning is a foundational theoretical framework for understanding what it means to learn in ML.
Introduced by Leslie Valiant (1984), PAC learning provides a formal model that describes how many training examples are needed to learn a concept well.
π 5. PAC Learning – Key Concepts
π― 5.1 Objective
To determine under what conditions a learner can probably (with high probability) learn a concept that is approximately correct.
π§© 5.2 Formal Setup
Letβs assume:
X
: input space (e.g., images, texts)C
: concept class (set of possible functions or hypotheses)c β C
: target concept to be learnedD
: unknown distribution overX
L
: learning algorithmΞ΅ (epsilon)
: accuracy parameter (0 < Ξ΅ < 1) β how much error we tolerateΞ΄ (delta)
: confidence parameter (0 < Ξ΄ < 1) β how confident we want to be
We want the learning algorithm to output a hypothesis h
such that:
Pr_D[h(x) β c(x)] β€ Ξ΅
with probability at least1 β Ξ΄
π 5.3 Probably Approximately Correct (PAC)
The goal of a PAC learner is to produce a hypothesis h
from a hypothesis class H
such that:
h
is approximately correct (error β€ Ξ΅)- This happens with high probability (β₯ 1 – Ξ΄)
π 5.4 PAC Learnability
A concept class C
is PAC-learnable if there exists a learning algorithm such that, for every distribution D
, for every Ξ΅
, Ξ΄
, and for every concept c β C
, the algorithm can find a hypothesis h
satisfying the above conditions in polynomial time and number of samples (in terms of 1/Ξ΅
, 1/Ξ΄
, and size of input).
π’ 6. Sample Complexity in PAC Learning
The number of samples needed to guarantee PAC learning depends on the complexity of the hypothesis space and the desired Ξ΅
, Ξ΄
.
For a finite hypothesis class H, the sample complexity m
satisfies:

Implications:
- As we want more accuracy (smaller
Ξ΅
) or more confidence (smallerΞ΄
), we need more data. - Larger hypothesis space requires more samples to generalize well.
βοΈ 7. PAC vs Real-World Learning
Aspect | PAC Learning | Practical Learning |
---|---|---|
Assumes known distribution? | No | No |
Deals with noise? | Initially No (later extended) | Yes |
Focus | Theoretical bounds | Performance and accuracy |
Guarantees | Formal (probabilistic) | Empirical |
π§ 8. Extensions to PAC Learning
- Agnostic PAC Learning: No assumption that the data fits any function in the hypothesis class.
- VC Dimension: A measure of the capacity of the hypothesis space. Tightly connected to PAC learnability.
- Noisy PAC Models: Deal with situations where labels may be incorrect.
π‘ 9. Why is PAC Learning Important?
- Provides a solid mathematical foundation for ML
- Helps us understand the relationship between data size, complexity, accuracy, and confidence
- Justifies the feasibility of learning from finite data
- Useful in algorithm analysis, generalization theory, and model evaluation
π§ͺ 10. Simple Example of PAC Learning
Suppose youβre trying to learn whether an email is spam based on features.
- Hypothesis space: All linear classifiers
- Accuracy Ξ΅ = 0.05 (want β€5% error)
- Confidence Ξ΄ = 0.01 (99% confidence)
PAC learning tells us how many emails we need to train on so that with 99% probability, our learned classifier is within 5% error of the best possible one in the hypothesis class.
π Summary
Concept | Description |
---|---|
Learning Paradigms | Supervised, Unsupervised, Reinforcement, Semi-Supervised |
PAC Learning | A formal framework defining how to learn a concept approximately correctly with high probability |
Parameters | Ξ΅ (error tolerance), Ξ΄ (confidence level) |
Sample Complexity | Depends on hypothesis class size, Ξ΅, Ξ΄ |
Key Takeaway | ML can work with finite data if the hypothesis class is not too complex, and we accept “good enough” results |
π Further Reading
- “A Theory of the Learnable” by Leslie Valiant (1984) β Original paper on PAC Learning
- Understanding Machine Learning: From Theory to Algorithms by Shai Shalev-Shwartz and Shai Ben-David
- MIT OCW: Introduction to PAC Learning β Lecture Notes