🎓 Tutorial: Introduction to Machine Learning – Learning Paradigms – PAC Learning

🧠 1. What is Machine Learning?

Machine Learning (ML) is a subfield of artificial intelligence (AI) that focuses on building systems that learn from data to make decisions or predictions without being explicitly programmed.

Arthur Samuel’s definition (1959):

“Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed.”

🎯 2. Goals of Machine Learning

Discover patterns in data
Make predictions on unseen data
Improve performance with experience
Generalize well to new situations

📚 3. Learning Paradigms in ML

Machine learning problems can be broadly classified into three learning paradigms based on the supervision in training data:

✅ 3.1 Supervised Learning

Input: Labeled dataset (X, Y)
Goal: Learn a function f : X → Y to map input to output
Examples:
- Classification (e.g., spam detection)
- Regression (e.g., house price prediction)

❓ 3.2 Unsupervised Learning

Input: Unlabeled dataset (X)
Goal: Discover structure or patterns in data
Examples:
- Clustering (e.g., customer segmentation)
- Dimensionality reduction (e.g., PCA)

⚖️ 3.3 Reinforcement Learning

Input: Environment and rewards
Goal: Learn to make sequences of decisions to maximize reward
Examples:
- Game playing (e.g., AlphaGo)
- Robotics (e.g., walking, grasping)

🧪 3.4 Semi-Supervised Learning

Mix of labeled and unlabeled data
Useful when labeling is expensive

🧑‍🤝‍🧑 3.5 Self-Supervised Learning

Learns supervision signal from the data itself (e.g., predicting missing words in a sentence)

🧪 4. Introduction to PAC Learning

Probably Approximately Correct (PAC) Learning is a foundational theoretical framework for understanding what it means to learn in ML.

Introduced by Leslie Valiant (1984), PAC learning provides a formal model that describes how many training examples are needed to learn a concept well.

📘 5. PAC Learning – Key Concepts

🎯 5.1 Objective

To determine under what conditions a learner can probably (with high probability) learn a concept that is approximately correct.

🧩 5.2 Formal Setup

Let’s assume:

X: input space (e.g., images, texts)
C: concept class (set of possible functions or hypotheses)
c ∈ C: target concept to be learned
D: unknown distribution over X
L: learning algorithm
ε (epsilon): accuracy parameter (0 < ε < 1) — how much error we tolerate
δ (delta): confidence parameter (0 < δ < 1) — how confident we want to be

We want the learning algorithm to output a hypothesis h such that:

Pr_D[h(x) ≠ c(x)] ≤ ε
with probability at least 1 – δ

📈 5.3 Probably Approximately Correct (PAC)

The goal of a PAC learner is to produce a hypothesis h from a hypothesis class H such that:

h is approximately correct (error ≤ ε)
This happens with high probability (≥ 1 – δ)

🏗 5.4 PAC Learnability

A concept class C is PAC-learnable if there exists a learning algorithm such that, for every distribution D, for every ε, δ, and for every concept c ∈ C, the algorithm can find a hypothesis h satisfying the above conditions in polynomial time and number of samples (in terms of 1/ε, 1/δ, and size of input).

🔢 6. Sample Complexity in PAC Learning

The number of samples needed to guarantee PAC learning depends on the complexity of the hypothesis space and the desired ε, δ.

For a finite hypothesis class H, the sample complexity m satisfies:

Implications:

As we want more accuracy (smaller ε) or more confidence (smaller δ), we need more data.
Larger hypothesis space requires more samples to generalize well.

⚖️ 7. PAC vs Real-World Learning

Aspect	PAC Learning	Practical Learning
Assumes known distribution?	No	No
Deals with noise?	Initially No (later extended)	Yes
Focus	Theoretical bounds	Performance and accuracy
Guarantees	Formal (probabilistic)	Empirical

🧠 8. Extensions to PAC Learning

Agnostic PAC Learning: No assumption that the data fits any function in the hypothesis class.
VC Dimension: A measure of the capacity of the hypothesis space. Tightly connected to PAC learnability.
Noisy PAC Models: Deal with situations where labels may be incorrect.

💡 9. Why is PAC Learning Important?

Provides a solid mathematical foundation for ML
Helps us understand the relationship between data size, complexity, accuracy, and confidence
Justifies the feasibility of learning from finite data
Useful in algorithm analysis, generalization theory, and model evaluation

🧪 10. Simple Example of PAC Learning

Suppose you’re trying to learn whether an email is spam based on features.

Hypothesis space: All linear classifiers
Accuracy ε = 0.05 (want ≤5% error)
Confidence δ = 0.01 (99% confidence)

PAC learning tells us how many emails we need to train on so that with 99% probability, our learned classifier is within 5% error of the best possible one in the hypothesis class.

📘 Summary

Concept	Description
Learning Paradigms	Supervised, Unsupervised, Reinforcement, Semi-Supervised
PAC Learning	A formal framework defining how to learn a concept approximately correctly with high probability
Parameters	ε (error tolerance), δ (confidence level)
Sample Complexity	Depends on hypothesis class size, ε, δ
Key Takeaway	ML can work with finite data if the hypothesis class is not too complex, and we accept “good enough” results

📎 Further Reading

“A Theory of the Learnable” by Leslie Valiant (1984) – Original paper on PAC Learning
Understanding Machine Learning: From Theory to Algorithms by Shai Shalev-Shwartz and Shai Ben-David
MIT OCW: Introduction to PAC Learning – Lecture Notes