Machine Learning-MCQs – cyberenlightener.com

PAC & VC Dim

Q1. Who introduced the PAC Learning model?
A) Geoffrey Hinton
B) Leslie Valiant
C) Yann LeCun
D) Andrew Ng
Answer: Leslie Valiant

Q2. In PAC learning, what does “Probably” refer to?
A) Accuracy of hypothesis
B) Training time
C) Confidence (1−δ)
D) Error rate
Answer: Confidence (1−δ)

Q3. In PAC learning, the “Approximately” refers to:
A) Training samples
B) Model complexity
C) Error bound (ε)
D) Model runtime
Answer: Error bound (ε)

Q4. The instance space in PAC learning is commonly defined as:
A) Real numbers
B) Text data
C) Binary vectors of length n
D) Images
Answer: Binary vectors of length n

Q5. What is the range of the concept function c: X → ?
A) {−1, 1}
B) {0, 1}
C) {0, ∞}
D) ℝ
Answer: {0, 1}

Q6. What kind of distribution are training examples drawn from in PAC learning?
A) Gaussian distribution
B) Unknown but fixed distribution
C) Uniform distribution
D) Zipf distribution
Answer: Unknown but fixed distribution

Q7. What is the meaning of a hypothesis being “PAC correct”?
A) Exact match with the concept
B) Zero error
C) Approximately correct with high probability
D) Derived through a neural net
Answer: Approximately correct with high probability

Q8. What is the full form of PAC in PAC learning?
A) Precise and Clear
B) Probably Accurate Concept
C) Probably Approximately Correct
D) Practically Accurate Classification
Answer: Probably Approximately Correct

Q9. Which of the following classes is known to be PAC-learnable?
A) General DNF
B) Boolean Circuits
C) Monotone DNF
D) RSA Functions
Answer: Monotone DNF

Q10. The VC dimension quantifies:
A) Runtime of algorithm
B) Number of training epochs
C) Complexity of hypothesis class
D) Dimensionality of features
Answer: Complexity of hypothesis class

Q11. What is the output of a PAC learning algorithm?
A) A neural network
B) A set of training data
C) A hypothesis from the hypothesis class
D) A random classifier
Answer: A hypothesis from the hypothesis class

Q12. The EXAMPLES routine provides:
A) Noisy examples
B) Human-labeled inputs
C) Random labeled examples
D) Negative-only examples
Answer: Random labeled examples

Q13. One-sided error means:
A) The model is always wrong
B) Errors are only on negatives
C) High recall
D) Zero error
Answer: Errors are only on negatives

Q14. Sample complexity in PAC learning is mainly influenced by:
A) Number of layers in model
B) VC dimension
C) Number of CPUs
D) Type of activation function
Answer: VC dimension

Q15. What is the key assumption in standard PAC learning?
A) Noise in labels
B) Non-linear data
C) i.i.d. sampled examples
D) Deep learning model
Answer: i.i.d. sampled examples

Q16. If the VC-dimension is 10, ε = 0.05, and δ = 0.01, then the sample complexity is approximately:
A) O(10)
B) O(100)
C) O(740)
D) O(1000)
Answer: O(740)

Q17. What does the ORACLE routine allow a learning algorithm to do?
A) Predict labels
B) Query whether an instance is positive
C) Delete hypotheses
D) Return VC dimension
Answer: Query whether an instance is positive

Q18. Which class requires both EXAMPLES and ORACLE to be PAC-learned?
A) Boolean circuits
B) Monotone DNF
C) k-CNF
D) General DNF
Answer: Monotone DNF

Q19. Which of the following is not a valid extension of PAC Learning?
A) Noisy PAC
B) Bayesian PAC
C) Logical PAC
D) Incremental PAC
Answer: Logical PAC

Q20. What happens to sample complexity as VC-dim increases?
A) Decreases
B) Remains constant
C) Increases
D) Becomes independent of error
Answer: Increases

Q21. Which condition implies that some Boolean functions are not PAC-learnable?
A) VC-dim = 0
B) Neural net is shallow
C) Cryptographic hardness assumptions
D) Finite sample space
Answer: Cryptographic hardness assumptions

Q22. Why is the general DNF class not known to be PAC-learnable?
A) Requires infinite data
B) Algorithm doesn’t exist
C) It’s NP-hard in general
D) No hypothesis class defined
Answer: It’s NP-hard in general

Q23. The presence of noise in data led to the development of which PAC extension?
A) Incremental PAC
B) Bayesian PAC
C) Noisy PAC
D) Approximate PAC
Answer: Noisy PAC

Q24. For PAC learning, which of the following guarantees is false?
A) Output hypothesis is always 100% accurate
B) Hypothesis has error ≤ ε with probability ≥ 1−δ
C) Uses i.i.d. samples
D) Time and sample bounds are polynomial
Answer: Output hypothesis is always 100% accurate

Q25. In a concept class with infinite VC-dimension:
A) Learning is guaranteed
B) Sample complexity becomes unbounded
C) Only one-sided error is allowed
D) ORACLE queries are unnecessary
Answer: Sample complexity becomes unbounded

Regression

What kind of regression is most widely used due to its simplicity and interpretability?
A. Logistic Regression
B. Polynomial Regression
C. Linear Regression
D. Ridge Regression
Which of the following is a key visual tool used before applying regression?
A. Line graph
B. Scatterplot
C. Pie chart
D. Histogram
Which element in regression is usually denoted by β₀?
A. Slope
B. Residual
C. Intercept
D. Mean
What is the slope in the linear equation E(Y | X = x) = β₀ + β₁x?
A. β₀
B. β₁
C. X
D. Y
Which type of points strongly influence the regression line due to extreme X-values?
A. Outliers
B. Leverage points
C. Influential residuals
D. Mean points
Why was jittering used in the scatterplot of mother–daughter heights?
A. To show non-linearity
B. To reduce variance
C. To prevent point overlapping
D. To remove outliers
In the bass growth study, what does the mean function describe?
A. Maximum fish size
B. Average length by age
C. Fastest growing fish
D. Fish with anomalies
What kind of regression line would best represent a perfect positive correlation?
A. Curved upward
B. Horizontal
C. Downward-sloping
D. 45-degree upward sloping line
In the regression formula, which variable is usually plotted on the Y-axis?
A. Predictor
B. Independent variable
C. Response variable
D. Constant
Which historical dataset showed a poor linear fit until transformed?
A. Galton height data
B. Turkey weight data
C. Forbes’s boiling point data
D. Flagstaff snowfall data
In the height study, what does a slope less than 1 in the regression line suggest?
A. No variation
B. Negative correlation
C. Regression to the mean
D. High randomness
Which variable in E(Y | X = x) represents the expected average of Y?
A. Y
B. X
C. E(Y | X = x)
D. β₁
What is the effect of assuming constant variance in a regression model?
A. It increases complexity
B. It avoids outliers
C. It simplifies calculations
D. It distorts predictions
Which of the following is NOT a common goal of regression analysis?
A. Making predictions
B. Measuring central tendency
C. Identifying relationships
D. Testing hypotheses
In regression, what does an outlier typically affect the most?
A. Mean of predictor
B. Linearity
C. Slope accuracy
D. Variance of X
What does the variance function Var(Y | X = x) measure?
A. The number of observations
B. Spread of Y for fixed X
C. Growth rate of Y
D. Constant term in the model
What is the term for visual representations showing each data pair as a point?
A. Dot matrix
B. Correlation map
C. Scatterplot
D. Area graph
In which dataset was within-treatment variability not visible due to averaged values?
A. Galton height data
B. Turkey weight data
C. Flagstaff snowfall data
D. Forbes boiling point data
What concept explains that tall mothers may have slightly shorter daughters?
A. Linearity
B. Equal scaling
C. Regression to the mean
D. High leverage
What does “E” stand for in the expression E(Y | X = x)?
A. Error
B. Expected
C. Equal
D. Estimate
What insight does a good scatterplot provide before modeling?
A. Median
B. Model accuracy
C. Data coding
D. Strength and direction of relationship
In the snowfall study, what conclusion was drawn from the scatterplot?
A. Strong positive correlation
B. Seasonal interaction
C. No meaningful trend
D. Linear increase in snowfall
Which kind of regression would you use if the data shows a curved trend?
A. Simple linear regression
B. Log-transformed regression
C. Constant regression
D. Slope-normalized regression
What is one benefit of plotting equal axes when predictor and response have similar scales?
A. Better coloring
B. Better visibility of noise
C. Fair visual comparison
D. Removal of outliers
Which statistical method forms the base for many advanced predictive models?
A. Chi-square test
B. Clustering
C. Linear regression
D. t-test

MEAN & Variance

What does the mean function E(Y∣X=x)represent?
A) The maximum value of Y for given X
B) The expected (average) value of Y given X = x
C) The variance of Y given X = x
D) The slope of the relationship between X and Y

In the linear mean function E(Y∣X=x)=β0+β1x what does β0 represent?
A) The slope of the line
B) The intercept (value of Y when X = 0)
C) The variance of Y
D) The residual error

In the Galton height data, if height were perfectly inherited, what slope would the mean function have?
A) 0
B) 0.5
C) 1
D) Greater than 1

What does a slope less than 1 in the regression line between mother’s and daughter’s heights indicate?
A) Perfect inheritance of height
B) Regression to the mean
C) No relationship between mother and daughter height
D) Random variation

What phenomenon explains why children of very tall or very short parents tend to be closer to average height?
A) Genetic drift
B) Regression to the mean
C) Environmental effect
D) Measurement error

Which variable is typically called the predictor in a regression model?
A) Y
B) X
C) β0
D) The residual

What does the variance function Var(Y∣X=x) describe?
A) The average value of Y at X = x
B) The spread or variability of Y around its mean at X = x
C) The slope of the mean function
D) The intercept of the mean function

What assumption about variance is commonly made in simple linear regression models?
A) Variance increases with X
B) Variance decreases with X
C) Variance remains constant for all X values
D) Variance is zero

If a fish’s length at age x is described by the mean function, what does the curve connecting average lengths at each age represent?
A) The variance function
B) The mean function E(length∣age=x)
C) The residuals
D) The random noise

What does the slope β1 in the mean function indicate?
A) The expected change in Y when X increases by one unit
B) The expected change in X when Y increases by one unit
C) The intercept value
D) The variance of Y

Why do we estimate β0 and β1 from data rather than knowing them beforehand?
A) Because they are constants
B) Because we never collect data
C) Because the exact relationship is usually unknown and must be learned from data
D) Because they represent random error

Which of the following best describes “regression to the mean”?
A) Extreme values become more extreme over time
B) Extreme values tend to move closer to the average on subsequent measurements
C) Mean values never change
D) Variance always increases

In the Galton height example, what does the dashed line with slope 1 represent?
A) Actual data trend
B) Perfect inheritance where daughters’ heights equal mothers’ heights exactly
C) Random variation
D) Regression line estimated from data

What is the main purpose of the mean function in regression?
A) To describe the spread of data
B) To predict the average response Y for each predictor value X
C) To calculate residuals
D) To measure data skewness

When variance is constant across all values of X, the model is said to have:
A) Heteroscedasticity
B) Homoscedasticity
C) Nonlinearity
D) Autocorrelation