Machine Learning-MCQs

PAC & VC Dim

Q1. Who introduced the PAC Learning model?
A) Geoffrey Hinton
B) Leslie Valiant
C) Yann LeCun
D) Andrew Ng
Answer: Leslie Valiant


Q2. In PAC learning, what does “Probably” refer to?
A) Accuracy of hypothesis
B) Training time
C) Confidence (1−δ)
D) Error rate
Answer: Confidence (1−δ)


Q3. In PAC learning, the “Approximately” refers to:
A) Training samples
B) Model complexity
C) Error bound (ε)
D) Model runtime
Answer: Error bound (ε)


Q4. The instance space in PAC learning is commonly defined as:
A) Real numbers
B) Text data
C) Binary vectors of length n
D) Images
Answer: Binary vectors of length n


Q5. What is the range of the concept function c: X → ?
A) {−1, 1}
B) {0, 1}
C) {0, ∞}
D) ℝ
Answer: {0, 1}


Q6. What kind of distribution are training examples drawn from in PAC learning?
A) Gaussian distribution
B) Unknown but fixed distribution
C) Uniform distribution
D) Zipf distribution
Answer: Unknown but fixed distribution


Q7. What is the meaning of a hypothesis being “PAC correct”?
A) Exact match with the concept
B) Zero error
C) Approximately correct with high probability
D) Derived through a neural net
Answer: Approximately correct with high probability


Q8. What is the full form of PAC in PAC learning?
A) Precise and Clear
B) Probably Accurate Concept
C) Probably Approximately Correct
D) Practically Accurate Classification
Answer: Probably Approximately Correct


Q9. Which of the following classes is known to be PAC-learnable?
A) General DNF
B) Boolean Circuits
C) Monotone DNF
D) RSA Functions
Answer: Monotone DNF


Q10. The VC dimension quantifies:
A) Runtime of algorithm
B) Number of training epochs
C) Complexity of hypothesis class
D) Dimensionality of features
Answer: Complexity of hypothesis class


Q11. What is the output of a PAC learning algorithm?
A) A neural network
B) A set of training data
C) A hypothesis from the hypothesis class
D) A random classifier
Answer: A hypothesis from the hypothesis class


Q12. The EXAMPLES routine provides:
A) Noisy examples
B) Human-labeled inputs
C) Random labeled examples
D) Negative-only examples
Answer: Random labeled examples


Q13. One-sided error means:
A) The model is always wrong
B) Errors are only on negatives
C) High recall
D) Zero error
Answer: Errors are only on negatives


Q14. Sample complexity in PAC learning is mainly influenced by:
A) Number of layers in model
B) VC dimension
C) Number of CPUs
D) Type of activation function
Answer: VC dimension


Q15. What is the key assumption in standard PAC learning?
A) Noise in labels
B) Non-linear data
C) i.i.d. sampled examples
D) Deep learning model
Answer: i.i.d. sampled examples


Q16. If the VC-dimension is 10, ε = 0.05, and δ = 0.01, then the sample complexity is approximately:
A) O(10)
B) O(100)
C) O(740)
D) O(1000)
Answer: O(740)


Q17. What does the ORACLE routine allow a learning algorithm to do?
A) Predict labels
B) Query whether an instance is positive
C) Delete hypotheses
D) Return VC dimension
Answer: Query whether an instance is positive


Q18. Which class requires both EXAMPLES and ORACLE to be PAC-learned?
A) Boolean circuits
B) Monotone DNF
C) k-CNF
D) General DNF
Answer: Monotone DNF


Q19. Which of the following is not a valid extension of PAC Learning?
A) Noisy PAC
B) Bayesian PAC
C) Logical PAC
D) Incremental PAC
Answer: Logical PAC


Q20. What happens to sample complexity as VC-dim increases?
A) Decreases
B) Remains constant
C) Increases
D) Becomes independent of error
Answer: Increases


Q21. Which condition implies that some Boolean functions are not PAC-learnable?
A) VC-dim = 0
B) Neural net is shallow
C) Cryptographic hardness assumptions
D) Finite sample space
Answer: Cryptographic hardness assumptions


Q22. Why is the general DNF class not known to be PAC-learnable?
A) Requires infinite data
B) Algorithm doesn’t exist
C) It’s NP-hard in general
D) No hypothesis class defined
Answer: It’s NP-hard in general


Q23. The presence of noise in data led to the development of which PAC extension?
A) Incremental PAC
B) Bayesian PAC
C) Noisy PAC
D) Approximate PAC
Answer: Noisy PAC


Q24. For PAC learning, which of the following guarantees is false?
A) Output hypothesis is always 100% accurate
B) Hypothesis has error ≤ ε with probability ≥ 1−δ
C) Uses i.i.d. samples
D) Time and sample bounds are polynomial
Answer: Output hypothesis is always 100% accurate


Q25. In a concept class with infinite VC-dimension:
A) Learning is guaranteed
B) Sample complexity becomes unbounded
C) Only one-sided error is allowed
D) ORACLE queries are unnecessary
Answer: Sample complexity becomes unbounded

Regression


  1. What kind of regression is most widely used due to its simplicity and interpretability?
    A. Logistic Regression
    B. Polynomial Regression
    C. Linear Regression
    D. Ridge Regression
  2. Which of the following is a key visual tool used before applying regression?
    A. Line graph
    B. Scatterplot
    C. Pie chart
    D. Histogram
  3. Which element in regression is usually denoted by β₀?
    A. Slope
    B. Residual
    C. Intercept
    D. Mean
  4. What is the slope in the linear equation E(Y | X = x) = β₀ + β₁x?
    A. β₀
    B. β₁
    C. X
    D. Y
  5. Which type of points strongly influence the regression line due to extreme X-values?
    A. Outliers
    B. Leverage points
    C. Influential residuals
    D. Mean points
  6. Why was jittering used in the scatterplot of mother–daughter heights?
    A. To show non-linearity
    B. To reduce variance
    C. To prevent point overlapping
    D. To remove outliers
  7. In the bass growth study, what does the mean function describe?
    A. Maximum fish size
    B. Average length by age
    C. Fastest growing fish
    D. Fish with anomalies
  8. What kind of regression line would best represent a perfect positive correlation?
    A. Curved upward
    B. Horizontal
    C. Downward-sloping
    D. 45-degree upward sloping line
  9. In the regression formula, which variable is usually plotted on the Y-axis?
    A. Predictor
    B. Independent variable
    C. Response variable
    D. Constant
  10. Which historical dataset showed a poor linear fit until transformed?
    A. Galton height data
    B. Turkey weight data
    C. Forbes’s boiling point data
    D. Flagstaff snowfall data
  11. In the height study, what does a slope less than 1 in the regression line suggest?
    A. No variation
    B. Negative correlation
    C. Regression to the mean
    D. High randomness
  12. Which variable in E(Y | X = x) represents the expected average of Y?
    A. Y
    B. X
    C. E(Y | X = x)
    D. β₁
  13. What is the effect of assuming constant variance in a regression model?
    A. It increases complexity
    B. It avoids outliers
    C. It simplifies calculations
    D. It distorts predictions
  14. Which of the following is NOT a common goal of regression analysis?
    A. Making predictions
    B. Measuring central tendency
    C. Identifying relationships
    D. Testing hypotheses
  15. In regression, what does an outlier typically affect the most?
    A. Mean of predictor
    B. Linearity
    C. Slope accuracy
    D. Variance of X
  16. What does the variance function Var(Y | X = x) measure?
    A. The number of observations
    B. Spread of Y for fixed X
    C. Growth rate of Y
    D. Constant term in the model
  17. What is the term for visual representations showing each data pair as a point?
    A. Dot matrix
    B. Correlation map
    C. Scatterplot
    D. Area graph
  18. In which dataset was within-treatment variability not visible due to averaged values?
    A. Galton height data
    B. Turkey weight data
    C. Flagstaff snowfall data
    D. Forbes boiling point data
  19. What concept explains that tall mothers may have slightly shorter daughters?
    A. Linearity
    B. Equal scaling
    C. Regression to the mean
    D. High leverage
  20. What does “E” stand for in the expression E(Y | X = x)?
    A. Error
    B. Expected
    C. Equal
    D. Estimate
  21. What insight does a good scatterplot provide before modeling?
    A. Median
    B. Model accuracy
    C. Data coding
    D. Strength and direction of relationship
  22. In the snowfall study, what conclusion was drawn from the scatterplot?
    A. Strong positive correlation
    B. Seasonal interaction
    C. No meaningful trend
    D. Linear increase in snowfall
  23. Which kind of regression would you use if the data shows a curved trend?
    A. Simple linear regression
    B. Log-transformed regression
    C. Constant regression
    D. Slope-normalized regression
  24. What is one benefit of plotting equal axes when predictor and response have similar scales?
    A. Better coloring
    B. Better visibility of noise
    C. Fair visual comparison
    D. Removal of outliers
  25. Which statistical method forms the base for many advanced predictive models?
    A. Chi-square test
    B. Clustering
    C. Linear regression
    D. t-test

MEAN & Variance


  1. What does the mean function E(Y∣X=x)represent?
    A) The maximum value of Y for given X
    B) The expected (average) value of Y given X = x
    C) The variance of Y given X = x
    D) The slope of the relationship between X and Y

  1. In the linear mean function E(Y∣X=x)=β0+β1x what does β0 represent?
    A) The slope of the line
    B) The intercept (value of Y when X = 0)
    C) The variance of Y
    D) The residual error

  1. In the Galton height data, if height were perfectly inherited, what slope would the mean function have?
    A) 0
    B) 0.5
    C) 1
    D) Greater than 1

  1. What does a slope less than 1 in the regression line between mother’s and daughter’s heights indicate?
    A) Perfect inheritance of height
    B) Regression to the mean
    C) No relationship between mother and daughter height
    D) Random variation

  1. What phenomenon explains why children of very tall or very short parents tend to be closer to average height?
    A) Genetic drift
    B) Regression to the mean
    C) Environmental effect
    D) Measurement error

  1. Which variable is typically called the predictor in a regression model?
    A) Y
    B) X
    C) β0
    D) The residual

  1. What does the variance function Var(Y∣X=x) describe?
    A) The average value of Y at X = x
    B) The spread or variability of Y around its mean at X = x
    C) The slope of the mean function
    D) The intercept of the mean function

  1. What assumption about variance is commonly made in simple linear regression models?
    A) Variance increases with X
    B) Variance decreases with X
    C) Variance remains constant for all X values
    D) Variance is zero

  1. If a fish’s length at age x is described by the mean function, what does the curve connecting average lengths at each age represent?
    A) The variance function
    B) The mean function E(length∣age=x)
    C) The residuals
    D) The random noise

  1. What does the slope β1 in the mean function indicate?
    A) The expected change in Y when X increases by one unit
    B) The expected change in X when Y increases by one unit
    C) The intercept value
    D) The variance of Y

  1. Why do we estimate β0 and β1 from data rather than knowing them beforehand?
    A) Because they are constants
    B) Because we never collect data
    C) Because the exact relationship is usually unknown and must be learned from data
    D) Because they represent random error

  1. Which of the following best describes “regression to the mean”?
    A) Extreme values become more extreme over time
    B) Extreme values tend to move closer to the average on subsequent measurements
    C) Mean values never change
    D) Variance always increases

  1. In the Galton height example, what does the dashed line with slope 1 represent?
    A) Actual data trend
    B) Perfect inheritance where daughters’ heights equal mothers’ heights exactly
    C) Random variation
    D) Regression line estimated from data

  1. What is the main purpose of the mean function in regression?
    A) To describe the spread of data
    B) To predict the average response Y for each predictor value X
    C) To calculate residuals
    D) To measure data skewness

  1. When variance is constant across all values of X, the model is said to have:
    A) Heteroscedasticity
    B) Homoscedasticity
    C) Nonlinearity
    D) Autocorrelation

Scroll to Top