# STATS 244.3(01) Elementary Statistical Concepts

## Final Exam , December 21, 1998

1. The number of observations is 10.

[back to questions]

2. The mean is 18.

[back to questions]

3. The median is 19.

[back to questions]

4. The sample variance is 36.

[back to questions]

5. The sample standard deviation is 6.

[back to questions]

6. The lower quartile is 16

[back to questions]

7. The 75th percentile (same as upper quartile) is 21

[back to questions]

8. The 90th percentile is 25.

[back to questions]

9. The range is 18.

[back to questions]

10. The inter-quartile range is 21-16 = 5.

[back to questions]

11. I is the only one that displays marked negative skewness.

[back to questions]

12. III is a stemplot or stem and leaf diagram.

[back to questions]

13. There are 21 observations so the median would be the eleventh. This is 61 (choice (b)).

[back to questions]

14. The quartiles are at 60 and 70 (or 69, depending on which definition you use), so the inter-quartile range is 70-60 = 10. (It would also have been correct to say 9 but that was not one of the choices.)

[back to questions]

15. Display I is a histogram.

[back to questions]

16. The mean value is the balance point of the histogram. If you try to balance the histogram to the left of the 70, you would find that there is too much weight on the right. Thus the correct answer is (e), between 70 and 90.

[back to questions]

17. Up to 70, we have encompassed 5+10+20=35% of the distribution. Up to 80, we have encompassed 5+10+20+30=65% of the distribution. Thus the 50th percentile (i.e., the median) must be somewhere between 70 and 80.

[back to questions]

18. This is a boxplot or box and whisker plot.

[back to questions]

19. The median is indicated by the line in the middle of the box. In this case, the line is at 60.

[back to questions]

20. The lower quartile is at the bottom of the box, which in this case is a bit above 50. The only choice that resembles this is (c) 53.5.

[back to questions]

21. This is a cumulative distribution.

[back to questions]

22. To find the median, draw across from the 50% point on the ordinate, and find the abcissa where this line meets the graph. This abcissa gives the median, which is around 60. [back to questions]

23. To find the lower quartile (which is the same as the 25th percentile), draw a line across from the 25% point. The abcissa where this line meets the graph is the lower quartile, which seems to be just a bit less than 55. The closest choice is 53. [back to questions]

24. This is a scatterplot or scatter diagram.

[back to questions]

25. The correlation coefficient is clearly negative, since the response variable is lower for large values of the explanatory variable. Although the points appear clumped in three groups, these groups do fit inside a rather narrow ellipse, defining a fairly definite line, showing quite a strong correlation. A correlation coefficient of -0.5 would only vaguely show the direction of the relationship, thus (a) -0.9 is the best choice. (The correlation is actually -0.927.)

[back to questions] 26. As the explanatory variable varies from 40 to 80, the response variable changes from around 32 to around 22, thus giving a slope of around -10/40 = -0.25. The best answer is thus (c). (The exact regression coefficient is -0.221.)

[back to questions]

27. Display I is unique, since it is the only one that is negatively skewed. Displays II, III, IV and V (explanatory variable) all have medians around 60. However, display V has many data points around 40 and 80, something not present in the other three. Both II and IV have outliers somewhat bigger than 90, whereas III has no outliers. Thus II and IV are the only ones that resemble each other. Notice that the range of values and quartiles also agree. Thus the correct answer is (f) II and IV.

[back to questions]

28. A simple random sample is (a) A random subset (of given size) of a population chosen such that every subset of that size has the same probability of being chosen.

[back to questions]

29. A stratified sample is (b)A sample obtained by taking a simple random sample from selected subpopulations.

[back to questions]

30. A standardized value is (c)The result of subtracting the mean value and dividing by the square root of the variance.

[back to questions]

31. A test statistic is (g) A random variable used to determine whether to reject the null hypothesis.

[back to questions]

32. A confidence interval is (d) A range of values of a population parameter, calculated from the sample, such that the true value of the parameter is contained in that range with a specified probability.

[back to questions]

33. The power is (e)The probability of not making a Type II error.

[back to questions]

34. A type I error is (h)Rejecting the null hypothesis when it is true.

[back to questions]

35. The standard error is (i)The standard deviation of the sampling distribution of an estimator.

[back to questions]

36. Bias is (j) A systematic error in an estimator.

[back to questions]

37. An unbiased estimator is (f)A statistic whose expected value is equal to a population parameter.

[back to questions]

38. This is a contingency table.

[back to questions]

39. To answer the next few questions, we need to compute the margins. The percentage of babies that were stillborn is 167/3010=5.55%.

[back to questions]

40. Here we are talking about stillbirths, thus we are conditioning on the second column. The answer is 85/167=50.90%.

[back to questions]

41. Here we are talking about what happens at General Hospital, thus we condition on the first row. The answer is 85/1232=6.90%.

[back to questions]

42. The answer is (d) "the rate of stillbirths is the same at all locations". The null hypothesis is always the negation of what you are trying to prove.

[back to questions]

43. The expected number under the hypothesis of independence is given by the product of row and column totals, divided by the grand total. The answer is

201 x 167 /3010 = 11.15

[back to questions]

44. This data consists of two categorical variables. The only choice that is appropriate for categorical data is (b) test for association.

[back to questions]

45. The chi-squared distribution is the appropriate one.

[back to questions]

46. The degrees of freedom is

(number of rows - 1) x (number of columns - 1)

which in this case is

(4-1) x (2-1) = 3.

[back to questions]

1. Wrong. The P-value is extremely small, giving strong evidence agains the null hypothesis. It is unreasonable to conclude there is no relationship.

[back to questions]

2. Wrong. There is very strong evidence to demonstrate a relationship.

[back to questions]

3. Wrong. Although there is evidence of a relationship, the evidence is not for a relationship as specific as this.

[back to questions]

4. Wrong. This is a causal inference, but all we have is an observational study.

[back to questions]

5. Correct. On the basis of this analysis, we conclude that the rates are not all the same, and we can say no more than that.

[back to questions]

1. Wrong. This is a reasonable comment, since it would give insight about which hospitals were favoured by high-risk births.

[back to questions]

2. Correct. This is an incorrect statement: The study is an observational one, not an experiment.

[back to questions]

3. Wrong. This is a reasonable comment about any observational study.

[back to questions]

4. Wrong. Simpson's paradox could be occuring in this situation, since the high-risk births could be going to one particular hospital.

[back to questions]

5. Wrong. General inference demands a random sample from the target population. In this case, we are observing only data from this city.

[back to questions]

47. The number of committees is (This number will be the denominator for the probability questions below.)

[back to questions]

48. The committee is a simple random sample.

[back to questions]

49. The number of men is a random variable.

[back to questions]

50. This is a little tricky. You could just try np = 2 x ½ = 1 which does give the correct answer, but this is assuming the binomial distribution which does not apply in this case because the sampling fraction is too small. To do this properly, you have to compute the probabilities of finding 0, 1, or 2 men on the committee.

The denominator will be 15, since there are 15 possible committees. There are 3 committees consisting entirely of men (each leaving out one of the three men), and thus the probability of having a committee of two men is 3/15 = 1/5. Similarly, so is the probability of two women (no men). The probability of having exactly one man (and one woman) is what is left:

1 - 1/5 - 1/5 = 3/5.

Thus the expected number of men is: [back to questions]

51. Here we have to use the same probabilities as above, and compute the mean squared deviation from the mean: [back to questions]

52. As we saw above, this is 3/15 or 1/5.

[back to questions]

53. This happens if they are both men (probability 1/5) or both women (probability 1/5). These events are mutually exclusive, so the probability of either is 1/5 + 1/5 = 2/5.

[back to questions]

54. This happens if they are not both of the same sex. Thus we have

1 - 2/5 = 3/5.

[back to questions]

55. This happens if they are not both men. Thus we have

1 - 1/5 = 4/5.

[back to questions]

56. This means that the committee must be chosen from the four other people. Since this can be be done in 6 ways, the probability will be 6/15 = 2/5.

[back to questions]

57. We can choose the education student in 2 ways, and the engineering student in two ways, giving four possible commmittes. Thus the proability will be 4/15.

[back to questions]

58. This is a tricky one. Note that the committee cannot include the female engineer, for then whoever the second person may be, the conditions of the event are not met. Thus the committee must include the male engineer, and a woman who is not an engineer. There are two such women, so there are two such committees. Thus the answer is 2/15.

[back to questions]

59. If all the colours are equally represented, the probability of a red one is 1/4. The sample size is 490, so the expected number of red ones is 490/4. (Note that the question says "expected number", not "expected proportion".

[back to questions]

60. The variance of a binomial random variable is given by np(1-p). If n=490, p=1/4, and (1-p)=3/4, then the variance is

490 x 3/16 = 1470/16.

[back to questions]

1. Wrong. The chi-squared table has nothing to do with this.

[back to questions]

2. Wrong. Simpson's paradox is irrelevant in this context.

[back to questions]

3. Correct. A binomial distribution is the reasonable model for this situation. We have either red candies ("success") or some other colour ("failure"), and the probability of success is 1/4, with 490 trials. More correctly we have a sample without replacement whose probabilities would be described by the hypergeometric distribution, but the binomial is a reasonable approximation under the conditions of the following question.

[back to questions]

4. Wrong. The number of red candies does not follow a normal distribution, since it is a discrete random variable. It is true that this number can be approximated by the normal distribution, but this approximation does not provide for the computations of expected value and variance as given in the previous two questions.

[back to questions]

5. Wrong. The candies might in fact be a cluster sample, if they were not well mixed, but in that case the computations of mean and variance are not justified.

[back to questions]

1. Wrong. Degrees of freedom is not a relevant concept here.

[back to questions]

2. Correct. Having well-mixed candies justifies considering your purchase to be a simple random sample. Sampling only a small fraction justifies using the binomial distribution.

[back to questions]

3. Wrong. This makes no sense -- standard deviation is a measure of dispersion for a quantitative variable.

[back to questions]

4. Wrong. Since there are four colours, you can't have a probability of 1/2 for each of them, since the probabilities have to add up to 1.

[back to questions]

5. Wrong. If you purchased most of the candies, then you would be sampling a substantial proportion of the population, and the binomial distribution would not be a reasonable approximation.

[back to questions]

1. Wrong. A parameter is an attribute of the population. The quantity 2/7 refers to the sample you actually obtained. A different sample might give a different value.

[back to questions]

2. Correct. An estimate is an attribute of the sample. We have observed 2/7 of the candies in our sample to be red, and might estimate that that is the proportion in the population as well.

[back to questions]

3. Wrong. An event is not a number. An event is always described by a complete sentence.

[back to questions]

4. Wrong. It is a statistic, but the sample proportion (which is what this is) is an unbiased statistic, assuming the sample is unbiased.

[back to questions]

5. Wrong. Type II errors are not of consequence here. Moreover, making a Type II error is an event, which 2/7 is not.

[back to questions]

61. We estimate the standard error of a sample proportion by Since we get [back to questions]

62. The 95% confidence interval is approximately given by the estimate plus or minus two standard errors. This would give us the interval from

2/7-2/49 to 2/7+2/49

which works out to be (24.6,32.6).

[back to questions]

63. The central limit theorem is the essential matter here, which allows us to approximate the sampling distribution of the estimator by a normal distribution, justifying the use of 2 standard errors for 95% probability.

[back to questions]

64. (d) is the only that makes sense. Only (b) is also a meaningful statement, but it is not the meaning of a confidence interval. The other three are pure nonsense.

[back to questions]

1. Wrong. An assumption is not based on data, so cannot be an estimate.

[back to questions]

2. Correct. The null hypothesis is the opposite of what you are trying to prove. If you want to show that the colours are unequally distributed, you first assume the null hypothesis that they are equally likely.

[back to questions]

3. Wrong. The alternative is what you are trying to prove, in this case that some colours are more common than others.

[back to questions]

4. Wrong. A Type I error refers to the outcome of a test, not to an assumption prior to the test.

[back to questions]

5. Wrong. A Type II error refers to the outcome of a test, not to an assumption prior to the test.

[back to questions]

1. Correct. This is the formula for a chi-squared test.

[back to questions]

2. Wrong. This is the formula for a confidence interval on the mean of a normal population. We are not calculating confidence intervals here.

[back to questions]

3. Wrong. This is the estimator for the standard error of the difference between two estimated proportions. It is not used as a test statistic.

[back to questions]

4. Wrong. This is the test statistic for a two sample t-test, appropriate for quantitative data. Here, we are dealing with qualitative data, and we only have one sample.

[back to questions]

5. Wrong. This is the test statistic for comparing binomial proportions from two samples. Here we have only one sample.

[back to questions]

65. This is the only question that required using a table. In the chi-squared table, with 3 df, a chi-squared value of 4.11 gives a P-value of 0.25, and a value of 4.64 give a P-value of 0.20. The observed value of 4.188 is just slightly larger than 4.11 and hence gives a P-value of slightly less than 0.25.

66. The degrees of freedom are one less than the number of categories. Since we have 4 colours, we have 4-1=3 degrees of freedom.

[back to questions]

1. Wrong. The P-value for the goodness of fit test is not very small, indicating only weak evidence against the null hypothesis. Moreover, the chi-squared test does not specifically identify an excess of red candies as the alternative, but would just indicate some kine of unevenness. Even then, the confidence interval in question 67 included the hypothesized value of 25% red candies.

[back to questions]

2. Wrong. The evidence is not strong at all, as shown by the P-value of 0.25.

[back to questions]

3. Correct. We do have somewhat excessive red candies, (2/7 instead of 1/4), but unevenness of such magnitude, as measured by the chi-squared statistic, would occur in almost 25% of samples of this size.

[back to questions]

4. Wrong. Hypotheses tests cannot conclusively demonstrate such assertions. All they do is assess evidence. The confidence interval in question 67 indicates what range of proportions are consistent with the data.

[back to questions]

5. Wrong. There are no statistically significant differences.

[back to questions]

67. This would be a Type II error. The null hypothesis would be false, but the null hypothesis would not be rejected.

[back to questions]