STATS 244.3(01) Elementary Statistical Concepts

Final Examination, 21 December 1998

© 1998 by Mikelis Bickis

Click on the [?] to find the answers

Consider the following sample: Choose from the following list the value of the quantities stated in questions 1-10.

1. 5
2. 6
3. 10
4. 16
5. 18
6. 19
7. 21
8. 25
9. 36
10. 180
1. Sample size. ------[?]
2. Mean. ------[?]
3. Median. ------[?]
4. (Sample) variance. ------[?]
5. (Sample) standard deviation. ------[?]
6. Lower quartile. ------[?]
7. 75th percentile. ------[?]
8. 90th percentile. ------[?]
9. Range. ------[?]
10. Inter-quartile range. ------[?]

Questions 11-27 refer to the following five graphical displays.

1. 2. 3. 4. 5. 11. Which display shows the most negative skewness?
1. I
2. II
3. III ------[?]
4. IV
5. V
12. The data display III is called
1. a histogram
2. a boxplot
3. a stemplot ------[?]
4. a cumulative distribution
5. a scatterplot
13. The median of this data is
1. 60
2. 61
3. 62 ------[?]
4. 62.4
5. 65
14. The interquartile-range is
1. 10
2. 16
3. 25 ------[?]
4. 30
5. 57
15. The data display I is called
1. a histogram
2. a boxplot
3. a stemplot ------[?]
4. a cumulative distribution
5. a scatterplot
16. The mean value of this data is
1. around 25
2. less than 60
3. exactly 65 ------[?]
4. between 60 and 70
5. between 70 and 90
17. The median value of this data is
1. around 20
2. between 20 and 60
3. between 60 and 70 ------[?]
4. between 70 and 80
5. between 80 and 90
18. The data display II is called
1. a histogram
2. a boxplot
3. a stemplot ------[?]
4. a cumulative distribution
5. a scatterplot
19. The median value is
1. 40
2. 50
3. 53 ------[?]
4. 60
5. 70
20. The lower quartile is
1. 25
2. 40
3. 53.5 ------[?]
4. 60
5. 66
21. The data display IV is called
1. a histogram
2. a boxplot
3. a stemplot ------[?]
4. a cumulative distribution
5. a scatterplot
22. The median of these data is
1. 40
2. 50
3. 60 ------[?]
4. 65
5. 70
23. The lower quartile of these data is
1. 25
2. 40
3. 53 ------[?]
4. 60
5. 75
24. The data display V is called
1. a histogram
2. a boxplot
3. a stemplot ------[?]
4. a cumulative distribution
5. a scatterplot
25. The correlation coefficient would be around
1. -0.9
2. -0.5
3. 0 ------[?]
4. 0.3
5. 0.9
26. The slope of the regression line would be around
1. -1
2. -0.5
3. -0.25 ------[?]
4. 0.5
5. 1
27. Which pair of displays might be showing the same data?
1. I and II
2. I and III
3. I and IV
4. I and V
5. II and III ------[?]
6. II and IV
7. II and V
8. III and IV
9. III and V
10. IV and V

Questions 28-37 present some statistical terms. Choose from the following list the correct definition of these terms.

1. A random subset (of given size) of a population chosen such that every subset of that size has the same probability of being chosen.
2. A sample obtained by taking a simple random sample from selected subpopulations.
3. The result of subtracting the mean value and dividing by the square root of the variance.
4. A range of values of a population parameter, calculated from the sample, such that the true value of the parameter is contained in that range with a specified probability.
5. The probability of not making a Type II error.
6. A statistic whose expected value is equal to a population parameter.
7. A random variable used to determine whether to reject the null hypothesis.
8. Rejecting the null hypothesis when it is true.
9. The standard deviation of the sampling distribution of an estimator.
10. A systematic error in an estimator.
28. Simple random sample. ------[?]
29. Stratified sample. ------[?]
30. Standardized value. ------[?]
31. Test statistic. ------[?]
32. Confidence interval. ------[?]
33. Power. ------[?]
34. Type I error. ------[?]
35. Standard error. ------[?]
36. Bias. ------[?]
37. Unbiased estimator. ------[?]

A city has three hospitals in which babies are delivered. In addition, there are a certain number of home births. An epidemiologist has collected the following data about the number of live births and stillbirths in 1996 at the different locations. 38. A data display of this kind is called
1. a box plot,
2. a contingency table,
3. a histogram, ------[?]
4. a scatter diagram,
5. a table.
39. What percentage of babies were stillborn?
1. 5.22
2. 5.55
3. 5.87 ------[?]
4. 41.75
5. 53.93
40. What percentage of stillbirths occurred at General Hospital?
1. 2.82
2. 6.90
3. 7.41 ------[?]
4. 50.90
5. 85.00
41. What was the rate of stillbirths at General Hospital?
1. 2.82%
2. 6.90%
3. 7.41% ------[?]
4. 50.90%
5. 85.00%
42. Suppose that the epidemiologist wants to demonstrate that the probability of stillbirths does depend on the location of birth. A formal statistical procedure for this demonstration might begin with a null hypothesis that
1. home births are more risky,
2. home births are less risky,
3. the number of births is the same at all hospitals, ------[?]
4. the rate of stillbirths is the same at all locations,
5. the rate of stillbirths depends on the location.
43. Under this null hypothesis, what would be the expected number of stillbirths among home births?
1. 1.49
2. 1.80
3. 3 ------[?]
4. 11.15
5. 167
44. To test the null hypothesis, the epidemiologist could use
1. a regression analysis
2. a test for association,
3. a one-sample t test, ------[?]
4. a two-sample t test,
5. a sign test.
45. The test statistic would be compared with the tabulated values for which distribution?
1. normal,
2. Student's t,
3. chi-squared, ------[?]
4. binomial,
5. binary.
46. The degrees of freedom would be
1. 1
2. 3
3. 7 ------[?]
4. 8
5. 3009.
47. The test statistic calculated in question 44 is equal to 68.103 which exceeds the 95th percentile of the appropriate distribution. A correct interpretation of this event would be
1. [?]there is no relationship between rates of stillbirth and location of birth,
2. [?] there is insufficient evidence to demonstrate a relationship between stillbirth and location of birth,
3. [?] home births are safer than hospital births,
4. [?] going to College Hospital for delivery increases the chances of stillbirth,
5. [?] rates of stillbirths are not the same for all the hospitals.
48. All but one of the following are a reasonable comment about this study. Select the one that is not justified.
1. [?] The study woud be more informative if information could be obtained about the reasons why the various hospitals were chosen for delivery.
2. [?] Since the data were collected from a properly randomized experiment, one can conclude that procedures at College Hospital are causing an increased number of stillbirths.
3. [?] It is risky to make cause-and-effect inferences from observational studies since lurking variables could be responsible for apparent association.
4. [?] Because the data represent an aggregation of deliveries at various levels of risk, the observed effects could be an instance of Simpson's paradox.
5. [?] Since the births analyzed are not a random sample from a larger population, inferences cannot be made beyond the particular city on statistical grounds alone.

A difficult mathematics class contains six students, of which two are math majors, two are education students, and two are engineering students. Both education students are female, as is one of the engineers. The other three students are male. The students are having problems with the professor and decide to pick a committee of two to speak to him about their concerns. The committee is chosen by putting the names of the six students into a hat, and after thorough mixing, randomly drawing two names.

49. How many possible committees are there?
1. 3
2. 6
3. 15 ------[?]
4. 30
5. 36
50. The committee could be considered to be
1. a simple random sample,
2. a sample with replacement,
3. a random variable, ------[?]
4. a sequence of Bernoulli trials,
5. an exclusive event.
51. Let X be the number of men on the committee. X is an example of
1. an event,
2. an outcome,
3. a random variable, ------[?]
4. a standardized value,
5. a parameter.
52. The expected value of X is
1. 0
2. ½
3. 1 ------[?]
4. 3
5. a meaningless concept.
53. The variance of X is
1. 1/3
2. 2/5
3. 1/2 ------[?]
4. 2/3
5. 1
Select from the following list the correct probability for each of the events in questions 54-60.
1. 2/15
2. 1/5
3. 4/15
4. 3/10
5. 2/5
6. 1/2
7. 3/5
8. 2/3
9. 4/5
10. cannot be determined from the information given.
54. The committee consists entirely of men. ------[?]
55. Both members of the committee are of the same sex. ------[?]
56. The committee contains one member of each sex. ------[?]
57. The committee includes at least one woman. ------[?]
58. The committee does not include a math major. ------[?]
59. The committee is made up of one education ------[?] student and one engineering student.
60. The committee consists of exactly one woman and one engineering student. ------[?]

You are particularly fond of a type of candy that is sold in bulk at the local grocery store. The candy comes in four different colours: green, yellow, orange, and red, each having a different flavour. You like them all except the red ones, which you only eat if there are no others left. You have bought a scoopful of the candy from the bin, but when you get home, you get the uneasy feeling that the bag has too many red ones. You fear that the store manager (who was visibly annoyed when you complained about the stale biscotti) is maliciously dumping red candies into the bin. Before eating any, you sort the candy by colour, and carefully count the number of each kind. You find that you bought a total of 490 candies, of which 125 were green, 113 were yellow, 112 were orange, and 140 were red. Assume that the candies you bought are a simple random sample of candies in the bin.

61. Suppose that all four colours are equally common in the bin. What would be the expected number of red candies in your purchase?
1. 0
2. 1/4
3. 490/4 ------[?]
4. 140
5. 245
62. Under the same assumption, what would be the (approximate) variance of the number of red candies?
1. 1
2. 13.0767
3. 1470/16 ------[?]
4. 490/4
5. 171
63. These computations are based on
1. [?] the -table,
2. [?] Simpson's paradox,
3. [?] the approximation that the number of red candies follows a binomial distribution,
4. [?] the assumption that the number of red candies follows a normal distribution,
5. [?] the fact that the candies constituted a cluster sample,
64. which is reasonable provided that
1. [?] there aren't too many degrees of freedom,
2. [?] the candies were well mixed, and the ones you bought were only a small fraction of the number in the bin,
3. [?] you purchased less than two standard deviations of candy,
4. [?] the probabilty of getting any colour was exactly ½,
5. [?] there were only a few candies left in the bin after your purchase.
65. In fact, 2/7 of the candies you purchased were red. The figure 2/7=0.2857 is an example of
1. [?] a parameter,
2. [?] an estimate,
3. [?] an event,
4. [?] a biased statistic,
5. [?] a Type II error.
66. The (estimated) standard error of the estimator of the true proportion of red candies in the bin is
1. 1/49
2. 3/16
3. 1/4 ------[?]
4. 10
5. 67. A 95% confidence interval on the percentage of red candies in the bin is approximately
1. (18.6,38.6)
2. (21.2,28.8)
3. (23.1,26.9) ------[?]
4. (24.6,32.6)
5. (26.7,30.5)
68. This approximation is based on
1. the central limit theorem,
2. Student's t-distribution,
3. 3 degrees of freedom, ------[?]
4. the regression phenomenon,
5. a sign test.
69. A correct interpretation of this confidence interval is:
1. 95% of the candies are between 24.6% and 32.6% red,
2. 95% of the samples will have between 23.1% and 26.9% red candies,
3. The estimator of the percentage of red candies will be unbiased 19 times out of 20, ------[?]
4. The proportion of red candies in the bin is estimated to be in the indicated range, and such estimated ranges will contain the true value 95% of the time,
5. The hypothesis that the sampling distribution of the candies includes the 25th percentile is rejected 95% of the time, where =0.05.
70. Instead of concentrating on the red candies, you could use a goodness-of-fit test to determine whether there was sufficient evidence to conclude that the four colours were not equally common. Such a procedure would consider the distribution of a test statistic assuming that that the four colours were equally likely to occur. This assumption is called
1. [?] an unbiased estimate,
2. [?] the null hypothesis,
3. [?] the alternative hypothesis,
4. [?] a Type I error,
5. [?] a Type II error.
71. Which of the following would be the most appropriate test statistic in this situation?
1. [?] 2. [?] 3. [?] 4. [?] 5. [?] 72. The value of the test statistic is 4.188. This corresponds to a P-value of
1. around -0.21,
2. less than 0.0005,
3. between 0.01 and 0.02, ------[?]
4. slightly less than 0.25,
5. around 0.5.
73. The degrees of freedom for this test statistic are
1. 3
2. 4
3. 10 ------[?]
4. 489
5. irrelevant.
74. On the basis of this data, one could reasonably state that
1. [?] there is overwhelming evidence that there are too many red candies,
2. [?] there is strong evidence that the different colours are not evenly distributed,
3. [?] although the number of red candies is somewhat more than expected by chance, there is not much evidence against the hypothesis that the four colours are equally likely,
4. [?] it has been conclusively shown that exactly 25% of the candies in the bin were red,
5. [?] although there are statistically significant differences between the proportions of colours among the candies, it has not been established whether there really are more red ones than green ones.
75. Suppose that manager has indeed spiked the candy bin so that 31% of the candies are red. Then the answer to question would illustrate
1. the power of statistical calculation,
2. rejection of the null hypothesis,
3. a significant degree of freedom, ------[?]
4. a Type I error,
5. a Type II error.