Let *X* = (*X*1*,…,Xn*) be the blood pressure (measured in *mmHg*) and let *Y* = (*Y*1*,…,Yn*) be the cortisol level (measured in *mcg/dL*) recorded for *n* = 79 patients recruited for a study in a hospital (*Xi* and *Yi* are measurements for the same patient). What test is most appropriate to gather evidence towards the alternative hypothesis that blood pressure is associated with cortisol level? Please provide the reasoning in detail for your answer.

A) The two-sample paired *t*-test with the null hypothesis that the means of *X* and *Y* differ.

B) The test with the null hypothesis that the Pearson correlation coefficient between *X* and *Y* is zero.

C) The test with the null hypothesis that the regression coefficient is zero in a linear regression with response variable *X* (blood pressure) and explanatory variable *Y* (cortisol level).

(5 points)

ii) Suppose that a treatment is proposed to reduce the duration from the time of infection date, to the time at which a first negative test is recorded in people with mild COVID-19 (call this time period *the duration*). Suppose that 27 people with mild COVID-19 (the study population) are administered the treatment and 73 people with mild COVID-19 are not administered the

treatment (the control population). Both populations are sampled from patients tested at the same clinic over the same period. Let *the durations* for the study sample be *X* = (*X*1*, X*2*,…*), and *the durations* for the control sample be *Y* = (*Y*1*, Y*2*,…*). What test is most appropriate to gather evidence towards the alternative hypothesis that the treatment reduces *the duration*? Please provide the reasoning for your answer. Please provide the reasoning in detail for your answer.

A) The *one-sided* two-sample *unpaired t*-test with *H*0: The mean of *X* is *greater than or equal to the mean of Y.*

B) The *one-sided* two-sample *unpaired t*-test with the null hypothesis that the mean of *X* is *less than or equal to the mean of Y.*

C) The test against the null hypothesis that the Spearman’s ranked correlation coefficient between *X* and *Y* is zero.

D) The *one-sided* two-sample *paired t*-test against *H*0: The mean difference between *Xi* and *Yi* is *less than or equal to zero*.

E) The *two-sided* two-sample *paired t*-test with the null hypothesis that the mean difference between *Xi* and *Yi* is *zero*.

(5 points)

iii) Road vehicle accidents involving ambulances have more detrimental outcomes than accidents involving other similarly sized vehicles (Ray and Kupas, 2005). Measures to avoid such accidents are continually being refined by organizations involved in emergency medical services. Suppose that a city council is interested in knowing if adoption of such measures has lead to an improvement over the last decade. Suppose that the ratio between the number of accidents

iv) involving ambulances (the numerator) and the number of kilometers driven by ambulances (the denominator) has been recorded (*rt* with units’ number of accidents per kilometer year) for each year *t* over the past decade. Which single one of the following statistical quantities is most relevant

for investigating whether or not measures are leading to improvements? Please provide the reasoning in detail for your answer.

A) The sample standard deviation of *rt*.

B) The sample mean of *rt*.

C) The Pearson correlation coefficient *ρ* between *rt* and *t*.

D) The regression coefficient for *t* in a linear regression with *rt* as the response variable and *t* as the explanatory variable.

E) The regression coefficient for *rt* in a linear regression with *rt* as the explanatory variable and *t* as the response variable.

## Problem 2: Bayes’ rule

A study was conducted to assess the sensitivity and specificity of four different human immunodeficiency virus (HIV) serology tests (Koblavi-D`eme et al. 2001). The *Determine* test was among the four, it was developed by Abbott Laboratories (an American provider of health care, medical devices and pharmaceuticals) and was found to have a true negative rate (the true negative rate is also called specificity) of 99.4% and a true positive rate (the true positive rate is also called sensitivity) of 100%. The true negative rate of a test for a disease is the probability that someone without the disease tests negative. The true positive rate of a test for a disease is the probability that someone with the disease tests positive. HIV may be transmitted from an expecting parent to their child by transmission during childbirth or by transmission to the fetus during pregnancy (throughout, assume that there’s no other way for a newborn to be infected). Treatment by the drugs *zidovudine* or *nevirapine* has been shown to reduce the rate of these sorts of transmission of HIV by 38% to 50% in the absence of other intervention (Koblavi-D`eme et al. 2001).

a) Suppose that an expecting parent is infected with HIV and they are treated with *zidovudine* or *nevirapine* during pregnancy. Suppose that after they give birth, a *Determine* serology test reports a positive test for HIV. What is the probability that the child does not have HIV? Round your answer to the nearest 10-th of a percent.

(6 points)

b) UNAIDS (an organization established by the United Nations Economic and Social Council) estimates the prevalence of HIV in Cˆote d’Ivoire among people aged 15-49 to be 2.6%. If a *Determine* serology test reported a positive test for HIV in someone selected uniformly at random among all people in Cˆote d’Ivoire aged 15-49, what is the probability that the person does not have HIV? Round your answer to the nearest 10-th of a percent.

(4 points)

c) In the USA, according to the Centers for Disease Control (a public health institute within the United States Department of Health and Human Services), if someone has a positive serology test for HIV they are not diagnosed as HIV-positive until a second follow-up test also yields a positive test result. What is the probability that someone is incorrectly diagnosed as HIV-positive (*i.e.*, if someone is *not* infected with HIV, what is the probability that their first test and also their second follow-up test are both positive)? Suppose that both tests are *Determine* serology tests, and also assume that the test results are statistically independent. Express your answer in expected number of events in a million (*i.e.* something like ‘a 36 in a million chance’ or ‘a one in a million chance’). Also: In one sentence, what is a *possible* argument as to why the assumption of independence of the two test results might be wrong? (Your argument does not have to be sound, but it must be valid without being tautological).

(3 points)

d) What is the probability that an HIV infected expecting parent transmits HIV to their child either during childbirth or through transmitting HIV to the fetus during pregnancy, given that the parent has *not* received treatment with the drugs *zidovudine* or *nevirapine*, and in the absence of other intervention, according to the preamble of this problem (in concordance with Koblavi-D`eme et al. 2001)?

(2 points) Let *X* = (*X*1*,…,Xn*) be the blood pressure (measured in *mmHg*) and let *Y* = (*Y*1*,…,Yn*) be the cortisol level (measured in *mcg/dL*) recorded for *n* = 79 patients recruited for a study in a hospital (*Xi* and *Yi* are measurements for the same patient). What test is most appropriate to gather evidence towards the alternative hypothesis that blood pressure is associated with cortisol level? Please provide the reasoning in detail for your answer.

A) The two-sample paired *t*-test with the null hypothesis that the means of *X* and *Y* differ.

B) The test with the null hypothesis that the Pearson correlation coefficient between *X* and *Y* is zero.

C) The test with the null hypothesis that the regression coefficient is zero in a linear regression with response variable *X* (blood pressure) and explanatory variable *Y* (cortisol level).

(5 points)

ii) Suppose that a treatment is proposed to reduce the duration from the time of infection date, to the time at which a first negative test is recorded in people with mild COVID-19 (call this time period *the duration*). Suppose that 27 people with mild COVID-19 (the study population) are administered the treatment and 73 people with mild COVID-19 are not administered the

treatment (the control population). Both populations are sampled from patients tested at the same clinic over the same period. Let *the durations* for the study sample be *X* = (*X*1*, X*2*,…*), and *the durations* for the control sample be *Y* = (*Y*1*, Y*2*,…*). What test is most appropriate to gather evidence towards the alternative hypothesis that the treatment reduces *the duration*? Please provide the reasoning for your answer. Please provide the reasoning in detail for your answer.

A) The *one-sided* two-sample *unpaired t*-test with *H*0: The mean of *X* is *greater than or equal to the mean of Y.*

B) The *one-sided* two-sample *unpaired t*-test with the null hypothesis that the mean of *X* is *less than or equal to the mean of Y.*

C) The test against the null hypothesis that the Spearman’s ranked correlation coefficient between *X* and *Y* is zero.

D) The *one-sided* two-sample *paired t*-test against *H*0: The mean difference between *Xi* and *Yi* is *less than or equal to zero*.

E) The *two-sided* two-sample *paired t*-test with the null hypothesis that the mean difference between *Xi* and *Yi* is *zero*.

(5 points)

iii) Road vehicle accidents involving ambulances have more detrimental outcomes than accidents involving other similarly sized vehicles (Ray and Kupas, 2005). Measures to avoid such accidents are continually being refined by organizations involved in emergency medical services. Suppose that a city council is interested in knowing if adoption of such measures has lead to an improvement over the last decade. Suppose that the ratio between the number of accidents

iv) involving ambulances (the numerator) and the number of kilometers driven by ambulances (the denominator) has been recorded (*rt* with units’ number of accidents per kilometer year) for each year *t* over the past decade. Which single one of the following statistical quantities is most relevant

for investigating whether or not measures are leading to improvements? Please provide the reasoning in detail for your answer.

A) The sample standard deviation of *rt*.

B) The sample mean of *rt*.

C) The Pearson correlation coefficient *ρ* between *rt* and *t*.

D) The regression coefficient for *t* in a linear regression with *rt* as the response variable and *t* as the explanatory variable.

E) The regression coefficient for *rt* in a linear regression with *rt* as the explanatory variable and *t* as the response variable.

## Problem 2: Bayes’ rule

A study was conducted to assess the sensitivity and specificity of four different human immunodeficiency virus (HIV) serology tests (Koblavi-D`eme et al. 2001). The *Determine* test was among the four, it was developed by Abbott Laboratories (an American provider of health care, medical devices and pharmaceuticals) and was found to have a true negative rate (the true negative rate is also called specificity) of 99.4% and a true positive rate (the true positive rate is also called sensitivity) of 100%. The true negative rate of a test for a disease is the probability that someone without the disease tests negative. The true positive rate of a test for a disease is the probability that someone with the disease tests positive. HIV may be transmitted from an expecting parent to their child by transmission during childbirth or by transmission to the fetus during pregnancy (throughout, assume that there’s no other way for a newborn to be infected). Treatment by the drugs *zidovudine* or *nevirapine* has been shown to reduce the rate of these sorts of transmission of HIV by 38% to 50% in the absence of other intervention (Koblavi-D`eme et al. 2001).

a) Suppose that an expecting parent is infected with HIV and they are treated with *zidovudine* or *nevirapine* during pregnancy. Suppose that after they give birth, a *Determine* serology test reports a positive test for HIV. What is the probability that the child does not have HIV? Round your answer to the nearest 10-th of a percent.

(6 points)

b) UNAIDS (an organization established by the United Nations Economic and Social Council) estimates the prevalence of HIV in Cˆote d’Ivoire among people aged 15-49 to be 2.6%. If a *Determine* serology test reported a positive test for HIV in someone selected uniformly at random among all people in Cˆote d’Ivoire aged 15-49, what is the probability that the person does not have HIV? Round your answer to the nearest 10-th of a percent.

(4 points)

c) In the USA, according to the Centers for Disease Control (a public health institute within the United States Department of Health and Human Services), if someone has a positive serology test for HIV they are not diagnosed as HIV-positive until a second follow-up test also yields a positive test result. What is the probability that someone is incorrectly diagnosed as HIV-positive (*i.e.*, if someone is *not* infected with HIV, what is the probability that their first test and also their second follow-up test are both positive)? Suppose that both tests are *Determine* serology tests, and also assume that the test results are statistically independent. Express your answer in expected number of events in a million (*i.e.* something like ‘a 36 in a million chance’ or ‘a one in a million chance’). Also: In one sentence, what is a *possible* argument as to why the assumption of independence of the two test results might be wrong? (Your argument does not have to be sound, but it must be valid without being tautological).

(3 points)

d) What is the probability that an HIV infected expecting parent transmits HIV to their child either during childbirth or through transmitting HIV to the fetus during pregnancy, given that the parent has *not* received treatment with the drugs *zidovudine* or *nevirapine*, and in the absence of other intervention, according to the preamble of this problem (in concordance with Koblavi-D`eme et al. 2001)?

(2 points)