1) Download the Nutrition study data and read it into R-Studio. We will work with the entire data set for this assignment. Use the IFELSE( ) function to create 2 new categorical variables. The variables are:
Alcohol_Use: 1 (yes) if Alcohol > 0
0 (no) if Alcohol=0
Age_retired: 1 if Age >= 65
0 if Age < 65
If you have trouble using the IFELSE( ) function in R, you could create these new categorical variables in EXCEL, and then just read them into R with the dataset. It works either way.
Report the counts for each value of these 2 new categorical variables.
2) For this problem, we are going to see of smoking (SMOKE) is related to body mass (QUETELET). Here, Quetelet is the continuous dependent response variable (Y) and Smoke (X) is the categorical explanatory variable. Please complete the following:
a) Obtain descriptive statistics on Y for each group. In a table report each group's sample size, mean, standard deviation, and variance.
b) Clearly state the null and alternative hypotheses in words and symbols.
c) Use R to obtain the test statistic and p-value for the classic pooled variance two sample T-test. Report the test statistic and p-value, and then state the decision to be made.
d) Report the formula for the test statistic in part c) and verify the computer's computations using the descriptive statistics from part a).
e) Calculate and report confidence intervals for both groups. Discuss the interpretation of the result based on confidence intervals. Is it consistent with the hypothesis test result? If they are different, which should you believe?
3) Moving into a more data analytic framework, then next question would be are there any 2 group categorical variables that exhibit differences relative to the Quetelet variable? Reframing this as more of a direction for an assignment - Using the variable Quetelet as the dependent response variable (Y), conduct hypothesis tests and obtain confidence intervals (for each group) to determine if there are group mean differences relative to the categorical variables:
Gender (male vs female)
Age_retired
Alcohol_use
You will need to clearly set up the null and alternative hypotheses, conduct the test with appropriate statistics, and interpret the individual group confidence intervals. Please use tables to summarize your findings. What decisions do you make from these results? How would you summarize the "story" that emerges from these analyses on the Body Mass Quetelet variable?
4) Using the CHOLESTEROL variable as the dependent response variable (Y), conduct hypothesis tests and obtain confidence intervals (for each group) to determine if there are group mean differences relative to:
Gender (male vs female)
Smoke
Age_retired
Alcohol_use
You will need to clearly set up the null and alternative hypotheses, conduct the test with appropriate statistics, and interpret the individual group confidence intervals. How would you summarize the "story" that emerges from these analyses on the CHOLESTEROL variable?
5) Typically, in an open ended data analytic project, the analyst would look to see whether any of the potential response variables are related to the explanatory categorical variables of interest. To limit the amount of analytical work, for the FAT, FIBER, ALCOHOL variables, use a 95% confidence interval approach to compare groups, on average, for
Gender (male vs female)
Smoke
Age_retired
Alcohol_use
Do NOT conduct or report on formal Hypothesis tests! How would you summarize the "story" that emerges from these analyses?
6) Given what you've found so far comparing groups, what is surprising to you? What turned up that you did not expect, if anything? What is it that would explain these results? What do you think should be the next steps to any analysis on this Nutrition data?
Your write-up should address each task
Leave a Reply