As displayed in the image about the scientific method above, there are several steps that go into taking a topic of research interest from the observation stage through experimentation and eventually being able to draw conclusions from your data (i.e., from the study that you intend to conduct). For a brief overview of the scientific method, please review this document:
Scientific Method.pdf
Watch the presentation about this information as well.
Before moving further, it’s important to insure everyone understands the difference between reliability and validity. This is often covered in a undergraduate research methods courses, but as a brief refresher, please review this website, which discusses each. Watch below a presentation that talks through this information as well.
Next, read through this website. This is a bit more technical and longer, but explains several different components of research design that are essential to understand if you’re at the phase of beginning your capstone project.
This document highlights several types of research designs in a fairly understandable way and provides details about the type of data you can expect to collect from each approach.
Major-Types-of-Research-Designs.pdf
This document also provides similar information:
Research Designs-Exploratory.pdf
Watch this presentation talks through this as well. Knowing this will be extremely helpful as you craft a fully developed research proposal.
Sampling is the next area of importance. Before anything else, pleas take a moment to confirm you fully understand the difference between a sample and the larger population from which it is drawn, as well as the relationship between the two (including the limitations of sampling). The image below depicts this relationship and this presentation presentation talks through the topic in more depth.
Take a moment to review the two images below. In the first figure, a normal curve is shown to represent the sum total of any given sample. If you asked 100 people their age and the average age of people from this sample was 50, then this becomes the middle point of the curve, which we’ll refer to as the mean (this is an important distinction since there are multiple ways to measure central tendency, including using the median or mode instead of the average to represent the mean).
Moving along the X axis, you can see the numbers go from -3 through +3. This represents the standard deviation (denoted by “SD” here), which is the distance, or amount, that the individual ages in this example differ from the mean. If the SD is 5, then 55 years old is +1 SD from the mean. If you’re 35 years old in our sample, then you’re -3 SD from the mean. Mathematically, 99.7% of the sample fits under the normal curve (+/- 3 SD). There are always a few people outside of the +/- 3 SD range, such as one (or more) person in our sample who might be 75 (+ 5 SD from the mean), or one (or more) people who are 30 (-4 SD from the mean). Statistically, we might refer to these few people not included in the 99.7% as outliers and as you go through analyzing data from a research study, you will have to decide what to do with the data from these individuals since it may or may not actually represent the larger population from which the sample has been drawn.
As a researcher, you need to determine based on your expertise and what other researchers have already published if your curve is comparable to the larger population. For example, if you wanted to assess what percentage of the US population was African American but you only collected data from a sample of people living in predominantly Caucasian zip codes, you would get a skewed view of what the actual population looks like.
Below is just another way to view the normal curve. The SD numbers are missing, but you already know the middle line is the mean and one line in either direction is +/- 1 SD. This figure breaks down the range that exists within a specific range of SD’s. For example, the range of -1 SD through +1 SD is 68% of the overall sample, 34% in either direction. If you go +- 2 SD’s, you add another 27% (13.5% in either direction), which totals 95% of the population. Simply adding or subtracting one person from your sample, such as an outlier, will not change the percentages under the normal curve, however it will change the mean and SD values.
The image below shows visual representations of positive and negative skew (which does not refer to being good or bad, but rather, the direction of the skew). Going back to the age example, let’s say you have a mean age of 50 but there are a bunch of 45 year olds in your sample but also a handful of 90 year olds. Your average (using the mean) could be 50 but there are many more people who are younger. Even thought it’s not indicated on the image, the Y axis represents the number of people in the whole sample (represented by the letter N), and in this example of positive skew, there will be many more people within this sample (represented by the lowercase letter n, denoting a portion of the sample) clustered at the younger age and a small n at the higher end of the scale, hence the curve is not symmetrical (i.e., “normal”), it’s skewed.
Sometimes skew is expected, such as if you sampled people living in a nursing home where most would be elderly but there might be a few younger people mixed in for various reasons (negative skew). However, sometimes skew can be problematic, such as the example earlier of poor sampling of African Americans. If you know that the percentage of African Americans across the US is about 15% but in your sample it’s only 3%, we can conclude that African Americans are being under-sampled and Caucasians are being over-sampled. You may want to oversample purposefully, however, if it unanticipated, it can be problematic to interpret your data meaningfully.
The final topic we need to cover is research questions and research hypotheses. The first step is to understand what an operational definition means. This simply means that the definition you use to represent something is the standard to be measured in your study. For example, if you give a survey to measure quality of life, the inferences you make when you analyze your data are meaningless without context. You may have great data about quality of life in this example, but what does quality of life represent in your study? What does the phrase mean? Since it could mean something different to different people, you need to operationally define the term first, such as quality of life being a representation of someone’s physical, social, and emotional well-being.
With your operational definition, it still lacks any indication of how it is being measured. In this example, you might note that quality of life is going to be measured by an existing questionnaire that is both valid and reliable, which has already been designed (by someone else) to measure this construct. You will need to identify your dependent and independent variables as well, for example quality of life as being dependent upon something that is otherwise independent, such as age or ethnicity. Because “quality of life” in this example is now specific (by your operational definition) and is both measurable and quantifiable (by using an existing questionnaire, such as one that ranks someone’s opinion–e.g., “on a scale from 1 to 10, how much do you agree with the the following statement…” type questions). This weblink provides a brief overview of independent variables (IV) and dependent variables (DV).
For the purposes of this course, the last step is to generate strong research questions and corresponding hypotheses (in a full research proposal, there would be additional steps to work though subsequent to this). This web article discusses finding inspiration for coming up with your research questions. The handout below provides a brief summary of how to narrow the scope of your topic into appropriate research question format.
Developing a research question.pdf
This PDF helps to work through creating clarity to your research questions.
Research questions.pdf
As this image demonstrates, there is connection between your larger problem statement, your research questions, and the hypotheses. This brief Powerpoint slide presentation talks about research hypotheses in general. Returning to our example of quality of life and age, in such a study our research question might be to examine how age impacts quality of life for a specific group of people in the US and we might hypothesize that as age increases reported quality of life decreases (to connect this with information from earlier in this learning activity, we are hypothesizing a negative correlation between the independent variable of age and the dependent variable of quality of life).
You represent your primary hypothesis as H1. Keep in mind that you are studying social science research, and in social science, you rarely (if ever) can definitively state that you’ve “proven” something. Rather, your research is typically about collecting enough data so that your evidence “suggests” (hopefully strongly suggests) whatever you indicated in your hypothesis, which ultimately speaks to the larger research question, which can then be generalized back up to the original problem statement (so that you wind up adding a specific piece of the puzzle to the larger puzzle, which is the problem you identified in the first place). Therefore, your goal is not to confirm your hypothesis, but rather, it’s to ‘reject’ the null hypothesis (H0). The null hypothesis in this example is that age and quality of life do not have a negative relationship to one another. Keep in mind, that your H1 and H0 are specific to your unique sample, which may or may not generalize to the larger population. After you collect your data and analyze the results, this is where the stats don’t lie, but what they mean in real life (rather than in a lab setting) is up for debate.
Leave a Reply