This is the second in a series of posts that tell the behind-the-scenes story of the Chef Community Cookbooks Survey—how the project began, the science behind it, and what the results showed. In this post, we talk to Dr. Nicole Forsgren about how she developed and validated the survey. The first post in the series can be found here.
From discussions with Sean O’Meara and Nathen Harvey, Nicole knew the themes to include. Those themes became the latent constructs that the survey would measure. They were:
Along with the constructs, the survey would also need to include control variables, which would be used to help analyze the data.
Control variables represent things that are outside the control of the researcher. For example, it wasn’t possible for the cookbook survey to be randomized, or to have a “control group” included in the design of the experiment, so control variables were used to statistically account for these restrictions. Controls include information about the respondents (such as demographics) and about the environment. The community cookbooks survey included controls for gender, age, job history, and platforms that the participant used (for example, Linux).
It might be surprising to see that gender is one of the survey’s controls. Nicole says, “Gender is a standard control variable. For example, it wasn’t really possible to control for gender in the design of the survey or in its delivery. I couldn’t give one version of the survey to half of the women (or men, or those who identify otherwise) and another version to the remaining half. Sometimes we see that there are differences according to gender, in the responses to questions. These differences aren’t always explainable. As a sociological or anthropological phenomenon, we know that the way girls and boys are raised, particularly around technology, isn’t the same. Similarly, we also see generational differences. When I run analyses, the inclusion of control variables allows me to see more clearly the role of the key independent variables on the outcome variable, above and beyond the control variables like gender, or years of experience, or age. ”
When it came to developing the survey, Nicole knew that she could use several questions that had been previously validated in the literature, since the model she was using to evaluate and investigate community cookbooks was very much like TAM (see blog post 01). “Nathen, Sean and I joked that this is legal plagiarism. There are some things we copy when we do research that are very, very good to copy. That’s because we want to use measures that are known to be valid and reliable. We just modify the questions to suit our particular context.”
In fact, “question” might be misleading (sometimes, researchers use the term “item”). Most questions in a well-designed survey are actually written as statements, such as “Community cookbooks are easy to use” rather than “Are community cookbooks easy to use?” The reason for using statements is that questions only yield “yes” or “no” answers. A statement can more easily measure people’s attitudes because participants can rank each statement according to a scale. In the case of the community cookbooks survey, the scale had five options, ranging from “Strongly disagree” to “Strongly agree.” (These are called Likert-type items, and the key is to offer an odd-number of responses, so that a neutral response is possible. Most Likert-type survey items offer five or seven options.)
People often wonder why there are questions that seem quite similar to each other in surveys. Nicole explains, “That’s for the latent constructs. If you want a quick and dirty survey, usually because people can’t or won’t spend much time on it, you send out, maybe, a five-question survey, because you know that’s all the time you’ll get. In that case, you don’t use constructs, which need multiple questions for adequate measurement. In a short survey, you measure each attribute with a single question. This is risky because you might choose the wrong word and that question will be misread, misunderstood, or misconstrued.
If you want to do a latent construct, if you really want to capture the underlying idea, then you find two to five questions and you word the questions differently so they are similar but not the same.
For example, in the survey, the perceived usefulness construct has these questions:
Nicole says, “Those are three closely related items that capture the essence of “usefulness,” but they don’t use the same wording. When I do factor analysis, if everyone misunderstands one question the exact same way, I will see it because one question will not load. The others will load together and one will just be off. We always try to have at least three items. If one of them is off, maybe I can use the other two and toss the one. Maybe I can rewrite the one. If I only have two items and one is off, which one is it?”
Note: Factor analysis is used to explore the data for patterns, confirm a hypothesis, or reduce many variables to a more manageable number. In this case, it’s the latter: factor analysis is used to see if many items that we believe explain a latent construct can indeed be used to explain that construct. Factor loading can also be thought of as a statistical version of a card-sorting task. If someone has a stack of 3×5 cards, each with a question on it, and they sort the cards into piles, then similar questions should go into the same pile. If one question was consistently misinterpreted, it won’t fall into the same pile as the other, related questions.
Nicole goes on to say, “As a construct, ‘quality’ is actually tricky. We used the word ‘quality’ all three times in the survey because there’s not another good word. In contrast, in the section called ‘How do you use community cookbooks?’ we used the word ‘use’ twice and we also used ‘dependent on.’ All four of those questions loaded together so I’m fairly confident that everyone understood what I was saying.”
Once the survey was written, Nicole validated the design with a pilot run inside of Chef. The pilot ran over a two-week period, and 140 Chef employees participated (an awesome response). The purpose of the pilot was to test the wording, reliability, and validity of all the survey measures. Nicole wanted to make sure that all of the constructs behaved and loaded as she expected them to, which they largely did. For example, the questions that varied between using the words “use” and “dependent on” loaded together, so she could be fairly confident that everyone understood the questions. However, Sean had been interested in looking at differences in past and current perceptions of cookbooks, but those items loaded together. This meant that a cross-sectional survey probably wasn’t a great tool for teasing apart how perceptions in cookbooks change over time, so items about past perceptions were removed from the analysis.
Nicole described her assessment of the pilot. “All of the remaining items lumped together the way I expected them to. My constructs have the questions loading on the constructs I expected them to load on and the questions did not load on the constructs I did not want them to load on. In other words, they exhibit good psychometric qualities. Good psychometric qualities are discriminant validity, divergent validity and reliability. Discriminant validity (also called convergent validity) is when the questions load on the constructs they’re supposed to load on. Divergent validity is when the questions don’t load on the constructs they’re not supposed to load on. Reliability is when the questions load consistently.”
The survey was ready for ChefConf, and announced from the main stage.
Next: What we learned from the community cookbooks survey.