Many of Consumer Reports’ tests of food and drink involve the use of sensitive instruments. A liquid chromatograph determines how much caffeine is in coffee, and an atomic absorption spectrophotometer determines the amount of heavy metals in plastics and toys. A digital photometer measures the light and color of TV displays. To evaluate a food’s nutrition, we use sophisticated laboratory instruments. But how do we evaluate its sensory quality-the characteristics of its ingredients, the balance of its flavors? To do this, we use a very sensitive instrument called the human palate.
For food and sensory testing, Consumer Reports consults a panel of people who have been carefully screened. During their initial interview, they let us know, among other things, whether they’re willing to eat foods they dislike. (Visitors to our headquarters invariably volunteer to taste ice cream, but our testers are also required to taste buttery spreads – straight-up.) The people we hire have normal taste and odor acuity, but they’ve also shown the ability to recall and identify various flavors and textures, and to communicate – in precise terms – what their taste buds are telling them.
Our in-house sensory experts, who have studied food science, nutrition, statistics, and psychology, train the panelists in evaluating foods. The tasters learn that personal preference must play no part in taste testing and that they should ignore irrelevant cues such as color, which can make a bright red sauce seem more tomatoey than a dull orange sauce, even when it’s not. Some food categories require a specific expertise or knowledge, so experts in that category are used and they adhere to the same testing principles as our in-house panel.
Taste-testing takes multidimensional concentration. In a few bites or sips, panelists have to identify flavor: “Does the peanut butter taste roasted or burnt?” And texture: “Is the cookie crisp or soggy?” Moreover, they have to gauge the intensity of flavor and texture: “How meaty-tasting is the hot dog? Is it very juicy, like an orange, or merely moist, like a raisin?”
At the start of each project, our in-house experts spend several days preparing panelists to discern subtle and not-so-subtle differences that they’re likely to encounter in the food to be tasted. First, panelists look at representative samples of the food and list important attributes. Next, they synchronize taste buds, sniffing and sampling ingredients they may find later. During training for soups, the smorgasbord includes chicken broth made various ways: with roasted chicken, boiled chicken, and bouillon cubes. Tasters try monosodium glutamate dissolved in water so that they can identify it in chicken noodle soup. And to appreciate the range of flavors in vegetable soup, they taste canned vegetables next to their fresh, boiled counterparts.
Finally, tasters learn to use standard point systems and descriptive terms to compare such attributes as a food’s hardness or crispness. Soup tasters use the texture of pastas cooked to varying degrees of firmness as reference points when gauging the firmness of the soup noodles – from mushy to al dente. References such as water and milk are used to compare the viscosity or thickness of the soup.
We buy each food product in several locations (sometimes we buy food samples from all across the country) to ensure a representative sampling. We prepare each soup according to the manufacturer’s instructions and stir it just before ladling so each panelist receives a typical amount of vegetables and noodles. We serve the soup very hot; when it has cooled to 160° F, we tell panelists to begin tasting. That way, all soups are tested at the same temperature.
With the same attention to detail, we test ice cream by removing it from the container and then shaving off all of the edges of the ice cream block so that no one gets a piece with “freezer burn.” And we serve it in odor-free cups (yes, we’ve sniffed them to make sure). To test bread, we discard the ends. For cereal, we pour the whole box into a bowl, mix lightly, then serve individual portions.
We control the lighting, sound, and ventilation of the testing room to allow tasters to focus solely on the food in front of them. During the soup tests, that means keeping the area free of cooking smells. Each testing booth has a breadbox-like compartment that can be opened, via wooden shutters, from the booth and the kitchen. We distribute samples from the kitchen side, then close the shutters. The testers then open their shutters and reach in. We further minimize the possibility that kitchen smells will escape to the testing area by pressurizing the air in the booths so that odors that waft in make a U-turn back into the kitchen.
Before every taste test, a questionnaire is developed about the appearance, texture, and flavor of the food being tested. When the soup panelists sit at a booth, for example, they find soup questionnaires, a computer for logging in answers, a cup of water for rinsing between samples, and a “spit cup” for expectoration after the soup has been tasted and evaluated. When panelists hear the wooden shutter close on the kitchen side, they open the compartment and take their first cup of soup. While awaiting word the soup has cooled to the proper temperature, tasters answer questions about its appearance. Then they sip, answering questions about flavor and texture. In a morning, each panelist might evaluate 12 soups, with breaks after every four to refresh their palates.
Panelists try every soup at least three times. According to a plan developed by our statisticians, we switch up the order in which each soup is tasted. That helps us avoid the “context effect,” the tendency to compare a food to one tasted just before. Rotating the order serves another purpose; People tend to pay more attention to the first product than to the next.
We also keep the samples unidentifiable (except by taste) from test to test by serving them in uniform containers identified only by random code numbers. And in case someone has a favorite two-digit number-a birthday, a child’s age-the codes are always three digits.
Depending on the product category, sometimes it is more effective to use a roundtable test format instead of the booths. That means that the panelists are served the samples while seated around a large conference table. They taste each sample, complete their individual ballots and then a discussion about the sample is led by the Sensory Project Leader. A consensus is reached and recorded for each sample.
Our ultimate goal is to answer two questions for each product: How does it differ from other brands? And how high is its quality?
Determining how a product differs from other brands is a matter of having a statistician take data from the panelists’ questionnaires and rank each soup by the intensity of each attribute, or frequency that an attribute description was selected. Then we can talk about differences by highlighting products at the extremes. We may call a vegetable soup at the high end of the saltiness scale “very salty” or say a tomato soup that ranks low on the viscosity scale has a “thinner broth than most.”
To assess the quality, our food experts develop “criteria for high quality” based on how high-quality ingredients subjected to careful processing and handling would – and wouldn’t – taste.
The criteria define a range of attributes acceptable for an excellent product. For example, an excellent chicken noodle soup may have long or short noodles, as long as they aren’t mushy. An excellent chocolate chip cookie may taste buttery or not. A garlicky beef hot dog may be excellent, but so may a smoky pork or poultry one. We don’t pretend to know our readers’ particular likes and dislikes. Rather, we make clear the standard by which we’re judging a food and provide, in the ratings, comments describing each product or groups of products. That way, consumers can choose a highly rated product that suits their preferences.
Ratings result when our statisticians rank the products from those closest to the criteria for excellence to those farthest away. At that point, Consumer Reports’ sensory experts step in again to decide where the best and worst foods fit on a 0-to-100 scale. The products are presented in rank order within quality groups – excellent to poor – and additional descriptions help shoppers make choices to meet their specific needs.