My column describing the research on gender discrimination in coffee shops has drawn a lot of comments via email (about a zillion people saw it on Slate, apparently) both to me and to the author of the paper, Caitlin Myers. The comments fall into several categories:
- The behaviour described has a benign explanation (baristas take more care over drinks, for instance)
- We don’t trust the results because there is a small sample size.
- This is not a very important finding - only 20 seconds extra wait for women.
- Tim Harford is an idiot.
Caitlin Myers is not qualified to pronounce on (4) but she has very kindly shared a thoughtful response to (1), (2) and (3). It’s a couple of pages long so it’s below the fold, but recommended for those interested in the work.
I was happily surprised to discover over the weekend that my paper with a group of students had been written up by Mr. Harford. I’ve been even more surprised by the volume and vitriol of comments made in the multiple sites that covered the story. I don’t think that people who know me would classify me as anything like a crazed feminist looking for proof for conspiracies against women in every corner. (This characterization is a considerably toned down summary of some of the accusations floating around out there in comments sections.) In any event, this study was designed by a team of both men and women, and none of us had a strong prior assumption about what the outcome would be.
When we had finished analyzing the data, we found that even after controlling for a bunch of other factors, women wait about 20 seconds (or 24 percent) longer for their coffee than do men. To those who argue that 20 seconds isn’t worth getting upset about, I agree that it isn’t a very long time. But that doesn’t mean that the differential is not important. Despite anecdotal evidence that it exists, there is very little evidence on discrimination in small ticket consumer markets. Our finding that women wait longer than men is significant and robust to a large variety of specifications. The big deal isn’t really that women might wait longer for coffee, but that this suggests that maybe women are waiting longer in other "little" markets like this. These cumulative effects (if they exist—the study mostly motivates looking at the issue) are likely important. In other words, I’m not up in arms about 20 seconds, but knowing whether there is routine discrimination of this sort is important. If women frequently wait 24 percent longer for service, that’s a bigger deal. And, as Mr. Harford points out, understanding why the difference persists despite competition is also important.
I’ve been really interested in the alternative "non-discriminatory" explanations put forth. We considered several of these explanations in the paper. The wait differential may represent employees wishing to inflict a cost on women. It may also be employees wanting to chat with women more then men. Or it may be that employees expect women to be "easier" customers who are less likely to complain. It may also be that women are worse tippers. Economists would call all of these explanations "discrimination," although each represents a different type. (The latter two would be "statistical discrimination," or profiling based on expectations about the behavior of women. Moreover, it’s not so surprising that they would continue to exist in a competitive environment.) I thought that the point of a previous commenter that this may represent employees taking more care with a woman’s order was a great one; we hadn’t considered it.
The paper clearly (and admittedly) is based on a research project designed and conducted by undergraduate students for a class on how to define, identify, and measure discrimination in labor and consumer markets. Because of limited participants and time, we could not design a larger study that would have provided larger sample size or greater geographical coverage. That said, I see no reason to expect that Boston and/or these particular shops are somehow the site of more discriminatory treatment than any other locations. A sample of 277 transactions is large enough to allow for a variety of controls for factors other than gender that might have affected wait times. To the extend that the sample is too small, well, that’s what standard errors are for. I think that the paper is fairly clear about the possible limitations of the study due to sampling technique and power issues. We are also careful to point out what unobserved factors may continue to bias the result and interpret the models with interactions as suggestive that there is more here than simple differences in drink orders. We only identify coefficients as significant if they have a p-value below 0.05. Some of our discussion of significance involves joint significance of coefficients in models with interaction terms. Also, we did control for things like type of payment, and we didn’t start recording time until after the customer completed the order.
The issue of drink complexity is a serious one and we address it throughout the paper and in the conclusion. We feel that the overall evidence suggests that drink complexity is an issue, but that it is not driving all of the results. First, we point out that although sample sizes are too small to get much power, the differential indicates longer wait times for other minority groups as well as for women. Second, we try adding more thorough controls for order by breaking them down by latte, cappuccino, etc. The results remain the same. However, we recognize that this is not fully satisfactory since it was impossible to reliably overhear and quickly record every detail of the order. As an additional check, we note that the coefficients in models with interaction terms suggest that the gender differential changes with gender composition of employees and with line length. Neither interaction term is significant (p-values of 0.31 and 0.13). They are large in magnitude, so we’re probably suffering from low sample size. We don’t place too much weight on them, but conclude that this evidence also suggests that it seems like the differential might vary in ways that can’t be explained by drink complexity.
Many of the remaining pertinent technical issues raised in comments are addressed in the actual paper, so I’ll simply direct folks who are interested in stuff like fixed effects (which control for systematic differences in enumerators or coffee shops) to the paper. Also, if anyone would like the data set (in Stata format) and/or our program ("do") file, I’m happy to share it.
In summary, we’ll never be able to "prove" discrimination. But if the evidence points in that direction, it seems worth following up on.

Back to The Undercover Economist homepage
Tim writes about the economics of everyday life. His