Since the super-high-impact Science article, a lot of researchers in psychology have been discussing replicability and construct validity in psychology, and what can be done to increase it.
The efforts of the ManyBabies projects and others are, of course, completely essential. But personally I think that we need to be careful about this – in keeping with many others. I heard Koraly Perez Edgar make this same point about the dot probe task recently, and I know Linda Smith thinks the same, and see also here, here, here and here – and here for some counterarguments).
To explain why, I’m going to use an experiment that I know about, and have used myself – looking at infants’ tendency to follow an adult’s gaze patterns. The specific experiment I’m criticising is Atsushi Senju’s idea (Gaze Following in Human Infants Depends on Communicative Signals), which I know is a bit close to the bone – but Atsushi knows I think he’s brilliant, and we’ve talked about this – he agrees with me on this at least 50% he says!
Anyway – so say that my aim is to assess an infant’s tendency to follow gaze in a way which we want to be tightly standardised/replicable, with high construct validity. So, my first step is to decide to show exactly the same set of videos to lots of different infants. Then, to ensure test-retest reliability, we show the same (or similar) set of events to each particular infant across tons of different trials – because repeated measurements are more accurate than single measurements.
Each trial might start with the actress looking down for 8000ms, then looking up directly at the camera for 8000ms, then down, directly at one of the two objects, for 15000ms. We present a series of these identical clips, within everything nicely counterbalanced between objects and sides. We present them in a room, using an eyetracker, with lighting conditions tightly controlled, and we analyse the results in a tightly standardised way. Infants who follow gaze tend to look at the same object to which she is looking within a particular time window after she looked down at it, with certain criteria for excluding trials based on poor tracking etc etc (see another article on the importance of data quality in eyetracking).
So this is a great, highly replicable experiment, with strong construct validity right? Well, sure, but…
The problem that I see, and other researchers who work using a lot of naturalistic experimental paradigms, is this – what is this experiment really measuring? It claims to measuring gaze following – but there are a number of ways in which the situations in which the infant’s being asked to follow gaze are nothing like the situations in which infants actually have to follow gaze in the real world. First, there is the discrete, repeated trial structure. The infants learn that an identical series of events (actress looks down, looks up, looks to one object) happens again and again. Second, there is the setting, which looks like a situation that a child rarely if ever actually encounters in the real world. Third, there is the timings – which again is completely different to the time-frame on which we do actually follow gaze in real life.
Researchers who measure infants’ naturalistic gaze behaviour suggest that infants don’t actually follow gaze in some ‘real-world’ settings. This, despite decades of research using paradigms such as the one I’ve shown, suggesting that infants do follow gaze, using these reductionist screen-based paradigms. I’ve got some (unpublished) data suggesting that if you measure infants’ gaze following using these videos, and measure gaze following in the same infants in a table-top task from the Early Social Communication Scales, you get no cross-validity at all between the screen and the tabletop tasks – even though they claim to be measuring the same thing.
So if this screen-based task isn’t measuring gaze following, well what is it measuring? Some people think that it doesn’t matter – and as long as I can do another experiment which replicates my colleague’s work and extends it slightly then they’re advancing science. But I think it does matter. I think this illustrates a real, and important problem in experimental psychology. In attempting to standardise and ensure consistency by paring down and reducing untracked variables, we risk throwing the baby out with the bathwater. That’s why I, and lots of other researchers, think that it’s important to observe and analyse real-world behaviours – even though that too has lots of problems with it – rather than spending all our time designing experiments.