Miles and guests unpack how the ONS is collecting the prices from more than a billion supermarket checkout and online sales to measure UK inflation. Transcript Scanner data podcast transcript Miles: Hello and welcome to statistically speaking, the official podcast of the UK's Office for National Statistics. I'm Miles Fletcher, and in this episode, we're taking an in depth look at a very big change in how the ONS produces its estimates of inflation, no longer the sole preserve of clipboard wielding prices collectors roaming the supermarket aisles. The digital revolution has now fully arrived. From this month, the UK's inflation indices are now partly based on millions of prices data gathered directly from the tills or scanners, to be precise. How is it all done? What is the role of Taylor Swift in all this? Yes, there is one. And what are the benefits for economists, decision makers and all of us ordinary folk who worry about the cost of living. Here to unpack it all for us is Mike Hardy, who has led the project here at the ONS, and top economist and former member of the Bank of England Monetary Policy Committee, Jonathan Haskel, professor of economics at Imperial College London. Professor to start with you: to understand what's changed, it'd probably be helpful to remind ourselves how consumer prices inflation has until now been calculated. Essentially, it was the ONS and its agents checking the prices of 1000s of items on a monthly basis to see how they changed. Jonathan: Yeah, that's right, and the ONS has gone to an enormous amount of effort in order to make that collection representative and make it consistent. But of course, in the modern era of scanner, data, computers, e commerce, things like that, there are other ways of doing it. I guess the important point, which Mike can talk about some more, is that one of the things that we know from statistics is that having a big sample isn't necessarily going to be better if you have a representative sample to start with. So I think one of the interesting points about all of this is whilst the scanner data is collecting many more data points, it's a fascinating check on the representativeness or otherwise of the ONS survey and the procedures thus far as to whether the actual average of all of that will turn out to be very different or similar to what's done before. It's a great advantage to have all of this extra data, but one shouldn't overstate either the advantage or use it as a way of rubbishing what the ONS has been doing in the past. Miles: So to put it this way, perhaps then, what the ONS has traditionally been doing in collecting prices, is to take this big monthly snapshot of retailing and prices, and what people have been paying for items. What it's got now, what it's moving to in the digital age is moving from a still picture, perhaps to a rolling 4k video, and from that, it can find out exactly what has been missed out of the inflation calculations previously. Jonathan: Well, I'll just put a little bit of a spin on that. One of the things that the price collectors do, and they're very, very careful to do, is to make sure that that snapshot is consistent across the snapshots, if you sort of see what I mean. So there is a bit of a rolling element to those snapshots already, because, for example, if you're going to collect the price of, let us say, Ladies jeans, which is something that I was doing with the price collector recently, you want to be sure you collect the price for the same good over time. And the point about the price collectors is they're extremely conscientious about making sure that, in the case of ladies jeans, they are coloured blue. They've got either a flared leg or not a flared leg. They've got the same number of pockets, they've got the same amount of stitching, they've got different decorations on them. To make sure that those goods remain the same is actually very important, and that's something actually which the hand collection can do. And as I say, I think that means that the snapshot element is maybe not quite the right metaphor, if I may say, miles. It is the relation. It's a consistent element over time, a consistent snapshot if that's using a metaphor Miles: That said it's more than just blindly following the same list of products every month. But nonetheless, the traditional way of doing things has had significant. What do you think those are? Jonathan: I guess the limitations are that when one is collecting a sample, any kind of sample at all, one is always doing one's best to try to hope that that's a representative sample, and having more data then is going to help if it turns out that the sample is unrepresentative. So I think that's one part of it. I think the other part of it is, of course, it's becoming increasingly costly to hand collect these numbers, and you know, like any public agency, one wants to be as careful as one possibly can with taxpayers money. That's the sort of second thing. And the third thing is, especially in the era of the Internet and dynamic pricing and so forth, these prices change, you know, at sort of dizzying rates as firms change their prices throughout the product cycle of the good. And therefore the sort of consistent snapshots may miss some of that variation, Miles: And all that data, of course, is out there to be learned from, isn't it? Essentially, Mike, is that what the scanner data project has been, it's been all about harvesting that data and using it to produce what's being described as a step change in how inflation is calculated. Mike: So we've been transforming our consumer price statistics for some time, and we've been acquiring a wide range of data sources with the aim of improving the quality and granularity of our consumer price statistics. So in recent years, we've used administrative data for rail fares and second hand cars, and we will be incorporating grocery scanner data for 50% of the grocery market, where we will be moving from using around 25,000 prices for those retailers to 300 million derived from the sale of over a billion products, so much more granular and rich information on the prices within those retailers. Importantly, as well, we not only have the price of everything within a store, so we move away from the sample that Jonathan described. So taking the price of a small number of products within each store to collecting all of the prices within store, from supermarket checkouts to also getting a better understanding of how much of each product people are purchasing. So that gives us a much clearer picture of inflation by using these large administrative data sets. Miles: Because the purpose here is to get a sense of how the cost of living is changing for people as well, isn't it? And presumably, if you're just checking the same prices of the same goods month after month, you're not understanding about how price changes are influencing people's purchasing decisions. Does it help with that? Mike: Yeah, so the scanner data, as I said, gives us a complete kind of picture of all the prices within a store, and it gives us the underlying quantities, so how much of each particular product is being purchased, it also gives us the price at the till, rather than the price on the shelf. So that captures a number of different things that we were unable to capture with the sample data. So the first being if consumers switch from a premium to a value brand, for example, in response to cost of living rising. We pick that up in the scanner data and also discounting, we can better reflect that particularly store cards, because we now understand for a particular product, total spending on that product and the quantity of that product sold, which allows us to get an average price for that product. So we better capture store discount cards, which are available in many of the supermarkets. Miles: And so by getting a sense of the changing availability of products and what people are actually spending on them, it becomes much more useful then as a cost of living index as well as simply a measure of price change. Yes. Mike: So the way we currently produce our inflation statistics is that we have a large virtual shopping basket of goods and services. There are 760 items in that basket. We set the weights at the start of the year, and we set the basket, and then we track the prices of those 760 representative items throughout the year. What the scanner data allows us to do at a very detailed level for certain is what we describe as consumption segments. So a consumption segment would be rice for example, is to reflect change in consumer spending patterns within that consumption segment. So for example, if somebody changes the type of rice that they're buying, they decide to buy microwave rice instead of basmati rice, or they decide to switch from a premium rice product to a value rice product, then we'd capture that in the scanner data on a monthly basis, whereas in the previous approach, we'd just monitor the price of a small number of products. So maybe we would monitor the price of microwave rice and basmati rice just over the year. But now with the scanner data, we have the price of all rice sold within a store, and we can reflect people's changing consumer spending patterns when purchasing a particular consumption segment, which I've described here as rice. Miles: And that in turn, I guess, can also influence the way you weight the index as well in future, because that's a fundamental part of calculating inflation that perhaps a lot of people don't fully appreciate. Mike: Yeah, so it'll still be a fixed basket, but the lowest level of aggregation, so that the most detailed data that we have - that data that's coming in from retailers, where we have the kind of total sales plus the quantity, which allows us to derive a price or a unit cost within