A1385
Title: High dimensional hedonic models for spatial price index construction in the accommodation market using web scraped data
Authors: Ilaria Benedetti - University of Tuscia (Italy) [presenting]
Tiziana Laureti - University of Tuscia (Italy)
Niccolo Salvini - University of Tuscia (Italy)
Abstract: The analysis of spatial price disparities is often hampered by the limited availability of detailed information on product characteristics. The aim is to propose a comprehensive framework to overcome this limitation by leveraging high-dimensional webscrapped data for the construction of robust spatial price indices (SPI) for differentiated services. The focus is on the Italian accommodation market, where online platforms provide a rich source of granular data. The methodology integrates a scalable web scraping architecture with a multistage data processing pipeline. Data is collected from metasearch engines, capturing a wide array of features for each accommodation, including multiple price offers, customer ratings, and a high-dimensional set of amenities and service attributes. The statistical techniques employed to transform this raw, complex data into an analysis-ready dataset are detailed, including spatial deduplication and robust feature engineering. Building on this dataset, the core of the contribution lies in the application of a high-dimensional hedonic pricing model. Using the detailed scraped characteristics, the model controls for quality differences and isolates price gaps across areas. This yields spatial price indices that are more accurate and stable than those from conventional data. The replicable framework turns unstructured web data into indicators ready for policy analysis and consumer information.