The views expressed are those of the authors and do not necessarily reflect those of the institutions to which they belong. This paper is the result of the collaboration between the privately owned company Datasinc and the Bank of Italy, within the first Call for Proposals by the Bank of Italy’s Milano Hub. The data utilized in this study were collected by Datasinc, a company acting under its sole responsibility. They were graciously provided to researchers at the Bank of Italy in an aggregated format. Neither the Bank of Italy nor its employees were involved in the data collection process or contributed to the creation of the dataset in any way. The policy brief is based on Braggiotti et al. (2024). “Predicting buildings’ EPC in Italy: a machine learning based-approach” Bank of Italy Occasional Papers, No. 850.
Abstract
EU member states have committed to reaching carbon neutrality by 2050. Since building-related activities are responsible for nearly 25% of the EU’s greenhouse gas emissions, it is crucial to reduce emissions in this sector. However, policymakers need to be careful when acting on this asset class, as buildings are a major part of household wealth and bank assets. To make informed decisions, it is important to have accurate data on buildings’ energy efficiency, such as the energy class reported in Energy Performance Certificates (EPCs). However, accurately assessing the energy efficiency of buildings remains a challenge, in Italy and elsewhere, given the limited availability of comprehensive data.
In this study, we developed a machine learning model to predict the energy class of Italian buildings using publicly available data. The model was trained on a specific geographic area in Italy and was able to correctly predict the energy class 37% of the time. However, if a margin of error of one class was allowed, the accuracy increased to 74%, improving upon standard techniques such as logistic regression. The results also raised concerns about the potential underreporting of buildings with the worst energy efficiency in the official EPC registry.
With the European Union’s commitment to achieving carbon neutrality by 2050, improving energy efficiency in buildings has become crucial. Buildings contribute significantly to greenhouse gas emissions (almost a quarter of all GHG emissions in the EU), and energy retrofitting is seen as a key strategy for reducing these emissions. In addition, energy consumption and energy prices impact the value of buildings. Changes in regulatory standards on the energy efficiency of buildings (i.e. the new Energy Performance of Building Directive in Europe) could lead to a deterioration in the value of those less energy efficient. And, accordingly, the bank’s secured portfolio could be affected by a loss of value. In light of these potential transition risks, banks and supervisors consider the energy efficiency level (and certificate) as crucial data in the risk assessment process of credit granted by immovable property. However, accurately assessing the energy efficiency of buildings remains a challenge, given the limited availability of comprehensive data. In a recent study we address this gap in the context of Italy, by developing a machine learning model to predict the energy performance certificates (EPCs) of buildings using publicly accessible data1.
The study utilized an open dataset comprising approximately 700,000 residential entries from Lombardy (and circa 130,000 from Piedmont as robustness check), the most affluent and populated Italian region, encompassing a wide range of building characteristics, such as age, size, value and floor, and geographical/climatic variables, such as altitude and average temperature. The data on the energy efficiency of the buildings (energy class) were obtained from the regional EPC registry, while building location, size and value were extracted from a revised, proprietary, version, of the Italian cadastre. This comprehensive public registry documents property ownership, boundaries, and physical characteristics across the country.
The study employed a Random forest classifier (RFC) model, a type of ensemble learning method that builds multiple decision trees and merges their outcomes to improve predictive accuracy. This model was chosen for its ability to capture complex, potential non-linear relationships between the building characteristics and their energy performance. The RFC model proved effective, achieving an accuracy of 37% in predicting the exact energy class of buildings, which improved to 74% when a one-class margin (i.e. a class C might be in reality either a B or a D) of error was allowed. More importantly, these results are higher than what can be obtained with standard techniques such as logistic regression.
The analysis revealed that surface area and market value were among the most significant predictors of energy performance. These insights underline the model’s ability to identify which characteristics most strongly influence a building’s energy efficiency, providing valuable guidance for future energy policy and retrofitting efforts.
The model output uncovered discrepancies when compared with official EPC records. In particular, it identified a much higher proportion (82% vs 56%; fig.1) of buildings in the least efficient class, F. This suggests that official data might underrepresent less efficient buildings, probably due to biases in the data collection process. These findings underscore the potential underestimation of energy inefficiency in Italian buildings, which could have serious implications for energy policy design and implementation.
This study hints at a potentially significant underestimation of energy inefficiency within Italy’s building stock, suggesting that the problem may be more widespread than current official records indicate. While the model is not perfectly accurate, it offers valuable insights that underscore the need for more aggressive and targeted energy policies. These could include incentivizing comprehensive energy retrofits, revising regulations for both new and existing buildings, and expanding financial support for upgrading older, less efficient properties. As the EU advances towards stricter energy standards, this more realistic picture of building efficiency can guide the development of policies that effectively address the true scale of the challenge, paving the way for a more sustainable future.
Figure 1. Deviations in predictions vs. SIAPE data for national residential properties
EPC in Italy ranges from F (the worst energy-efficient class) to A4 (the most efficient). EPC is compulsory when selling, renting, or renovating any house. This might introduce a bias in the national cadaster (SIAPE).