Give (water) data to data scientists!
Artesia is renowned for the high-quality data science analysis and modelling we deliver. One thing we notice is that the quality of data science products is strongly dependent on the quality and quantity of available data to study.
To quote Guy Bevington from True North: “There is a naïve expectation … that there is a limitless jar of magic fairy dust that Machine Learning Engineers can sprinkle over the company’s data to uncover some revolutionary finding or create some gargantuan cost saving for the business. Is this possible? Well yes in theory but the truth is a Data Scientist can only ever be as good as the datasets they are working with.”
The water sector has come under criticism in the past for not being very proactive in adopting emerging technologies and, with them, data-collecting solutions. The lack of good quality data in some areas has increased challenges in reacting to the drastic and fast change in customer consumption habits during the COVID-19 pandemic. The sector had to rely on a few water companies that accelerated their data collection efforts in the years before the pandemic, in order to quantify consumption changes. This last shock probably accelerated the rise in awareness on the need for long and reliable time series, high-quality datasets, and a deeper understanding of the customer base, that was already occurring in the sector.
Since 2017 the non-household sector has been liberalised, meaning that a large number of smaller retailers are now responsible for customer data collection. Although one body (the Market Operator Services LTD - MOSL) is responsible for collecting and collating the data from the individual retailers, the quality and quantity of data available after the liberalisation has dropped, as not all retailers understand the importance of frequent and reliable data collection, do not see the economic value in collecting data at high frequency and quality, or simply do not have the means to do it. The recent pandemic has highlighted how problematic this approach is and the water sector is urgently seeking solutions, including MOSL setting up a strategic metering review for the non-household market.
But what does effective data collection look like for the water sector?
- Metering: although the UK is still not the greatest example in terms of metering coverage in the water sector, there is now a clear awareness that metering customers does not only have a positive effect in reducing their consumption thanks to waste reductions, but it also allows for a more accurate estimate of household and non-household demand. Having a larger metered base allows our data scientists to build more robust models to estimate current and future consumption.
- Smart-metering: if metering helps estimate average customer consumption in the long term, smart-metering (metering at higher temporal resolution thanks to automatic reporting) allows water companies to have a much better understanding of seasonal and short-term changes in customer habits, as well as the wide variation in daily consumption between households. Additionally, smart metering can improve the understanding of leakage and plumbing losses: machine learning techniques can identify the specific markers for leakage and plumbing losses in high-resolution consumption time series.
- High resolution logging (fast-logging): fast-logging in DMAs involves the high-resolution metering and logging of portions of the distribution network, rather than individual customers, and it is extremely important to identify leakage in the distribution network. The larger portions of the network are covered, the more accurate leakage estimations at company level can be. Fast logging also improves night use estimation, and leakage repair targeting, saving water and money in the long run.
- Stable long-term monitors: smart meters are often used in monitors – groups of household or non-household properties selected to be representative of the whole company. Traditionally, monitors have not always been stable, in particular for the non-household sector, where it is common to use high-frequency metering for as little as a few weeks, often changing the targeted properties. This has been demonstrated to be problematic, as it does not show any longer-term trends. We have demonstrated the value of long-term household and non-household monitors conducting studies on night use and leakage, seasonal trends, peak consumption, and last but not least the impact that COVID-19 has on household and non-household consumption.
- Other advanced technologies: if the solutions above are quite standard in the water sector and data quality improvements are achieved by improving the coverage, new emerging technologies are also investigated and adopted. Acoustic loggers are an example: some UK companies are installing large numbers of small sensors on their distribution pipes, listening to the sounds transmitted within the pipes; machine learning techniques are being used to identify the sound of leakage and triangulate the position of the sound source to target repairs. Many other examples can be found looking at non-operational solutions like the use of micro-robots that travel within water pipes to identify leakage even more accurately or the development of sensors that detect a temperature increase on user pipes when continuous flow occurs, a sign of household leakage.
- Effective data warehousing: although this is not strictly a water-related technology, it is often the bottleneck in effective data usage by water companies. Being able to generate abundant and consistent data is just the first step to effective data usage. A system of hardware and software to effectively capture, store, collate, search, and manage data is essential. With the growth of data quantity and complexity, the IT tools we used to rely on (e.g., spreadsheets or simple relational databases) are not optimal any more and more tailored solutions need to be sought. Cloud services today offer a vast range of solutions to operationalise the data pipeline, from acquisition, to storage, to management, without the need to build a data warehousing system in house.
Without going into detail of new developing technologies, a wider adoption of existing data-collection practices and an effort in maintaining the data collection, storage, and processing chain to a high standard is a great investment that all companies are considering and often undertaking, within the boundaries of strict budgets.
Indeed, we are today able to develop models and data science solutions that would not have been possible a few years ago. The world of big data has now opened to the water sector too, meaning that we are adopting new technologies to handle large datasets and complex algorithms. We are looking forward to the new challenges that these datasets will bring to Artesia and the innovative solutions we will be able to deliver.