Académique Documents
Professionnel Documents
Culture Documents
To accurately assess the performance of energy disaggregation algorithms, we need lots of high-quality,
disaggregated energy data. Ideally, we need data from hundreds of homes, from a diverse range of
geographies and demographic groups. Collecting this quantity of real data would be unfeasibly
expensive and time-consuming. But, worse, even if we had millions of pounds to spend on recording a
dataset, even the best real data has many issues which make it problematic for running a competition.
These issues include mislabelled channels, un-submetered appliances, sampling bias (e.g. only rich
neighbourhoods have been sampled from), gaps etc.
Furthermore, a challenging aspect of NILM research is the that, in order to provide valuable
energy-saving advice to users, NILM must accurately detect rare scenarios. Indeed it is because these
scenarios are rare that makes them so valuable to detect and report to the user. But, because these
scenarios are rare, they only appear a tiny number of times in real datasets; and even if the events of
interest do occur in a real dataset, they might not be labelled. Hence these important use-cases for
NILM are nearly impossible to rigorously test with real data. For example, how can we test if NILM can
detect when fridge seals need to be replaced if the dataset doesn’t label failing fridge seals? Or how can
we test if NILM can detect when an immersion heater is on 24/7 if there is only one example of this
scenario in the real dataset?
Hence we are considering building a “data augmentation system” to generate an effectively infinite
amount of realistic, disaggregated electricity data.
The data augmentation system would output disaggregated electricity data at around 1 Hz. This data
should look (almost) identical to data recorded from real homes. Users of the simulator would be able to
request simulated data for specific countries; or specific demographic groups; or to request data for
particular scenarios such as faulty fridges.
To build the data augmentation system, first we’d build simple statistical models of which appliances
appear in each house (e.g. House 1 has two fridges; one TV; one washer (which was replaced after 60
days); etc.). These statistical models could be trained from appliance survey data.
Then we’d build simple statistical models of when appliances are used. These models would be trained
from long-duration but low resolution datasets like HES and Dataport. The models would capture
patterns like the time of day that appliances are used (e.g. toasters are often used in the morning); the
correlation between appliances (e.g. toasters and kettles are often used within a few minutes of each
other), the correlation between external temperature and appliances, etc.
We could build different statistical models for each country. And, for each country, it would probably be
necessary to create several statistical models, for several different usage patterns (e.g. homes with
young children; single-occupancy homes where the occupant is away during the weekdays etc.)
Then the system would extract the 1 Hz power demand from real appliances from datasets such as
Tracebase. These time series would be split up into single runs of each appliance. (There are now 20
disaggregated datasets listed on the NILM wiki.)
To manufacture power data at 1 Hz, the system would sample randomly from the statistical models of
which appliances appears in each home and the models of appliance usage, conditional on real
historical temperature data (from a time period which was not in the original data). Each artificial "house"
would be made up of a collection of appliances, taken from the high frequency datasets.
To get even more sophisticated, you could imagine using something like Energy+ to model the building
physics in order to model the internal temperature of the home.
The augmented data generator would have plenty of additional uses, beyond powering a NILM
competition:
Background
Patterns to capture:
● Probability distributions for appliance ownership. Split by geography and demographics and
building type.
● Appliances which are mutually exclusive. E.g. a home with a gas boiler is extremely unlikely to
have an oil-fired boiler or a heat pump. Likewise, a home with a gas hob is very unlikely to have
an electric hob (although you might have a gas hob and an electric plate warmer or something
like that).
● How frequently appliances are replaced.
● Learn long-term trends. E.g. the transition from incandescent lamps to compact fluorescent to
LED.
Advantages
● Clean data
● Realistic appliance signatures
Disadvantages
● Small number of appliances
● All appliances are new and healthy
● Expensive and time consuming to build the lab and to add each new appliance
Advantages
● Total control over “mode” of appliance operation
● Could simulate a wide range of appliances, including faulty appliances
Disadvantages
● Requires a lot of work to make it realistic.
● Each new appliance requires work
● Might never be perfectly realistic
Disadvantages
● Needs lots of training data
● Black box
Extract signatures from real data
Advantages
● Easy to engineer & understand
● Easy to add new appliances
● Realistic
● There are now at least 20 public datasets (see list on NILM wiki) from which we can “harvest”
appliance signatures.
Disadvantages
● Can't generate novel appliances (unless we allow some time stretching and other subtle
warping??)
Challenges
● It isn’t trivial to extract signatures from appliance-level data.
Design
File formats
Options:
● Provide a configuration file:
○ Per house or set of houses:
■ Set mean / max / min / constant daily energy consumption for individual
appliances and for an arbitrary set of houses. E.g. “I want the energy consumption
for the fridge to have a mean consumption of 1 kWh for the first 10 houses and 3
kWh for the last 10 houses”
■ Produce separate models for specific geographical groups
■ Produce separate models for specific demographic groups (per geographical
group)
■ Produce separate models per time period (e.g. Christmas; or four seasons; or
whenever!)
● Aggregate days? E.g. daily_models: [mon, tue, wed, thu, fri], [sat, sun]
would create two models: one for week days and one for the weekend.
■ Select pre-defined “anomalies” (which would be learnt from real data) such as:
● Faulty fridge seal
● Appliances on constantly (immersion heater, pool heater, pool pump, DVR)
view_signatures.py signatures.h5
Browse through plots of each signature
simulate.py model.yaml signatures1.h5 signatures2.h5 -o
simulation.h5
Randomly sample from one or more models and stitch together signatures to simulate a new dataset.
Options:
● Request output of appliance activity script rather than dataset.
● Make simulation conditional on real historical weather data.
To do
ipython prototype
● Fetch latest NILMTK code
● Create appliance usage “scripts” (on/off times) for each appliance and each house.
○ Use UK-DALE to start with
○ Plot all this data and visually sanity-check it. (Save plot for report)
○ Save scripts to disk (see File formats section)
■ Save all household metadata.
● Build simple appliance usage model (for entire UK-DALE dataset)
○ Load appliance scripts
○ Be careful to handle missing data (not all homes have data for all the year)
○ Create categorical probability distributions for:
■ Time of use
● User can define interval (hourly / 15-minute intervals / minutely / etc.)
● User can define temporal groupings (e.g. just one categorical dist for every
day? Or break into seasons? Or months? Or one dist per day of the
year? Or seasons + Easter break + Christmas break?
○ Periodicity
■ Fridge compressor
■ Washing machine used, say, every 3 days
○ Correlations between appliances
○ Save models to disk?
■ Probably yes, so we can share models (which will be small files)
■ See File formats section
● Split UK-DALE into separate activations
○ Save to disk (see File formats section)
● Generate augmented data from usage models and activations
● Switch to using HES for usage models
○ Convert HES to NILMTK HDF5
○ Step through HES importer code
○ Re-run modelling code with HES HDF5
○ Visually check appliance scripts
● Add demographic data & geographical data
○ Modify HES importer to save this data into HDF5 metadata
○ Modify appliance usage model to build different models for different demographics /
geographies
● Build model of household appliances using HES.
○ Load “appliance usage scripts” HDF5 (this knowns demographics and which appliances
are in each home)
○ Learn from data how frequently appliances are changed, and for which demographics and
geographies.
○ Learn appliances correlations (e.g. people with a heat pump rarely have a gas boiler)
○ Sample from model of household appliances
■ User can select
● demographic & geographical groupings
● Swap appliances within a home
○ Single appliance swaps (e.g. replace fridge)
○ Many-appliance swaps (e.g. a new tennant)
○ Save sampled “household specifications” to disk.
● Add weather
○ Model correlations between weather and appliance usage
○ Then use real temperature data to drive sampling
● Add appliance noise when off / on standby
Future extensions
● Sub-activation “granulary synthesis”
● Simulate data from commercial buildings, using data from the BMS to build statistical models of
appliance usage.
● Use generative deep neural nets to generate synthetic NILM data?!? (e.g. see OpenAI’s blog
post on generative models)
Names
● DES - Disaggregated Electricity Simulator (but what if the simulator grows to include water /
gas?!)
● DEDAS - Disaggregated Electricity Data Augmentation System (“augmentation” might sound
better than “simulation” for people who are allergic to simulation!)
● NILMTK.Simulator
● Combine efforts with SmartSim?
● NILM-Gym
● NILM-Augment (except it’s not just for NILM)
● EDED - Engineered disaggregated electricity data
● Reconstructed
● RED - reconstructed electricity data (but clashes with REDD!)
● RDED - reconstructed disaggregated electricity data
● EDR - electricity data reconstructed
● Electric MDF
● ER - electric reconstructed
● SlicedPy
● IED infinite_electiricy_data (improvised explosive device!)
● RALED - Reconstructed, appliance-level electricity data
● RAD: Reconstructed appliance data
●
References
● Srinivasarengan, K.; Goutam, Y. G. & Chandra, M. G. Home Energy Simulation for Non-Intrusive
Load Monitoring Applications. I nternational Workshop on Engineering Simulations for
Cyber-Physical Systems, ACM, 2014, 9:9-9:12
● Paatero, J. V. & Lund, P. D. A model for generating household electricity load profiles.
International Journal of Energy Research, Wiley-Blackwell, 2006, 30, 273-290
● Chen, D.; Irwin, D. & Shenoy, P. SmartSim: A Device-Accurate Smart Home Simulator for Energy
Analytics. IEEE International Conference on Smart Grid Communications, 2016
○ bottom-up simulation of appliances
○ Works with NILMTK :)
○ Github project page: “We will be launching our beta version soon”
○ See Iyengar et al. 2016 (same research group) for details on how they infer their
appliance energy models from data. And see Barker et al. 2014 for earlier work on the
same topic.
● Barker, S.; Kalra, S.; Irwin, D. & Shenoy, P. Empirical Characterization, Modeling, and Analysis
of Smart Meter Data. Journal on Selected Areas in Communications, IEEE, 2014, 32, 1312-1327
● Bilton, M. J. PhD Thesis: Electricity Demand: Measurement, modelling and management of UK
homes. Centre for Environmental Policy, Imperial College London, Centre for Environmental
Policy, 2010
○ He built a simulator for appliances down to a remarkable level of detail. Appliances are
modelled pretty much from first principles. Appliances are described in terms of a FSM,
volumes of physical containers (to simulate heating and cooling), U-values of insulators
for fridges, etc etc. Very very detailed stuff.
● Giri, S. Blog post: Literature Review: Simulation of power signatures for appliances. 2014
● Pouresmaeil, E.; Gonzalez, J.; Bhattacharya, K. & Canizares, C. Development of a Smart
Residential Load Simulator for Energy Management in Smart Grids. IEEE Transactions on Power
Systems, 2013, 1-8
○ They developed a matlab toolkit
○ Code is available for download (a zip file from their website)
● Selin Yilmaz’s PhD is highly relevant. She is working on Occupant behaviour modelling.
● De Souza et al. 2016. Procedural Generation of Videos to Train Deep Action Recognition
Networks. “We generate a diverse, realistic, and physically plausible [synthetic] dataset of
human action videos, called PHAV for "Procedural Human Action Videos". … We introduce a
deep multi-task representation learning architecture to mix synthetic and real videos, even if the
action categories differ. Our experiments on the UCF101 and HMDB51 benchmarks suggest that
combining our large set of synthetic videos with small real-world datasets can boost recognition
performance, significantly outperforming fine-tuning state-of-the-art unsupervised generative
models of videos."
● Apple’s first AI paper: Learning from Simulated and Unsupervised Images through Adversarial
Training (Dec 2016)
●