Académique Documents
Professionnel Documents
Culture Documents
1, WINTER 2021 41
Authorized licensed use limited to: University of New Orleans. Downloaded on February 08,2023 at 22:07:26 UTC from IEEE Xplore. Restrictions apply.
42 IEEE CANADIAN JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING, VOL. 44, NO. 1, WINTER 2021
Authorized licensed use limited to: University of New Orleans. Downloaded on February 08,2023 at 22:07:26 UTC from IEEE Xplore. Restrictions apply.
ALESHINLOYE et al.: EVALUATION OF DIMENSIONALITY REDUCTION TECHNIQUES FOR LOAD PROFILING APPLICATION 43
Fig. 1. Energy consumption of residential building for a two-day period including timing information. They are global active power,
sampled at 1 sample/h.
global reactive power, voltage, global intensity, submetering 1
(electrical equipment in the kitchen), submetering 2 (electrical
(fans/electricity), HVac (electricity), fans (electricity), general equipment in the laundry room), and submetering 3 (electric
interior lights (electricity), general exterior lights (electricity), water heater and air conditioner). The observations are accu-
appliance interior equipment (electricity), miscellaneous inte- mulated on a weekly basis, and these observations are used
rior equipment (electricity), and water heater (water systems in load profiling. The data set contains some missing values
and gas). Similarly, the ten features of the commercial data (approximately 1.25%) that are filled by the average values of
are electricity (facility), fans (electricity), heating (electric- the respective column. The data set is divided into four load
ity), cooling (electricity), general interior lights (electricity), profiles based on seasons, i.e., spring, summer, autumn, and
interior equipment (electricity), gas (facility), heating (gas), winter.
interior equipment (gas), and water heater (water systems and
gas). All measurements are aggregated on an hourly basis. III. M ETHODOLOGY
We use all the parameters except timestamps in the process Visualization of data is important in data analysis as it
of dimensionality reduction. The output of each observation allows for the identification of outliers, observing clustered
is the timestamps and 2-D vector (reduced). The timestamps data points, and determining the underlying structure of the
are not considered for reduction because it is required in other data. However, the increasing amount of metering and sensing
operations, e.g., billing. devices in the grid makes it difficult for industry profes-
The TMY3 data set has defined three categories for res- sionals to visualize data in an intuitive manner. With the
idential load profile data and 16 categories for the com- aid of dimensionality reduction tools, data from the phasor
mercial load profile data. Each observation in the data set measurement unit (PMU), smart meters, and sensors can be
is assigned to one of these categories. We use the same reduced to a 2-D or 3-D representation in real time for quick
categorization and divide buildings into the residential data visualization to aid the identification of anomalies or outliers
set into three classes based on their energy consumption, in the data. We have converted the multidimensional input
i.e., low, base, or high [19]. A two-day energy consumption vector into a 2-D feature vector. The 2-D vector can easily
profile is visualized in Fig. 1. The vertical axis shows the be plotted and visualized. Furthermore, it is easy to interpret
hourly electricity utilization (kW), and the horizontal axis a 2-D plot and identify abnormal usage behaviors and out-
shows the time. Similarly, each commercial building falls in liers. These are the reasons to reduce the multidimensional
one of 16 classes representing midrise apartment, small hotel, input vector into a 2-D vector. Hyndman et al. [22] applied
large hotel, stand-alone retail, outpatient health care, hospital, dimensionality reduction techniques to visualize smart meter
secondary school, primary school, strip mall, supermarket, and PMU data for detecting users with abnormal consumption
quick service restaurant, full-service restaurant, warehouse, patterns, identifying islanding event, postdisturbance voltage,
large office, medium office, and small office [20]. The energy and visualizing probability forecast for renewable energy.
consumption for a building in each of the 16 classes is shown In this article, both linear and nonlinear dimensionality
in Fig. 2. Due to differences in the utilization of different reduction techniques are used to transform the input data
commercial buildings, we plot the hourly electricity utilization of dimension m to a reduced dimension k. Mathematically,
as natural logarithmic values. We also use different line the input X ∈ Rn×m is reduced to Y ∈ Rn×k . In this article,
styles for plotting electricity consumption for each category in PCA is used for the linear reduction since it is shown to
addition to using a logarithmic scale for better interpretation. outperform other linear reduction techniques in the smart
In addition to the TMY3 data set, we also use the “Indi- grid environment [17], while other nonlinear dimensionality
vidual household electric power consumption Data Set” [21] reduction techniques tested are Isometric Feature Mapping
to create load profiles for a single user based on seasonal (Isomap), kernel PCA (KPCA), locally linear embedding
patterns as another use case. The data set contains smart meter (LLE), t-distributed stochastic neighbor embedding (t-SNE),
data for a single customer with the granularity of one minute. autoencoders, and Laplacian eigenmaps.
The data are collected over a period of around four years. Each technique is selected considering its performance in
Each observation in the data set consists of nine attributes, other domains, e.g., Isomap is successfully applied to various
Authorized licensed use limited to: University of New Orleans. Downloaded on February 08,2023 at 22:07:26 UTC from IEEE Xplore. Restrictions apply.
44 IEEE CANADIAN JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING, VOL. 44, NO. 1, WINTER 2021
Authorized licensed use limited to: University of New Orleans. Downloaded on February 08,2023 at 22:07:26 UTC from IEEE Xplore. Restrictions apply.
ALESHINLOYE et al.: EVALUATION OF DIMENSIONALITY REDUCTION TECHNIQUES FOR LOAD PROFILING APPLICATION 45
Fig. 5. First feature plot against second feature of the PCA reduced data.
(a) Residential. (b) Commercial. (c) Individual household data set.
Authorized licensed use limited to: University of New Orleans. Downloaded on February 08,2023 at 22:07:26 UTC from IEEE Xplore. Restrictions apply.
46 IEEE CANADIAN JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING, VOL. 44, NO. 1, WINTER 2021
Fig. 6. First feature plot against second feature of the KPCA reduced data. Fig. 7. First feature plot against second feature of the Isomap reduced data.
(a) Residential. (b) Commercial. (c) Individual household data set.
(a) Residential. (b) Commercial. (c) Individual household data set.
Autoencoder aims to reduce the data dimensionality by computes a similarity measure between the two clustering
learning to ignore the signal noise. We train our architecture techniques by considering all pairs of samples and counting
using 100 epochs with a batch size of 64 and a learning rate pairs that are assigned in the same or different clusters in
of 0.01 using Adam optimizer [41]. The reduced dimensions the predicted and true clusters and provides a quantitative
are activated using ReLu activation to model the nonlinearity measurement. ARI is computed using the following equation:
in the output. Fig. 11 shows the first and second component
RI − Expected RI
plots after dimensionality reduction. ARI = . (1)
The visualization provides a quantitative assessment of max(RI) − Expected RI
dimension reduction techniques for load profiling applications. The resulting scores range from zero (minimum) to one
The objective of the proposed method is to create a load (maximum). The result for each reduction technique using
profiling visualization that can accurately cluster different two components of the data sets is shown in Table II. From
users. Clusters, being far apart, can also handle the outliers the result, it is seen that some of the nonlinear dimension-
more effectively. Based on the visualization results, we clearly ality reduction techniques outperform PCA for all data sets
see that t-SNE is better suited for the smart meter data and can and, thus, confirms that for the application of load profiling,
accurately separate various classes in all data sets. However, an indirect clustering technique using nonlinear dimensionality
few of the techniques can only visualize one of the data set reduction techniques would perform better. However, the result
properly, and few of the techniques cannot handle any of the of each nonlinear technique is inconsistent with a different
data set. We can also conclude that nonlinear techniques can number of runs. This is solved by averaging the results of the
produce better results compared with linear methods. For the projection over multiple runs.
purpose of comparing the reduction techniques, the resulting Furthermore, all nonlinear techniques require tuning para-
clusters are then evaluated using the ARI [42]. Rand index (RI) meters to obtain the best result. This is a major challenge
Authorized licensed use limited to: University of New Orleans. Downloaded on February 08,2023 at 22:07:26 UTC from IEEE Xplore. Restrictions apply.
ALESHINLOYE et al.: EVALUATION OF DIMENSIONALITY REDUCTION TECHNIQUES FOR LOAD PROFILING APPLICATION 47
Fig. 8. First feature plot against second feature of the LLE reduced data.
(a) Residential. (b) Commercial. (c) Individual household data set.
Fig. 9. First feature plot against second feature of the Laplacian eigenmaps
reduced data. (a) Residential. (b) Commercial. (c) Individual household
TABLE II data set.
Authorized licensed use limited to: University of New Orleans. Downloaded on February 08,2023 at 22:07:26 UTC from IEEE Xplore. Restrictions apply.
48 IEEE CANADIAN JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING, VOL. 44, NO. 1, WINTER 2021
Fig. 11. First feature plot against second feature of the autoencoders reduced
Fig. 10. First feature plot against second feature of the t-SNE reduced data. data. (a) Residential. (b) Commercial. (c) Individual household data set.
(a) Residential. (b) Commercial. (c) Individual household data set.
TABLE III grids, the need for information has also evolved into new
C OMPUTATIONAL C OMPLEXITY AND M EMORY R EQUIREMENT dynamic visualization to incorporate the fuel-based energy
OF D IMENSIONALITY R EDUCTION T ECHNIQUES [43] sources along with renewable energy sources. The visual-
izations can empower decision-makers to better understand
the incoming information and, consequently, make a better
decision in response. This work investigated the dimension-
ality reduction for load profile data visualization. We have
employed both linear and nonlinear dimension reduction tech-
niques. The chosen linear dimensionality reduction technique
has no hyperparameters to be tuned aside from choosing the
number of eigenvectors corresponding to high eigenvalues.
It is evident from the experimentation that reducing the
effect of the weight is negligible because we use only
feature vectors to two features retains more than 90% of the
the first two components of each reduced data. This reduces
original information contents. In this article, we have provided
the computation time and memory requirement relative to
two test cases: residential data and commercial data. Our
using the data samples without dimensionality reduction. results show that some of the nonlinear dimensionality reduc-
tion techniques outperform PCA (widely used dimensionality
V. C ONCLUSION reduction technique), which emphasizes that the intrinsic
The traditional power grid usually required voltage, avail- dimension of the energy consumption data for both residential
able power, and simple demand data. With the advent of smart and commercial use-case is nonlinear. In the residential data,
Authorized licensed use limited to: University of New Orleans. Downloaded on February 08,2023 at 22:07:26 UTC from IEEE Xplore. Restrictions apply.
ALESHINLOYE et al.: EVALUATION OF DIMENSIONALITY REDUCTION TECHNIQUES FOR LOAD PROFILING APPLICATION 49
we observed KPCA outperforms other techniques, and in the [19] R. Hendron and C. Engebrecht, “Building America house simulation
case of commercial data, we observed t-SNE outperforms protocols,” Office of energy efficiency and renewable energy (EERE),
Washington, DC, USA, Tech. Rep. TP-550-49426, 2010. Accessed:
other techniques. This shows that the three data sets have a dif- Aug. 28, 2020. [Online]. Available: https://www.nrel.gov/docs/fy11osti/
ferent distribution, and not just a single dimensionality reduc- 49246.pdf
tion technique would work for all data sets. Selecting a suitable [20] M. Deru et al., “Us department of energy commercial reference building
models of the national building stock,” Nat. Renew. Energy Lab.,
dimensionality reduction technique is very important to retain Golden, CO, USA, Tech. Rep. NREL/TP-5500-46861, 2011. Accessed:
the relationship between the data points in a lower dimensional Aug. 28, 2020. [Online]. Available: https://www.nrel.gov/docs/fy11osti/
space and allow for proper load profiling of the consumers. 46861.pdf
[21] G. Hebrail and A. Berard. UCI Machine Learning Repository: Individual
It can also be concluded from the experimentation that the Household Electric Power Consumption Data Set. Accessed: Aug. 28,
nonlinear techniques can outperform the linear techniques. 2020. [Online]. Available: https://archive.ics.uci.edu/ml/datasets/
Individual+household+electric+power+consumption
R EFERENCES [22] R. J. Hyndman, X. Liu, and P. Pinson, “Visualizing big energy data:
Solutions for this crucial component of data analysis,” IEEE Power
[1] C. Tu, X. He, Z. Shuai, and F. Jiang, “Big data issues in smart grid–a Energy Mag., vol. 16, no. 3, pp. 18–25, May 2018.
review,” Renew. Sustain. Energy Rev., vol. 79, pp. 1099–1107, 2017. [23] M. Niskanen and O. Silvén, “Comparison of dimensionality reduction
[2] U. EIA. (2016). International Energy Statistics. US Energy and Infor- methods for wood surface inspection,” emphProc. SPIE, vol. 5132,
mation Administration, Washington, DC, USA. Accessed: Sep. 6, 2018. pp. 178–189, May 2003.
[Online]. Available: http://www.eia.gov [24] H. Liu et al., “Dimensionality reduction for identification of hepatic
[3] K. Balachandran, R. L. Olsen, and J. M. Pedersen, “Bandwidth analysis
tumor samples based on terahertz time-domain spectroscopy,” IEEE
of smart meter network infrastructure,” in Proc. 16th Int. Conf. Adv.
Trans. THz Sci. Technol., vol. 8, no. 3, pp. 271–277, May 2018.
Commun. Technol., Feb. 2014, pp. 928–933.
[4] Y. Wang, Q. Chen, C. Kang, M. Zhang, K. Wang, and Y. Zhao, “Load [25] I. Soo Lim, P. de Heras Ciechomski, S. Sarni, and D. Thalmann,
profiling and its application to demand response: A review,” Tsinghua “Planar arrangement of high-dimensional biomedical data sets by isomap
Sci. Technol., vol. 20, no. 2, pp. 117–129, Apr. 2015. coordinates,” in Proc. 16th IEEE Symp. Comput.-Based Med. Syst.,
[5] K.-L. Zhou, S.-L. Yang, and C. Shen, “A review of electric load Jun. 2003, pp. 50–55.
classification in smart grid environment,” Renew. Sustain. Energy Rev., [26] H. Dadkhahi, M. F. Duarte, and B. M. Marlin, “Out-of-sample extension
vol. 24, pp. 103–110, Aug. 2013. for dimensionality reduction of noisy time series,” IEEE Trans. Image
[6] G. J. Tsekouras, P. B. Kotoulas, C. D. Tsirekis, E. N. Dialynas, and Process., vol. 26, no. 11, pp. 5435–5446, Nov. 2017.
N. D. Hatziargyriou, “A pattern recognition methodology for evaluation [27] K. In Kim, K. Jung, and H. Joon Kim, “Face recognition using kernel
of load profiles and typical days of large electricity customers,” Electr. principal component analysis,” IEEE Signal Process. Lett., vol. 9, no. 2,
Power Syst. Res., vol. 78, no. 9, pp. 1494–1510, Sep. 2008. pp. 40–42, Feb. 2002.
[7] G. Chicco, R. Napoli, and F. Piglione, “Comparisons among clustering [28] A. Lima, H. Zen, Y. Nankaku, C. Miyajima, K. Tokuda, and T. Kitamura,
techniques for electricity customer classification,” IEEE Trans. Power “On the use of kernel PCA for feature extraction in speech recognition,”
Syst., vol. 21, no. 2, pp. 933–940, May 2006. IEICE Trans. Inf. Syst., vol. 87, no. 12, pp. 2802–2811, 2004.
[8] G. Chicco, O.-M. Ionel, and R. Porumb, “Electrical load pattern group- [29] H. Hoffmann, “Kernel PCA for novelty detection,” Pattern Recognit.,
ing based on centroid model with ant colony clustering,” IEEE Trans. vol. 40, no. 3, pp. 863–874, Mar. 2007.
Power Syst., vol. 28, no. 2, pp. 1706–1715, May 2013. [30] F. Yuan, X. Xia, J. Shi, H. Li, and G. Li, “Non-linear dimensionality
[9] T. Zhang, G. Zhang, J. Lu, X. Feng, and W. Yang, “A new index reduction and Gaussian process based classification method for smoke
and classification approach for load pattern analysis of large electricity detection,” IEEE Access, vol. 5, pp. 6833–6841, 2017.
customers,” IEEE Trans. Power Syst., vol. 27, no. 1, pp. 153–160, [31] H. Chang, D.-Y. Yeung, and Y. Xiong, “Super-resolution through neigh-
Feb. 2012. bor embedding,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern
[10] M. Koivisto, P. Heine, I. Mellin, and M. Lehtonen, “Clustering of Recognit. CVPR, vol. 1, Jul. 2004, p. I.
connection points and load modeling in distribution systems,” IEEE [32] R. Duraiswami and V. C. Raykar, “The manifolds of spatial hearing,”
Trans. Power Syst., vol. 28, no. 2, pp. 1255–1265, May 2013. in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., ICASSP,
[11] E. D. Varga, S. F. Beretka, C. Noce, and G. Sapienza, “Robust real- Mar. 2005, p. 285.
time load profile encoding and classification framework for efficient [33] A. Y. Ng, M. I. Jordan, and Y. Weiss, “On spectral clustering: Analysis
power systems operation,” IEEE Trans. Power Syst., vol. 30, no. 4, and an algorithm,” in Proc. Adv. Neural Inf. Process. Syst., 2002,
pp. 1897–1904, Jul. 2015. pp. 849–856.
[12] S. Zhong and K.-S. Tam, “Hierarchical classification of load profiles [34] J. Shi and J. Malik, “Normalized cuts and image segmentation,”
based on their characteristic attributes in frequency domain,” IEEE IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 888–905,
Trans. Power Syst., vol. 30, no. 5, pp. 2434–2441, Sep. 2015. Aug. 2000.
[13] S. V. Verdu, M. O. Garcia, C. Senabre, A. G. Marin, and F. J. G. Franco,
[35] M. A. Oikawa, Z. Dias, A. de Rezende Rocha, and S. Goldenstein,
“Classification, filtering, and identification of electrical customer load
“Manifold learning and spectral clustering for image phylogeny forests,”
patterns through the use of self-organizing maps,” IEEE Trans. Power
IEEE Trans. Inf. Forensics Security, vol. 11, no. 1, pp. 5–18, Jan. 2016.
Syst., vol. 21, no. 4, pp. 1672–1682, Nov. 2006.
[14] Y. Xiao, J. Yang, H. Que, M. Junjie Li, and Q. Gao, “Application of [36] X. He, S. Yan, Y. Hu, P. Niyogi, and H.-J. Zhang, “Face recognition
wavelet-based clustering approach to load profiling on AMI measure- using laplacianfaces,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27,
ments,” in Proc. China Int. Conf. Electr. Distrib. (CICED), Sep. 2014, no. 3, pp. 328–340, Mar. 2005.
pp. 1537–1540. [37] L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,”
[15] A. Albert and R. Rajagopal, “Smart meter driven segmentation: What J. Mach. Learn. Res., vol. 9, pp. 2579–2605, Nov. 2008.
your consumption says about you,” IEEE Trans. Power Syst., vol. 28, [38] L. Theis, W. Shi, A. A. Cunningham, and F. Huszár, “Lossy image
no. 4, pp. 4019–4030, Nov. 2013. compression with compressive autoencoders,” 2017, arXiv:1703.00395.
[16] S. Lhermitte, J. Verbesselt, W. W. Verstraeten, and P. Coppin, “A com- [Online]. Available: https://arxiv.org/abs/1703.00395
parison of time series similarity measures for classification and change [39] A. Sriram, S. Kalra, H. R. Tizhoosh, and S. Rahnamayan, “Learning
detection of ecosystem dynamics,” Remote Sens. Environ., vol. 115, autoencoded radon projections,” in Proc. IEEE Symp. Ser. Comput.
no. 12, pp. 3129–3152, Dec. 2011. Intell. (SSCI), Nov. 2017, pp. 1–5.
[17] A. Aleshinloye, A. Bais, and I. Al-Anbagi, “Performance analysis of [40] L. Xie, Y. Chen, and P. R. Kumar, “Dimensionality reduction
dimensionality reduction techniques for demand side management,” in of synchrophasor data for early event detection: Linearized analy-
Proc. IEEE Electr. Power Energy Conf. (EPEC), Oct. 2017, pp. 1–6. sis,” IEEE Trans. Power Syst., vol. 29, no. 6, pp. 2784–2794,
[18] E. Wilson. (2014). Commercial and Residential Hourly Load Profiles Nov. 2014.
for all TMY3 Locations in the United States. U.S. Department of Energy [41] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
Open Data Catalog: U.S. Department of Energ, Office of Energy Effi- in Proc. Int. Conf. Learn. Represent. (ICLR), 2015, pp. 1–15.
ciency and Renewable Energy (EERE), National Renewable Energy [42] L. Hubert and P. Arabie, “Comparing partitions,” J. Classification, vol. 2,
Laboratory, Golden, CO, USA. Accessed: Aug. 28, 2020. [Online]. no. 1, pp. 193–218, Dec. 1985.
Available: https://openei.org/doe-opendata/dataset/commercial-and- [43] L. Van Der Maaten, E. Postma, and J. Van den Herik, “Dimensionality
residential-hourly-load-profiles-for-all-tmy3-locations-in-the-united- reduction: A comparative,” J. Mach. Learn. Res., vol. 10, pp. 66–71,
states Oct. 2009.
Authorized licensed use limited to: University of New Orleans. Downloaded on February 08,2023 at 22:07:26 UTC from IEEE Xplore. Restrictions apply.