PROTEUS was designed to address fundamental scientific challenges related to the scalability and responsiveness of analytics capabilities. Completed.
The High Frequency Appliance Disaggregation Analysis project analysed real world data from the ETI's Home Energy Management System in five homes to gather detailed energy data from water, gas and electricity use. Completed.
ExtremeXP aims to provide accurate, precise, fit-for-purpose, and trustworthy data-driven insights via evaluating different complex analytics variants, considering end users preferences and feedback in an automated way. Active.
The use case two (UC2) is one of the practical aspect of the ExtremeXP I am involved in. The objective is to classify malicious and benign spams, which is a binary classification problem. We did experiments on number of datasets. Main concerns: (1) imbalance: label frequency imbalance. (2) scale: the features are on different scale; (3) selection: few features are useful. We have worked on an online learning classifier with constant time, space and predictive complexity. is the arbitrary separator. The protocol of online learning can be summarised as follows. The input is received. Algorithm processes the input and predicts. It learns using a surrogate loss.
Despite the imbalance we were able to achieve balance. The algorithm maintains above 90% score over the period (from 103 observations onwards) please see Figure 1 and algorithm shrinks the unnecessary features over time to learn, please see Figure 2. The overall result (at the end of the experiment) can be seen in Figure 3, and Figure 4 sheds light on Figure 3. Following features are selected: