Projects

PROTEUS

PROTEUS was designed to address fundamental scientific challenges related to the scalability and responsiveness of analytics capabilities. Completed.

ETI

The High Frequency Appliance Disaggregation Analysis project analysed real world data from the ETI's Home Energy Management System in five homes to gather detailed energy data from water, gas and electricity use. Completed.

ExtremeXP

ExtremeXP aims to provide accurate, precise, fit-for-purpose, and trustworthy data-driven insights via evaluating different complex analytics variants, considering end users preferences and feedback in an automated way. Active.

loss
Figure 1. F score
Coeficents
Figure 2. Coefficient loadings
report
Figure 3. Classification report
matrix
Figure 4. Confusion matrix

Binary Classification

The use case two (UC2) is one of the practical aspect of the ExtremeXP I am involved in. The objective is to classify malicious and benign spams, which is a binary classification problem. We did experiments on number of datasets. Main concerns: (1) imbalance: label frequency imbalance. (2) scale: the features are on different scale; (3) selection: few features are useful. We have worked on an online learning classifier with constant time, space and predictive complexity. is the arbitrary separator. The protocol of online learning can be summarised as follows. The input is received. Algorithm processes the input and predicts. It learns using a surrogate loss.


Despite the imbalance we were able to achieve balance. The algorithm maintains above 90% score over the period (from 103 observations onwards) please see Figure 1 and algorithm shrinks the unnecessary features over time to learn, please see Figure 2. The overall result (at the end of the experiment) can be seen in Figure 3, and Figure 4 sheds light on Figure 3. Following features are selected:

  • pkts_mean: A packet is a small segment of a larger message. Data sent over networks, such as the internet, is divided into packets which are then recombined by the computer or device that receives them.
  • bytes_mean: A byte is the basic unit of information in computer storage and processing.
  • mean_duration: The mean duration is commonly measured by MTTC, which encompasses the time taken to detect, acknowledge, and fully contain a security incident.
  • udp_ratio: A UDP ratio typically refers to the proportion of network traffic that utilises the User Datagram Protocol (UDP) compared to the Transmission Control Protocol (TCP).
  • conn_ratio: A conn ratio typically refers to the ratio of connections established to a system compared to the number of legitimate users or devices that should be accessing it.



© Copyright 2025. All rights reserved.