Investigating the Effects of Dimensionality Reduction and Implementation of a Novel Feature Engineering Framework Towards Low Dimensional Medical Datasets / Muhammad Amirul Fahmiin bin Abdullah
Material type:
TextPublication details: Bandar Seri Begawan : Universiti Teknologi Brunei, ©2020. Description: 117 pages : coloured illustrations, charts, tables ; 30 cmSubject(s): -- Project Report Universiti Teknologi Brunei | Thesis Writing | Project Report, Academic | Project Report Universiti Teknologi Brunei | Medical informatics -- Data processing | Feature extraction (Computer science) | Dimensionality reduction (Statistics) | Medical records -- Data analysisOther classification: RTDS 343 | UTB 120 REPORT THESIS & DISSERTATION, RTDS 343
| Item type | Current library | Call number | Copy number | Status | Notes | Date due | Barcode |
|---|---|---|---|---|---|---|---|
Reports, Thesis & Dissertation Students
|
Universiti Teknologi Brunei Library - at level 2 | UTB 120 REPORT THESIS & DISSERTATION, RTDS 343 (Browse shelf(Opens below)) | 1 | Not for loan | Reg. No._UTB [RTDS 343] | 850389 |
Browsing Universiti Teknologi Brunei Library shelves, Shelving location: - at level 2 Close shelf browser (Hides shelf browser)
A thesis submitted to the Universiti Teknologi Brunei in the fulfillment of the requirements for the degree of Master of Science (MSc) in Electrical and Electronic Engineering.
Abstract
In machine learning applications for Electronic Patient Records (EPR), mainly only high dimensional datasets are used to train reliable models for prediction, as opposed to low-dimensional datasets that are dismissed due to the lack of features. However, in the case of health institutions in low developed to developing countries, big digitised data are scarce and Artificial Intelligence approaches have to rely on available low-dimensional datasets resulting in sub-par standards for the constructed predictive model. This research aims to improve reliability and accuracy of machine learning models trained on medical datasets to benefit the health institutions that only has low-dimensional datasets.
To realise it, a framework based on feature preprocessing along with selection of the most suitable classifying algorithm that provides the best overall performance boost is constructed.
This research starts off by identifying the datasets, dimensionality reduction methods and classification algorithms to be tested for its evaluation metrics as a form of performance benchmarking. In the first set of experiments, dimensionality reduction methods of Sequential Feature Selection (SFWS and SBS), Recursive Feature Elimination (RFE) and Principle Component Analysis (PCA) methods were used in variety of combinations with Artificial Neural Network (ANN), Support Vector Machine (SVM) and Random Forest (RF) algorithms. The outcome shows that the dimensionality reduction methods were able to identify the best subset of features from the original dataset. However, it produces negligible (less than 5% increase) or no performance improvements to the machine learning models.
In the second part, the research introduces feature engineering (FE) into the framework as a means of constructing additional data instances to better the quality of the datasets. This resulted in an increase in dataset size from the original three sets after the addition of the engineered features. A similar set of experiments performed in the first part were run, keeping all other variables and hyperparameters constant except for the fed train-test input data. Comprehensive analysis of the results shows consistent increases in accuracy and precision when implementing the FE-RFE-ANN approach resulting in an average of 2.89% increase in accuracy and 2.88% increase in precision across all datasets.
Reports, Thesis & Dissertation Students
There are no comments on this title.