000 03955nam a2200301 4500
008 250510t2020 |||ad||g m||| 00| 0 eng d
040 _aUniversiti Teknologi Brunei
_beng
_cUTB
084 _aRTDS 343
_aUTB 120 REPORT THESIS & DISSERTATION, RTDS 343
100 1 _aMuhammad Amirul Fahmiin bin Abdullah
_eauthor.
245 1 0 _aInvestigating the Effects of Dimensionality Reduction and Implementation of a Novel Feature Engineering Framework Towards Low Dimensional Medical Datasets /
_cMuhammad Amirul Fahmiin bin Abdullah
260 _aBandar Seri Begawan :
_bUniversiti Teknologi Brunei,
_c©2020.
300 _a117 pages :
_bcoloured illustrations, charts, tables ;
_c30 cm.
500 _aA thesis submitted to the Universiti Teknologi Brunei in the fulfillment of the requirements for the degree of Master of Science (MSc) in Electrical and Electronic Engineering.
500 _aAbstract In machine learning applications for Electronic Patient Records (EPR), mainly only high dimensional datasets are used to train reliable models for prediction, as opposed to low-dimensional datasets that are dismissed due to the lack of features. However, in the case of health institutions in low developed to developing countries, big digitised data are scarce and Artificial Intelligence approaches have to rely on available low-dimensional datasets resulting in sub-par standards for the constructed predictive model. This research aims to improve reliability and accuracy of machine learning models trained on medical datasets to benefit the health institutions that only has low-dimensional datasets. To realise it, a framework based on feature preprocessing along with selection of the most suitable classifying algorithm that provides the best overall performance boost is constructed. This research starts off by identifying the datasets, dimensionality reduction methods and classification algorithms to be tested for its evaluation metrics as a form of performance benchmarking. In the first set of experiments, dimensionality reduction methods of Sequential Feature Selection (SFWS and SBS), Recursive Feature Elimination (RFE) and Principle Component Analysis (PCA) methods were used in variety of combinations with Artificial Neural Network (ANN), Support Vector Machine (SVM) and Random Forest (RF) algorithms. The outcome shows that the dimensionality reduction methods were able to identify the best subset of features from the original dataset. However, it produces negligible (less than 5% increase) or no performance improvements to the machine learning models. In the second part, the research introduces feature engineering (FE) into the framework as a means of constructing additional data instances to better the quality of the datasets. This resulted in an increase in dataset size from the original three sets after the addition of the engineered features. A similar set of experiments performed in the first part were run, keeping all other variables and hyperparameters constant except for the fed train-test input data. Comprehensive analysis of the results shows consistent increases in accuracy and precision when implementing the FE-RFE-ANN approach resulting in an average of 2.89% increase in accuracy and 2.88% increase in precision across all datasets.
610 4 _vProject Report
_aUniversiti Teknologi Brunei
650 4 _aThesis Writing.
650 4 _aProject Report, Academic.
650 4 _aProject Report Universiti Teknologi Brunei.
650 4 _aMedical informatics
_xData processing
650 4 _aFeature extraction (Computer science)
650 4 _aDimensionality reduction (Statistics)
650 4 _aMedical records
_xData analysis
700 1 _cDr.
_esupervisor.
_aLim Tiong Hoo
700 1 _cDr.
_esupervisor.
_aKenneth Siok Kiam Yeo
710 _aUniversiti Teknologi Brunei
_bFaculty of Engineering
942 _2lc
_cRTDS
998 _eBook
_s850389 : 002147 c.1_UTB
_xUniversiti Teknologi Brunei
999 _c23396
_d23396