MARC View

000			03955nam a2200301 4500
008			250510t2020 \|\|\|ad\|\|g m\|\|\| 00\| 0 eng d
040			_aUniversiti Teknologi Brunei _beng _cUTB
084			_aRTDS 343 _aUTB 120 REPORT THESIS & DISSERTATION, RTDS 343
100	1		_aMuhammad Amirul Fahmiin bin Abdullah _eauthor.
245	1	0	_aInvestigating the Effects of Dimensionality Reduction and Implementation of a Novel Feature Engineering Framework Towards Low Dimensional Medical Datasets / _cMuhammad Amirul Fahmiin bin Abdullah
260			_aBandar Seri Begawan : _bUniversiti Teknologi Brunei, _c©2020.
300			_a117 pages : _bcoloured illustrations, charts, tables ; _c30 cm.
500			_aA thesis submitted to the Universiti Teknologi Brunei in the fulfillment of the requirements for the degree of Master of Science (MSc) in Electrical and Electronic Engineering.
500			_aAbstract In machine learning applications for Electronic Patient Records (EPR), mainly only high dimensional datasets are used to train reliable models for prediction, as opposed to low-dimensional datasets that are dismissed due to the lack of features. However, in the case of health institutions in low developed to developing countries, big digitised data are scarce and Artificial Intelligence approaches have to rely on available low-dimensional datasets resulting in sub-par standards for the constructed predictive model. This research aims to improve reliability and accuracy of machine learning models trained on medical datasets to benefit the health institutions that only has low-dimensional datasets. To realise it, a framework based on feature preprocessing along with selection of the most suitable classifying algorithm that provides the best overall performance boost is constructed. This research starts off by identifying the datasets, dimensionality reduction methods and classification algorithms to be tested for its evaluation metrics as a form of performance benchmarking. In the first set of experiments, dimensionality reduction methods of Sequential Feature Selection (SFWS and SBS), Recursive Feature Elimination (RFE) and Principle Component Analysis (PCA) methods were used in variety of combinations with Artificial Neural Network (ANN), Support Vector Machine (SVM) and Random Forest (RF) algorithms. The outcome shows that the dimensionality reduction methods were able to identify the best subset of features from the original dataset. However, it produces negligible (less than 5% increase) or no performance improvements to the machine learning models. In the second part, the research introduces feature engineering (FE) into the framework as a means of constructing additional data instances to better the quality of the datasets. This resulted in an increase in dataset size from the original three sets after the addition of the engineered features. A similar set of experiments performed in the first part were run, keeping all other variables and hyperparameters constant except for the fed train-test input data. Comprehensive analysis of the results shows consistent increases in accuracy and precision when implementing the FE-RFE-ANN approach resulting in an average of 2.89% increase in accuracy and 2.88% increase in precision across all datasets.
610		4	_vProject Report _aUniversiti Teknologi Brunei
650		4	_aThesis Writing.
650		4	_aProject Report, Academic.
650		4	_aProject Report Universiti Teknologi Brunei.
650		4	_aMedical informatics _xData processing
650		4	_aFeature extraction (Computer science)
650		4	_aDimensionality reduction (Statistics)
650		4	_aMedical records _xData analysis
700	1		_cDr. _esupervisor. _aLim Tiong Hoo
700	1		_cDr. _esupervisor. _aKenneth Siok Kiam Yeo
710			_aUniversiti Teknologi Brunei _bFaculty of Engineering
942			_2lc _cRTDS
998			_eBook _s850389 : 002147 c.1_UTB _xUniversiti Teknologi Brunei
999			_c23396 _d23396