MARC details
| 000 -LEADER |
| fixed length control field |
03955nam a2200301 4500 |
| 008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION |
| fixed length control field |
250510t2020 |||ad||g m||| 00| 0 eng d |
| 040 ## - CATALOGING SOURCE |
| Original cataloging agency |
Universiti Teknologi Brunei |
| Language of cataloging |
eng |
| Transcribing agency |
UTB |
| 084 ## - BOOK Call Number |
| Classification number |
RTDS 343 |
| -- |
UTB 120 REPORT THESIS & DISSERTATION, RTDS 343 |
| 100 1# - MAIN ENTRY--PERSONAL NAME |
| Personal name |
Muhammad Amirul Fahmiin bin Abdullah |
| Relator term |
author. |
| 245 10 - TITLE STATEMENT |
| Title |
Investigating the Effects of Dimensionality Reduction and Implementation of a Novel Feature Engineering Framework Towards Low Dimensional Medical Datasets / |
| Statement of responsibility, etc. |
Muhammad Amirul Fahmiin bin Abdullah |
| 260 ## - PUBLICATION, DISTRIBUTION, ETC. (IMPRINT) |
| Place of publication, distribution, etc. |
Bandar Seri Begawan : |
| Name of publisher, distributor, etc. |
Universiti Teknologi Brunei, |
| Date of publication, distribution, etc. |
©2020. |
| 300 ## - PHYSICAL DESCRIPTION |
| Extent |
117 pages : |
| Other physical details |
coloured illustrations, charts, tables ; |
| Dimensions |
30 cm. |
| 500 ## - GENERAL NOTE |
| General note |
A thesis submitted to the Universiti Teknologi Brunei in the fulfillment of the requirements for the degree of Master of Science (MSc) in Electrical and Electronic Engineering. |
| 500 ## - GENERAL NOTE |
| General note |
Abstract<br/><br/>In machine learning applications for Electronic Patient Records (EPR), mainly only high dimensional datasets are used to train reliable models for prediction, as opposed to low-dimensional datasets that are dismissed due to the lack of features. However, in the case of health institutions in low developed to developing countries, big digitised data are scarce and Artificial Intelligence approaches have to rely on available low-dimensional datasets resulting in sub-par standards for the constructed predictive model. This research aims to improve reliability and accuracy of machine learning models trained on medical datasets to benefit the health institutions that only has low-dimensional datasets.<br/><br/>To realise it, a framework based on feature preprocessing along with selection of the most suitable classifying algorithm that provides the best overall performance boost is constructed.<br/><br/>This research starts off by identifying the datasets, dimensionality reduction methods and classification algorithms to be tested for its evaluation metrics as a form of performance benchmarking. In the first set of experiments, dimensionality reduction methods of Sequential Feature Selection (SFWS and SBS), Recursive Feature Elimination (RFE) and Principle Component Analysis (PCA) methods were used in variety of combinations with Artificial Neural Network (ANN), Support Vector Machine (SVM) and Random Forest (RF) algorithms. The outcome shows that the dimensionality reduction methods were able to identify the best subset of features from the original dataset. However, it produces negligible (less than 5% increase) or no performance improvements to the machine learning models.<br/><br/>In the second part, the research introduces feature engineering (FE) into the framework as a means of constructing additional data instances to better the quality of the datasets. This resulted in an increase in dataset size from the original three sets after the addition of the engineered features. A similar set of experiments performed in the first part were run, keeping all other variables and hyperparameters constant except for the fed train-test input data. Comprehensive analysis of the results shows consistent increases in accuracy and precision when implementing the FE-RFE-ANN approach resulting in an average of 2.89% increase in accuracy and 2.88% increase in precision across all datasets. |
| 610 #4 - SUBJECT ADDED ENTRY--CORPORATE NAME |
| Form subdivision |
Project Report |
| Corporate name or jurisdiction name as entry element |
Universiti Teknologi Brunei |
| 650 #4 - SUBJECT ADDED ENTRY--TOPICAL TERM |
| Topical term or geographic name entry element |
Thesis Writing. |
| 650 #4 - SUBJECT ADDED ENTRY--TOPICAL TERM |
| Topical term or geographic name entry element |
Project Report, Academic. |
| 650 #4 - SUBJECT ADDED ENTRY--TOPICAL TERM |
| Topical term or geographic name entry element |
Project Report Universiti Teknologi Brunei. |
| 650 #4 - SUBJECT ADDED ENTRY--TOPICAL TERM |
| Topical term or geographic name entry element |
Medical informatics |
| General subdivision |
Data processing |
| 650 #4 - SUBJECT ADDED ENTRY--TOPICAL TERM |
| Topical term or geographic name entry element |
Feature extraction (Computer science) |
| 650 #4 - SUBJECT ADDED ENTRY--TOPICAL TERM |
| Topical term or geographic name entry element |
Dimensionality reduction (Statistics) |
| 650 #4 - SUBJECT ADDED ENTRY--TOPICAL TERM |
| Topical term or geographic name entry element |
Medical records |
| General subdivision |
Data analysis |
| 700 1# - ADDED ENTRY--PERSONAL NAME |
| Titles and other words associated with a name |
Dr. |
| Relator term |
supervisor. |
| Personal name |
Lim Tiong Hoo |
| 700 1# - ADDED ENTRY--PERSONAL NAME |
| Titles and other words associated with a name |
Dr. |
| Relator term |
supervisor. |
| Personal name |
Kenneth Siok Kiam Yeo |
| 710 ## - ADDED ENTRY--CORPORATE NAME |
| Corporate name or jurisdiction name as entry element |
Universiti Teknologi Brunei |
| Subordinate unit |
Faculty of Engineering |
| 942 ## - ADDED ENTRY ELEMENTS (KOHA) |
| Source of classification or shelving scheme |
Local Classification |
| Koha item type |
Reports, Thesis & Dissertation Students |
| 998 ## - LOCAL CONTROL INFORMATION (RLIN) |
| Internal field |
Book |
| CC (RLIN) |
850389 : 002147 c.1_UTB |
| Internal field |
Universiti Teknologi Brunei |