<?xml version="1.0" encoding="UTF-8"?>
<record
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
    xmlns="http://www.loc.gov/MARC21/slim">

  <leader>03459nam a22003137a 4500</leader>
  <controlfield tag="008">260608t2023    bx a|||g |||| 00| 0 eng d</controlfield>
  <datafield tag="020" ind1=" " ind2=" ">
    <subfield code="q">hardback</subfield>
  </datafield>
  <datafield tag="040" ind1=" " ind2=" ">
    <subfield code="a">Universiti Teknologi Brunei</subfield>
    <subfield code="b">eng</subfield>
    <subfield code="c">UTB</subfield>
  </datafield>
  <datafield tag="084" ind1=" " ind2=" ">
    <subfield code="a">UTB 120 REPORT, THESIS &amp; DISSERTATION </subfield>
    <subfield code="a">RTDS 410</subfield>
  </datafield>
  <datafield tag="100" ind1="1" ind2=" ">
    <subfield code="a">Nurulhidayati Haji Mohd Sani</subfield>
    <subfield code="e">author.</subfield>
  </datafield>
  <datafield tag="245" ind1="1" ind2="0">
    <subfield code="a">A study of feedback signal calibration in reinforcement learning with sub-goals /</subfield>
    <subfield code="c">Nurulhidayati Haji Mohd Sani</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="a">Bandar Seri Begawan :</subfield>
    <subfield code="b">Universiti Teknologi Brunei,</subfield>
    <subfield code="c">2023.</subfield>
  </datafield>
  <datafield tag="300" ind1=" " ind2=" ">
    <subfield code="a">xvi, 148 pages :</subfield>
    <subfield code="b">illustrations ;</subfield>
    <subfield code="c">30 cm.</subfield>
  </datafield>
  <datafield tag="500" ind1=" " ind2=" ">
    <subfield code="a">Submitted in fulfillment of the requirements for the degree of Doctor of Philosophy.</subfield>
  </datafield>
  <datafield tag="500" ind1=" " ind2=" ">
    <subfield code="a">ABSTRACT Reinforcement learning (RL) is a machine learning technique that allows intelligent agents to learn a new task without being explicitly supervised. RL agents learn to perform a new task based on reinforcement signals. Traditional RL enumerates possible states (S) and associates actions (policy) to each state.
The main disadvantage of traditional RL is that the process is slower in a larger and more complex environment. Also, many decision-making processes are involved in a complex environment instead of focusing on just one goal.
In contrast to the traditional position-based approach, we investigate a visual-based Q-learning agent that uses the projection of rays to perceive its environ-ment. Despite the increasing number of possible state value inputs due to the number of angles between rays that the agent can perceive, or the increasing number of objects in the environment when this approach is used, it allows the agent greater flexibility in reusing its strategy in different environments. This flexibility is very useful, especially in a real-world application where the environment is known to be very dynamic. Our preliminary study allows our agent to use visual perception to navigate different environment sizes and settings.
In our thesis, we also studied a Q-learning agent in a navigation problem with sub-goals using amplified feedback signals to determine the most effective strategies for amplification signals to solve the problem. We investigated these signals using two different problem configurations: sequential sub-goals and non-sequential sub-goals. In the problem with sequential sub-goals, the agent is forced to reach the goal in a specific order to achieve the goal. In the problem with non-sequential sub-goals, the order of the goal is irrelevant but necessary to achieve the optimal reward. The results show that although the agent can learn and achieve the goal in most of the feedback signals, having consistent, incremental rewards and immediate rewards contribute most to the agent's performance in achieving the goal.</subfield>
  </datafield>
  <datafield tag="502" ind1=" " ind2=" ">
    <subfield code="a">Dissertation (Doctor of Philosophy) - Universiti Teknologi Brunei (2032)</subfield>
  </datafield>
  <datafield tag="504" ind1=" " ind2=" ">
    <subfield code="a">Includes bibliographical references.</subfield>
  </datafield>
  <datafield tag="610" ind1=" " ind2="4">
    <subfield code="a">Universiti Teknologi Brunei</subfield>
    <subfield code="v">Thesis</subfield>
  </datafield>
  <datafield tag="610" ind1=" " ind2="4">
    <subfield code="a">Universiti Teknologi Brunei</subfield>
    <subfield code="v">Final Year Report</subfield>
  </datafield>
  <datafield tag="650" ind1=" " ind2="4">
    <subfield code="a">Dissertation, Academic</subfield>
  </datafield>
  <datafield tag="650" ind1=" " ind2="4">
    <subfield code="a">Thesis writing</subfield>
  </datafield>
  <datafield tag="650" ind1=" " ind2="4">
    <subfield code="a">Dissertation Universiti Teknologi Brunei</subfield>
  </datafield>
  <datafield tag="650" ind1=" " ind2="4">
    <subfield code="a">Computing and Informatics</subfield>
  </datafield>
  <datafield tag="700" ind1="1" ind2=" ">
    <subfield code="a">Somnuk Phon-Amnuaisuk, Prof</subfield>
    <subfield code="e">advisors.</subfield>
  </datafield>
  <datafield tag="700" ind1="1" ind2=" ">
    <subfield code="a">Thien Wan Au, Dr.</subfield>
    <subfield code="e">advisors.</subfield>
  </datafield>
  <datafield tag="710" ind1=" " ind2=" ">
    <subfield code="a">Universiti Teknologi Brunei</subfield>
  </datafield>
  <datafield tag="942" ind1=" " ind2=" ">
    <subfield code="2">lc</subfield>
    <subfield code="c">RTDS</subfield>
  </datafield>
  <datafield tag="998" ind1=" " ind2=" ">
    <subfield code="e">Reports, Thesis &amp; Dissertation</subfield>
    <subfield code="s">850582 : 002454 c.1 UTB </subfield>
    <subfield code="x">Universiti Teknologi Brunei</subfield>
  </datafield>
  <datafield tag="999" ind1=" " ind2=" ">
    <subfield code="c">24218</subfield>
    <subfield code="d">24218</subfield>
  </datafield>
  <datafield tag="952" ind1=" " ind2=" ">
    <subfield code="0">0</subfield>
    <subfield code="1">0</subfield>
    <subfield code="2">lc</subfield>
    <subfield code="4">0</subfield>
    <subfield code="7">0</subfield>
    <subfield code="a">UTB</subfield>
    <subfield code="b">UTB</subfield>
    <subfield code="c">Level 2</subfield>
    <subfield code="d">2023-07-17</subfield>
    <subfield code="e">Universiti Teknologi Brunei</subfield>
    <subfield code="l">0</subfield>
    <subfield code="o">UTB 120 REPORT THESIS &amp; DISSERTATION, RTDS 410</subfield>
    <subfield code="p">850582</subfield>
    <subfield code="r">2026-06-08</subfield>
    <subfield code="t">c. 1</subfield>
    <subfield code="w">2026-06-08</subfield>
    <subfield code="y">RTDS</subfield>
    <subfield code="z">Reg. No. 002454_UTB [RTDS410]</subfield>
  </datafield>
</record>
