- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Optimal control of dynamic systems through the reinforcement...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Optimal control of dynamic systems through the reinforcement learning of transition points Buckland, Kenneth M.
Abstract
This work describes the theoretical development and practical application of transition point dynamic programming (TPDP). TPDP is a memory-based, reinforcement learning, direct dynamic programming approach to adaptive optimal control that can reduce the learning time and memory usage required for the control of continuous stochastic dynamic systems. TPDP does so by determining an ideal set of transition points (TPs) which specify, at various system states, only the control action changes necessary for optimal control. TPDP converges to an ideal TP set by using a variation of Q-learning to assess the merits of adding, swapping and removing TPs from states throughout the state space. This work first presents how optimal control is achieved using dynamic programming, in particular Q-learning. It then presents the basic TPDP concept and proof that TPDP converges to an ideal set of TPs. After the formal presentation of TPDP, a Practical TPDP Algorithm will be described which facilitates the application of TPDP to practical problems. The compromises made to achieve good performance with the Practical TPDP Algorithm invalidate the TPDP convergence proofs, but near optimal control policies were nevertheless learned in the practical problems considered. These policies were learned very quickly compared to conventional Q-learning, and less memory was required during the learning process. A neural network implementation of TPDP is also described, and the possibility of this neural network being a plausible model of biological movement control is speculated upon. Finally, the incorporation of TPDP into a complete hierarchical controller is discussed, and potential enhancements of TPDP are presented.
Item Metadata
Title |
Optimal control of dynamic systems through the reinforcement learning of transition points
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
1994
|
Description |
This work describes the theoretical development and practical application of transition
point dynamic programming (TPDP). TPDP is a memory-based, reinforcement learning,
direct dynamic programming approach to adaptive optimal control that can reduce the
learning time and memory usage required for the control of continuous stochastic dynamic
systems. TPDP does so by determining an ideal set of transition points (TPs) which
specify, at various system states, only the control action changes necessary for optimal
control. TPDP converges to an ideal TP set by using a variation of Q-learning to assess
the merits of adding, swapping and removing TPs from states throughout the state space.
This work first presents how optimal control is achieved using dynamic programming,
in particular Q-learning. It then presents the basic TPDP concept and proof that TPDP
converges to an ideal set of TPs. After the formal presentation of TPDP, a Practical
TPDP Algorithm will be described which facilitates the application of TPDP to practical problems. The compromises made to achieve good performance with the Practical
TPDP Algorithm invalidate the TPDP convergence proofs, but near optimal control policies were nevertheless learned in the practical problems considered. These policies were
learned very quickly compared to conventional Q-learning, and less memory was required
during the learning process.
A neural network implementation of TPDP is also described, and the possibility of
this neural network being a plausible model of biological movement control is speculated
upon. Finally, the incorporation of TPDP into a complete hierarchical controller is
discussed, and potential enhancements of TPDP are presented.
|
Extent |
4144678 bytes
|
Genre | |
Type | |
File Format |
application/pdf
|
Language |
eng
|
Date Available |
2009-04-08
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.
|
DOI |
10.14288/1.0065157
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
1994-05
|
Campus | |
Scholarly Level |
Graduate
|
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.