0% found this document useful (0 votes)
27 views3 pages

QUT - Assignment-1

Uploaded by

hongsilin1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views3 pages

QUT - Assignment-1

Uploaded by

hongsilin1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Research and write about an interesting new product or computer technology announced

in the last year. Explain why it’s interesting from a technical computer systems viewpoint

Introduction

Artificial intelligence (AI) has already revolutionised traditional industries in recent years. For
example, Google's DeepMind AlphaFold can accelerate the disease research and drug
development (Wikipedia, 2022).

This essay will discuss the improvement in the Neuro Linguistic Programming (NLP) field-
OpenAI’s ChatGPT, a large language model (LLM) that can simulate human-like conversations
(Helter, 2023).

Why it is interesting

ChatGPT is interesting from a technical computer systems viewpoint because its two
features: transformer architecture and Reinforcement Learning from Human Feedback
(RELF) technique represent giant improvements and have the real-life applications.

Transformer architecture

ChatGPT is one of the Large Language Models (LLMs) works based on the transformer
architecture (Radford et al., 2020), which is a type of deep learning model that first
introduced (Vaswani et al,2017). This reshapes machine learning as the conventional
method requires a lot of manual engineering and is time-consuming and expensive. In
comparison, deep learning has the ability to learn and analyse relevant information from the
complex datasets, for example, human- write text and pictures. Embodying in the computer
system(applications), for example, image and video analysis, can process data and perform
work that traditionally require human intervention.

Reinforcement Learning from Human Feedback (RELF) technique

Secondly, ChatGPT is the first one to use Reinforcement Learning from Human Feedback
(RELF) technique in the real-world case scenarios (Manikandan, 2023).

Traditionally, the next- token- prediction technique was used to develop the language model
(Chang, 2024), in which, the provided answer is the most commonly appear word in the
previous content that the model ‘learned’. This causes the answer may incorrectly fit the
complex text and human preferences. The three steps are included in RELF to solve this
issue: Supervised Fine-Turning (SFT) model (Walker II, 2024), Reward Model (RM)
(Khandelwal, 2023) and Proximal Policy Optimisation (PPO) (Mathew, 2023. SFT model
collects the sample from datasets to train the model based on the supervised requirements.
Then, RM is employed to rank all outputs from the model from the best to worst based on
the human feedback. Then, the PPO learning algorithm directly learns and updates the
datasets from SFT and RM models. This can train the model to provide more accurate,
reliable and human like responses. The computer system with this technique, especially for
applications that require personalised, can bring human and technology closer together.

Conclusion

ChatGPT is an ahead of curve product and it is interesting to investigate because of its


advanced features and potential in real- life applications.

Reference

AlphaFold. (2022, August 12). Wikipedia. AlphaFold - Wikipedia

Chang, R. (2024, April 15). How far can “Next Token Prediction” go? -
Raymond Chang - Medium. Medium.
https://medium.com/@raymondchang_24290/how-far-can-next-token-
prediction-go-b330d72c7754

Hetler, A. (2023, December). What is ChatGPT? Everything You Need to Know. TechTarget.

What Is ChatGPT? Everything You Need to Know | TechTarget

Khandelwal, A. (2023, August 11). RLHF Reward Model Training. Towards Generative AI.
https://medium.com/towards-generative-ai/reward-model-training-2209d1befb5f.

Klu. (2023, July 4). What is supervised fine-tuning? Klu.ai. Supervised fine-tuning (SFT) — Klu

Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving Language
Understanding by Generative Pre-Training.
https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_p
aper.pdf

Manikandan, B. (2023, February 2). Demystifying ChatGPT: A Deep Dive


into Reinforcement Learning with Human Feedback. Medium. ChatGPT: A
study from Reinforcement Learning | Medium

Mathew, S. (2023, February 26). The Power of ChatGPT, InstructGPT, and Proximal Policy
Optimization Algorithms. LinkedIn. (2) The Power of ChatGPT, InstructGPT, and Proximal
Policy Optimization Algorithms | LinkedIn

You might also like