Research and write about an interesting new product or computer technology announced
in the last year. Explain why it’s interesting from a technical computer systems viewpoint
Introduction
Artificial intelligence (AI) has already revolutionised traditional industries in recent years. For
example, Google's DeepMind AlphaFold can accelerate the disease research and drug
development (Wikipedia, 2022).
This essay will discuss the improvement in the Neuro Linguistic Programming (NLP) field-
OpenAI’s ChatGPT, a large language model (LLM) that can simulate human-like conversations
(Helter, 2023).
Why it is interesting
ChatGPT is interesting from a technical computer systems viewpoint because its two
features: transformer architecture and Reinforcement Learning from Human Feedback
(RELF) technique represent giant improvements and have the real-life applications.
Transformer architecture
ChatGPT is one of the Large Language Models (LLMs) works based on the transformer
architecture (Radford et al., 2020), which is a type of deep learning model that first
introduced (Vaswani et al,2017). This reshapes machine learning as the conventional
method requires a lot of manual engineering and is time-consuming and expensive. In
comparison, deep learning has the ability to learn and analyse relevant information from the
complex datasets, for example, human- write text and pictures. Embodying in the computer
system(applications), for example, image and video analysis, can process data and perform
work that traditionally require human intervention.
Reinforcement Learning from Human Feedback (RELF) technique
Secondly, ChatGPT is the first one to use Reinforcement Learning from Human Feedback
(RELF) technique in the real-world case scenarios (Manikandan, 2023).
Traditionally, the next- token- prediction technique was used to develop the language model
(Chang, 2024), in which, the provided answer is the most commonly appear word in the
previous content that the model ‘learned’. This causes the answer may incorrectly fit the
complex text and human preferences. The three steps are included in RELF to solve this
issue: Supervised Fine-Turning (SFT) model (Walker II, 2024), Reward Model (RM)
(Khandelwal, 2023) and Proximal Policy Optimisation (PPO) (Mathew, 2023. SFT model
collects the sample from datasets to train the model based on the supervised requirements.
Then, RM is employed to rank all outputs from the model from the best to worst based on
the human feedback. Then, the PPO learning algorithm directly learns and updates the
datasets from SFT and RM models. This can train the model to provide more accurate,
reliable and human like responses. The computer system with this technique, especially for
applications that require personalised, can bring human and technology closer together.
Conclusion
ChatGPT is an ahead of curve product and it is interesting to investigate because of its
advanced features and potential in real- life applications.
Reference
AlphaFold. (2022, August 12). Wikipedia. AlphaFold - Wikipedia
Chang, R. (2024, April 15). How far can “Next Token Prediction” go? -
Raymond Chang - Medium. Medium.
https://medium.com/@raymondchang_24290/how-far-can-next-token-
prediction-go-b330d72c7754
Hetler, A. (2023, December). What is ChatGPT? Everything You Need to Know. TechTarget.
What Is ChatGPT? Everything You Need to Know | TechTarget
Khandelwal, A. (2023, August 11). RLHF Reward Model Training. Towards Generative AI.
https://medium.com/towards-generative-ai/reward-model-training-2209d1befb5f.
Klu. (2023, July 4). What is supervised fine-tuning? Klu.ai. Supervised fine-tuning (SFT) — Klu
Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving Language
Understanding by Generative Pre-Training.
https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_p
aper.pdf
Manikandan, B. (2023, February 2). Demystifying ChatGPT: A Deep Dive
into Reinforcement Learning with Human Feedback. Medium. ChatGPT: A
study from Reinforcement Learning | Medium
Mathew, S. (2023, February 26). The Power of ChatGPT, InstructGPT, and Proximal Policy
Optimization Algorithms. LinkedIn. (2) The Power of ChatGPT, InstructGPT, and Proximal
Policy Optimization Algorithms | LinkedIn