What Is ChatGPT Doing by Stephen Wolfram
Author:Stephen Wolfram [Wolfram, Stephen]
Language: eng
Format: epub
Publisher: Wolfram Media
Published: 2023-03-09T14:09:02+00:00
The input is a vector of n tokens (represented as in the previous section by integers from 1 to about 50,000). Each of these tokens is converted (by a single-layer neural net) into an embedding vector (of length 768 for GPT-2 and 12,288 for ChatGPTâs GPT-3). Meanwhile, thereâs a âsecondary pathwayâ that takes the sequence of (integer) positions for the tokens, and from these integers creates another embedding vector. And finally the embedding vectors from the token value and the token position are added togetherâto produce the final sequence of embedding vectors from the embedding module.
Why does one just add the token-value and token-position embedding vectors together? I donât think thereâs any particular science to this. Itâs just that various different things have been tried, and this is one that seems to work. And itâs part of the lore of neural nets thatâin some senseâso long as the setup one has is âroughly rightâ itâs usually possible to home in on details just by doing sufficient training, without ever really needing to âunderstand at an engineering levelâ quite how the neural net has ended up configuring itself.
Hereâs what the embedding module does, operating on the string hello hello hello hello hello hello hello hello hello hello bye bye bye bye bye bye bye bye bye bye:
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Computer Vision & Pattern Recognition | Expert Systems |
Intelligence & Semantics | Machine Theory |
Natural Language Processing | Neural Networks |
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8296)
Test-Driven Development with Java by Alan Mellor(6708)
Data Augmentation with Python by Duc Haba(6616)
Principles of Data Fabric by Sonia Mezzetta(6372)
Learn Blender Simulations the Right Way by Stephen Pearson(6267)
Microservices with Spring Boot 3 and Spring Cloud by Magnus Larsson(6134)
Hadoop in Practice by Alex Holmes(5958)
Jquery UI in Action : Master the concepts Of Jquery UI: A Step By Step Approach by ANMOL GOYAL(5806)
RPA Solution Architect's Handbook by Sachin Sahgal(5530)
Big Data Analysis with Python by Ivan Marin(5353)
The Infinite Retina by Robert Scoble Irena Cronin(5226)
Life 3.0: Being Human in the Age of Artificial Intelligence by Tegmark Max(5144)
Pretrain Vision and Large Language Models in Python by Emily Webber(4315)
Infrastructure as Code for Beginners by Russ McKendrick(4076)
Functional Programming in JavaScript by Mantyla Dan(4038)
The Age of Surveillance Capitalism by Shoshana Zuboff(3946)
WordPress Plugin Development Cookbook by Yannick Lefebvre(3790)
Embracing Microservices Design by Ovais Mehboob Ahmed Khan Nabil Siddiqui and Timothy Oleson(3592)
Applied Machine Learning for Healthcare and Life Sciences Using AWS by Ujjwal Ratan(3568)
