0% found this document useful (0 votes)
24 views30 pages

Maarten Grootendorst: Notion

Uploaded by

ChangYulLee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views30 pages

Maarten Grootendorst: Notion

Uploaded by

ChangYulLee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

!

… Notion

(Maarten Grootendorst)
.

: Maarten Grootendorst
:
.
1 .
(Original English Post by Maarten
Grootendorst):

A Visual Guide to Mamba and State Space Mod…


An Alternative to Transformers for Language Modeling

https://maartengrootendorst.substack.com/p/a-vi…

(LLMs)
.
LLMs ,
Mistral ChatGPT
.
LLMs ,

. (Mamba),
(State Space) .
'Mamba: Linear-Time Sequence
Modeling with Selective State Spaces'
.
GitHub repo .1

,
. ,
.
,
!
1 :

RNNs ?

2 : (State Space Models, SSM)

3 : - (Selective) SSM

Resource

1 :
,

.
,
.

,
.
.

.
(Generative Pre-trained
Transformer, GPT)
.


(self-
attention)
.

.
.
?

.
.

. "My"
"name" ,
"name" "is" .

!
. ,

.
L
L² ,
.

.
" "
(Recurrent Neural Networks)
.

RNNs ?
(RNN) .
,
t t-1 ,
.

RNNs
. "
" .
, RNN
.
.
, RNNs
!
.
, RNN
.
.

"Maarten"
"Hello"
. RNNs

.
RNNs .

RNNs !
.

?
! ,

.
2 : (State Space
Models, SSM)
(State Space Model, SSM)
RNN
.
. SSM
.

.
.
.
." "
( ) .
,
.
" " .
( ), (
), (
) .

,
,
.
, X Y
" "
.
?
" "
. , ( )
.

, " "
,

?
SSM
.
, t SSM :

• x(t) —( :
, )

• h(t) —( :
x/y )

• y(t) —( :
)
,
,
.
SSM , 3D
t
.

, (
)
.

h(t) .
.

. ,
.

( A )
( B
) .

, h(t) t
, x(t) .
(
C )
( D ) .
: A, B, C, D
.

.
x(t) .

B .

( )
" " .
A
.
.

, A
.
, C
.

, D
.
(skip-connection) .
D , SSM
.

, SSM
A, B, C .

(
)
.

.
, SSM
(continuous-time representation) .
, h(t)
. ,
( : ) ,
.
, - (Zero-order hold)
. .
, ,
.
SSM :

Δ .
.
,

!
, -
:

, SSM SSM
, - (x(t) → y(t))
- (xₖ → yₖ) :
, A B
.
SSM SSM
t k
.

: A
.
.

,
.

SSM
.
RNNs ,
.
,

, (Bxₖ)
(Ahₖ₋₁)
(Chₖ) .
!
RNN
.

( ) :

RNN
.
RNN . ,
.

SSM
.
( )
:
1
:

" "
SSM :

.
, SSM
:

. ,
.
,
:

,
:

SSM
(CNNs)
. ,
RNNs .

, , ,
:

, SSM
SSM
.
,
.
,
:
(Linear State-
Space Layer, LSSL) .2
(Linear Time
Invariance, LTI) .
LTI SSM A, B, C
. , A, B, C
SSM .
, SSM , A, B,
C . (content-
awareness) .
,
A .

A
A SSM
. ,

, A :
A

.
.
( )
A ?
(Hungry Hungry Hippo)
HiPPO(High-order Polynomial Projection
Operators)3 . HiPPO
.

A
.
:

A , :
HiPPO A

. , ( )
( ) .
HiPPO
.
, (Legendre
polynomial)
.4
HiPPO
.
SSM
Structured State Space for Sequences (S4)
.5
:


• HiPPO

SSM ( )
,
HiPPO
.

: HiPPO S4

, Annotated S4
.
3 : - (Selective)
SSM

.
,
.

:
1. ( )

2. , ( )

.
SSM S6 ,
-
.
,
.

?
, S4(Structured State Space
Model)
. ,
.
(Selective Copying)
(induction heads)
.
, SSM
:
( / ) SSM
, .
, A, B, C SSM
.
, SSM
. A, B, C
. SSM (
) .
SSM
(induction heads) ,
:

,
. "Q:"
"A:" " " .
SSM ,
.
B . x
B x
:

,A C
. SSM
.
,
.
.
" "" "
.
SSM
SSM , A, B, C
.

SSM
.

.
.
:

,
.
, ,
(stop words) .

. , SSM
:
Structured State Space Model (S4) , A,
B, C . N D
.

, B C, Δ

B C
.
!

: A .

, (B C )
.

,
.
Δ
.
Δ :
,
.

.
,
:

( A )
( B ) .
, for
.
,
.

. ,
:
, B C,

GPU
SRAM DRAM
(IO) . SRAM
DRAM
.

Flash Attention DRAM SRAM


, .

DRAM
SRAM :

, :

• Δ


• C

.
,
. ,
.
,
DRAM
.

SSM. : Gu, Albert, and Tri Dao. "Mamba:


Linear-time sequence modeling with selective state
spaces." arXiv preprint arXiv:2312.00752 (2023).

SSM S6
,
S4 .

SSM
-
.

,
,
:
. , SSM

.
SSM :

• SSM

• HiPPO
A


- -

:
.
,
!

!
. ,
.
,

Resource

. ,
:

• Annotated S4 S4 JAX
, !

• YouTube
.

• Hugging Face Repo.

• S4
(1, 2, 3).


Mamba No. 5 (A Little
Bit Of...) .

• , ! DNA
.

1 Gu, Albert, and Tri Dao. "Mamba: Linear-time


sequence modeling with selective state
spaces." arXiv preprint
arXiv:2312.00752 (2023).
2 Gu, Albert, et al. "Combining recurrent,
convolutional, and continuous-time models
with linear state space layers." Advances in
neural information processing systems 34
(2021): 572-585.

You might also like