!
… Notion
(Maarten Grootendorst)
.
: Maarten Grootendorst
:
.
1 .
(Original English Post by Maarten
Grootendorst):
A Visual Guide to Mamba and State Space Mod…
An Alternative to Transformers for Language Modeling
https://maartengrootendorst.substack.com/p/a-vi…
(LLMs)
.
LLMs ,
Mistral ChatGPT
.
LLMs ,
. (Mamba),
(State Space) .
'Mamba: Linear-Time Sequence
Modeling with Selective State Spaces'
.
GitHub repo .1
,
. ,
.
,
!
1 :
RNNs ?
2 : (State Space Models, SSM)
3 : - (Selective) SSM
Resource
1 :
,
.
,
.
,
.
.
.
(Generative Pre-trained
Transformer, GPT)
.
…
(self-
attention)
.
.
.
?
.
.
. "My"
"name" ,
"name" "is" .
!
. ,
.
L
L² ,
.
.
" "
(Recurrent Neural Networks)
.
RNNs ?
(RNN) .
,
t t-1 ,
.
RNNs
. "
" .
, RNN
.
.
, RNNs
!
.
, RNN
.
.
…
"Maarten"
"Hello"
. RNNs
.
RNNs .
RNNs !
.
?
! ,
.
2 : (State Space
Models, SSM)
(State Space Model, SSM)
RNN
.
. SSM
.
.
.
.
." "
( ) .
,
.
" " .
( ), (
), (
) .
,
,
.
, X Y
" "
.
?
" "
. , ( )
.
, " "
,
?
SSM
.
, t SSM :
• x(t) —( :
, )
• h(t) —( :
x/y )
• y(t) —( :
)
,
,
.
SSM , 3D
t
.
, (
)
.
h(t) .
.
. ,
.
( A )
( B
) .
, h(t) t
, x(t) .
(
C )
( D ) .
: A, B, C, D
.
.
x(t) .
B .
( )
" " .
A
.
.
, A
.
, C
.
, D
.
(skip-connection) .
D , SSM
.
, SSM
A, B, C .
(
)
.
.
, SSM
(continuous-time representation) .
, h(t)
. ,
( : ) ,
.
, - (Zero-order hold)
. .
, ,
.
SSM :
Δ .
.
,
!
, -
:
, SSM SSM
, - (x(t) → y(t))
- (xₖ → yₖ) :
, A B
.
SSM SSM
t k
.
: A
.
.
,
.
SSM
.
RNNs ,
.
,
, (Bxₖ)
(Ahₖ₋₁)
(Chₖ) .
!
RNN
.
( ) :
RNN
.
RNN . ,
.
SSM
.
( )
:
1
:
" "
SSM :
.
, SSM
:
. ,
.
,
:
,
:
SSM
(CNNs)
. ,
RNNs .
, , ,
:
, SSM
SSM
.
,
.
,
:
(Linear State-
Space Layer, LSSL) .2
(Linear Time
Invariance, LTI) .
LTI SSM A, B, C
. , A, B, C
SSM .
, SSM , A, B,
C . (content-
awareness) .
,
A .
A
A SSM
. ,
, A :
A
.
.
( )
A ?
(Hungry Hungry Hippo)
HiPPO(High-order Polynomial Projection
Operators)3 . HiPPO
.
A
.
:
A , :
HiPPO A
. , ( )
( ) .
HiPPO
.
, (Legendre
polynomial)
.4
HiPPO
.
SSM
Structured State Space for Sequences (S4)
.5
:
•
• HiPPO
SSM ( )
,
HiPPO
.
: HiPPO S4
, Annotated S4
.
3 : - (Selective)
SSM
.
,
.
:
1. ( )
2. , ( )
.
SSM S6 ,
-
.
,
.
?
, S4(Structured State Space
Model)
. ,
.
(Selective Copying)
(induction heads)
.
, SSM
:
( / ) SSM
, .
, A, B, C SSM
.
, SSM
. A, B, C
. SSM (
) .
SSM
(induction heads) ,
:
,
. "Q:"
"A:" " " .
SSM ,
.
B . x
B x
:
,A C
. SSM
.
,
.
.
" "" "
.
SSM
SSM , A, B, C
.
SSM
.
.
.
:
,
.
, ,
(stop words) .
. , SSM
:
Structured State Space Model (S4) , A,
B, C . N D
.
, B C, Δ
B C
.
!
: A .
, (B C )
.
,
.
Δ
.
Δ :
,
.
.
,
:
( A )
( B ) .
, for
.
,
.
. ,
:
, B C,
GPU
SRAM DRAM
(IO) . SRAM
DRAM
.
Flash Attention DRAM SRAM
, .
DRAM
SRAM :
, :
• Δ
•
• C
.
,
. ,
.
,
DRAM
.
SSM. : Gu, Albert, and Tri Dao. "Mamba:
Linear-time sequence modeling with selective state
spaces." arXiv preprint arXiv:2312.00752 (2023).
SSM S6
,
S4 .
SSM
-
.
,
,
:
. , SSM
.
SSM :
• SSM
• HiPPO
A
•
- -
:
.
,
!
!
. ,
.
,
Resource
. ,
:
• Annotated S4 S4 JAX
, !
• YouTube
.
• Hugging Face Repo.
• S4
(1, 2, 3).
•
Mamba No. 5 (A Little
Bit Of...) .
• , ! DNA
.
1 Gu, Albert, and Tri Dao. "Mamba: Linear-time
sequence modeling with selective state
spaces." arXiv preprint
arXiv:2312.00752 (2023).
2 Gu, Albert, et al. "Combining recurrent,
convolutional, and continuous-time models
with linear state space layers." Advances in
neural information processing systems 34
(2021): 572-585.