ah... the problem between imageftbbox() and imagefttext() lies in the mirroring of the y-axes.
Below you see, for a font-size 16 the boudingboxes of "b", "p" and "bp":
< b: w=9 h=15
b(0,-1)
b(9,-1)
b(9,-16)
b(0,-16)
< p: w=9 h=16
p(0,4)
p(9,4)
p(9,-12)
p(0,-12)
< bp: w=20 h=20
bp(0,4)
bp(20,4)
bp(20,-16)
bp(0,-16)
If drawing "bp" using imagefttext() at y=0, the the top of "bp" indeed is at y=-16, and the bottom of "bp" at y=4. (Plus or minus a pixel here and there, because at y=0 there actually is a vissible pixel.)