-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Most of the cases I observed that neon to rvv there is one to many mapping
example
FORCE_INLINE float32x2_t vmulx_f32(float32x2_t a, float32x2_t b) {
vbool32_t a_non_nan_mask = __riscv_vmfeq_vv_f32m1_b32(a, a, 2);
vbool32_t b_non_nan_mask = __riscv_vmfeq_vv_f32m1_b32(b, b, 2);
vbool32_t ab_nan_mask = __riscv_vmnot_m_b32(__riscv_vmor_mm_b32(a_non_nan_mask, b_non_nan_mask, 2), 2);
vfloat32m1_t mul = __riscv_vfmul_vv_f32m1(a, b, 2);
vbool32_t non_nan_mask = __riscv_vmfeq_vv_f32m1_b32(mul, mul, 2);
vbool32_t non_two_mask = __riscv_vmor_mm_b32(non_nan_mask, ab_nan_mask, 2);
vfloat32m1_t all_twos = __riscv_vfmv_v_f_f32m1(2, 2);
return __riscv_vmerge_vvm_f32m1(all_twos, mul, non_two_mask, 2);
}
Because of this elf size might increased performance might decrease so can we avoid this ? or what be be the best alternative to minimize calls