✅【研报复现】开源证券:聪明钱因子模型的2.0版本
- **聪明钱因子的改进方案**
- 1. S 1 = V S_1=V S1=V
- 2. S 2 = r a n k ( ∣ R ∣ ) + r a n k ( ∣ V ∣ ) S_2=rank(|R|)+rank(|V|) S2=rank(∣R∣)+rank(∣V∣) (表现稍微还算✅)
- 3. S 3 = ∣ R ∣ l n ( V ) S_3=\frac{|R|}{ln(V)} S3=ln(V)∣R∣
- 4. S = ∣ R ∣ × t a k e r _ b u y _ v o l u m e v o l u m e S = |R| \times \frac{taker\_buy\_volume}{volume} S=∣R∣×volumetaker_buy_volume
- 5. S = ∣ R ∣ t r a d e _ c o u n t S = \frac{|R|}{\sqrt{trade\_count}} S=trade_count∣R∣ (表现稍微还算✅)
- 6. S = ∣ R ∣ × e − β V S = |R| \times e^{- \frac{\beta}{V}} S=∣R∣×e−Vβ
- 针对尾部(尤其是近期)数据收益表现反常修改因子
- **讨论**:不同截止值的差异
0. 原始聪明钱因子 ( S t = ∣ R t ∣ V t S_t = \frac{|R_t|}{\sqrt{V_t}} St=Vt∣Rt∣)
- 核心逻辑是,从分钟行情数据的价量信息中,尝试识别机构参与交易的多寡,最终构造出一个跟踪聪明钱的选股因子。
- 大量的实证研究表明,聪明钱在交易过程中往往呈现“单笔订单数量更大、订单报价更为激进”的基本特征。
- 基于这个考虑,我们构造了用于度量交易聪明度的指标S(计算步骤如下)
- 对选定股票,回溯取其过去
10个交易日
的分钟行情数据; - 构造指标
S
t
=
∣
R
t
∣
V
t
S_t = \frac{|R_t|}{\sqrt{V_t}}
St=Vt∣Rt∣,其中
R
t
R_t
Rt 为
第 t 分钟涨跌幅
, V t V_t Vt 为第 t 分钟成交量
;- 指标S的数值越大,则认为该分钟的交易中有越多聪明钱参与。
- 将分钟数据按照指标
S
t
S_t
St
从大到小
进行排序,取成交量累积占比前20%
的分钟,视为聪明钱交易; - 计算聪明钱交易的成交量加权平均价 VWAP smart \text{VWAP}_{\text{smart}} VWAPsmart;
- 计算所有交易的成交量加权平均价 VWAP all \text{VWAP}_{\text{all}} VWAPall;
- 聪明钱因子
Q
=
VWAP
smart
VWAP
all
Q = \frac{\text{VWAP}_{\text{smart}}}{\text{VWAP}_{\text{all}}}
Q=VWAPallVWAPsmart。
- VWAP smart \text{VWAP}_{\text{smart}} VWAPsmart 是聪明钱的成交量加权平均价
- VWAP all \text{VWAP}_{\text{all}} VWAPall 是所有交易的成交量加权平均价
- 因子Q实际上反映了在该时间段中聪明钱参与交易的相对价位。
- 之所以将其称为聪明钱的情绪因子,是因为:
- 因子Q的值越大,表明聪明钱的交易越倾向于出现在价格较高处,这是逢高出货的表现,反映了聪明钱的悲观态度;
- 因子Q的值越小,则表明聪明钱的交易多出现在价格较低处,这是逢低吸筹的表现,是乐观的情绪。
- 对选定股票,回溯取其过去
代码实现
def factor(df, window_size = 960, threshold_value = 0.2):
"""
原始聪明钱因子 (S = |R|/sqrt(V))
计算步骤:
1. 计算每根K线的平均价和涨跌幅绝对值
2. 回溯过去10天(960根15分钟K线)的数据
3. 计算每根K线的S值 = |涨跌幅| / sqrt(成交量)
4. 按S值降序排序,取成交量累积占比前20%的K线
5. 计算聪明钱VWAP和整体VWAP
6. 因子值 = 聪明钱VWAP / 整体VWAP
优化后的聪明钱因子计算 (S = |R|/sqrt(V))
优化点:
1. 预计算所有必要列(避免重复计算)
2. 使用滚动窗口向量化操作
3. 使用numpy替代pandas进行窗口内计算
4. 避免在循环中创建临时Series
"""
# 预计算所有必要值
avg_price = (df['open'] + df['high'] + df['low'] + df['close']) / 4
avg_price_vals = avg_price.values
abs_ret = abs((df['close'] - df['open']) / df['open'])
volume = df['volume'].values
# 初始化因子值数组
factor_values = np.full(len(df), np.nan)
# 预计算S值 (|R|/sqrt(V))
S_values = abs_ret.values / np.sqrt(volume)
# 使用滑动窗口计算
for i in range(window_size, len(df)):
# 获取当前窗口切片索引
start_idx = i - window_size
end_idx = i - 1
# 提取当前窗口数据
window_avg_price = avg_price_vals[start_idx:end_idx]
window_volume = volume[start_idx:end_idx]
window_S = S_values[start_idx:end_idx]
# 按S值降序排序的索引
sorted_idx = np.argsort(-window_S)
sorted_volume = window_volume[sorted_idx]
sorted_avg_price = window_avg_price[sorted_idx]
# 计算累积成交量
cum_volume = np.cumsum(sorted_volume)
total_volume = cum_volume[-1]
threshold = total_volume * threshold_value
# 找到聪明钱交易点 (前20%成交量)
smart_mask = cum_volume <= threshold
if np.any(smart_mask):
smart_vwap = np.sum(sorted_avg_price[smart_mask] *
sorted_volume[smart_mask]) / np.sum(sorted_volume[smart_mask])
else:
smart_vwap = np.nan
# 计算整体VWAP
all_vwap = np.sum(window_avg_price * window_volume) / np.sum(window_volume)
# 计算因子值
if not np.isnan(smart_vwap) and all_vwap != 0:
factor_values[i] = smart_vwap / all_vwap
return pd.Series(factor_values, index=df.index)
因子表现
📊 单币种 (single) 详细评估结果:
--------------------------------------------------
🔗 相关性分析:
IC (Pearson): 0.015930
Rank_IC (Spearman): 0.018679
📊 信息比率:
IR: 0.132837
聪明钱因子的改进方案
- 一般化,将分钟成交量 V 的指数项定义为可变的参数
- S = ∣ R ∣ V β S = \frac{|R|}{V^\beta} S=Vβ∣R∣
- 其中,R为分钟涨跌幅,V为分钟成交量, β \beta β 为分钟成交量V的指数项参数。
- 不同的S指标构造方式
- 基于不同的逻辑,S指标的构造方式也会不同。这里我们尝试了3种不同的S指标构造方式:
1. S 1 = V S_1=V S1=V
- 含义:分钟成交量
- 单独考虑成交量因素,将分钟成交量较大的交易划分为聪明钱交易
代码实现
def factor(df):
"""
优化后的 calculate_smart_money_S1
使用成交量作为排序标准
"""
# 预计算所有必要值
avg_price = (df['open'] + df['high'] + df['low'] + df['close']) / 4
volume = df['volume'].values
avg_price_vals = avg_price.values
# 初始化因子值数组
factor_values = np.full(len(df), np.nan)
window_size = 960
for i in range(window_size, len(df)):
# 获取当前窗口切片索引
start_idx = i - window_size
end_idx = i - 1
# 提取当前窗口数据
window_avg_price = avg_price_vals[start_idx:end_idx]
window_volume = volume[start_idx:end_idx]
# 直接使用成交量作为S值
S = window_volume # 定义就在这里!!!
# 按S值降序排序的索引
sorted_idx = np.argsort(-S)
sorted_volume = window_volume[sorted_idx]
sorted_avg_price = window_avg_price[sorted_idx]
# 计算累积成交量
cum_volume = np.cumsum(sorted_volume)
total_volume = cum_volume[-1]
threshold = total_volume * 0.2
# 找到聪明钱交易点 (前20%成交量)
smart_mask = cum_volume <= threshold
if np.any(smart_mask):
smart_vwap = np.sum(sorted_avg_price[smart_mask] *
sorted_volume[smart_mask]) / np.sum(sorted_volume[smart_mask])
else:
smart_vwap = np.nan
# 计算整体VWAP
all_vwap = np.sum(window_avg_price * window_volume) / np.sum(window_volume)
# 计算因子值
if not np.isnan(smart_vwap) and all_vwap != 0:
factor_values[i] = smart_vwap / all_vwap
return pd.Series(factor_values, index=df.index)
因子表现
📊 单币种 (single) 详细评估结果:
--------------------------------------------------
🔗 相关性分析:
IC (Pearson): 0.013049
Rank_IC (Spearman): 0.017434
📊 信息比率:
IR: 0.012048
2. S 2 = r a n k ( ∣ R ∣ ) + r a n k ( ∣ V ∣ ) S_2=rank(|R|)+rank(|V|) S2=rank(∣R∣)+rank(∣V∣) (表现稍微还算✅)
- 含义:分钟涨跌幅绝对值分位排名 与 分钟成交量分位排名 之和
- 综合考虑分钟交易的成交量和涨跌幅绝对值排名,将排名之和靠前的交易划分为聪明钱交易
代码实现
def factor(df):
"""
优化后的 calculate_smart_money_S2
使用涨跌幅和成交量的排名之和作为排序标准
"""
# 预计算所有必要值
avg_price = (df['open'] + df['high'] + df['low'] + df['close']) / 4
abs_ret = abs((df['close'] - df['open']) / df['open']).values
volume = df['volume'].values
avg_price_vals = avg_price.values
# 初始化因子值数组
factor_values = np.full(len(df), np.nan)
window_size = 960
for i in range(window_size, len(df)):
# 获取当前窗口切片索引
start_idx = i - window_size
end_idx = i - 1
# 提取当前窗口数据
window_avg_price = avg_price_vals[start_idx:end_idx]
window_abs_ret = abs_ret[start_idx:end_idx]
window_volume = volume[start_idx:end_idx]
# 计算排名 (使用numpy的argsort替代pandas rank)
rank_ret = np.argsort(np.argsort(window_abs_ret)) + 1
rank_volume = np.argsort(np.argsort(window_volume)) + 1
S = rank_ret + rank_volume
# 按S值降序排序的索引
sorted_idx = np.argsort(-S)
sorted_volume = window_volume[sorted_idx]
sorted_avg_price = window_avg_price[sorted_idx]
# 计算累积成交量
cum_volume = np.cumsum(sorted_volume)
total_volume = cum_volume[-1]
threshold = total_volume * 0.2
# 找到聪明钱交易点 (前20%成交量)
smart_mask = cum_volume <= threshold
if np.any(smart_mask):
smart_vwap = np.sum(sorted_avg_price[smart_mask] *
sorted_volume[smart_mask]) / np.sum(sorted_volume[smart_mask])
else:
smart_vwap = np.nan
# 计算整体VWAP
all_vwap = np.sum(window_avg_price * window_volume) / np.sum(window_volume)
# 计算因子值
if not np.isnan(smart_vwap) and all_vwap != 0:
factor_values[i] = smart_vwap / all_vwap
return pd.Series(factor_values, index=df.index)
因子表现
📊 单币种 (single) 详细评估结果:
--------------------------------------------------
🔗 相关性分析:
IC (Pearson): 0.013273
Rank_IC (Spearman): 0.018782
📊 信息比率:
IR: -0.218254
3. S 3 = ∣ R ∣ l n ( V ) S_3=\frac{|R|}{ln(V)} S3=ln(V)∣R∣
- 含义:分钟涨跌幅绝对值 除以 分钟成交量对数值
- 是基于原始S指标的变形,尝试对分钟成交量作对数变换构造聪明钱因子
代码实现
def factor(df):
"""
优化后的 calculate_smart_money_S3
使用|R|/ln(V)作为排序标准
"""
# 预计算所有必要值
avg_price = (df['open'] + df['high'] + df['low'] + df['close']) / 4
abs_ret = abs((df['close'] - df['open']) / df['open']).values
volume = df['volume'].values
avg_price_vals = avg_price.values
# 初始化因子值数组
factor_values = np.full(len(df), np.nan)
window_size = 960
# 添加小量避免log(0)
log_volume = np.log(volume + 1e-5)
for i in range(window_size, len(df)):
# 获取当前窗口切片索引
start_idx = i - window_size
end_idx = i - 1
# 提取当前窗口数据
window_avg_price = avg_price_vals[start_idx:end_idx]
window_abs_ret = abs_ret[start_idx:end_idx]
window_log_volume = log_volume[start_idx:end_idx]
window_volume = volume[start_idx:end_idx]
# 计算S值
S = window_abs_ret / window_log_volume
# 按S值降序排序的索引
sorted_idx = np.argsort(-S)
sorted_volume = window_volume[sorted_idx]
sorted_avg_price = window_avg_price[sorted_idx]
# 计算累积成交量
cum_volume = np.cumsum(sorted_volume)
total_volume = cum_volume[-1]
threshold = total_volume * 0.2
# 找到聪明钱交易点 (前20%成交量)
smart_mask = cum_volume <= threshold
if np.any(smart_mask):
smart_vwap = np.sum(sorted_avg_price[smart_mask] *
sorted_volume[smart_mask]) / np.sum(sorted_volume[smart_mask])
else:
smart_vwap = np.nan
# 计算整体VWAP
all_vwap = np.sum(window_avg_price * window_volume) / np.sum(window_volume)
# 计算因子值
if not np.isnan(smart_vwap) and all_vwap != 0:
factor_values[i] = smart_vwap / all_vwap
return pd.Series(factor_values, index=df.index)
因子表现
📊 单币种 (single) 详细评估结果:
--------------------------------------------------
🔗 相关性分析:
IC (Pearson): 0.015834
Rank_IC (Spearman): 0.018512
📊 信息比率:
IR: 0.036820
4. S = ∣ R ∣ × t a k e r _ b u y _ v o l u m e v o l u m e S = |R| \times \frac{taker\_buy\_volume}{volume} S=∣R∣×volumetaker_buy_volume
代码实现
def factor(df, window_size = 960, threshold_value = 0.2):
"""
S = |R| × (taker_buy_volume / volume) (主动买入比例)
"""
# 预计算所有必要值
avg_price = (df['open'] + df['high'] + df['low'] + df['close']) / 4
avg_price_vals = avg_price.values
abs_ret = abs((df['close'] - df['open']) / df['open'])
volume = df['volume'].values
# S = |R| × (taker_buy_volume / volume)
# 计算主动买入比例
buy_ratio = df['taker_buy_volume'] / (df['volume'] + 1e-9)
S_values = abs_ret.values * buy_ratio.values
# 初始化因子值数组
factor_values = np.full(len(df), np.nan)
# 使用滑动窗口计算
for i in range(window_size, len(df)):
# 获取当前窗口切片索引
start_idx = i - window_size
end_idx = i - 1
# 提取当前窗口数据
window_avg_price = avg_price_vals[start_idx:end_idx]
window_volume = volume[start_idx:end_idx]
window_S = S_values[start_idx:end_idx]
# 按S值降序排序的索引
sorted_idx = np.argsort(-window_S)
sorted_volume = window_volume[sorted_idx]
sorted_avg_price = window_avg_price[sorted_idx]
# 计算累积成交量
cum_volume = np.cumsum(sorted_volume)
total_volume = cum_volume[-1]
threshold = total_volume * threshold_value
# 找到聪明钱交易点 (前20%成交量)
smart_mask = cum_volume <= threshold
if np.any(smart_mask):
smart_vwap = np.sum(sorted_avg_price[smart_mask] *
sorted_volume[smart_mask]) / np.sum(sorted_volume[smart_mask])
else:
smart_vwap = np.nan
# 计算整体VWAP
all_vwap = np.sum(window_avg_price * window_volume) / np.sum(window_volume)
# 计算因子值
if not np.isnan(smart_vwap) and all_vwap != 0:
factor_values[i] = smart_vwap / all_vwap
return pd.Series(factor_values, index=df.index)
因子表现
📊 单币种 (single) 详细评估结果:
--------------------------------------------------
🔗 相关性分析:
IC (Pearson): 0.014563
Rank_IC (Spearman): 0.020014
📊 信息比率:
IR: -0.271804
5. S = ∣ R ∣ t r a d e _ c o u n t S = \frac{|R|}{\sqrt{trade\_count}} S=trade_count∣R∣ (表现稍微还算✅)
代码实现
def factor(df, window_size = 960, threshold_value = 0.2):
"""
S = |R| / √(trade_count) (交易笔数调整)
"""
# 预计算所有必要值
avg_price = (df['open'] + df['high'] + df['low'] + df['close']) / 4
avg_price_vals = avg_price.values
abs_ret = abs((df['close'] - df['open']) / df['open'])
volume = df['volume'].values
sqrt_trade = np.sqrt(df['trade_count'])
S_values = abs_ret.values / (sqrt_trade.values + 1e-9) # 避免除零
# 初始化因子值数组
factor_values = np.full(len(df), np.nan)
# 使用滑动窗口计算
for i in range(window_size, len(df)):
# 获取当前窗口切片索引
start_idx = i - window_size
end_idx = i - 1
# 提取当前窗口数据
window_avg_price = avg_price_vals[start_idx:end_idx]
window_volume = volume[start_idx:end_idx]
window_S = S_values[start_idx:end_idx]
# 按S值降序排序的索引
sorted_idx = np.argsort(-window_S)
sorted_volume = window_volume[sorted_idx]
sorted_avg_price = window_avg_price[sorted_idx]
# 计算累积成交量
cum_volume = np.cumsum(sorted_volume)
total_volume = cum_volume[-1]
threshold = total_volume * threshold_value
# 找到聪明钱交易点 (前20%成交量)
smart_mask = cum_volume <= threshold
if np.any(smart_mask):
smart_vwap = np.sum(sorted_avg_price[smart_mask] *
sorted_volume[smart_mask]) / np.sum(sorted_volume[smart_mask])
else:
smart_vwap = np.nan
# 计算整体VWAP
all_vwap = np.sum(window_avg_price * window_volume) / np.sum(window_volume)
# 计算因子值
if not np.isnan(smart_vwap) and all_vwap != 0:
factor_values[i] = smart_vwap / all_vwap
return pd.Series(factor_values, index=df.index)
因子表现
📊 单币种 (single) 详细评估结果:
--------------------------------------------------
🔗 相关性分析:
IC (Pearson): 0.016723
Rank_IC (Spearman): 0.019661
📊 信息比率:
IR: 0.377188
6. S = ∣ R ∣ × e − β V S = |R| \times e^{- \frac{\beta}{V}} S=∣R∣×e−Vβ
代码实现
def factor(df, window_size = 960, threshold_value = 0.2):
"""
S = |R| × e^(-β/V) (指数衰减函数)
"""
# 预计算所有必要值
avg_price = (df['open'] + df['high'] + df['low'] + df['close']) / 4
avg_price_vals = avg_price.values
abs_ret = abs((df['close'] - df['open']) / df['open'])
volume = df['volume'].values
# S = |R| × e^(-β/V)
# β参数设置 (推荐范围0.5-2)
beta = 2
# 指数衰减计算 (对小成交量更敏感)
exp_factor = np.exp(-beta / (volume + 1e-9)) # 避免除零
S_values = abs_ret.values * exp_factor
# 初始化因子值数组
factor_values = np.full(len(df), np.nan)
# 使用滑动窗口计算
for i in range(window_size, len(df)):
# 获取当前窗口切片索引
start_idx = i - window_size
end_idx = i - 1
# 提取当前窗口数据
window_avg_price = avg_price_vals[start_idx:end_idx]
window_volume = volume[start_idx:end_idx]
window_S = S_values[start_idx:end_idx]
# 按S值降序排序的索引
sorted_idx = np.argsort(-window_S)
sorted_volume = window_volume[sorted_idx]
sorted_avg_price = window_avg_price[sorted_idx]
# 计算累积成交量
cum_volume = np.cumsum(sorted_volume)
total_volume = cum_volume[-1]
threshold = total_volume * threshold_value
# 找到聪明钱交易点 (前20%成交量)
smart_mask = cum_volume <= threshold
if np.any(smart_mask):
smart_vwap = np.sum(sorted_avg_price[smart_mask] *
sorted_volume[smart_mask]) / np.sum(sorted_volume[smart_mask])
else:
smart_vwap = np.nan
# 计算整体VWAP
all_vwap = np.sum(window_avg_price * window_volume) / np.sum(window_volume)
# 计算因子值
if not np.isnan(smart_vwap) and all_vwap != 0:
factor_values[i] = smart_vwap / all_vwap
return pd.Series(factor_values, index=df.index)
因子表现
📊 单币种 (single) 详细评估结果:
--------------------------------------------------
🔗 相关性分析:
IC (Pearson): 0.015751
Rank_IC (Spearman): 0.019500
📊 信息比率:
IR: -0.316391
针对尾部(尤其是近期)数据收益表现反常修改因子
1. 因子值大小的情况分析
因子值小/大主要出现在以下情况:
因子值小(聪明钱低位交易)时:
- 市场恐慌性抛售期间(聪明钱逆势吸筹)
- 重要支撑位附近的密集交易
- 突发利好消息前的潜伏期
- 主力资金建仓阶段(尤其是大单分散买入)
- 近期问题所在:尾部数据表现反常常出现在持续下跌趋势中,聪明钱"逢低吸筹"后价格继续下跌
因子值大(聪明钱高位交易)时:
- 市场狂热上涨阶段(聪明钱趁机派发)
- 重要阻力位附近的密集交易
- 利好兑现后的出货期
- 主力资金获利了结阶段
- 通常表现符合预期,高因子值预示后续下跌概率大
2. 尾部数据问题诊断
尾部数据(因子值小的部分)表现反常的主要原因:
- 极端市场中的误判:在持续下跌趋势中,所谓的"聪明钱"交易可能是被动接盘而非主动吸筹
- 量价关系扭曲:恐慌性抛售时,成交量激增导致 S t S_t St计算失真
- 时间滞后效应:10日回溯窗口包含过多过期信息,无法及时反映市场状态变化
- 未区分交易意图:未区分主动买入与被动卖出,将主力出货误判为吸筹
- 阈值僵化问题:固定的20%成交量阈值不适应不同市场环境
3. 改进方案:动态阈值与市场状态调整
基于上述分析,我提出以下综合改进方案:
方案1:动态成交量阈值(atr调整)(表现比较✅)
代码实现
def factor(df, window_size = 960, base_threshold=0.2, volatility_window=96):
"""
改进点:
1. 根据市场波动率动态调整成交量阈值
2. 高波动市场使用更严格的阈值(选取更少分钟作为聪明钱)
3. 引入自适应机制应对极端行情
"""
# 计算市场波动率(过去24小时ATR)
high, low, close = df['high'].values, df['low'].values, df['close'].values
atr = np.zeros(len(df))
for i in range(1, len(df)):
tr = max(high[i] - low[i],
abs(high[i] - close[i-1]),
abs(low[i] - close[i-1]))
atr[i] = (atr[i-1] * (volatility_window-1) + tr) / volatility_window
# 波动率归一化 (0-1范围)
norm_atr = (atr - np.min(atr)) / (np.max(atr) - np.min(atr) + 1e-7)
# 动态阈值:高波动时降低阈值
dynamic_threshold = base_threshold * (1 - norm_atr * 0.5) # 波动率最大时阈值减半
# 其余部分保持原始因子计算逻辑
# ... [此处省略与原始因子相同的计算部分] ...
# 预计算所有必要值
avg_price = (df['open'] + df['high'] + df['low'] + df['close']) / 4
avg_price_vals = avg_price.values
abs_ret = abs((df['close'] - df['open']) / df['open'])
volume = df['volume'].values
# 初始化因子值数组
factor_values = np.full(len(df), np.nan)
# 预计算S值 (|R|/sqrt(V))
S_values = abs_ret.values / np.sqrt(volume)
# 使用滑动窗口计算
for i in range(window_size, len(df)):
# 获取当前窗口切片索引
start_idx = i - window_size
end_idx = i - 1
# 提取当前窗口数据
window_avg_price = avg_price_vals[start_idx:end_idx]
window_volume = volume[start_idx:end_idx]
window_S = S_values[start_idx:end_idx]
# 按S值降序排序的索引
sorted_idx = np.argsort(-window_S)
sorted_volume = window_volume[sorted_idx]
sorted_avg_price = window_avg_price[sorted_idx]
# 计算累积成交量
cum_volume = np.cumsum(sorted_volume)
total_volume = cum_volume[-1]
# 在计算中使用dynamic_threshold[i]替代固定阈值
threshold = total_volume * dynamic_threshold[i]
smart_mask = cum_volume <= threshold
if np.any(smart_mask):
smart_vwap = np.sum(sorted_avg_price[smart_mask] *
sorted_volume[smart_mask]) / np.sum(sorted_volume[smart_mask])
else:
smart_vwap = np.nan
# 计算整体VWAP
all_vwap = np.sum(window_avg_price * window_volume) / np.sum(window_volume)
# 计算因子值
if not np.isnan(smart_vwap) and all_vwap != 0:
factor_values[i] = smart_vwap / all_vwap
return pd.Series(factor_values, index=df.index)
因子表现
📊 单币种 (single) 详细评估结果:
--------------------------------------------------
🔗 相关性分析:
IC (Pearson): 0.011585
Rank_IC (Spearman): 0.016998
📊 信息比率:
IR: 0.308638
评价
长期表现有进步(比较好看),但仍未改变短期总体形态。
方案2:量价确认复合因子(表现✅✅✅!!)
代码实现
def factor(df):
"""
改进点:
1. 结合成交量分布和价格位置双重确认
2. 增加交易意图分析(主动买入/卖出)
3. 使用动态时间窗口
"""
# 动态窗口:根据波动率调整回溯期
volatility = df['close'].pct_change().rolling(96).std()
# 处理NaN和inf问题 - 填充缺失值,避免除以零
volatility = volatility.fillna(volatility.mean()) # 填充NaN为均值
volatility = volatility.replace(0, 1e-6) # 避免除以零
volatility = np.clip(volatility, 1e-6, None) # 确保最小正值
# 计算窗口大小并确保为整数
window_size = (960 / (volatility * 100 + 1))
window_size = np.nan_to_num(window_size, nan=480) # 替换NaN为默认值
window_size = np.clip(window_size, 480, 1440).astype(int)
# 预计算关键指标
avg_price = (df['open'] + df['high'] + df['low'] + df['close']) / 4
ret = (df['close'] - df['open']) / df['open']
abs_ret = np.abs(ret)
buy_power = df['taker_buy_volume'] / df['volume']
# 核心指标:量价确认得分
# 结合价格变动方向、主动买入比例和交易规模
S = np.sign(ret) * abs_ret * buy_power * np.log1p(df['volume'])
# 初始化因子值数组
factor_values = np.full(len(df), np.nan)
# 找到有效的起始索引(确保有足够数据)
start_idx = max(np.max(window_size), 96) + 1
for i in range(start_idx, len(df)):
w_size = window_size[i]
start_idx = i - w_size
end_idx = i - 1
# 提取窗口数据
window_avg_price = avg_price.iloc[start_idx:end_idx].values
window_volume = df['volume'].iloc[start_idx:end_idx].values
window_S = S.iloc[start_idx:end_idx].values
# 按S值降序排序(聪明钱交易在前)
sorted_idx = np.argsort(-window_S)
sorted_volume = window_volume[sorted_idx]
sorted_avg_price = window_avg_price[sorted_idx]
# 动态阈值(基于波动率)
threshold_ratio = 0.3 - 0.15 * (volatility[i] / volatility.max()) # 波动大时降低阈值
threshold = np.sum(window_volume) * threshold_ratio
# 识别聪明钱交易
cum_volume = np.cumsum(sorted_volume)
smart_mask = cum_volume <= threshold
if np.any(smart_mask):
# 计算聪明钱VWAP(加权平均价)
smart_vwap = np.sum(sorted_avg_price[smart_mask] *
sorted_volume[smart_mask]) / np.sum(sorted_volume[smart_mask])
# 计算整体VWAP
all_vwap = np.sum(window_avg_price * window_volume) / np.sum(window_volume)
# 添加市场状态调整
if all_vwap > window_avg_price[-96:].mean(): # 近期上涨
factor_values[i] = smart_vwap / all_vwap
else: # 近期下跌
factor_values[i] = 2 - (smart_vwap / all_vwap) # 反转因子方向
return pd.Series(-factor_values, index=df.index)
因子表现
📊 单币种 (single) 详细评估结果:
--------------------------------------------------
🔗 相关性分析:
IC (Pearson): 0.026600
Rank_IC (Spearman): 0.021094
📊 信息比率:
IR: 0.461683
评价
短期数据终于好看了!!长期表现的总体趋势也是不错的~
(而且发现,未来收益周期
越长、整体表现越好看诶…
从5
一直增加到40
:
- IC, IR等数值一路往上走
- 收益率的图总体是变好看的,但为了能看到更明显的单调性,最终选择
32
- 到
40
似乎出现颓势
⇒ \Rightarrow ⇒可能说明:这个因子对长期的收益率变化更有指导意义(?
讨论:不同截止值的差异
- 在原始聪明钱因子的构造过程中,我们取成交量累积占比前20%的分钟视为聪明钱交易,选取20%作为截止值的原因是考虑到:在我国股票市场中,机构投资者交易占比较低。
- 但是,如果标的有更改(比如虚拟货币),这个截止值可能改变。