✅【研报复现】开源证券:聪明钱因子模型的2.0版本

0. 原始聪明钱因子 ( S t = ∣ R t ∣ V t S_t = \frac{|R_t|}{\sqrt{V_t}} St=Vt Rt)

  • 核心逻辑是,从分钟行情数据的价量信息中,尝试识别机构参与交易的多寡,最终构造出一个跟踪聪明钱的选股因子。
  • 大量的实证研究表明,聪明钱在交易过程中往往呈现“单笔订单数量更大、订单报价更为激进”的基本特征。
  • 基于这个考虑,我们构造了用于度量交易聪明度的指标S(计算步骤如下)
    1. 对选定股票,回溯取其过去10个交易日的分钟行情数据;
    2. 构造指标 S t = ∣ R t ∣ V t S_t = \frac{|R_t|}{\sqrt{V_t}} St=Vt Rt,其中 R t R_t Rt第 t 分钟涨跌幅 V t V_t Vt第 t 分钟成交量
      • 指标S的数值越大,则认为该分钟的交易中有越多聪明钱参与。
    3. 将分钟数据按照指标 S t S_t St 从大到小 进行排序,取成交量累积占比 前20% 的分钟,视为聪明钱交易;
    4. 计算聪明钱交易的成交量加权平均价 VWAP smart \text{VWAP}_{\text{smart}} VWAPsmart
    5. 计算所有交易的成交量加权平均价 VWAP all \text{VWAP}_{\text{all}} VWAPall
    6. 聪明钱因子 Q = VWAP smart VWAP all Q = \frac{\text{VWAP}_{\text{smart}}}{\text{VWAP}_{\text{all}}} Q=VWAPallVWAPsmart
      • VWAP smart \text{VWAP}_{\text{smart}} VWAPsmart 是聪明钱的成交量加权平均价
      • VWAP all \text{VWAP}_{\text{all}} VWAPall 是所有交易的成交量加权平均价
      • 因子Q实际上反映了在该时间段中聪明钱参与交易的相对价位。
      • 之所以将其称为聪明钱的情绪因子,是因为:
        • 因子Q的值越大,表明聪明钱的交易越倾向于出现在价格较高处,这是逢高出货的表现,反映了聪明钱的悲观态度;
        • 因子Q的值越小,则表明聪明钱的交易多出现在价格较低处,这是逢低吸筹的表现,是乐观的情绪。

代码实现

def factor(df, window_size = 960, threshold_value = 0.2):
    """
	原始聪明钱因子 (S = |R|/sqrt(V))
	计算步骤:
	1. 计算每根K线的平均价和涨跌幅绝对值
	2. 回溯过去10天(960根15分钟K线)的数据
	3. 计算每根K线的S值 = |涨跌幅| / sqrt(成交量)
	4. 按S值降序排序,取成交量累积占比前20%的K线
	5. 计算聪明钱VWAP和整体VWAP
	6. 因子值 = 聪明钱VWAP / 整体VWAP
    
	优化后的聪明钱因子计算 (S = |R|/sqrt(V))
    优化点:
    1. 预计算所有必要列(避免重复计算)
    2. 使用滚动窗口向量化操作
    3. 使用numpy替代pandas进行窗口内计算
    4. 避免在循环中创建临时Series
    """
    # 预计算所有必要值
    avg_price = (df['open'] + df['high'] + df['low'] + df['close']) / 4
    avg_price_vals = avg_price.values
    
    abs_ret = abs((df['close'] - df['open']) / df['open'])
    
    volume = df['volume'].values
        
    # 初始化因子值数组
    factor_values = np.full(len(df), np.nan)
    
    # 预计算S值 (|R|/sqrt(V))
    S_values = abs_ret.values / np.sqrt(volume)
    
    # 使用滑动窗口计算
    for i in range(window_size, len(df)):
        # 获取当前窗口切片索引
        start_idx = i - window_size
        end_idx = i - 1
        
        # 提取当前窗口数据
        window_avg_price = avg_price_vals[start_idx:end_idx]
        window_volume = volume[start_idx:end_idx]
        window_S = S_values[start_idx:end_idx]
        
        # 按S值降序排序的索引
        sorted_idx = np.argsort(-window_S)
        sorted_volume = window_volume[sorted_idx]
        sorted_avg_price = window_avg_price[sorted_idx]
        
        # 计算累积成交量
        cum_volume = np.cumsum(sorted_volume)
        total_volume = cum_volume[-1]
        threshold = total_volume * threshold_value
        
        # 找到聪明钱交易点 (前20%成交量)
        smart_mask = cum_volume <= threshold
        if np.any(smart_mask):
            smart_vwap = np.sum(sorted_avg_price[smart_mask] * 
                               sorted_volume[smart_mask]) / np.sum(sorted_volume[smart_mask])
        else:
            smart_vwap = np.nan
        
        # 计算整体VWAP
        all_vwap = np.sum(window_avg_price * window_volume) / np.sum(window_volume)
        
        # 计算因子值
        if not np.isnan(smart_vwap) and all_vwap != 0:
            factor_values[i] = smart_vwap / all_vwap
            
    return pd.Series(factor_values, index=df.index)

因子表现

📊 单币种 (single) 详细评估结果:
--------------------------------------------------
🔗 相关性分析:
   IC (Pearson): 0.015930
   Rank_IC (Spearman): 0.018679
📊 信息比率:
   IR: 0.132837

收益率
分布
散点图

聪明钱因子的改进方案

  • 一般化,将分钟成交量 V 的指数项定义为可变的参数
    • S = ∣ R ∣ V β S = \frac{|R|}{V^\beta} S=VβR
    • 其中,R为分钟涨跌幅,V为分钟成交量, β \beta β 为分钟成交量V的指数项参数。
  • 不同的S指标构造方式
    • 基于不同的逻辑,S指标的构造方式也会不同。这里我们尝试了3种不同的S指标构造方式:

1. S 1 = V S_1=V S1=V

  • 含义:分钟成交量
  • 单独考虑成交量因素,将分钟成交量较大的交易划分为聪明钱交易

代码实现

def factor(df):
    """
    优化后的 calculate_smart_money_S1
    使用成交量作为排序标准
    """
    # 预计算所有必要值
    avg_price = (df['open'] + df['high'] + df['low'] + df['close']) / 4
    volume = df['volume'].values
    avg_price_vals = avg_price.values
    
    # 初始化因子值数组
    factor_values = np.full(len(df), np.nan)
    window_size = 960
    
    for i in range(window_size, len(df)):
        # 获取当前窗口切片索引
        start_idx = i - window_size
        end_idx = i - 1
        
        # 提取当前窗口数据
        window_avg_price = avg_price_vals[start_idx:end_idx]
        window_volume = volume[start_idx:end_idx]
        
        # 直接使用成交量作为S值
        S = window_volume # 定义就在这里!!!
        
        # 按S值降序排序的索引
        sorted_idx = np.argsort(-S)
        sorted_volume = window_volume[sorted_idx]
        sorted_avg_price = window_avg_price[sorted_idx]
        
        # 计算累积成交量
        cum_volume = np.cumsum(sorted_volume)
        total_volume = cum_volume[-1]
        threshold = total_volume * 0.2
        
        # 找到聪明钱交易点 (前20%成交量)
        smart_mask = cum_volume <= threshold
        if np.any(smart_mask):
            smart_vwap = np.sum(sorted_avg_price[smart_mask] * 
                               sorted_volume[smart_mask]) / np.sum(sorted_volume[smart_mask])
        else:
            smart_vwap = np.nan
        
        # 计算整体VWAP
        all_vwap = np.sum(window_avg_price * window_volume) / np.sum(window_volume)
        
        # 计算因子值
        if not np.isnan(smart_vwap) and all_vwap != 0:
            factor_values[i] = smart_vwap / all_vwap
            
    return pd.Series(factor_values, index=df.index)

因子表现

📊 单币种 (single) 详细评估结果:
--------------------------------------------------
🔗 相关性分析:
   IC (Pearson): 0.013049
   Rank_IC (Spearman): 0.017434
📊 信息比率:
   IR: 0.012048

收益率
分布
散点图

2. S 2 = r a n k ( ∣ R ∣ ) + r a n k ( ∣ V ∣ ) S_2=rank(|R|)+rank(|V|) S2=rank(R)+rank(V) (表现稍微还算✅)

  • 含义:分钟涨跌幅绝对值分位排名 与 分钟成交量分位排名 之和
  • 综合考虑分钟交易的成交量和涨跌幅绝对值排名,将排名之和靠前的交易划分为聪明钱交易

代码实现

def factor(df):
    """
    优化后的 calculate_smart_money_S2
    使用涨跌幅和成交量的排名之和作为排序标准
    """
    # 预计算所有必要值
    avg_price = (df['open'] + df['high'] + df['low'] + df['close']) / 4
    abs_ret = abs((df['close'] - df['open']) / df['open']).values
    volume = df['volume'].values
    avg_price_vals = avg_price.values
    
    # 初始化因子值数组
    factor_values = np.full(len(df), np.nan)
    window_size = 960
    
    for i in range(window_size, len(df)):
        # 获取当前窗口切片索引
        start_idx = i - window_size
        end_idx = i - 1
        
        # 提取当前窗口数据
        window_avg_price = avg_price_vals[start_idx:end_idx]
        window_abs_ret = abs_ret[start_idx:end_idx]
        window_volume = volume[start_idx:end_idx]
        
        # 计算排名 (使用numpy的argsort替代pandas rank)
        rank_ret = np.argsort(np.argsort(window_abs_ret)) + 1
        rank_volume = np.argsort(np.argsort(window_volume)) + 1
        S = rank_ret + rank_volume
        
        # 按S值降序排序的索引
        sorted_idx = np.argsort(-S)
        sorted_volume = window_volume[sorted_idx]
        sorted_avg_price = window_avg_price[sorted_idx]
        
        # 计算累积成交量
        cum_volume = np.cumsum(sorted_volume)
        total_volume = cum_volume[-1]
        threshold = total_volume * 0.2
        
        # 找到聪明钱交易点 (前20%成交量)
        smart_mask = cum_volume <= threshold
        if np.any(smart_mask):
            smart_vwap = np.sum(sorted_avg_price[smart_mask] * 
                               sorted_volume[smart_mask]) / np.sum(sorted_volume[smart_mask])
        else:
            smart_vwap = np.nan
        
        # 计算整体VWAP
        all_vwap = np.sum(window_avg_price * window_volume) / np.sum(window_volume)
        
        # 计算因子值
        if not np.isnan(smart_vwap) and all_vwap != 0:
            factor_values[i] = smart_vwap / all_vwap
            
    return pd.Series(factor_values, index=df.index)

因子表现

📊 单币种 (single) 详细评估结果:
--------------------------------------------------
🔗 相关性分析:
   IC (Pearson): 0.013273
   Rank_IC (Spearman): 0.018782
📊 信息比率:
   IR: -0.218254

收益率
分布
散点图

3. S 3 = ∣ R ∣ l n ( V ) S_3=\frac{|R|}{ln(V)} S3=ln(V)R

  • 含义:分钟涨跌幅绝对值 除以 分钟成交量对数值
  • 是基于原始S指标的变形,尝试对分钟成交量作对数变换构造聪明钱因子

代码实现

def factor(df):
    """
    优化后的 calculate_smart_money_S3
    使用|R|/ln(V)作为排序标准
    """
    # 预计算所有必要值
    avg_price = (df['open'] + df['high'] + df['low'] + df['close']) / 4
    abs_ret = abs((df['close'] - df['open']) / df['open']).values
    volume = df['volume'].values
    avg_price_vals = avg_price.values
    
    # 初始化因子值数组
    factor_values = np.full(len(df), np.nan)
    window_size = 960
    
    # 添加小量避免log(0)
    log_volume = np.log(volume + 1e-5)
    
    for i in range(window_size, len(df)):
        # 获取当前窗口切片索引
        start_idx = i - window_size
        end_idx = i - 1
        
        # 提取当前窗口数据
        window_avg_price = avg_price_vals[start_idx:end_idx]
        window_abs_ret = abs_ret[start_idx:end_idx]
        window_log_volume = log_volume[start_idx:end_idx]
        window_volume = volume[start_idx:end_idx]
        
        # 计算S值
        S = window_abs_ret / window_log_volume
        
        # 按S值降序排序的索引
        sorted_idx = np.argsort(-S)
        sorted_volume = window_volume[sorted_idx]
        sorted_avg_price = window_avg_price[sorted_idx]
        
        # 计算累积成交量
        cum_volume = np.cumsum(sorted_volume)
        total_volume = cum_volume[-1]
        threshold = total_volume * 0.2
        
        # 找到聪明钱交易点 (前20%成交量)
        smart_mask = cum_volume <= threshold
        if np.any(smart_mask):
            smart_vwap = np.sum(sorted_avg_price[smart_mask] * 
                               sorted_volume[smart_mask]) / np.sum(sorted_volume[smart_mask])
        else:
            smart_vwap = np.nan
        
        # 计算整体VWAP
        all_vwap = np.sum(window_avg_price * window_volume) / np.sum(window_volume)
        
        # 计算因子值
        if not np.isnan(smart_vwap) and all_vwap != 0:
            factor_values[i] = smart_vwap / all_vwap
            
    return pd.Series(factor_values, index=df.index)

因子表现

📊 单币种 (single) 详细评估结果:
--------------------------------------------------
🔗 相关性分析:
   IC (Pearson): 0.015834
   Rank_IC (Spearman): 0.018512
📊 信息比率:
   IR: 0.036820

收益率
分布
散点图

4. S = ∣ R ∣ × t a k e r _ b u y _ v o l u m e v o l u m e S = |R| \times \frac{taker\_buy\_volume}{volume} S=R×volumetaker_buy_volume

代码实现

def factor(df, window_size = 960, threshold_value = 0.2):
    """
    S = |R| × (taker_buy_volume / volume)  (主动买入比例)
    """
    # 预计算所有必要值
    avg_price = (df['open'] + df['high'] + df['low'] + df['close']) / 4
    avg_price_vals = avg_price.values
    
    abs_ret = abs((df['close'] - df['open']) / df['open'])
    volume = df['volume'].values
    
    # S = |R| × (taker_buy_volume / volume)
    # 计算主动买入比例
    buy_ratio = df['taker_buy_volume'] / (df['volume'] + 1e-9)
    S_values = abs_ret.values * buy_ratio.values
    
    # 初始化因子值数组
    factor_values = np.full(len(df), np.nan)
    
    # 使用滑动窗口计算
    for i in range(window_size, len(df)):
        # 获取当前窗口切片索引
        start_idx = i - window_size
        end_idx = i - 1
        
        # 提取当前窗口数据
        window_avg_price = avg_price_vals[start_idx:end_idx]
        window_volume = volume[start_idx:end_idx]
        window_S = S_values[start_idx:end_idx]
        
        # 按S值降序排序的索引
        sorted_idx = np.argsort(-window_S)
        sorted_volume = window_volume[sorted_idx]
        sorted_avg_price = window_avg_price[sorted_idx]
        
        # 计算累积成交量
        cum_volume = np.cumsum(sorted_volume)
        total_volume = cum_volume[-1]
        threshold = total_volume * threshold_value
        
        # 找到聪明钱交易点 (前20%成交量)
        smart_mask = cum_volume <= threshold
        if np.any(smart_mask):
            smart_vwap = np.sum(sorted_avg_price[smart_mask] * 
                               sorted_volume[smart_mask]) / np.sum(sorted_volume[smart_mask])
        else:
            smart_vwap = np.nan
        
        # 计算整体VWAP
        all_vwap = np.sum(window_avg_price * window_volume) / np.sum(window_volume)
        
        # 计算因子值
        if not np.isnan(smart_vwap) and all_vwap != 0:
            factor_values[i] = smart_vwap / all_vwap
            
    return pd.Series(factor_values, index=df.index)

因子表现

📊 单币种 (single) 详细评估结果:
--------------------------------------------------
🔗 相关性分析:
   IC (Pearson): 0.014563
   Rank_IC (Spearman): 0.020014
📊 信息比率:
   IR: -0.271804

收益率
分布
散点图

5. S = ∣ R ∣ t r a d e _ c o u n t S = \frac{|R|}{\sqrt{trade\_count}} S=trade_count R (表现稍微还算✅)

代码实现

def factor(df, window_size = 960, threshold_value = 0.2):
    """
    S = |R| / √(trade_count)  (交易笔数调整)
    """
    # 预计算所有必要值
    avg_price = (df['open'] + df['high'] + df['low'] + df['close']) / 4
    avg_price_vals = avg_price.values
    
    abs_ret = abs((df['close'] - df['open']) / df['open'])
    volume = df['volume'].values

    sqrt_trade = np.sqrt(df['trade_count'])
    S_values = abs_ret.values / (sqrt_trade.values + 1e-9)  # 避免除零
    
    # 初始化因子值数组
    factor_values = np.full(len(df), np.nan)
    
    # 使用滑动窗口计算
    for i in range(window_size, len(df)):
        # 获取当前窗口切片索引
        start_idx = i - window_size
        end_idx = i - 1
        
        # 提取当前窗口数据
        window_avg_price = avg_price_vals[start_idx:end_idx]
        window_volume = volume[start_idx:end_idx]
        window_S = S_values[start_idx:end_idx]
        
        # 按S值降序排序的索引
        sorted_idx = np.argsort(-window_S)
        sorted_volume = window_volume[sorted_idx]
        sorted_avg_price = window_avg_price[sorted_idx]
        
        # 计算累积成交量
        cum_volume = np.cumsum(sorted_volume)
        total_volume = cum_volume[-1]
        threshold = total_volume * threshold_value
        
        # 找到聪明钱交易点 (前20%成交量)
        smart_mask = cum_volume <= threshold
        if np.any(smart_mask):
            smart_vwap = np.sum(sorted_avg_price[smart_mask] * 
                               sorted_volume[smart_mask]) / np.sum(sorted_volume[smart_mask])
        else:
            smart_vwap = np.nan
        
        # 计算整体VWAP
        all_vwap = np.sum(window_avg_price * window_volume) / np.sum(window_volume)
        
        # 计算因子值
        if not np.isnan(smart_vwap) and all_vwap != 0:
            factor_values[i] = smart_vwap / all_vwap
            
    return pd.Series(factor_values, index=df.index)

因子表现

📊 单币种 (single) 详细评估结果:
--------------------------------------------------
🔗 相关性分析:
   IC (Pearson): 0.016723
   Rank_IC (Spearman): 0.019661
📊 信息比率:
   IR: 0.377188

收益率
分布
散点图

6. S = ∣ R ∣ × e − β V S = |R| \times e^{- \frac{\beta}{V}} S=R×eVβ

代码实现

def factor(df, window_size = 960, threshold_value = 0.2):
    """
    S = |R| × e^(-β/V)   (指数衰减函数)
    """
    # 预计算所有必要值
    avg_price = (df['open'] + df['high'] + df['low'] + df['close']) / 4
    avg_price_vals = avg_price.values
    
    abs_ret = abs((df['close'] - df['open']) / df['open'])
    volume = df['volume'].values
    
    # S = |R| × e^(-β/V)
    # β参数设置 (推荐范围0.5-2)
    beta = 2
    # 指数衰减计算 (对小成交量更敏感)
    exp_factor = np.exp(-beta / (volume + 1e-9))  # 避免除零
    S_values = abs_ret.values * exp_factor
    
    # 初始化因子值数组
    factor_values = np.full(len(df), np.nan)
    
    # 使用滑动窗口计算
    for i in range(window_size, len(df)):
        # 获取当前窗口切片索引
        start_idx = i - window_size
        end_idx = i - 1
        
        # 提取当前窗口数据
        window_avg_price = avg_price_vals[start_idx:end_idx]
        window_volume = volume[start_idx:end_idx]
        window_S = S_values[start_idx:end_idx]
        
        # 按S值降序排序的索引
        sorted_idx = np.argsort(-window_S)
        sorted_volume = window_volume[sorted_idx]
        sorted_avg_price = window_avg_price[sorted_idx]
        
        # 计算累积成交量
        cum_volume = np.cumsum(sorted_volume)
        total_volume = cum_volume[-1]
        threshold = total_volume * threshold_value
        
        # 找到聪明钱交易点 (前20%成交量)
        smart_mask = cum_volume <= threshold
        if np.any(smart_mask):
            smart_vwap = np.sum(sorted_avg_price[smart_mask] * 
                               sorted_volume[smart_mask]) / np.sum(sorted_volume[smart_mask])
        else:
            smart_vwap = np.nan
        
        # 计算整体VWAP
        all_vwap = np.sum(window_avg_price * window_volume) / np.sum(window_volume)
        
        # 计算因子值
        if not np.isnan(smart_vwap) and all_vwap != 0:
            factor_values[i] = smart_vwap / all_vwap
            
    return pd.Series(factor_values, index=df.index)

因子表现

📊 单币种 (single) 详细评估结果:
--------------------------------------------------
🔗 相关性分析:
   IC (Pearson): 0.015751
   Rank_IC (Spearman): 0.019500
📊 信息比率:
   IR: -0.316391

收益率
分布
散点图

针对尾部(尤其是近期)数据收益表现反常修改因子

1. 因子值大小的情况分析

因子值小/大主要出现在以下情况:

因子值小(聪明钱低位交易)时:

  • 市场恐慌性抛售期间(聪明钱逆势吸筹)
  • 重要支撑位附近的密集交易
  • 突发利好消息前的潜伏期
  • 主力资金建仓阶段(尤其是大单分散买入)
  • 近期问题所在:尾部数据表现反常常出现在持续下跌趋势中,聪明钱"逢低吸筹"后价格继续下跌

因子值大(聪明钱高位交易)时:

  • 市场狂热上涨阶段(聪明钱趁机派发)
  • 重要阻力位附近的密集交易
  • 利好兑现后的出货期
  • 主力资金获利了结阶段
  • 通常表现符合预期,高因子值预示后续下跌概率大

2. 尾部数据问题诊断

尾部数据(因子值小的部分)表现反常的主要原因:

  1. 极端市场中的误判:在持续下跌趋势中,所谓的"聪明钱"交易可能是被动接盘而非主动吸筹
  2. 量价关系扭曲:恐慌性抛售时,成交量激增导致 S t S_t St计算失真
  3. 时间滞后效应:10日回溯窗口包含过多过期信息,无法及时反映市场状态变化
  4. 未区分交易意图:未区分主动买入与被动卖出,将主力出货误判为吸筹
  5. 阈值僵化问题:固定的20%成交量阈值不适应不同市场环境

3. 改进方案:动态阈值与市场状态调整

基于上述分析,我提出以下综合改进方案:

方案1:动态成交量阈值(atr调整)(表现比较✅)

代码实现
def factor(df, window_size = 960, base_threshold=0.2, volatility_window=96):
    """
    改进点:
    1. 根据市场波动率动态调整成交量阈值
    2. 高波动市场使用更严格的阈值(选取更少分钟作为聪明钱)
    3. 引入自适应机制应对极端行情
    """
    # 计算市场波动率(过去24小时ATR)
    high, low, close = df['high'].values, df['low'].values, df['close'].values
    atr = np.zeros(len(df))
    for i in range(1, len(df)):
        tr = max(high[i] - low[i], 
                 abs(high[i] - close[i-1]), 
                 abs(low[i] - close[i-1]))
        atr[i] = (atr[i-1] * (volatility_window-1) + tr) / volatility_window
    
    # 波动率归一化 (0-1范围)
    norm_atr = (atr - np.min(atr)) / (np.max(atr) - np.min(atr) + 1e-7)
    
    # 动态阈值:高波动时降低阈值
    dynamic_threshold = base_threshold * (1 - norm_atr * 0.5)  # 波动率最大时阈值减半
    
    # 其余部分保持原始因子计算逻辑
    # ... [此处省略与原始因子相同的计算部分] ...
    # 预计算所有必要值
    avg_price = (df['open'] + df['high'] + df['low'] + df['close']) / 4
    avg_price_vals = avg_price.values
    
    abs_ret = abs((df['close'] - df['open']) / df['open'])
    
    volume = df['volume'].values
        
    # 初始化因子值数组
    factor_values = np.full(len(df), np.nan)
    
    # 预计算S值 (|R|/sqrt(V))
    S_values = abs_ret.values / np.sqrt(volume)
    
    # 使用滑动窗口计算
    for i in range(window_size, len(df)):
        # 获取当前窗口切片索引
        start_idx = i - window_size
        end_idx = i - 1
        
        # 提取当前窗口数据
        window_avg_price = avg_price_vals[start_idx:end_idx]
        window_volume = volume[start_idx:end_idx]
        window_S = S_values[start_idx:end_idx]
        
        # 按S值降序排序的索引
        sorted_idx = np.argsort(-window_S)
        sorted_volume = window_volume[sorted_idx]
        sorted_avg_price = window_avg_price[sorted_idx]
        
        # 计算累积成交量
        cum_volume = np.cumsum(sorted_volume)
        total_volume = cum_volume[-1]
        
        # 在计算中使用dynamic_threshold[i]替代固定阈值
        threshold = total_volume * dynamic_threshold[i]
        
        
        smart_mask = cum_volume <= threshold
        if np.any(smart_mask):
            smart_vwap = np.sum(sorted_avg_price[smart_mask] * 
                               sorted_volume[smart_mask]) / np.sum(sorted_volume[smart_mask])
        else:
            smart_vwap = np.nan
        
        # 计算整体VWAP
        all_vwap = np.sum(window_avg_price * window_volume) / np.sum(window_volume)
        
        # 计算因子值
        if not np.isnan(smart_vwap) and all_vwap != 0:
            factor_values[i] = smart_vwap / all_vwap
            
    return pd.Series(factor_values, index=df.index)
因子表现
📊 单币种 (single) 详细评估结果:
--------------------------------------------------
🔗 相关性分析:
   IC (Pearson): 0.011585
   Rank_IC (Spearman): 0.016998
📊 信息比率:
   IR: 0.308638

收益率
分布
散点图

评价

长期表现有进步(比较好看),但仍未改变短期总体形态。

方案2:量价确认复合因子(表现✅✅✅!!)

代码实现
def factor(df):
    """
    改进点:
    1. 结合成交量分布和价格位置双重确认
    2. 增加交易意图分析(主动买入/卖出)
    3. 使用动态时间窗口
    """
    # 动态窗口:根据波动率调整回溯期
    volatility = df['close'].pct_change().rolling(96).std()
    
    # 处理NaN和inf问题 - 填充缺失值,避免除以零
    volatility = volatility.fillna(volatility.mean())  # 填充NaN为均值
    volatility = volatility.replace(0, 1e-6)           # 避免除以零
    volatility = np.clip(volatility, 1e-6, None)        # 确保最小正值
    
    # 计算窗口大小并确保为整数
    window_size = (960 / (volatility * 100 + 1))
    window_size = np.nan_to_num(window_size, nan=480)   # 替换NaN为默认值
    window_size = np.clip(window_size, 480, 1440).astype(int)
    
    # 预计算关键指标
    avg_price = (df['open'] + df['high'] + df['low'] + df['close']) / 4
    ret = (df['close'] - df['open']) / df['open']
    abs_ret = np.abs(ret)
    buy_power = df['taker_buy_volume'] / df['volume']
    
    # 核心指标:量价确认得分
    # 结合价格变动方向、主动买入比例和交易规模
    S = np.sign(ret) * abs_ret * buy_power * np.log1p(df['volume'])
    
    # 初始化因子值数组
    factor_values = np.full(len(df), np.nan)
    
    # 找到有效的起始索引(确保有足够数据)
    start_idx = max(np.max(window_size), 96) + 1
    
    for i in range(start_idx, len(df)):
        w_size = window_size[i]
        start_idx = i - w_size
        end_idx = i - 1
        
        # 提取窗口数据
        window_avg_price = avg_price.iloc[start_idx:end_idx].values
        window_volume = df['volume'].iloc[start_idx:end_idx].values
        window_S = S.iloc[start_idx:end_idx].values
        
        # 按S值降序排序(聪明钱交易在前)
        sorted_idx = np.argsort(-window_S)
        sorted_volume = window_volume[sorted_idx]
        sorted_avg_price = window_avg_price[sorted_idx]
        
        # 动态阈值(基于波动率)
        threshold_ratio = 0.3 - 0.15 * (volatility[i] / volatility.max())  # 波动大时降低阈值
        threshold = np.sum(window_volume) * threshold_ratio
        
        # 识别聪明钱交易
        cum_volume = np.cumsum(sorted_volume)
        smart_mask = cum_volume <= threshold
        
        if np.any(smart_mask):
            # 计算聪明钱VWAP(加权平均价)
            smart_vwap = np.sum(sorted_avg_price[smart_mask] * 
                              sorted_volume[smart_mask]) / np.sum(sorted_volume[smart_mask])
            
            # 计算整体VWAP
            all_vwap = np.sum(window_avg_price * window_volume) / np.sum(window_volume)
            
            # 添加市场状态调整
            if all_vwap > window_avg_price[-96:].mean():  # 近期上涨
                factor_values[i] = smart_vwap / all_vwap
            else:  # 近期下跌
                factor_values[i] = 2 - (smart_vwap / all_vwap)  # 反转因子方向
    
    return pd.Series(-factor_values, index=df.index)
因子表现
📊 单币种 (single) 详细评估结果:
--------------------------------------------------
🔗 相关性分析:
   IC (Pearson): 0.026600
   Rank_IC (Spearman): 0.021094
📊 信息比率:
   IR: 0.461683

收益率
分布
散点图

评价

短期数据终于好看了!!长期表现的总体趋势也是不错的~

(而且发现,未来收益周期越长、整体表现越好看诶…
5一直增加到40

  • IC, IR等数值一路往上走
  • 收益率的图总体是变好看的,但为了能看到更明显的单调性,最终选择32
  • 40似乎出现颓势

⇒ \Rightarrow 可能说明:这个因子对长期的收益率变化更有指导意义(?

讨论:不同截止值的差异

  • 在原始聪明钱因子的构造过程中,我们取成交量累积占比前20%的分钟视为聪明钱交易,选取20%作为截止值的原因是考虑到:在我国股票市场中,机构投资者交易占比较低。
  • 但是,如果标的有更改(比如虚拟货币),这个截止值可能改变。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值