代码训练LeetCode(42)串联所有单词的子串-CSDN博客

代码训练(41)串联所有单词的子串

Author: Once Day Date: 2025年6月28日

漫漫长路，才刚刚开始…

参考文章:

文章目录

- - 代码训练(41)串联所有单词的子串

1. 原题

给定一个字符串 s 和一个字符串数组 words。 words 中所有字符串 长度相同。

s 中的 串联子串 是指一个包含 words 中所有字符串以任意顺序排列连接起来的子串。

例如，如果 words = ["ab","cd","ef"]，那么 "abcdef"， "abefcd"，"cdabef"， "cdefab"，"efabcd"，和 "efcdab" 都是串联子串。 "acdbef" 不是串联子串，因为他不是任何 words 排列的连接。

返回所有串联子串在 s 中的开始索引。你可以以 任意顺序 返回答案。

提示：

1 <= s.length <= 104
1 <= words.length <= 5000
1 <= words[i].length <= 30
words[i] 和 s 由小写英文字母组成

示例 1:

输入：s = "barfoothefoobarman", words = ["foo","bar"]
输出：[0,9]
解释：因为 words.length == 2 同时 words[i].length == 3，连接的子字符串的长度必须为 6。
子串 "barfoo" 开始位置是 0。它是 words 中以 ["bar","foo"] 顺序排列的连接。
子串 "foobar" 开始位置是 9。它是 words 中以 ["foo","bar"] 顺序排列的连接。
输出顺序无关紧要。返回 [9,0] 也是可以的。

示例 2:

输入：s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]
输出：[]
解释：因为 words.length == 4 并且 words[i].length == 4，所以串联子串的长度必须为 16。
s 中没有子串长度为 16 并且等于 words 的任何顺序排列的连接。
所以我们返回一个空数组。

示例 3:

输入：s = "barfoofoobarthefoobarman", words = ["bar","foo","the"]
输出：[6,9,12]
解释：因为 words.length == 3 并且 words[i].length == 3，所以串联子串的长度必须为 9。
子串 "foobarthe" 开始位置是 6。它是 words 中以 ["foo","bar","the"] 顺序排列的连接。
子串 "barthefoo" 开始位置是 9。它是 words 中以 ["bar","the","foo"] 顺序排列的连接。
子串 "thefoobar" 开始位置是 12。它是 words 中以 ["the","foo","bar"] 顺序排列的连接。

2. 分析

题目要求我们找出所有可能的“串联子串”的起始索引。这些串联子串是由给定数组 words 中的所有字符串以任意顺序排列连接形成的。所有 words 中的字符串长度都是相同的。

核心思路是使用哈希表（字典）和滑动窗口技术。

首先，我们需要一个哈希表来存储 words 数组中每个单词的出现次数。
其次，考虑到所有单词的长度相同，我们可以使用固定长度的滑动窗口来遍历字符串 s。这个窗口的大小就是所有单词长度总和。
然后，在每个可能的窗口中，我们再次使用一个哈希表来记录窗口内单词的出现次数，同时与 words 的哈希表进行比较。
如果两个哈希表完全匹配，那么就记录当前窗口的起始索引。

分析步骤：

构建哈希表：对 words 数组中的每个单词计数。
滑动窗口：窗口大小为所有单词长度的总和，窗口每次移动一个单词的长度。
检查窗口：在每个窗口内部，将窗口分割成单词长度的多个部分，对这些部分进行计数并与 words 的哈希表比较。

举例分析：以示例 1 为例：

s = "barfoothefoobarman"
words = ["foo","bar"]

words 的单词长度为 3，总长度为 6。我们需要检查 s 中每个长度为 6 的子串，看它是否可以由 words 中的单词重新排列得到。

第一个窗口 “barfoo”（从索引 0 开始），可以重新排列为 ["bar", "foo"]，符合条件。
第二个窗口 “arfoot”（从索引 1 开始），不符合条件。
…
第十个窗口 “foobar”（从索引 9 开始），可以重新排列为 ["foo", "bar"]，符合条件。

性能优化关键点：

使用有效的哈希函数减少冲突。
避免不必要的字符串比较，通过计数匹配来优化。
使用动态数组来存储结果，避免使用过大的静态数组。

3. 代码实现

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int* findSubstring(char* s, char** words, int wordsSize, int* returnSize) {
    int wordLen = strlen(words[0]);
    int windowSize = wordsSize * wordLen;
    int sLen = strlen(s);
    int *result = malloc(sLen * sizeof(int));
    *returnSize = 0;
    
    if (sLen < windowSize) return result;
    
    int *wordCount = calloc(10000, sizeof(int));
    for (int i = 0; i < wordsSize; i++) {
        int hash = 0;
        for (int j = 0; j < wordLen; j++) {
            hash = (hash * 31 + words[i][j]) % 10000;
        }
        wordCount[hash]++;
    }
    
    for (int i = 0; i <= sLen - windowSize; i++) {
        int *tempCount = calloc(10000, sizeof(int));
        int valid = 1;
        for (int j = 0; j < wordsSize; j++) {
            int hash = 0;
            for (int k = 0; k < wordLen; k++) {
                hash = (hash * 31 + s[i + j * wordLen + k]) % 10000;
            }
            if (++tempCount[hash] > wordCount[hash]) {
                valid = 0;
                break;
            }
        }
        if (valid) result[(*returnSize)++] = i;
        free(tempCount);
    }
    
    free(wordCount);
    return result;
}

int main() {
    char* s = "barfoothefoobarman";
    char* words[] = {"foo", "bar"};
    int returnSize;
    int* indices = findSubstring(s, words, 2, &returnSize);
    for (int i = 0; i < returnSize; i++) {
        printf("%d ", indices[i]);
    }
    free(indices);
    return 0;
}