2025-03-09T19:46:32.300+08:00 ERROR 4464 — [ CrawlerTask-1] c.x.backend.service.PriceCrawlerService : [source1] 网络请求异常: HTTP error fetching URL. Status=404, URL=[https://example.com/price?id=P123]package com.xymzsfxy.backend.service; import com.xymzsfxy.backend.entity.PriceHistory; import com.xymzsfxy.backend.entity.Product; import com.xymzsfxy.backend.repository.PriceHistoryRepository; import com.xymzsfxy.backend.repository.ProductRepository; import org.jsoup.Jsoup; import org.junit.jupiter.api.Test; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.boot.test.context.SpringBootTest; import org.springframework.boot.test.context.TestConfiguration; import org.springframework.boot.test.mock.mockito.MockBean; import org.springframework.boot.test.mock.mockito.SpyBean; import org.springframework.context.annotation.Bean; import org.springframework.scheduling.annotation.EnableAsync; import org.springframework.test.context.TestPropertySource; import java.math.BigDecimal; import java.util.Optional; import java.util.concurrent.TimeUnit; import java.io.IOException; import org.jsoup.nodes.Document; import static org.awaitility.Awaitility.await; import static org.mockito.ArgumentMatchers.any; import static org.mockito.Mockito.*; // 测试类注解 @SpringBootTest @EnableAsync @TestPropertySource(properties = { “crawler.sources.source1=https://example.com/price?id=%s”, “crawler.selectors.source1=div.price” }) class PriceCrawlerServiceTest { @Autowired private PriceCrawlerService priceCrawlerService; @MockBean private ProductRepository productRepository; @MockBean private PriceHistoryRepository priceHistoryRepository; @SpyBean private PriceCrawlerService spyPriceCrawlerService; @TestConfiguration static class TestConfig { @Bean public JsoupWrapper jsoupWrapper() { return new JsoupWrapper(); } } @Test void shouldSavePriceWhenCrawlSuccess() throws Exception { // 模拟测试数据 Product mockProduct = new Product(); mockProduct.setId(1L); mockProduct.setExternalId("P123"); // 配置Mock行为 when(productRepository.findById(1L)).thenReturn(Optional.of(mockProduct)); doReturn(new BigDecimal("299.00")).when(spyPriceCrawlerService).extractPrice(any(), anyString()); // 执行测试方法 priceCrawlerService.crawlPrices(1L); // 验证异步执行结果 await().atMost(5, TimeUnit.SECONDS).untilAsserted(() -> { verify(priceHistoryRepository, times(1)).save(any(PriceHistory.class)); verify(productRepository, times(1)).save(argThat(p -> p.getLatestPrice().equals(new BigDecimal("299.00")) )); }); } @Test void shouldHandleMissingProduct() { when(productRepository.findById(999L)).thenReturn(Optional.empty()); // 直接调用方法，若抛出异常测试会自动失败 priceCrawlerService.crawlPrices(999L); verify(priceHistoryRepository, never()).save(any()); } } // 辅助类用于模拟网页请求 class JsoupWrapper { Document fetchDocument(String url) throws IOException { // 模拟返回包含价格的HTML文档 String html = “<html><div class=‘price’>¥299.00</div></html>”; return Jsoup.parse(html); } }如何用正确的网址验证

时间: 2025-06-19 09:18:44 浏览: 9

<think>嗯，用户的问题是关于如何验证网址的正确性，特别是针对他们提供的Java测试代码中出现的404错误。首先，我需要理解他们的代码上下文。他们有一个PriceCrawlerServiceTest测试类，里面用到了MockBean和SpyBean来模拟依赖，比如ProductRepository和PriceHistoryRepository。测试方法里模拟了产品数据和价格抓取，但出现了HTTP 404错误，说明请求的URL有问题。用户想知道如何用正确的网址验证，可能是在测试过程中确保他们构造的URL是正确的，避免404错误。我需要回顾他们的代码，看看URL是如何生成的。在@TestPropertySource中，他们设置了crawler.sources.source1的值为https://example.com/price?id=%s，然后在测试中用externalId P123替换占位符，生成最终的URL：https://example.com/price?id=P123。首先，检查URL构造是否正确。占位符%s应该被正确替换，但用户可能没有正确编码参数，或者目标服务器不接受这个格式。比如，参数是否需要使用URL编码？或者服务器是否期望不同的参数名，比如product_id而不是id？然后，考虑模拟的JsoupWrapper是否正确处理了URL。在测试中，他们可能直接返回了一个静态的HTML，没有实际发送请求，导致无法发现真实的404错误。这时候需要验证实际请求的URL是否有效，可能需要使用真实的HTTP客户端或者确保模拟对象正确模拟了网络请求。接下来，用户可能需要如何修改测试代码，以便在测试阶段就能发现URL构造的问题。比如，添加日志输出生成的URL，或者在测试中验证URL的正确性。此外，可能需要使用Mockito来验证是否正确调用了fetchDocument方法，并传入了预期的URL。另外，考虑网络请求的异常处理。在PriceCrawlerService中是否捕获了HTTP异常，并做了适当的处理，比如重试机制或记录错误日志。测试中是否覆盖了这些异常情况，确保系统在遇到404时能够优雅处理，而不是崩溃。还要检查测试配置中的属性是否正确，比如crawler.sources.source1的URL模板是否正确，是否有拼写错误。例如，用户代码中使用了“crawler.sources.source1=https://example.com/price?id=%s”，但实际服务器可能需要不同的路径或参数。最后，可能需要建议用户在实际运行爬虫之前，先手动访问构造的URL，确认是否能正常获取数据。如果手动访问也返回404，说明URL构造有误或者服务器端的问题，需要调整URL模板或与服务器提供方确认正确的端点。总结下来，用户需要确保生成的URL正确，测试中能够验证URL的构造，并且处理网络请求中的异常情况。可能需要具体的代码修改建议，比如添加日志、增强测试用例、使用URL编码等，来帮助用户解决404错误。</think>以下是针对您提供的Java测试代码中出现的 **HTTP 404错误** 的网址验证分析及修复步骤： --- ### 一、问题定位日志错误信息： ```bash 2025-03-09T19:46:32.300+08:00 ERROR 4464 — [ CrawlerTask-1] c.x.backend.service.PriceCrawlerService : [source1] 网络请求异常: HTTP error fetching URL. Status=404, URL=[https://example.com/price?id=P123] ``` **核心问题**：构造的URL `https://example.com/price?id=P123` 在目标服务器上不存在（返回404状态码）。 --- ### 二、验证网址正确性的关键步骤 #### 1. **检查URL构造逻辑** 在您的测试配置中，通过 `@TestPropertySource` 设置的属性： ```java crawler.sources.source1=https://example.com/price?id=%s ``` 当传入 `externalId="P123"` 时，生成的URL应为： $$ \text{URL} = \text{String.format(source1, externalId)} = \text{https://example.com/price?id=P123} $$ **验证方法**： - 添加日志输出：在 `PriceCrawlerService` 中打印最终请求的URL： ```java logger.debug("正在请求URL: {}", fullUrl); ``` - 手动测试：直接在浏览器中访问 `https://example.com/price?id=P123`，观察是否返回有效数据。 --- #### 2. **参数编码验证** 若目标网站对参数值有编码要求（如空格需转为`%20`），需检查 `externalId` 是否需要编码： ```java String encodedId = URLEncoder.encode(externalId, StandardCharsets.UTF_8); String fullUrl = String.format(sourceUrl, encodedId); ``` --- #### 3. **Mock测试的覆盖性** 当前测试代码中，`JsoupWrapper` 直接返回模拟HTML，未真实检测URL有效性： ```java class JsoupWrapper { Document fetchDocument(String url) throws IOException { // 模拟返回固定HTML，未真实请求URL String html = "<html><div class='price'>¥299.00</div></html>"; return Jsoup.parse(html); } } ``` **改进方案**：在测试中验证URL参数是否传递正确： ```java @Test void shouldBuildCorrectUrl() { // 模拟产品 Product mockProduct = new Product(); mockProduct.setExternalId("P123"); when(productRepository.findById(1L)).thenReturn(Optional.of(mockProduct)); // 调用爬虫逻辑 priceCrawlerService.crawlPrices(1L); // 验证生成的URL是否符合预期 verify(jsoupWrapper).fetchDocument("https://example.com/price?id=P123"); } ``` --- #### 4. **服务器端验证若目标网站为真实服务（非示例域名）： 1. **检查API文档**：确认路径和参数名是否正确（例如是否为 `/price?productId=P123`）。 2. **请求头验证**：部分网站需添加 `User-Agent` 或 `Referer` 等请求头，否则返回404。 --- ### 三、修复后的代码调整在 `PriceCrawlerService` 中添加URL编码和日志： ```java public void crawlPrices(Long productId) { Product product = productRepository.findById(productId) .orElseThrow(() -> new RuntimeException("Product not found")); try { String sourceUrl = env.getProperty("crawler.sources.source1"); String encodedId = URLEncoder.encode(product.getExternalId(), StandardCharsets.UTF_8); String fullUrl = String.format(sourceUrl, encodedId); logger.info("构造的请求URL: {}", fullUrl); // 关键日志 Document doc = jsoupWrapper.fetchDocument(fullUrl); // ...后续解析逻辑 } catch (IOException e) { logger.error("[{}] 网络请求异常: {}", "source1", e.getMessage()); } } ``` --- ### 四、安全与稳定性增强 1. **重试机制**：对404或5xx错误添加重试逻辑： ```java int maxRetries = 3; for (int i = 0; i < maxRetries; i++) { try { Document doc = jsoupWrapper.fetchDocument(fullUrl); break; } catch (IOException e) { if (i == maxRetries - 1) throw e; Thread.sleep(1000); } } ``` 2. **超时设置**：在 `JsoupWrapper` 中配置连接超时： ```java Document fetchDocument(String url) throws IOException { return Jsoup.connect(url) .timeout(10_000) .get(); } ``` --- ### 五、总结通过以下步骤确保网址正确性： 1. **本地构造验证**：输出并手动测试生成的URL。 2. **参数编码**：检查特殊字符是否需要转义。 3. **Mock测试增强**：验证URL参数传递正确性。 4. **服务端兼容性**：检查API文档或联系数据提供方。若仍有问题，请提供目标网站的真实域名或API文档，可进一步分析！

阅读全文

相关推荐

sos-covid-backend：https：sos-covid-backend.herokuapp.com

poetryclub-backend:基于 laravel + vue.js 的诗词小筑网站后台页面与后端代码

Django-Vue-Extend:基于 django2.x + vue2.x 的集成项目

基于单片机的水位自动检测与控制系统开题报告.doc

机电控制与可编程序控制器课程设计.doc

基于单片机的红外防盗系统.doc

投资项目管理师试题.doc

网络游戏校园推广方案.doc

全民消防安全知识网络大赛答案.doc

DAC3484在高速通信领域的应用报告：中国电信HSP项目中的电流输出与电压配置设计

成功的项目管理.docx

全光网络介绍-论文型.doc

基于TUSB3410及VCP软件的USB/串行接口应用设计指南：驱动软件与固件配置详解

基于PLC和变频器控制的恒压供水系统设计.doc

投资项目管理论文.doc

MATLAB中乘法与除法运算的运用

嵌入式系统开发+千兆网卡PHY芯片设备树配置+nvidia agx orin网卡驱动+嵌入式驱动开发

VS2019使用ClaudiaIDE插件更换背景教程

基于单片机的制冷系统及智能控制.doc

《身份证号码与籍贯信息对应关系表》

大家在看

常用的网络拓朴图素材.zip

最新VISIO各种图形图标大集合.

2020_0610_应对新兴毫米波应用的测试挑战.pdf

国家/地区：国家/地区信息应用

Xilinx 7系列FPGA手册[打包下载]

最新推荐

解决keras,val_categorical_accuracy:,0.0000e+00问题

tensorflow 2.0模式下训练的模型转成 tf1.x 版本的pb模型实例

基于单片机的水位自动检测与控制系统开题报告.doc

机电控制与可编程序控制器课程设计.doc

cc65 Windows完整版发布：6502 C开发工具

【CLIP模型实战】：从数据预处理到代码实现的图文相似度计算完全指南

车载以太网doip协议格式

JavaScript中文帮助手册：初学者实用指南

深入理解MySQL存储引擎：InnoDB与MyISAM的终极对决

window中系统中断，cpu占用100%