Hadoop报错Caused by: java.io.IOException: Stream closed问题

本文详细介绍了在使用Hadoop和Hive时遇到的'java.io.IOException: Stream closed'错误的分析过程和解决方案。问题定位为Hadoop 2.7版本的一个已知问题,该问题在2.8版本得到修复。通过编译源码并替换Hive的lib目录下的jar包,成功解决了问题,避免了升级Hadoop带来的复杂性。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

【一、问题现象】

这几天离线计算平台有个别计算作业运行失败,排查原因,查看平台日志,报错关键信息如下:

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Error: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. java.io.IOException: Stream closed
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:257)
at org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:348)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:362)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: java.io.IOException: Stream closed
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2639)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:981)
at org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:2007)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:479)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:469)
at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:188)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:580)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:578)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.JobClient.getJobUsingCluster(JobClient.java:578)
at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:596)
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:295)
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:559)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:424)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1232)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:255)
        ... 11 more
Caused by: java.io.IOException: Stream closed
at java.util.zip.InflaterInputStream.ensureOpen(InflaterInputStream.java:67)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:142)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at org.apache.xerces.impl.XMLEntityManager$RewindableInputStream.read(Unknown Source)
at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
        at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
        at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2480)
        at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2468)
        at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2539)
        ... 37 more (state=08S01,code=1)
Closing: 0: jdbc:hive2://60.8.1.23:10000

【二、问题定位】

1、根据报错日志,指向为hadoop问题,于是在apache官网选择ALL ISSUES,搜索日志报错关键字:java.io.IOException: Stream closed,记录数较多,一 一比对,发现HADOOP-12404这个ISSUE与问题日志匹配

 

我们生产环境所用Hadoop版本为2.7,该ISSUE在Hadoop 2.8已被fixed,问题描述为:

从Configuration类中的URL加载资源时,请禁用JarURLConnection的缓存,以避免与其他用户共享JarFile。

Configuration类的parse方法将调用url.openStream来获取InputStream供DocumentBuilder进行解析。

根据JDK源代码,调用顺序为 url.openStream => handler.openConnection.getInputStream => new JarURLConnection => JarURLConnection.connect => factory.get(getJarFileURL(),getUseCaches())=> URLJarFile.getInputStream => JarFile.getInputStream => ZipFile.getInputStream

如果URLConnection类的getUseCaches方法返回值为true(默认情况下),则URLJarFile将为同一URL共享。 如果共享的URLJarFile被其他用户关闭,则URLJarFile类的getInputStream方法返回的所有InputStream都将基于文档关闭。

因此,在集群负载较高时,可能会发生该异常。

 

版本2.8对该问题进行了修复,设置用户缓存为false,同时修改了parse函数的返回参数。

 

diff --git hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java

index 0b45429..8801c6c 100644

--- hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java

+++ hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java

@@ -34,7 +34,9 @@

 import java.io.Writer;

 import java.lang.ref.WeakReference;

 import java.net.InetSocketAddress;

+import java.net.JarURLConnection;

 import java.net.URL;

+import java.net.URLConnection;

 import java.util.ArrayList;

 import java.util.Arrays;

 import java.util.Collection;

@@ -2531,7 +2533,14 @@ private Document parse(DocumentBuilder builder, URL url)

     if (url == null) {

       return null;

     }

-    return parse(builder, url.openStream(), url.toString());

+

+    URLConnection connection = url.openConnection();

+    if (connection instanceof JarURLConnection) {

+      // Disable caching for JarURLConnection to avoid sharing JarFile

+      // with other users.

+      connection.setUseCaches(false);

+    }

+    return parse(builder, connection.getInputStream(), url.toString());

   }

 

### 解决 IntelliJ IDEA 中 `@Autowired` 注解导致的红色波浪线错误 在使用 Spring 框架时,如果遇到 `@Autowired` 注解下的依赖注入对象显示为红色波浪线错误或者黄色警告的情况,通常是由以下几个原因引起的: #### 1. **Spring 插件未启用** 如果 Spring 支持插件未被激活,则可能导致 IDE 无法识别 `@Autowired` 或其他 Spring 特定的功能。可以通过以下方式解决问题: - 打开设置菜单:`File -> Settings -> Plugins`。 - 确认已安装并启用了名为 “Spring Framework Support” 的官方插件[^1]。 #### 2. **项目配置文件缺失或不正确** Spring 需要通过 XML 文件、Java Config 类或其他形式来定义 Bean 定义。如果没有正确加载这些配置文件,可能会导致 `@Autowired` 报错。 - 确保项目的 `applicationContext.xml` 或者基于 Java 的配置类(带有 `@Configuration` 和 `@Bean` 注解)已被正确定义和引入。 - 对于 Spring Boot 项目,确认是否存在 `spring.factories` 文件以及是否包含了必要的组件扫描路径[^3]。 #### 3. **模块依赖关系问题** 当前模块可能缺少对 Spring Core 或 Context 组件库的有效引用。这可能是由于 Maven/Gradle 构建工具中的依赖项声明不足造成的。 - 检查 `pom.xml` (Maven) 或 `build.gradle` (Gradle),确保包含如下核心依赖之一: ```xml <!-- For Maven --> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-context</artifactId> <version>${spring.version}</version> </dependency> ``` ```gradle // For Gradle implementation 'org.springframework:spring-context:${springVersion}' ``` - 更新项目依赖树以应用更改:右键点击项目根目录 -> `Maven -> Reload Project` 或运行命令 `./gradlew build --refresh-dependencies`。 #### 4. **IDE 缓存损坏** Intellij IDEA 的缓存机制有时会因各种因素而失效,从而引发误报错误。清除缓存可以有效缓解此类情况。 - 使用快捷组合键 `Ctrl + Alt + Shift + S` 进入项目结构对话框;也可以尝试执行操作序列:`File -> Invalidate Caches / Restart... -> Invalidate and Restart`. #### 5. **启动异常影响正常解析** 若之前存在类似 `com.intellij.diagnostic.PluginException` 的严重初始化失败日志记录,则表明某些关键服务未能成功加载,进而干扰到后续功能表现[^2]。建议重新下载最新稳定版本的 IDEA 并按照标准流程完成初次部署工作。 ```java // 示例代码片段展示如何正确运用 @Autowired 注解实现自动装配 @Service public class StudentService { private final Repository repository; public StudentService(@Qualifier("specificRepository") Repository repo){ this.repository = repo; } } @Component class SpecificComponent{ @Autowired private transient StudentService studentService; // 此处应无任何编译期告警现象发生 } ```
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值