接着上文所说的map的输入以及通过调用读取器的nextKeyValue判断是否还有下一条数据,读取数据,重新赋值偏移量,以及给key赋值偏移量pos(当前行相对于整个文件的一个位置),value(当前行的数据),完成了map阶段的输入以及重新定义,接下来该聊的就是map的输出阶段,该阶段是比较复杂的,分为两个分支;第一分支是没有reduce的阶段,在日常的MR的过程中最简单的优化手段就是在尽量的避免reduce的产生,因为reduce阶段存在shuffle阶段,只要有reduce阶段的发生,那么shuffle阶段就是不可避免的,而在整个MR的过程中,最耗时的就是shuffle阶段,现阶段的大部分对于MR的优化基本是都是旨在减少甚至避免shuffle对整个过程造成的影响,此处就不多讲了,在接下来的文章中会将MR的调优仔细讲一下,有需要的话,可以参考之后的文章。另外一个分支就是存在reduce阶段,这个分支将是最主要也是最为复杂的阶段。
现在进入主题,进入map输出阶段的源码分析,入口就是在自定义的map的context.write()方法。
进入代码,就会发现其实是调用的mapContext的Write方法。
而这write方法其实是上一篇中在初始化MapContext的时候根据是否存在reduce阶段,选择output方式之后加入mapcontext的,代码如下;
所以问题就回到了,这个时候的output到底是啥呢?话不多说,上代码;
这下对这个output算是有一定了解了吧,对的,其实如果不存在reduce阶段的话,map的输出底层是调用hdfs的输出,将结果直接输出,不存在排序相关的过程,接下来就是直接输出的时候的相关源码了;
/**
* 直接输出相关的源码
* 对容器进行初始化
*/
NewDirectOutputCollector(MRJobConfig jobContext,
JobConf job, TaskUmbilicalProtocol umbilical, TaskReporter reporter)
throws IOException, ClassNotFoundException, InterruptedException {
/*****************接下来是集群的监控或者通信服务,暂时不用关注*******************/
this.reporter = reporter;
mapOutputRecordCounter = reporter
.getCounter(TaskCounter.MAP_OUTPUT_RECORDS);
fileOutputByteCounter = reporter
.getCounter(FileOutputFormatCounter.BYTES_WRITTEN);
List<Statistics> matchedStats = null;
if (outputFormat instanceof org.apache.hadoop.mapreduce.lib.output.FileOutputFormat) {
matchedStats = getFsStatistics(org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
.getOutputPath(taskContext), taskContext.getConfiguration());
}
fsStats = matchedStats;
/*****************end*******************/
long bytesOutPrev = getOutputBytes(fsStats);
/**
* 获取当前输出格式化类的输出器
* 这个方式可以联想到输入格式化类有自己默认的读取器,输出格式化类也有着自己的默认的读取器
* out = new LineRecordWriter<K, V>(fileOut, keyValueSeparator);
*/
out = outputFormat.getRecordWriter(taskContext);
long bytesOutCurr = getOutputBytes(fsStats);
fileOutputByteCounter.increment(bytesOutCurr - bytesOutPrev);
}
/**
* 真正的写出的过程
* @param key
* @param value
* @throws IOException
* @throws InterruptedException
*/
@Override
@SuppressWarnings("unchecked")
public void write(K key, V value)
throws IOException, InterruptedException {
reporter.progress();
long bytesOutPrev = getOutputBytes(fsStats);
/**
* 此处调用了LineRecordWriter的write方法
*/
out.write(key, value);
long bytesOutCurr = getOutputBytes(fsStats);
fileOutputByteCounter.increment(bytesOutCurr - bytesOutPrev);
mapOutputRecordCounter.increment(1);
}
//接下来是进入LineRecordWriter内部查看write的实现,需要注意的是,输出的时候是不经过缓冲区,直接将key,value进行输出
public synchronized void write(K key, V value)
throws IOException {
boolean nullKey = key == null || key instanceof NullWritable;
boolean nullValue = value == null || value instanceof NullWritable;
if (nullKey && nullValue) {
return;
}
if (!nullKey) {
writeObject(key);
}
/**
* 判断key和value是否为空,不过我不知道为什么会这么判断,key值会为空吗?
*/
if (!(nullKey || nullValue)) {
//此处的out为 DataOutputStream extends FilterOutputStream
out.write(keyValueSeparator);
}
if (!nullValue) {
writeObject(value);
}
out.write(newline);
}
再往下走其实就没什么必要性,因为底层调用的hdfs的底层的一个output的工具,个人觉得没什么必要性继续往下追,所以基本也就到此,浅尝辄止了。
接下来才是重头戏,就是在存在reduce的情况下的输出到底是什么样的呢?
好了,上代码,代码如下;
首先明确一点,此处的output已经变了;
output = new NewOutputCollector(taskContext, job, umbilical, reporter);
在接下来的这段代码,将会说明reduce的个数,到底是靠什么去规定的,以及分区器是干什么的;
/**
* 初始化一个可排序的容器,注意此时是支持排序的,为的是输出有序的数据
* 初始化分区器,这时候加载分区器,如果没有自定义,就会默认选择hashPartition
* 如果reduce没有指定,默认只有一个,则每次指挥返回0号分区
*/
NewOutputCollector(org.apache.hadoop.mapreduce.JobContext jobContext,
JobConf job,
TaskUmbilicalProtocol umbilical,
TaskReporter reporter
) throws IOException, ClassNotFoundException {
//存储key,value,partition的容器--》MapOutputCollector--》MapOutputbuffer
collector = createSortingCollector(job, reporter);
/**
* 分区个数等于设定的reduce的个数
* reduce的个数是跟分区数相同的
* reduece的 个数是可以人为指定的,所以reduce的个数,并行度,其实是由人为的去控制的
* 重写分区器partitioner的作用是,通过人为的干预分组过程,尽量避免出现数据倾斜的问题
*/
partitions = jobContext.getNumReduceTasks();
if (partitions > 1) {
//自定义分区器,继承org.apache.hadoop.mapreduce.Partitioner<K,V> -》根据数据抽样自定义分区器可有效防止在进行mr的时候出现数据倾斜的现象
//默认是hash分区器
//自定义分区器的作用是为了在一定程度上避免大量数据倾斜的产生
partitioner = (org.apache.hadoop.mapreduce.Partitioner<K, V>)
ReflectionUtils.newInstance(jobContext.getPartitionerClass(), job);
} else {
//获取分区号
partitioner = new org.apache.hadoop.mapreduce.Partitioner<K, V>() {
@Override
public int getPartition(K key, V value, int numPartitions) {
return partitions - 1;
}
};
}
}
记住这段代码;partitions = jobContext.getNumReduceTasks();
你的分区数是由你的reduce的个数决定的,那么reduce的个数哪来的呢?就是我们自己在编写代码的时候自己设置的,也就是说我们自己可以人为的规定reduce的分区个数,但是在决定分区个数的时候最好是先做抽样,避免分区设置太多很多都是无用的,或者分区数太少,reduce的并行度起不来,导致任务执行很长时间。
接下来的重点就是;
就是这个创建可排序的容器,上代码;
private <KEY, VALUE> MapOutputCollector<KEY, VALUE>
createSortingCollector(JobConf job, TaskReporter reporter)
throws IOException, ClassNotFoundException {
MapOutputCollector.Context context =
new MapOutputCollector.Context(this, job, reporter);
/**
* 容器的底层实现是MapOutputBuffer
*/
Class<?>[] collectorClasses = job.getClasses(
JobContext.MAP_OUTPUT_COLLECTOR_CLASS_ATTR, MapOutputBuffer.class);
int remainingCollectors = collectorClasses.length;
for (Class clazz : collectorClasses) {
try {
if (!MapOutputCollector.class.isAssignableFrom(clazz)) {
throw new IOException("Invalid output collector class: " + clazz.getName() +
" (does not implement MapOutputCollector)");
}
Class<? extends MapOutputCollector> subclazz =
clazz.asSubclass(MapOutputCollector.class);
LOG.debug("Trying map output collector class: " + subclazz.getName());
MapOutputCollector<KEY, VALUE> collector =
ReflectionUtils.newInstance(subclazz, job);
//容器初始化
/**
* 几个步骤;
* 1;确定容器的大小也就是环形缓冲区的大小,如果用户没有设定的话,会采用默认的溢写比例0.8;环形缓冲区大小为100m
* 2;排序算法的确定,如果用户没有指定对key排序使用的特殊算法的话,会默认采用快排算法来对key进行排序
* 3;排序需要指定,按什么来排序,是字符序,还是数值序,或者自定义的排序比较器
* 如果没有自定义排序比较器,就会默认采用key的类型的那个比较器去做排序比较器
* 所以如果key为自定义的,而不是系统原生的,则必须重写排序比较器
*/
collector.init(context);
LOG.info("Map output collector class = " + collector.getClass().getName());
return collector;
} catch (Exception e) {
String msg = "Unable to initialize MapOutputCollector " + clazz.getName();
if (--remainingCollectors > 0) {
msg += " (" + remainingCollectors + " more collector(s) to try)";
}
LOG.warn(msg, e);
}
}
throw new IOException("Unable to initialize any output collector");
}
接下来是容器初始化,也就是mapOutPutBuffer的初始化;
这里面有几个重点,一个是环形缓冲区的大小,溢写比例,排序比较器,commbiner
其中环形缓冲区的大小可以作为之后对MR优化的一个方向,因为环形缓冲区越大,相对应溢写时候的文件也就越大,减少大量小文件出现的情况,
第二点是commbiner的存在,就是map端的reduce,在shuffle将数据拉走之前提前对当前map输出的数据进行聚合操作,map端聚合也是MR的优化手段之一;
public void init(MapOutputCollector.Context context
) throws IOException, ClassNotFoundException {
job = context.getJobConf();
reporter = context.getReporter();
mapTask = context.getMapTask();
mapOutputFile = mapTask.getMapOutputFile();
sortPhase = mapTask.getSortPhase();
spilledRecordsCounter = reporter.getCounter(TaskCounter.SPILLED_RECORDS);
partitions = job.getNumReduceTasks();
rfs = ((LocalFileSystem) FileSystem.getLocal(job)).getRaw();
//sanity checks
/**
* 确定环形缓冲区大小,溢写阈值
*/
final float spillper =
job.getFloat(JobContext.MAP_SORT_SPILL_PERCENT, (float) 0.8);//溢写阈值
final int sortmb = job.getInt(JobContext.IO_SORT_MB, 100);//内存缓冲区大小
indexCacheMemoryLimit = job.getInt(JobContext.INDEX_CACHE_MEMORY_LIMIT,
INDEX_CACHE_MEMORY_LIMIT_DEFAULT);
if (spillper > (float) 1.0 || spillper <= (float) 0.0) {
throw new IOException("Invalid \"" + JobContext.MAP_SORT_SPILL_PERCENT +
"\": " + spillper);
}
if ((sortmb & 0x7FF) != sortmb) {
throw new IOException(
"Invalid \"" + JobContext.IO_SORT_MB + "\": " + sortmb);
}
/**
* 基于内存的排序,此处的排序是整个MR中唯一的一次由无序变有序的排序,其余的都是在此基础之上进行排序的
* 可以通过设置map.sort.class来选择排序的方式
* 如果未指定排序方式,默认为快速排序
*/
sorter = ReflectionUtils.newInstance(job.getClass("map.sort.class",
QuickSort.class, IndexedSorter.class), job);
// buffers and accounting
int maxMemUsage = sortmb << 20;
maxMemUsage -= maxMemUsage % METASIZE;
kvbuffer = new byte[maxMemUsage];
bufvoid = kvbuffer.length;
kvmeta = ByteBuffer.wrap(kvbuffer)
.order(ByteOrder.nativeOrder())
.asIntBuffer();
setEquator(0);
bufstart = bufend = bufindex = equator;
kvstart = kvend = kvindex;
maxRec = kvmeta.capacity() / NMETA;
softLimit = (int) (kvbuffer.length * spillper);
bufferRemaining = softLimit;
LOG.info(JobContext.IO_SORT_MB + ": " + sortmb);
LOG.info("soft limit at " + softLimit);
LOG.info("bufstart = " + bufstart + "; bufvoid = " + bufvoid);
LOG.info("kvstart = " + kvstart + "; length = " + maxRec);
// k/v serialization
/**
*定义key的排序比较器
* 如果没有认为定义
* WritableComparator.get(getMapOutputKeyClass().asSubclass(WritableComparable.class), this)
* getClass(JobContext.MAP_OUTPUT_KEY_CLASS, null, Object.class);
*会将key所属的类型的比较器作为key的排序比较器
* 这也就说明了,如果key值是对象的话,这个比较器是必须要重写的,需要定义key的排序的规则
*/
comparator = job.getOutputKeyComparator();
keyClass = (Class<K>) job.getMapOutputKeyClass();
valClass = (Class<V>) job.getMapOutputValueClass();
serializationFactory = new SerializationFactory(job);
keySerializer = serializationFactory.getSerializer(keyClass);
keySerializer.open(bb);
valSerializer = serializationFactory.getSerializer(valClass);
valSerializer.open(bb);
// output counters
mapOutputByteCounter = reporter.getCounter(TaskCounter.MAP_OUTPUT_BYTES);
mapOutputRecordCounter =
reporter.getCounter(TaskCounter.MAP_OUTPUT_RECORDS);
fileOutputByteCounter = reporter
.getCounter(TaskCounter.MAP_OUTPUT_MATERIALIZED_BYTES);
// compression
if (job.getCompressMapOutput()) {
Class<? extends CompressionCodec> codecClass =
job.getMapOutputCompressorClass(DefaultCodec.class);
codec = ReflectionUtils.newInstance(codecClass, job);
} else {
codec = null;
}
/**
* 判断是否由commbiner的存在,如果存在就实例化
*/
// combiner
final Counters.Counter combineInputCounter =
reporter.getCounter(TaskCounter.COMBINE_INPUT_RECORDS);
combinerRunner = CombinerRunner.create(job, getTaskID(),
combineInputCounter,
reporter, null);
if (combinerRunner != null) {
final Counters.Counter combineOutputCounter =
reporter.getCounter(TaskCounter.COMBINE_OUTPUT_RECORDS);
combineCollector = new CombineOutputCollector<K, V>(combineOutputCounter, reporter, job);
} else {
combineCollector = null;
}
spillInProgress = false;
minSpillsForCombine = job.getInt(JobContext.MAP_COMBINE_MIN_SPILLS, 3);
/**
* 以守护线程的方式启动溢写的线程,
* 溢写线程启动之后会一直执行,检测,溢写
* 发生溢写的时候会锁住待溢写的环形缓冲区的数据,溢写结束之后将锁释放
*/
spillThread.setDaemon(true);
spillThread.setName("SpillThread");
spillLock.lock();
try {
spillThread.start();
while (!spillThreadRunning) {
spillDone.await();
}
} catch (InterruptedException e) {
throw new IOException("Spill thread failed to initialize", e);
} finally {
spillLock.unlock();
}
if (sortSpillException != null) {
throw new IOException("Spill thread failed to initialize",
sortSpillException);
}
}
以上就是容器的初始化,现在该进入的阶段是溢写阶段了,这段代码我并没有注释,研究的不是太详细,所以就只能贴上源码;
方法入口;重点是,存入缓冲区的数据是K,V,P三个数据
/**
* context的write()实际上是调用的output的write的方法,collector将key,value,partition都存储起来
*
* context的write方法实际上是调用output的write方法,在reduce的个数不为0的情况下调用的就是newoutputcollector的write的方法
* 这个方法会将以下三个数据写入容器中
* 1;key
* 2;value
* 3;根据初始化的时候初始化的分区器,计算获取当前key值所从属的分区
* 此处所说的容器其实是一个环形内存缓冲区
* collector = mapoutputbuffer
*
*
* @param key
* @param value
* @throws IOException
* @throws InterruptedException
*/
@Override
public void write(K key, V value) throws IOException, InterruptedException {
collector.collect(key, value,
partitioner.getPartition(key, value, partitions));
}
接下来这段代码是数据的写入缓冲区的过程;
/**
* Serialize the key, value to intermediate storage.
* When this method returns, kvindex must refer to sufficient unused
* storage to store one METADATA.
*/
public synchronized void collect(K key, V value, final int partition
) throws IOException {
reporter.progress();
if (key.getClass() != keyClass) {
throw new IOException("Type mismatch in key from map: expected "
+ keyClass.getName() + ", received "
+ key.getClass().getName());
}
if (value.getClass() != valClass) {
throw new IOException("Type mismatch in value from map: expected "
+ valClass.getName() + ", received "
+ value.getClass().getName());
}
if (partition < 0 || partition >= partitions) {
throw new IOException("Illegal partition for " + key + " (" +
partition + ")");
}
checkSpillException();
bufferRemaining -= METASIZE;
if (bufferRemaining <= 0) {
// start spill if the thread is not running and the soft limit has been
// reached
spillLock.lock();
try {
do {
if (!spillInProgress) {
final int kvbidx = 4 * kvindex;
final int kvbend = 4 * kvend;
// serialized, unspilled bytes always lie between kvindex and
// bufindex, crossing the equator. Note that any void space
// created by a reset must be included in "used" bytes
final int bUsed = distanceTo(kvbidx, bufindex);
final boolean bufsoftlimit = bUsed >= softLimit;
if ((kvbend + METASIZE) % kvbuffer.length !=
equator - (equator % METASIZE)) {
// spill finished, reclaim space
resetSpill();
bufferRemaining = Math.min(
distanceTo(bufindex, kvbidx) - 2 * METASIZE,
softLimit - bUsed) - METASIZE;
continue;
} else if (bufsoftlimit && kvindex != kvend) {
// spill records, if any collected; check latter, as it may
// be possible for metadata alignment to hit spill pcnt
startSpill();
final int avgRec = (int)
(mapOutputByteCounter.getCounter() /
mapOutputRecordCounter.getCounter());
// leave at least half the split buffer for serialization data
// ensure that kvindex >= bufindex
final int distkvi = distanceTo(bufindex, kvbidx);
final int newPos = (bufindex +
Math.max(2 * METASIZE - 1,
Math.min(distkvi / 2,
distkvi / (METASIZE + avgRec) * METASIZE)))
% kvbuffer.length;
setEquator(newPos);
bufmark = bufindex = newPos;
final int serBound = 4 * kvend;
// bytes remaining before the lock must be held and limits
// checked is the minimum of three arcs: the metadata space, the
// serialization space, and the soft limit
bufferRemaining = Math.min(
// metadata max
distanceTo(bufend, newPos),
Math.min(
// serialization max
distanceTo(newPos, serBound),
// soft limit
softLimit)) - 2 * METASIZE;
}
}
} while (false);
} finally {
spillLock.unlock();
}
}
try {
// serialize key bytes into buffer
int keystart = bufindex;
keySerializer.serialize(key);
if (bufindex < keystart) {
// wrapped the key; must make contiguous
bb.shiftBufferedKey();
keystart = 0;
}
// serialize value bytes into buffer
final int valstart = bufindex;
valSerializer.serialize(value);
// It's possible for records to have zero length, i.e. the serializer
// will perform no writes. To ensure that the boundary conditions are
// checked and that the kvindex invariant is maintained, perform a
// zero-length write into the buffer. The logic monitoring this could be
// moved into collect, but this is cleaner and inexpensive. For now, it
// is acceptable.
bb.write(b0, 0, 0);
// the record must be marked after the preceding write, as the metadata
// for this record are not yet written
int valend = bb.markRecord();
mapOutputRecordCounter.increment(1);
mapOutputByteCounter.increment(
distanceTo(keystart, valend, bufvoid));
// write accounting info
kvmeta.put(kvindex + PARTITION, partition);
kvmeta.put(kvindex + KEYSTART, keystart);
kvmeta.put(kvindex + VALSTART, valstart);
kvmeta.put(kvindex + VALLEN, distanceTo(valstart, valend));
// advance kvindex
kvindex = (kvindex - NMETA + kvmeta.capacity()) % kvmeta.capacity();
} catch (MapBufferTooSmallException e) {
LOG.info("Record too large for in-memory buffer: " + e.getMessage());
spillSingleRecord(key, value, partition);
mapOutputRecordCounter.increment(1);
return;
}
}
再接下来就是spillandsort阶段了;
protected class SpillThread extends Thread {
@Override
public void run() {
spillLock.lock();
spillThreadRunning = true;
try {
while (true) {
spillDone.signal();
while (!spillInProgress) {
spillReady.await();
}
try {
spillLock.unlock();
sortAndSpill();
} catch (Throwable t) {
sortSpillException = t;
} finally {
spillLock.lock();
if (bufend < bufstart) {
bufvoid = kvbuffer.length;
}
kvstart = kvend;
bufstart = bufend;
spillInProgress = false;
}
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
} finally {
spillLock.unlock();
spillThreadRunning = false;
}
}
}
这段代码我们只需要关注;sortAndSpill();即可
/**
* 排序溢写阶段
* 基于内存排序
* 由无序变有序
* 快排+归并
* 先按分区排序后按key排序
* @throws IOException
* @throws ClassNotFoundException
* @throws InterruptedException
*/
private void sortAndSpill() throws IOException, ClassNotFoundException,
InterruptedException {
//approximate the length of the output file to be the length of the
//buffer + header lengths for the partitions
final long size = distanceTo(bufstart, bufend, bufvoid) +
partitions * APPROX_HEADER_LENGTH;
FSDataOutputStream out = null;
try {
// create spill file
final SpillRecord spillRec = new SpillRecord(partitions);
/**
* String.format(String.format(SPILL_FILE_PATTERN, conf.get(JobContext.TASK_ATTEMPT_ID), spillNumber)), size, conf);
* SPILL_FILE_PATTERN = "%s_spill_%d.out"
*/
final Path filename =
mapOutputFile.getSpillFileForWrite(numSpills, size);
/**
* 溢写的文件的文件名的格式
* "%s_spill_%d.out"
*/
out = rfs.create(filename);
final int mstart = kvend / NMETA;
final int mend = 1 + // kvend is a valid record
(kvstart >= kvend
? kvstart
: kvmeta.capacity() + kvstart) / NMETA;
/**
* 基于内存排序
* 排序依据是key的指定的排序方式
*/
sorter.sort(MapOutputBuffer.this, mstart, mend, reporter);
int spindex = mstart;
final IndexRecord rec = new IndexRecord();
final InMemValBytes value = new InMemValBytes();
for (int i = 0; i < partitions; ++i) {
IFile.Writer<K, V> writer = null;
try {
long segmentStart = out.getPos();
FSDataOutputStream partitionOut = CryptoUtils.wrapIfNecessary(job, out);
/**
* 按key排序之后,再按分区输出
*/
writer = new Writer<K, V>(job, partitionOut, keyClass, valClass, codec,
spilledRecordsCounter);
if (combinerRunner == null) {
// spill directly
DataInputBuffer key = new DataInputBuffer();
while (spindex < mend &&
kvmeta.get(offsetFor(spindex % maxRec) + PARTITION) == i) {
final int kvoff = offsetFor(spindex % maxRec);
int keystart = kvmeta.get(kvoff + KEYSTART);
int valstart = kvmeta.get(kvoff + VALSTART);
key.reset(kvbuffer, keystart, valstart - keystart);
getVBytesForOffset(kvoff, value);
writer.append(key, value);
++spindex;
}
} else {
int spstart = spindex;
while (spindex < mend &&
kvmeta.get(offsetFor(spindex % maxRec)
+ PARTITION) == i) {
++spindex;
}
// Note: we would like to avoid the combiner if we've fewer
// than some threshold of records for a partition
//如果记录很少的尽量避免使用combiner
if (spstart != spindex) {
/**
* 开始执行combiner
* combiner有一个参数,是设定文件小于多少的时候不进行combiner,参数默认为;3
*/
combineCollector.setWriter(writer);
RawKeyValueIterator kvIter =
new MRResultIterator(spstart, spindex);
combinerRunner.combine(kvIter, combineCollector);
}
}
// close the writer
writer.close();
// record offsets
rec.startOffset = segmentStart;
rec.rawLength = writer.getRawLength() + CryptoUtils.cryptoPadding(job);
rec.partLength = writer.getCompressedLength() + CryptoUtils.cryptoPadding(job);
spillRec.putIndex(rec, i);
writer = null;
} finally {
if (null != writer) writer.close();
}
}
最终所有数据处理完成,会执行output.close()方法,这个方法的底层实现为;
public void flush() throws IOException, ClassNotFoundException,
InterruptedException {
LOG.info("Starting flush of map output");
spillLock.lock();
try {
while (spillInProgress) {
reporter.progress();
spillDone.await();
}
checkSpillException();
final int kvbend = 4 * kvend;
if ((kvbend + METASIZE) % kvbuffer.length !=
equator - (equator % METASIZE)) {
// spill finished
resetSpill();
}
if (kvindex != kvend) {
kvend = (kvindex + NMETA) % kvmeta.capacity();
bufend = bufmark;
LOG.info("Spilling map output");
LOG.info("bufstart = " + bufstart + "; bufend = " + bufmark +
"; bufvoid = " + bufvoid);
LOG.info("kvstart = " + kvstart + "(" + (kvstart * 4) +
"); kvend = " + kvend + "(" + (kvend * 4) +
"); length = " + (distanceTo(kvend, kvstart,
kvmeta.capacity()) + 1) + "/" + maxRec);
sortAndSpill();
}
} catch (InterruptedException e) {
throw new IOException("Interrupted while waiting for the writer", e);
} finally {
spillLock.unlock();
}
assert !spillLock.isHeldByCurrentThread();
// shut down spill thread and wait for it to exit. Since the preceding
// ensures that it is finished with its work (and sortAndSpill did not
// throw), we elect to use an interrupt instead of setting a flag.
// Spilling simultaneously from this thread while the spill thread
// finishes its work might be both a useful way to extend this and also
// sufficient motivation for the latter approach.
try {
spillThread.interrupt();
spillThread.join();
} catch (InterruptedException e) {
throw new IOException("Spill failed", e);
}
// release sort buffer before the merge
kvbuffer = null;
//归并
mergeParts();
Path outputPath = mapOutputFile.getOutputFile();
fileOutputByteCounter.increment(rfs.getFileStatus(outputPath).getLen());
}
也就是在,最终即将关闭的时候开启合并过程-》mergeParts();
//将溢写的数据最终归并成中间结果集文件,以供之后的reduce中的shuffle拉取数据
private void mergeParts() throws IOException, InterruptedException,
ClassNotFoundException {
// get the approximate size of the final output/index files
long finalOutFileSize = 0;
long finalIndexFileSize = 0;
final Path[] filename = new Path[numSpills];
final TaskAttemptID mapId = getTaskID();
for (int i = 0; i < numSpills; i++) {
filename[i] = mapOutputFile.getSpillFile(i);
finalOutFileSize += rfs.getFileStatus(filename[i]).getLen();
}
if (numSpills == 1) { //the spill is the final output
sameVolRename(filename[0],
mapOutputFile.getOutputFileForWriteInVolume(filename[0]));
if (indexCacheList.size() == 0) {
sameVolRename(mapOutputFile.getSpillIndexFile(0),
mapOutputFile.getOutputIndexFileForWriteInVolume(filename[0]));
} else {
indexCacheList.get(0).writeToFile(
mapOutputFile.getOutputIndexFileForWriteInVolume(filename[0]), job);
}
sortPhase.complete();
return;
}
// read in paged indices
for (int i = indexCacheList.size(); i < numSpills; ++i) {
Path indexFileName = mapOutputFile.getSpillIndexFile(i);
indexCacheList.add(new SpillRecord(indexFileName, job));
}
//make correction in the length to include the sequence file header
//lengths for each partition
finalOutFileSize += partitions * APPROX_HEADER_LENGTH;
finalIndexFileSize = partitions * MAP_OUTPUT_INDEX_RECORD_LENGTH;
Path finalOutputFile =
mapOutputFile.getOutputFileForWrite(finalOutFileSize);
Path finalIndexFile =
mapOutputFile.getOutputIndexFileForWrite(finalIndexFileSize);
//The output stream for the final single output file
FSDataOutputStream finalOut = rfs.create(finalOutputFile, true, 4096);
if (numSpills == 0) {
//create dummy files
IndexRecord rec = new IndexRecord();
SpillRecord sr = new SpillRecord(partitions);
try {
for (int i = 0; i < partitions; i++) {
long segmentStart = finalOut.getPos();
FSDataOutputStream finalPartitionOut = CryptoUtils.wrapIfNecessary(job, finalOut);
Writer<K, V> writer =
new Writer<K, V>(job, finalPartitionOut, keyClass, valClass, codec, null);
writer.close();
rec.startOffset = segmentStart;
rec.rawLength = writer.getRawLength() + CryptoUtils.cryptoPadding(job);
rec.partLength = writer.getCompressedLength() + CryptoUtils.cryptoPadding(job);
sr.putIndex(rec, i);
}
sr.writeToFile(finalIndexFile, job);
} finally {
finalOut.close();
}
sortPhase.complete();
return;
}
{
sortPhase.addPhases(partitions); // Divide sort phase into sub-phases
IndexRecord rec = new IndexRecord();
final SpillRecord spillRec = new SpillRecord(partitions);
for (int parts = 0; parts < partitions; parts++) {
//create the segments to be merged
List<Segment<K, V>> segmentList =
new ArrayList<Segment<K, V>>(numSpills);
for (int i = 0; i < numSpills; i++) {
IndexRecord indexRecord = indexCacheList.get(i).getIndex(parts);
Segment<K, V> s =
new Segment<K, V>(job, rfs, filename[i], indexRecord.startOffset,
indexRecord.partLength, codec, true);
segmentList.add(i, s);
if (LOG.isDebugEnabled()) {
LOG.debug("MapId=" + mapId + " Reducer=" + parts +
"Spill =" + i + "(" + indexRecord.startOffset + "," +
indexRecord.rawLength + ", " + indexRecord.partLength + ")");
}
}
int mergeFactor = job.getInt(JobContext.IO_SORT_FACTOR, 100);
// sort the segments only if there are intermediate merges
boolean sortSegments = segmentList.size() > mergeFactor;
//merge
@SuppressWarnings("unchecked")
RawKeyValueIterator kvIter = Merger.merge(job, rfs,
keyClass, valClass, codec,
segmentList, mergeFactor,
new Path(mapId.toString()),
job.getOutputKeyComparator(), reporter, sortSegments,
null, spilledRecordsCounter, sortPhase.phase(),
TaskType.MAP);
//write merged output to disk
long segmentStart = finalOut.getPos();
FSDataOutputStream finalPartitionOut = CryptoUtils.wrapIfNecessary(job, finalOut);
Writer<K, V> writer =
new Writer<K, V>(job, finalPartitionOut, keyClass, valClass, codec,
spilledRecordsCounter);
if (combinerRunner == null || numSpills < minSpillsForCombine) {
Merger.writeFile(kvIter, writer, reporter, job);
} else {
combineCollector.setWriter(writer);
combinerRunner.combine(kvIter, combineCollector);
}
//close
writer.close();
sortPhase.startNextPhase();
// record offsets
rec.startOffset = segmentStart;
rec.rawLength = writer.getRawLength() + CryptoUtils.cryptoPadding(job);
rec.partLength = writer.getCompressedLength() + CryptoUtils.cryptoPadding(job);
spillRec.putIndex(rec, parts);
}
spillRec.writeToFile(finalIndexFile, job);
finalOut.close();
for (int i = 0; i < numSpills; i++) {
rfs.delete(filename[i], true);
}
}
}
到此为止,map输出阶段就结束了,也就是说,map阶段基本就结束了;