split

The split command is useful to divide the input into smaller parts based on the number of lines, bytes, file size, etc. You can also execute another command on the divided parts before saving the results. An example use case is sending a large file as multiple parts as a workaround for online transfer size limits.

info Since a lot of output files will be generated in this chapter (often with the same filenames), remove these files after every illustration.

Default split

By default, the split command divides the input 1000 lines at a time. Newline character is the default line separator. You can pass a single file or stdin data as the input. Use cat if you need to concatenate multiple input sources.

By default, the output files will be named xaa, xab, xac and so on (where x is the prefix). If the filenames are exhausted, two more letters will be appended and the pattern will continue as needed. If the number of input lines is not evenly divisible, the last file will contain less than 1000 lines.

# divide input 1000 lines at a time $ seq 10000 | split # output filenames $ ls x* xaa xab xac xad xae xaf xag xah xai xaj # preview of some of the output files $ head -n1 xaa xab xae xaj ==> xaa <== 1 ==> xab <== 1001 ==> xae <== 4001 ==> xaj <== 9001 $ rm x*

info warning As mentioned earlier, remove the output files after every illustration.

Change number of lines

You can use the -l option to change the number of lines to be saved in each output file.

# maximum of 3 lines at a time $ split -l3 purchases.txt $ head x* ==> xaa <== coffee tea washing powder ==> xab <== coffee toothpaste tea ==> xac <== soap tea

Split by byte count

The -b option allows you to split the input by the number of bytes. Similar to line based splitting, you can always reconstruct the input by concatenating the output files. This option also accepts suffixes such as K for 1024 bytes, KB for 1000 bytes, M for 1024 * 1024 bytes and so on.

# maximum of 15 bytes at a time $ split -b15 greeting.txt $ head x* ==> xaa <== Hi there Have a ==> xab <== nice day # when you concatenate the output files, you'll the original input $ cat x* Hi there Have a nice day

The -C option is similar to the -b option, but it will try to break on line boundaries if possible. The break will happen before the given byte limit. Here's an example where input lines do not exceed the given byte limit:

$ split -C20 purchases.txt $ head x* ==> xaa <== coffee tea ==> xab <== washing powder ==> xac <== coffee toothpaste ==> xad <== tea soap tea $ wc -c x* 11 xaa 15 xab 18 xac 13 xad 57 total

If a line exceeds the given limit, it will be broken down into multiple parts:

$ printf 'apple\nbanana\n' | split -C4 $ head x* ==> xaa <== appl ==> xab <== e ==> xac <== bana ==> xad <== na $ cat x* apple banana

Divide based on file size

The -n option has several features. If you pass only a numeric argument N, the given input file will be divided into N chunks. The output files will be roughly the same size.

# divide the file into 2 parts $ split -n2 purchases.txt $ head x* ==> xaa <== coffee tea washing powder co ==> xab <== ffee toothpaste tea soap tea # the two output files are roughly the same size $ wc x* 3 5 28 xaa 5 5 29 xab 8 10 57 total

warning Since the division is based on file size, stdin data cannot be used. Newer versions of the coreutils package supports this use case by creating a temporary file before splitting.

$ seq 6 | split -n2 split: -: cannot determine file size

By using K/N as the argument, you can view the Kth chunk of N parts on stdout. No output file will be created in this scenario.

# divide the input into 2 parts # view only the 1st chunk on stdout $ split -n1/2 greeting.txt Hi there Hav

To avoid splitting a line, use l/ as a prefix. Quoting from the manual:

For l mode, chunks are approximately input size / N. The input is partitioned into N equal sized portions, with the last assigned any excess. If a line starts within a partition it is written completely to the corresponding file. Since lines or records are not split even if they overlap a partition, the files written can be larger or smaller than the partition size, and even empty if a line/record is so long as to completely overlap the partition.

# divide input into 2 parts, but don't split lines $ split -nl/2 purchases.txt $ head x* ==> xaa <== coffee tea washing powder coffee ==> xab <== toothpaste tea soap tea

Here's an example to view the Kth chunk without splitting lines:

# 2nd chunk of 3 parts without splitting lines $ split -nl/2/3 sample.txt 7) Believe it 8) 9) banana 10) papaya 11) mango

Interleaved lines

The -n option will also help you create output files with interleaved lines. Since this is based on the line separator and not file size, stdin data can also be used. Use the r/ prefix to enable this feature.

# two parts, lines distributed in round robin fashion $ seq 5 | split -nr/2 $ head x* ==> xaa <== 1 3 5 ==> xab <== 2 4

Here's an example to view the Kth chunk:

$ split -nr/1/3 sample.txt 1) Hello World 4) How are you 7) Believe it 10) papaya 13) Much ado about nothing

Custom line separator

You can use the -t option to specify a single byte character as the line separator. Use \0 to specify NUL as the separator. Depending on your shell you can use ANSI-C quoting to use escapes like \t instead of a literal tab character.

$ printf 'apple\nbanana\n;mango\npapaya\n' | split -t';' -l1 $ head x* ==> xaa <== apple banana ; ==> xab <== mango papaya

Customize filenames

As seen earlier, x is the default prefix for output filenames. To change this prefix, pass an argument after the input source.

# choose prefix as 'op_' instead of 'x' $ split -l1 greeting.txt op_ $ head op_* ==> op_aa <== Hi there ==> op_ab <== Have a nice day

The -a option controls the length of the suffix. You'll get an error if this length isn't enough to cover all the output files. In such a case, you'll still get output files that can fit within the given length.

$ seq 10 | split -l1 -a1 $ ls x* xa xb xc xd xe xf xg xh xi xj $ rm x* $ seq 10 | split -l1 -a3 $ ls x* xaaa xaab xaac xaad xaae xaaf xaag xaah xaai xaaj $ rm x* $ seq 100 | split -l1 -a1 split: output file suffixes exhausted $ ls x* xa xc xe xg xi xk xm xo xq xs xu xw xy xb xd xf xh xj xl xn xp xr xt xv xx xz $ rm x*

You can use the -d option to use numeric suffixes, starting from 00 (length can be changed using the -a option). You can use the long option --numeric-suffixes to specify a different starting number.

$ seq 10 | split -l1 -d $ ls x* x00 x01 x02 x03 x04 x05 x06 x07 x08 x09 $ rm x* $ seq 10 | split -l2 --numeric-suffixes=10 $ ls x* x10 x11 x12 x13 x14

Use -x and --hex-suffixes options for hexadecimal numbering.

$ seq 10 | split -l1 --hex-suffixes=8 $ ls x* x08 x09 x0a x0b x0c x0d x0e x0f x10 x11

You can use the --additional-suffix option to add a constant string at the end of filenames.

$ seq 10 | split -l2 -a1 --additional-suffix='.log' $ ls x* xa.log xb.log xc.log xd.log xe.log $ rm x* $ seq 10 | split -l2 -a1 -d --additional-suffix='.txt' - num_ $ ls num_* num_0.txt num_1.txt num_2.txt num_3.txt num_4.txt

Exclude empty files

You can sometimes end up with empty files. For example, trying to split into more parts than possible with the given criteria. In such cases, you can use the -e option to prevent empty files in the output. The split command will ensure that the filenames are sequential even if files in the middle are empty.

# 'xac' is empty in this example $ split -nl/3 greeting.txt $ head x* ==> xaa <== Hi there ==> xab <== Have a nice day ==> xac <== $ rm x* # prevent empty files $ split -e -nl/3 greeting.txt $ head x* ==> xaa <== Hi there ==> xab <== Have a nice day

Process parts through another command

The --filter option will allow you to apply another command on the intermediate split results before saving the output files. Use $FILE to refer to the output filename of the intermediate parts. Here's an example of compressing the results:

$ split -l1 --filter='gzip > $FILE.gz' greeting.txt $ ls x* xaa.gz xab.gz $ zcat xaa.gz Hi there $ zcat xab.gz Have a nice day

Here's an example of ignoring the first line of the results:

$ cat body_sep.txt %=%= apple banana %=%= red green $ split -l3 --filter='tail -n +2 > $FILE' body_sep.txt $ head x* ==> xaa <== apple banana ==> xab <== red green

Exercises

info The exercises directory has all the files used in this section.

info Remove the output files after every exercise.

1) Split the s1.txt file 3 lines at a time.

##### add your solution here $ head xa? ==> xaa <== apple coffee fig ==> xab <== honey mango pasta ==> xac <== sugar tea $ rm xa?

2) Use appropriate options to get the output shown below.

$ echo 'apple,banana,cherry,dates' | ##### add your solution here $ head xa? ==> xaa <== apple, ==> xab <== banana, ==> xac <== cherry, ==> xad <== dates $ rm xa?

3) What do the -b and -C options do?

4) Display the 2nd chunk of the ip.txt file after splitting it 4 times as shown below.

##### add your solution here come back before the sky turns dark There are so many delights to cherish

5) What does the r prefix do when used with the -n option?

6) Split the ip.txt file 2 lines at a time. Customize the output filenames as shown below.

##### add your solution here $ head ip_* ==> ip_0.txt <== it is a warm and cozy day listen to what I say ==> ip_1.txt <== go play in the park come back before the sky turns dark ==> ip_2.txt <== There are so many delights to cherish ==> ip_3.txt <== Apple, Banana and Cherry Bread, Butter and Jelly ==> ip_4.txt <== Try them all before you perish $ rm ip_*

7) Which option would you use to prevent empty files in the output?

8) Split the items.txt file 5 lines at a time. Additionally, remove lines starting with a digit character as shown below.

$ cat items.txt 1) fruits apple 5 banana 10 2) colors green sky blue 3) magical beasts dragon 3 unicorn 42 ##### add your solution here $ head xa? ==> xaa <== apple 5 banana 10 green ==> xab <== sky blue dragon 3 unicorn 42 $ rm xa?