I/O with NumPy



I/O with NumPy

I/O in NumPy refers to input/output operations, allowing you to save and load arrays to and from files. Functions like np.save() and np.load() handle binary files, while np.savetxt() and np.loadtxt() work with text files.

Reading and Writing Text Files

Text files are one of the most common ways to store data. NumPy provides functions to read from and write to text files. The primary functions for these operations are numpy.loadtxt() and numpy.savetxt().

The numpy.loadtxt() Function

The numpy.loadtxt() function loads data from a text file into a NumPy array. It can handle different data types and allows customization of delimiters and skipping of header lines.

Example: Reading a Text File

In the following example, we are loading data from a text file in NumPy using the np.loadtxt() function −

import numpy as np

# Create a sample text file
with open('data.txt', 'w') as f:
   f.write("1.0 2.0 3.0\n4.0 5.0 6.0\n7.0 8.0 9.0")

# Load the data from the text file
data = np.loadtxt('data.txt')
print("Loaded data from text file:")
print(data)

Following is the output obtained −

Loaded data from text file:
[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]

The numpy.savetxt() Function

The numpy.savetxt() function saves a NumPy array to a text file. It allows customization of the delimiter, format, and header information for the output file.

Example: Writing to a Text File

In this example, we are saving a NumPy array to a text file in using the np.savetxt() function −

import numpy as np

# Create an array
data = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]])

# Save the array to a text file
np.savetxt('output.txt', data, delimiter=' ', fmt='%1.1f')
print("Data saved to text file 'output.txt'")

The data is saved to the file 'output.txt' with space as the delimiter and one decimal point format −

Data saved to text file 'output.txt'

Reading and Writing Binary Files

Binary files are more efficient for storing large datasets because they are compact and faster to read/write compared to text files. NumPy provides numpy.save() function and numpy.load() function for binary I/O operations.

The numpy.save() Function

The numpy.save() function saves a NumPy array to a binary file in .npy format. This function preserves the array's data, shape, and dtype for efficient storage and retrieval.

Example: Writing to a Binary File

In the example below, we are saving a NumPy array to a binary file using the np.save() function −

import numpy as np

# Create an array
data = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]])

# Save the array to a binary file
np.save('data.npy', data)
print("Data saved to binary file 'data.npy'")

Following is the output of the above code −

Data saved to binary file 'data.npy'

The numpy.load() Function

The numpy.load() function loads arrays from binary files saved in .npy or .npz format. It efficiently restores the saved data, including its shape and dtype.

Example: Reading a Binary File

In this example, we are loading an array from a binary file in NumPy using the np.load() function −

import numpy as np

# Load the array from the binary file
data = np.load('data.npy')
print("Loaded data from binary file:")
print(data)

The output obtained is as shown below −

Loaded data from binary file:
[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]

Handling CSV Files

Comma-Separated Values (CSV) files are widely used for data storage and exchange. NumPy can read and write CSV files using numpy.genfromtxt() and numpy.savetxt() functions.

The numpy.genfromtxt() Function

The numpy.genfromtxt() function loads data from a text file, handling missing values and non-numeric data. It is more flexible than loadtxt() function, allowing for advanced customization like filling missing values and specifying data types.

Example: Reading a CSV File

In the following example, we are creating a CSV file with sample data and saving it. Then, we use np.genfromtxt() to load the data from the CSV file into a NumPy array and print it −

import numpy as np

# Create a sample CSV file
with open('data.csv', 'w') as f:
   f.write("1.0,2.0,3.0\n4.0,5.0,6.0\n7.0,8.0,9.0")

# Load the data from the CSV file
data = np.genfromtxt('data.csv', delimiter=',')
print("Loaded data from CSV file:")
print(data)

After executing the above code, we get the following output −

Loaded data from CSV file:
[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]

Example: Writing to a CSV File

In this example, we are creating a NumPy array and saving it to a CSV file using np.savetxt(). The data is saved with a delimiter of commas and formatted to one decimal place −

import numpy as np

# Create an array
data = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]])

# Save the array to a CSV file
np.savetxt('output.csv', data, delimiter=',', fmt='%1.1f')
print("Data saved to CSV file 'output.csv'")

We get the output as shown below −

Data saved to CSV file 'output.csv'

Working with NumPy's NPZ Files

NumPy's NPZ format allows you to save multiple arrays in a single file, making it convenient to store related data together. You can numpy.savez() and numpy.load() functions for these operations.

The numpy.savez() Function

The numpy.savez() function saves multiple NumPy arrays into a single compressed .npz file. It allows you to store multiple arrays with custom names and retrieve them later efficiently.

Example: Writing to an NPZ File

In the example below, we are creating two NumPy arrays and saving them to a compressed .npz file using the np.savez() function. The arrays are stored with custom names, array1 and array2, in the file −

import numpy as np

# Create arrays
array1 = np.array([1, 2, 3])
array2 = np.array([[4, 5, 6], [7, 8, 9]])

# Save arrays to an NPZ file
np.savez('data.npz', array1=array1, array2=array2)
print("Arrays saved to NPZ file 'data.npz'")

Following is the output obtained −

Arrays saved to NPZ file 'data.npz'

Example: Reading from an NPZ File

In this example, we are loading the arrays stored in the .npz file using np.load() function. We access and print the arrays "array1" and "array2" by referencing their names from the loaded file −

import numpy as np

# Load the arrays from the NPZ file
data = np.load('data.npz')
print("Loaded array1 from NPZ file:")
print(data['array1'])
print("Loaded array2 from NPZ file:")
print(data['array2'])

The result produced is as follows −

Loaded array1 from NPZ file:
[1 2 3]
Loaded array2 from NPZ file:
[[4 5 6]
 [7 8 9]]

Handling Large Datasets with Memory Mapping

Memory mapping allows you to work with large datasets that do not fit into memory by mapping a file to memory. NumPy provides the numpy.memmap() function for this purpose.

Example: Using Memory Mapping

In the following example, we are creating a large NumPy array and saving it to a binary file using np.save()function. We then use np.memmap() function to memory-map the file, allowing us to access large data without loading it entirely into memory, and print the first 10 elements −

import numpy as np

# Create a large array and save it to a binary file
data = np.arange(1e7)
np.save('large_data.npy', data)

# Memory map the binary file
mmapped_data = np.memmap('large_data.npy', dtype='float64', mode='r', shape=(int(1e7),))
print("Memory-mapped data:")
print(mmapped_data[:10])

The output of the above code is shown below −

Memory-mapped data:
[1.87585069e-309 1.17119999e+171 5.22741680e-037 8.44740097e+252
 2.65141232e+180 9.92152605e+247 2.16209968e+233 1.39837001e-076
 5.89250072e-096 6.01347002e-154]
Advertisements