NumPy - Writing Data to Files



Writing Data to Files with NumPy

Writing data to files involves saving information from a program to a storage medium, such as a text file, CSV, or binary file. This process allows data to be preserved, shared, and reused later.

In this tutorial, we will explore various methods to write data to files using NumPy, including saving data in text and binary formats, using different delimiters, and handling large datasets.

Why Save Data to Files?

We save data to files due to the following reasons −

  • Preserve Data: Store your data for future analysis or record-keeping.
  • Share Data: Share your data with others in a standard format.
  • Efficiency: Avoid recomputing results by saving intermediate or final results.
  • Backup: Create backups of important data to prevent data loss.

Writing Data to Text Files

Text files are a common format for storing data because they are human-readable and easy to share. NumPy provides several functions to write arrays to text files.

Using numpy.savetxt() Function

The numpy.savetxt function is used to write a NumPy array to a text file. It allows customization of the delimiter, format, and header information for the output file.

Example

In the following example, we are creating a sample 2D NumPy array and writing it to a text file using np.savetxt() function. The array's contents are saved in the file 'data.txt' −

import numpy as np

# Creating a sample NumPy array
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Writing the array to a text file
np.savetxt('data.txt', data)
print("Data written to 'data.txt'.")

Following is the output obtained −

Data written to 'data.txt'.

Specifying a Delimiter

You can specify a delimiter to separate the elements in the text file, making it easier to parse the data later. For example, setting delimiter=',' will use commas to separate the values.

Example

In the following example, we are creating a 2D NumPy array and writing it to a text file using np.savetxt() function with a comma delimiter −

import numpy as np

# Create a 2D array
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Writing the array to a text file with a comma delimiter
np.savetxt('data_comma.txt', data, delimiter=',')
print("Data written to 'data_comma.txt'.")

The output obtained is as shown below −

Data written to 'data_comma.txt'.

Formatting Data

You can also format the data using the fmt parameter. This is useful when you want to control the appearance of the data, such as the number of decimal places, the width of the columns, or the specific formatting of each element

Example

In this example, we are creating a 2D NumPy array and writing it to a text file using np.savetxt() with formatted data. The array's elements are saved in the file 'data_formatted.txt' with two decimal places −

import numpy as np

# Create a 2D array
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Writing the array to a text file with formatted data
np.savetxt('data_formatted.txt', data, fmt='%0.2f')
print("Data written to 'data_formatted.txt'.")

After executing the above code, we get the following output −

Data written to 'data_formatted.txt'.

Writing Data to CSV Files

CSV (Comma-Separated Values) files are a popular format for storing tabular data. Although numpy.savetxt() function can be used to write CSV files by specifying a comma delimiter, the numpy.genfromtxt() function is more versatile for reading CSV files back into NumPy arrays.

Example: Using numpy.savetxt() Function

In the example below, we are creating a 2D NumPy array and writing it to a CSV file using np.savetxt() function with a comma delimiter −

import numpy as np

# Create a 2D array
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Writing the array to a CSV file
np.savetxt('data.csv', data, delimiter=',')
print("Data written to 'data.csv'.")

The result produced is as follows −

Data written to 'data.csv'.

Example: Using numpy.genfromtxt() Function

In the following example, we are creating a 2D NumPy array and saving it to a CSV file. We then read the CSV file back into a NumPy array using np.genfromtxt() function −

import numpy as np

# Create a 2D array
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Reading the CSV file into a NumPy array
data_from_csv = np.genfromtxt('data.csv', delimiter=',')
print( data_from_csv)

We get the output as shown below −

[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]

Writing Data to Binary Files

Binary files are more efficient for storing large datasets as they consume less space and can be read and written faster. NumPy provides functions to write arrays to binary files in both .npy and .npz formats.

Using numpy.save() Function

The numpy.save function is used to write a single NumPy array to a binary file in .npy format. This format is compact and preserves the data types and structure of the array.

Example

In this example, the numpy.save() function writes the array to a binary .npy file, and the numpy.load() function loads it back −

import numpy as np

# Create a 2D array
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Save the array to a binary file
np.save('data.npy', data)

# Load the data from the binary file
loaded_data = np.load('data.npy')

print("Loaded data from binary file:")
print(loaded_data)

Following is the output obtained −

Loaded data from binary file:
[[1 2 3]
 [4 5 6]
 [7 8 9]]

Using numpy.savez() Function

The numpy.savez function is used to write multiple arrays to a compressed .npz file. It allows you to store and retrieve multiple arrays with custom names efficiently.

Example

In the example below, we are creating two NumPy arrays and saving them to a single .npz file using np.savez() function, with the arrays stored under custom names "array1" and "array2" −

import numpy as np

# Create arrays
array1 = np.array([1, 2, 3])
array2 = np.array([[4, 5, 6], [7, 8, 9]])

# Save arrays to an NPZ file
np.savez('data.npz', array1=array1, array2=array2)
print("Arrays saved to 'data.npz'.")

This will produce the following result −

Arrays saved to 'data.npz'.

Using numpy.load() Function

The numpy.load() function loads NumPy arrays from a binary file, typically in .npy or .npz format. It allows you to restore saved arrays and access their data for further processing.

Example

In the following example, we are loading arrays from the data.npz file using np.load() function and accessing them by their custom names "array1" and "array2" −

import numpy as np

# Load arrays from an NPZ file
data = np.load('data.npz')

# Access and print the arrays
print("Loaded array1:")
print(data['array1'])
print("Loaded array2:")
print(data['array2'])

Following is the output of the above code −

Loaded array1:
[1 2 3]
Loaded array2:
[[4 5 6]
 [7 8 9]]

Handling Large Datasets

When working with large datasets, you might encounter memory limitations. To handle such scenarios, NumPy offers various methods, including the use of memory-mapped files.

Memory-mapped files allow you to access small segments of large files stored on disk, without the need to load the entire file into memory at once. This approach enables more efficient memory management when dealing with large datasets.

Example

In the following example, we create a memory-mapped file to store a large array, write data to it, and then read the data back without loading the entire file into memory at once −

import numpy as np

# Create a memory-mapped file to store a large array
large_array = np.memmap('large_data.dat', dtype='float32', mode='w+', shape=(10000, 10000))

# Write random data to the memory-mapped file
large_array[:] = np.random.rand(10000, 10000)

# Access the data from the memory-mapped file
large_array_read = np.memmap('large_data.dat', dtype='float32', mode='r', shape=(10000, 10000))

# Print the shape of the array read from the memory-mapped file
print("Shape of the memory-mapped array:", large_array_read.shape)

The output obtained is as shown below −

Shape of the memory-mapped array: (10000, 10000)

Writing Custom Data Formats

In many situations, working with data requires saving it in a specific format that is altered to the needs of the application or system you are working with.

NumPy provides the numpy.ndarray.tofile() function and the numpy.fromfile() function for writing custom data formats, which allows you to define how your data should be stored and retrieved.

Using numpy.ndarray.tofile() Function

The numpy.ndarray.tofile() function is used to write the contents of a NumPy array to a binary file. It allows you to specify the file name and the format in which the data should be stored.

This function is useful for saving large arrays in a binary format for efficient storage and retrieval.

Example

In the following example, we are creating a simple NumPy array with integer values and writing it to a binary file using the tofile() function −

import numpy as np

# Create a sample NumPy array
data = np.array([1, 2, 3, 4, 5], dtype='int32')

# Write the array to a binary file
data.tofile('data_binary.dat')
print("Array written to binary file:", data)

The result produced is as follows −

Array written to binary file: [1 2 3 4 5]

Using numpy.fromfile() Function

The numpy.fromfile function is used to read binary data from a file and load it into a NumPy array. It allows you to specify the data type (dtype) and shape of the array being read.

This method is commonly used for loading large datasets stored in binary format.

Example

In this example, we are reading the binary file 'data_binary.dat' into a NumPy array with the specified data type "int32". Since the binary file contains 5 elements, we load them into a 1D array without reshaping it −

import numpy as np

# Read the array from the binary file and reshape it to match the available data
data_from_binary = np.fromfile('data_binary.dat', dtype='int32')

# Print the loaded array (no reshaping as the number of elements doesn't match)
print("Array read from binary file:", data_from_binary)

We get the output as shown below −

Array read from binary file: [1 2 3 4 5]
Advertisements