NumPy - Reading Data from Files



File Reading in NumPy

Reading data from files involves opening a file and extracting its contents for further use. In Python, libraries like NumPy and Pandas provide functions to load data from various file formats, such as text, CSV, and binary. This allows easy access to stored information for analysis or processing.

In Python, files can be of various types, including text files, CSV files, and binary files. NumPy makes it easy to load data from these files into arrays, which can then be used for analysis or processing.

NumPy offers several functions to read data from files, enabling us to load the data into NumPy arrays for further processing and analysis. The primary functions we will cover are −

NumPy Functions for Reading Data

Following are the functions used in NumPy to read data from a file −

  • numpy.loadtxt(): Reads data from text files where the values are separated by spaces, commas, or other delimiters.
  • numpy.genfromtxt(): Similar to loadtxt() function but more flexible, allowing you to handle missing values and different data types.
  • numpy.load(): Reads binary data from .npy or .npz files.
  • numpy.memmap(): Efficiently maps large binary files to memory without loading the entire file into memory.

Reading Data from Text Files

Text files are simple and widely used for storing data. These files may contain numerical data separated by spaces, tabs, or commas. Let us explore how to read data from text files using NumPy.

Reading Simple Text Files with loadtxt() Function

The numpy.loadtxt() function is used to read simple, well-structured text files. By default, it assumes that the data in the file is numeric, and it can automatically split values by whitespace or a custom delimiter.

Example: Reading Data from a Text File

Here, we have created a file with three rows of numbers. The numpy.loadtxt() function reads the file and returns a 2D array, where each row corresponds to a line in the text file −

import numpy as np

# Create a sample text file
with open('data.txt', 'w') as f:
    f.write("1 2 3\n4 5 6\n7 8 9\n")

# Read the data from the text file
data = np.loadtxt('data.txt')

print("Loaded data from text file:")
print(data)

Following is the output obtained −

Loaded data from text file:
[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]

Custom Delimiters with loadtxt() Function

You can also specify a custom delimiter if your data is separated by commas, tabs, or other characters using the numpy.loadtxt() function.

Example

In this example, the file uses commas as separators, and we specify the ',' delimiter in the loadtxt() function −

import numpy as np

# Create a CSV-like text file
with open('data.csv', 'w') as f:
   f.write("1,2,3\n4,5,6\n7,8,9\n")

# Load data with comma as delimiter
data = np.loadtxt('data.csv', delimiter=',')

print("Loaded data from CSV file:")
print(data)

This will produce the following result −

Loaded data from CSV file:
[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]

Handling Missing Data with genfromtxt() Function

Sometimes, datasets contain missing or incomplete values. The numpy.genfromtxt() function is more flexible than loadtxt() function and can handle missing data or more complex file structures.

Example: Reading Data with Missing Values

Here, the missing value in the second row is replaced with nan (Not a Number). This is useful when working with real-world datasets where missing data is common −

import numpy as np

# Create a text file with missing values
with open('data_with_missing.csv', 'w') as f:
   f.write("1,2,3\n4,,6\n7,8,9\n")

# Load data, specifying the missing value
data = np.genfromtxt('data_with_missing.csv', delimiter=',', filling_values=np.nan)

print("Loaded data with missing values:")
print(data)

Following is the output of the above code −

Loaded data with missing values:
[[ 1.  2.  3.]
 [ 4. nan  6.]
 [ 7.  8.  9.]]

Reading Data from Binary Files

Binary files are often used to store data because they are more efficient in terms of space and speed. NumPy supports reading and writing binary files using the numpy.load() and numpy.save() functions. These functions are optimized for storing NumPy arrays in a binary format with .npy extension.

Example

In this example, the numpy.save() function writes the array to a binary .npy file, and the numpy.load() function loads it back. This format is compact and preserves the array's data types and structure −

import numpy as np

# Create a sample array
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Save the array to a binary file
np.save('data.npy', data)

# Load the data from the binary file
loaded_data = np.load('data.npy')

print("Loaded data from binary file:")
print(loaded_data)

The output obtained is as shown below −

Loaded data from binary file:
[[1 2 3]
 [4 5 6]
 [7 8 9]]

Memory-Mapped Files with memmap() Function

For working with large datasets that don't fit into memory, NumPy provides memory-mapped arrays using the numpy.memmap() function. This function allows you to read and write large binary files without loading the entire file into memory.

Example: Using Memory-Mapped Files

Memory mapping is ideal for large datasets as it allows you to access parts of the file directly without loading the entire file into memory −

import numpy as np

# Create a large binary file
data = np.arange(1e7)
np.save('large_data.npy', data)

# Memory-map the binary file
mmapped_data = np.memmap('large_data.npy', dtype='float64', mode='r', shape=(int(1e7),))

# Access a slice of the data
print("First 10 elements of the memory-mapped data:")
print(mmapped_data[:10])

After executing the above code, we get the following output −

First 10 elements of the memory-mapped data:
[1.87585069e-309 1.17119999e+171 5.22741680e-037 8.44740097e+252
 2.65141232e+180 9.92152605e+247 2.16209968e+233 1.39837001e-076
 5.89250072e-096 6.01347002e-154]

Working with CSV Files

CSV (Comma-Separated Values) files are commonly used for storing tabular data. NumPy provides functions to read from and write to CSV files. The numpy.genfromtxt() function can handle CSV files, and numpy.savetxt() function can be used to write data to CSV.

Example: Writing Data to CSV

In the following example, we are creating a 2D NumPy array and writing it to a CSV file using np.savetxt() function. The data is saved with a comma delimiter and formatted as integers −

import numpy as np

# Create a 2D array
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Write data to a CSV file
np.savetxt('output.csv', data, delimiter=',', fmt='%d')

print("Data written to 'output.csv'.")

The result produced is as follows −

Data written to 'output.csv'.
Advertisements