
- NumPy - Home
- NumPy - Introduction
- NumPy - Environment
- NumPy Arrays
- NumPy - Ndarray Object
- NumPy - Data Types
- NumPy Creating and Manipulating Arrays
- NumPy - Array Creation Routines
- NumPy - Array Manipulation
- NumPy - Array from Existing Data
- NumPy - Array From Numerical Ranges
- NumPy - Iterating Over Array
- NumPy - Reshaping Arrays
- NumPy - Concatenating Arrays
- NumPy - Stacking Arrays
- NumPy - Splitting Arrays
- NumPy - Flattening Arrays
- NumPy - Transposing Arrays
- NumPy Indexing & Slicing
- NumPy - Indexing & Slicing
- NumPy - Indexing
- NumPy - Slicing
- NumPy - Advanced Indexing
- NumPy - Fancy Indexing
- NumPy - Field Access
- NumPy - Slicing with Boolean Arrays
- NumPy Array Attributes & Operations
- NumPy - Array Attributes
- NumPy - Array Shape
- NumPy - Array Size
- NumPy - Array Strides
- NumPy - Array Itemsize
- NumPy - Broadcasting
- NumPy - Arithmetic Operations
- NumPy - Array Addition
- NumPy - Array Subtraction
- NumPy - Array Multiplication
- NumPy - Array Division
- NumPy Advanced Array Operations
- NumPy - Swapping Axes of Arrays
- NumPy - Byte Swapping
- NumPy - Copies & Views
- NumPy - Element-wise Array Comparisons
- NumPy - Filtering Arrays
- NumPy - Joining Arrays
- NumPy - Sort, Search & Counting Functions
- NumPy - Searching Arrays
- NumPy - Union of Arrays
- NumPy - Finding Unique Rows
- NumPy - Creating Datetime Arrays
- NumPy - Binary Operators
- NumPy - String Functions
- NumPy - Matrix Library
- NumPy - Linear Algebra
- NumPy - Matplotlib
- NumPy - Histogram Using Matplotlib
- NumPy Sorting and Advanced Manipulation
- NumPy - Sorting Arrays
- NumPy - Sorting along an axis
- NumPy - Sorting with Fancy Indexing
- NumPy - Structured Arrays
- NumPy - Creating Structured Arrays
- NumPy - Manipulating Structured Arrays
- NumPy - Record Arrays
- Numpy - Loading Arrays
- Numpy - Saving Arrays
- NumPy - Append Values to an Array
- NumPy - Swap Columns of Array
- NumPy - Insert Axes to an Array
- NumPy Handling Missing Data
- NumPy - Handling Missing Data
- NumPy - Identifying Missing Values
- NumPy - Removing Missing Data
- NumPy - Imputing Missing Data
- NumPy Performance Optimization
- NumPy - Performance Optimization with Arrays
- NumPy - Vectorization with Arrays
- NumPy - Memory Layout of Arrays
- Numpy Linear Algebra
- NumPy - Linear Algebra
- NumPy - Matrix Library
- NumPy - Matrix Addition
- NumPy - Matrix Subtraction
- NumPy - Matrix Multiplication
- NumPy - Element-wise Matrix Operations
- NumPy - Dot Product
- NumPy - Matrix Inversion
- NumPy - Determinant Calculation
- NumPy - Eigenvalues
- NumPy - Eigenvectors
- NumPy - Singular Value Decomposition
- NumPy - Solving Linear Equations
- NumPy - Matrix Norms
- NumPy Element-wise Matrix Operations
- NumPy - Sum
- NumPy - Mean
- NumPy - Median
- NumPy - Min
- NumPy - Max
- NumPy Set Operations
- NumPy - Unique Elements
- NumPy - Intersection
- NumPy - Union
- NumPy - Difference
- NumPy Random Number Generation
- NumPy - Random Generator
- NumPy - Permutations & Shuffling
- NumPy - Uniform distribution
- NumPy - Normal distribution
- NumPy - Binomial distribution
- NumPy - Poisson distribution
- NumPy - Exponential distribution
- NumPy - Rayleigh Distribution
- NumPy - Logistic Distribution
- NumPy - Pareto Distribution
- NumPy - Visualize Distributions With Sea born
- NumPy - Matplotlib
- NumPy - Multinomial Distribution
- NumPy - Chi Square Distribution
- NumPy - Zipf Distribution
- NumPy File Input & Output
- NumPy - I/O with NumPy
- NumPy - Reading Data from Files
- NumPy - Writing Data to Files
- NumPy - File Formats Supported
- NumPy Mathematical Functions
- NumPy - Mathematical Functions
- NumPy - Trigonometric functions
- NumPy - Exponential Functions
- NumPy - Logarithmic Functions
- NumPy - Hyperbolic functions
- NumPy - Rounding functions
- NumPy Fourier Transforms
- NumPy - Discrete Fourier Transform (DFT)
- NumPy - Fast Fourier Transform (FFT)
- NumPy - Inverse Fourier Transform
- NumPy - Fourier Series and Transforms
- NumPy - Signal Processing Applications
- NumPy - Convolution
- NumPy Polynomials
- NumPy - Polynomial Representation
- NumPy - Polynomial Operations
- NumPy - Finding Roots of Polynomials
- NumPy - Evaluating Polynomials
- NumPy Statistics
- NumPy - Statistical Functions
- NumPy - Descriptive Statistics
- NumPy Datetime
- NumPy - Basics of Date and Time
- NumPy - Representing Date & Time
- NumPy - Date & Time Arithmetic
- NumPy - Indexing with Datetime
- NumPy - Time Zone Handling
- NumPy - Time Series Analysis
- NumPy - Working with Time Deltas
- NumPy - Handling Leap Seconds
- NumPy - Vectorized Operations with Datetimes
- NumPy ufunc
- NumPy - ufunc Introduction
- NumPy - Creating Universal Functions (ufunc)
- NumPy - Arithmetic Universal Function (ufunc)
- NumPy - Rounding Decimal ufunc
- NumPy - Logarithmic Universal Function (ufunc)
- NumPy - Summation Universal Function (ufunc)
- NumPy - Product Universal Function (ufunc)
- NumPy - Difference Universal Function (ufunc)
- NumPy - Finding LCM with ufunc
- NumPy - ufunc Finding GCD
- NumPy - ufunc Trigonometric
- NumPy - Hyperbolic ufunc
- NumPy - Set Operations ufunc
- NumPy Useful Resources
- NumPy - Quick Guide
- NumPy - Cheatsheet
- NumPy - Useful Resources
- NumPy - Discussion
- NumPy Compiler
NumPy - Reading Data from Files
File Reading in NumPy
Reading data from files involves opening a file and extracting its contents for further use. In Python, libraries like NumPy and Pandas provide functions to load data from various file formats, such as text, CSV, and binary. This allows easy access to stored information for analysis or processing.
In Python, files can be of various types, including text files, CSV files, and binary files. NumPy makes it easy to load data from these files into arrays, which can then be used for analysis or processing.
NumPy offers several functions to read data from files, enabling us to load the data into NumPy arrays for further processing and analysis. The primary functions we will cover are −
NumPy Functions for Reading Data
Following are the functions used in NumPy to read data from a file −
- numpy.loadtxt(): Reads data from text files where the values are separated by spaces, commas, or other delimiters.
- numpy.genfromtxt(): Similar to loadtxt() function but more flexible, allowing you to handle missing values and different data types.
- numpy.load(): Reads binary data from .npy or .npz files.
- numpy.memmap(): Efficiently maps large binary files to memory without loading the entire file into memory.
Reading Data from Text Files
Text files are simple and widely used for storing data. These files may contain numerical data separated by spaces, tabs, or commas. Let us explore how to read data from text files using NumPy.
Reading Simple Text Files with loadtxt() Function
The numpy.loadtxt() function is used to read simple, well-structured text files. By default, it assumes that the data in the file is numeric, and it can automatically split values by whitespace or a custom delimiter.
Example: Reading Data from a Text File
Here, we have created a file with three rows of numbers. The numpy.loadtxt() function reads the file and returns a 2D array, where each row corresponds to a line in the text file −
import numpy as np # Create a sample text file with open('data.txt', 'w') as f: f.write("1 2 3\n4 5 6\n7 8 9\n") # Read the data from the text file data = np.loadtxt('data.txt') print("Loaded data from text file:") print(data)
Following is the output obtained −
Loaded data from text file: [[1. 2. 3.] [4. 5. 6.] [7. 8. 9.]]
Custom Delimiters with loadtxt() Function
You can also specify a custom delimiter if your data is separated by commas, tabs, or other characters using the numpy.loadtxt() function.
Example
In this example, the file uses commas as separators, and we specify the ',' delimiter in the loadtxt() function −
import numpy as np # Create a CSV-like text file with open('data.csv', 'w') as f: f.write("1,2,3\n4,5,6\n7,8,9\n") # Load data with comma as delimiter data = np.loadtxt('data.csv', delimiter=',') print("Loaded data from CSV file:") print(data)
This will produce the following result −
Loaded data from CSV file: [[1. 2. 3.] [4. 5. 6.] [7. 8. 9.]]
Handling Missing Data with genfromtxt() Function
Sometimes, datasets contain missing or incomplete values. The numpy.genfromtxt() function is more flexible than loadtxt() function and can handle missing data or more complex file structures.
Example: Reading Data with Missing Values
Here, the missing value in the second row is replaced with nan (Not a Number). This is useful when working with real-world datasets where missing data is common −
import numpy as np # Create a text file with missing values with open('data_with_missing.csv', 'w') as f: f.write("1,2,3\n4,,6\n7,8,9\n") # Load data, specifying the missing value data = np.genfromtxt('data_with_missing.csv', delimiter=',', filling_values=np.nan) print("Loaded data with missing values:") print(data)
Following is the output of the above code −
Loaded data with missing values: [[ 1. 2. 3.] [ 4. nan 6.] [ 7. 8. 9.]]
Reading Data from Binary Files
Binary files are often used to store data because they are more efficient in terms of space and speed. NumPy supports reading and writing binary files using the numpy.load() and numpy.save() functions. These functions are optimized for storing NumPy arrays in a binary format with .npy extension.
Example
In this example, the numpy.save() function writes the array to a binary .npy file, and the numpy.load() function loads it back. This format is compact and preserves the array's data types and structure −
import numpy as np # Create a sample array data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) # Save the array to a binary file np.save('data.npy', data) # Load the data from the binary file loaded_data = np.load('data.npy') print("Loaded data from binary file:") print(loaded_data)
The output obtained is as shown below −
Loaded data from binary file: [[1 2 3] [4 5 6] [7 8 9]]
Memory-Mapped Files with memmap() Function
For working with large datasets that don't fit into memory, NumPy provides memory-mapped arrays using the numpy.memmap() function. This function allows you to read and write large binary files without loading the entire file into memory.
Example: Using Memory-Mapped Files
Memory mapping is ideal for large datasets as it allows you to access parts of the file directly without loading the entire file into memory −
import numpy as np # Create a large binary file data = np.arange(1e7) np.save('large_data.npy', data) # Memory-map the binary file mmapped_data = np.memmap('large_data.npy', dtype='float64', mode='r', shape=(int(1e7),)) # Access a slice of the data print("First 10 elements of the memory-mapped data:") print(mmapped_data[:10])
After executing the above code, we get the following output −
First 10 elements of the memory-mapped data: [1.87585069e-309 1.17119999e+171 5.22741680e-037 8.44740097e+252 2.65141232e+180 9.92152605e+247 2.16209968e+233 1.39837001e-076 5.89250072e-096 6.01347002e-154]
Working with CSV Files
CSV (Comma-Separated Values) files are commonly used for storing tabular data. NumPy provides functions to read from and write to CSV files. The numpy.genfromtxt() function can handle CSV files, and numpy.savetxt() function can be used to write data to CSV.
Example: Writing Data to CSV
In the following example, we are creating a 2D NumPy array and writing it to a CSV file using np.savetxt() function. The data is saved with a comma delimiter and formatted as integers −
import numpy as np # Create a 2D array data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) # Write data to a CSV file np.savetxt('output.csv', data, delimiter=',', fmt='%d') print("Data written to 'output.csv'.")
The result produced is as follows −
Data written to 'output.csv'.