Open In App

How To Read Space-Delimited Files In Pandas

Last Updated : 11 Jul, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

In this article, We'll learn to efficiently read and process space-delimited files with variable spaces using Pandas in Python.

What is a Space-Delimited file?

Space-delimited files are a type of text file where data is organized into records (rows) and fields (columns), separated by spaces instead of other common delimiters like commas or tabs. Each record typically occupies one line, with spaces acting as invisible boundaries between individual data points within the record. Example of a space-delimited file:

Syam 25 New York
Sundar 30 Los Angeles
Hari 28 Chicago
Hemanth 35 Houston
Phani 22 Seattle

Each line represents a record with three fields: Name, Age, and City, separated by spaces.

Reading Space-Delimited Files with Pandas

Pandas, a powerful Python library for data analysis and manipulation, offers straightforward methods to handle space-delimited files efficiently. Here's how:

Using pandas.read_csv() with delimiter parameter

pandas.read_csv() is one of the function that can read the csv files and that can handle various delimited forms you many think that it can only only handle comma separated values as the name suggests but it can also also handle other delimited forms such as space, tab, newline etc,.

By setting sep=' ', we explicitly specify that space is the delimiter.

Python
import pandas as pd

# Read space-delimited file using pd.read_csv()
df = pd.read_csv('space_delimited_file.txt', sep=' ')

# Display the DataFrame
print(df)

Output:

      Name        Age  
0 Syam 25
1 Hari 22
2 Hemanth 30

Using pd.read_table()

The pd.read_table() function is versatile and can read various delimited files.

Similar to pd.read_csv(), specify sep=' ' to handle space-delimited files.

Python
import pandas as pd

# Read space-delimited file using pd.read_table()
df = pd.read_table('space_delimited_file.txt', sep=' ')

# display the data frame
print(df)

Output :

      Name        Age  
0 Syam 25
1 Hari 22
2 Hemanth 30

Handling Multiple spaces

Some files may contain irregularity of spaces that means sometimes it may contains 2 or 3 spaces which is inconsistent . We can overcome this problem by using a regex operator, '\s+' .

  • sep='\s+' , this argument controls how the function separates values within the file. It's crucial here because the file doesn't use standard commas as delimiters.
  • ='\s+' assigns a regular expression pattern as the separator.
  • \s+ matches any single whitespace character (space, tab, newline, etc.).
  • + quantifier means "one or more," so \s+ matches one or more consecutive whitespace characters.
Python
import pandas as pd

# Read file with inconsistent/multiple spaces using regex separator
df = pd.read_csv('multiple_space_delimited_file.txt', sep='\s+')

# Display the DataFrame
print(df)

Output :

      Name        Age  
0 Syam 25
1 Hari 22
2 Hemanth 30

Conclusion

In conclusion, space-delimited files are a straightforward way to store data, and Pandas provides flexible, powerful tools for reading and manipulating this data in Python. Whether dealing with neatly organized or irregularly spaced data, Pandas can handle the task efficiently, making it an invaluable tool for data analysis projects.


Next Article
Practice Tags :

Similar Reads