0% found this document useful (0 votes)
8 views

Lab 1

sdadg
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Lab 1

sdadg
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

To load a dataset from a CSV file using Pandas, you'll need to ensure that the file exists

in the specified directory. Here's a complete example that demonstrates how to load the
dataset, perform some basic operations, and visualize the data using Matplotlib.

Let's assume that `Salary.csv` contains columns `YearsExperience` and `Salary`.


Step-by-Step Example

#### 1. Import Libraries


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#### 2. Load the Dataset

Make sure `Salary.csv` is in the same directory as your script, or provide the full path to
the file.

# Load the dataset


dataset = pd.read_csv('Salary.csv')

# Display the first few rows of the dataset


print(dataset.head())
or
dataset.head() ( also tail , info , shape , size , describe)

#### 3. Explore the Dataset

# Display basic information about the dataset


print(dataset.info())

# Display summary statistics


print(dataset.describe())

#### 4. Visualize the Data

Create a scatter plot to visualize the relationship between `YearsExperience` and


`Salary`.

# Scatter plot of YearsExperience vs Salary

plt.scatter(dataset['YearsExperience'], dataset['Salary'], color='blue')


# Adding title and labels
plt.title('Years of Experience vs Salary')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')

# Display the plot


plt.show()

#### 5. Perform Regression Analysis

Let's perform a simple linear regression to predict Salary based on Years of Experience.

from sklearn.model_selection import train_test_split


// sklearn.model_selection is used to split your dataset into training and testing sets//

from sklearn.linear_model import LinearRegression


// LinearRegression to perform a linear regression analysis on a dataset, split the data into
training and testing sets, train the model, make predictions, and evaluate the model.
from sklearn.metrics import mean_squared_error, r2_score

//The mean_squared_error and r2_score functions from sklearn.metrics are used to


evaluate the performance of a regression model.

 Mean Squared Error (MSE): Measures the average squared difference between
the actual and predicted values. Lower values are better.
 R-squared (R²) score: Represents the proportion of variance in the dependent
variable that is predictable from the independent variable(s). Higher values
(closer to 1) are better.

# Define the features (X) and target (y)

X = dataset[['YearsExperience']]
y = dataset['Salary']

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

//  X: The feature(s) of the dataset. In this case, it is YearsExperience.


 y: The target variable. In this case, it is Salary.
 test_size=0.2: 20% of the data will be used as the test set.
 random_state=42: Ensures reproducibility of the split. Using the same random state
will always produce the same split.
# Create a Linear Regression model to Train a Linear Regression Mode
model = LinearRegression()

# Train the model


model.fit(X_train, y_train)

# Make predictions on the test set


y_pred = model.predict(X_test)

# Evaluate the print('Mean Squared Error:', mse)


print('R-squared:', r2)
model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Plot the regression line


plt.scatter(X, y, color='blue')
plt.plot(X, model.predict(X), color='red', linewidth=2)

# Adding title and labels


plt.title('Years of Experience vs Salary (with Regression Line)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')

# Display the plot


plt.show()
```

You might also like