0% found this document useful (0 votes)
22 views

Assignment 2

Uploaded by

aryan540787
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Assignment 2

Uploaded by

aryan540787
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

ASSIGNMENT 2

Q1. CONVERT A CONTINOUS VARIABLE INTO QUANTILE BINS AND


PLOT THEM USING COUNTPLOTS. USE DATASET DIAMONDS AND
DO THIS FOR COLUMN X,Y,Z
ANS1. To achieve this, you can use the seaborn library in Python, along with
pandas for data manipulation. Here's a step-by-step guide to convert continuous
variables into quantile bins and plot them using count plots for columns X, Y, and
Z in the diamonds dataset:

```python
import seaborn as sns
import pandas as pd

# Load the diamonds dataset


diamonds = sns.load_dataset('diamonds')

# Define the columns X, Y, and Z


columns = ['x', 'y', 'z']

# Create quantile bins for each column


for col in columns:
diamonds[f'{col}_quantile'] = pd.qcut(diamonds[col], q=5, labels=False)

# Melt the dataset to long format for countplot


diamonds_melted = pd.melt(diamonds, value_vars=[f'{col}_quantile' for col in
columns])

# Plot using countplot


sns.countplot(data=diamonds_melted, x='value', hue='variable')
```

This code will create countplots for the quantile bins of columns X, Y, and Z in the
diamonds dataset. Each bin represents a range of values for each column, and the
countplot shows the distribution of data points within each bin. Certainly! Here's a
more concise version:

```python
import seaborn as sns
import pandas as pd
diamonds = sns.load_dataset('diamonds')

columns = ['x', 'y', 'z']

for col in columns:


diamonds[f'{col}_quantile'] = pd.qcut(diamonds[col], q=5, labels=False)

sns.countplot(data=pd.melt(diamonds, value_vars=[f'{col}_quantile' for col in


columns]), x='value', hue='variable')
```

This code accomplishes the same task in a more condensed form.


Q2. PLOT A BIVARIATE REALTIONSHIP BETWEEN ATTAL THE
CATEGORICAL AND NUMERIC COLUMNS OF HOUSING DATA TO
UNDERSTAND THE REALTIONSHIP BETWEEN THE VARIABLES.
WHAT APPORACHES YOU TOOK?
ANS 2. To plot a bivariate relationship between all the categorical and numeric
columns of the housing data, you can use seaborn's `pairplot` function. This
function creates a grid of scatterplots and histograms for each pair of variables,
allowing you to visualize the relationships between them. Here's a step-by-step
approach:

1. Load the housing dataset.


2. Separate the categorical and numeric columns.
3. Use `pairplot` to plot the bivariate relationships between all pairs of variables.

Here's how you can do it:

```python
import seaborn as sns
import pandas as pd

# Load the housing dataset (replace 'housing_data.csv' with the actual file
name)
housing_data = pd.read_csv('housing_data.csv')

# Separate categorical and numeric columns


numeric_columns = housing_data.select_dtypes(include=['int64', 'float64'])
categorical_columns = housing_data.select_dtypes(include=['object'])
# Plot bivariate relationships using pairplot
sns.pairplot(data=numeric_columns.join(categorical_columns.apply(lambda
x: x.astype('category').cat.codes)),
diag_kind='hist')
```

In this approach:

- We load the housing dataset.


- We separate the numeric and categorical columns.
- We convert the categorical columns to numeric using `.apply(lambda x:
x.astype('category').cat.codes)` so that we can include them in the `pairplot`.
- Finally, we use `pairplot` to visualize the relationships between all pairs of
numeric and converted categorical variables. The `diag_kind='hist'` parameter
specifies that histograms should be plotted along the diagonal.
Q3. GENERATE A HEATMAP BETWEEN ALL THE CONTONIOUS
VARIABLES OF HOUSING DATASET , MAKESURE YOU
UNDERSTAND THAT WHAT CONTONIOUS DATASET IS I THE GIVEN
THE INPUT DATASET OF HOUSING.
ANS 3. Continuous variables in a dataset are those that can take any value within a
certain range. In the context of the housing dataset, continuous variables might
include features like square footage, number of bedrooms, price, etc. To generate a
heatmap between all the continuous variables in the housing dataset, you can use
the `corr()` function to calculate the correlation matrix and then visualize it using
seaborn's heatmap. Here's how you can do it:

```python
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

# Load the housing dataset (replace 'housing_data.csv' with the actual file
name)
housing_data = pd.read_csv('housing_data.csv')

# Filter out continuous variables


continuous_variables = housing_data.select_dtypes(include=['int64',
'float64'])
# Calculate the correlation matrix
correlation_matrix = continuous_variables.corr()

# Plot heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f")
plt.title('Correlation Heatmap of Continuous Variables')
plt.show()
```

In this code:

- We load the housing dataset.


- We filter out the continuous variables using `select_dtypes`.
- We calculate the correlation matrix using the `corr()` function.
- Finally, we plot the heatmap using seaborn's `heatmap` function, specifying
parameters such as `annot=True` to display the correlation values on the heatmap
and `cmap='coolwarm'` to choose the color palette. Certainly! Here's a more
detailed breakdown:

```python
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

# Load the housing dataset (replace 'housing_data.csv' with the actual file
name)
housing_data = pd.read_csv('housing_data.csv')

# Filter out continuous variables


continuous_variables = housing_data.select_dtypes(include=['int64',
'float64'])

# Calculate the correlation matrix


correlation_matrix = continuous_variables.corr()

# Plot heatmap
plt.figure(figsize=(10, 8)) # Set the size of the heatmap
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f")
# annot=True shows correlation values, cmap='coolwarm' selects color
palette, fmt=".2f" formats values to 2 decimal places
plt.title('Correlation Heatmap of Continuous Variables') # Set the title of the
heatmap
plt.show() # Display the heatmap
```

This code provides a more detailed explanation of each step:

1. **Load the dataset**: Reads the housing dataset into a pandas DataFrame.
Replace `'housing_data.csv'` with the actual filename.

2. **Filter continuous variables**: Selects only the continuous variables from


the dataset using `select_dtypes`. This ensures that only numerical variables are
included in the correlation analysis.

3. **Calculate the correlation matrix**: Computes the correlation coefficients


between all pairs of continuous variables using the `corr()` method. This generates
a symmetric matrix where each element represents the correlation between two
variables.

4. **Plot the heatmap**: Creates a heatmap using seaborn's `heatmap` function.


The heatmap visualizes the correlation matrix, with each cell color representing the
strength and direction of the correlation between two variables. Parameters such as
`annot=True`, `cmap='coolwarm'`, and `fmt=".2f"` are used to enhance the
visualization by displaying correlation values, selecting a color palette, and
formatting values to two decimal places, respectively.

5. **Display the plot**: Finally, the heatmap is displayed using `plt.show()`.

This heatmap allows you to quickly identify patterns of association between pairs
of continuous variables in the housing dataset. Strong correlations (either positive
or negative) are indicated by darker shades, while weaker correlations are
represented by lighter shades or near-zero values.
Q4. FIND THE ROOTS OF THE EQUATION OF “x^2 – 6X + 5 AND
POINT OUT THE ROOTS ON MATPLOTIB GRAPH.
ANS 4. To find the roots of the equation \(x^2 - 6x + 5\), you can use the quadratic
formula:

\[x = \frac{{-b \pm \sqrt{{b^2 - 4ac}}}}{{2a}}\]


Where \(a\), \(b\), and \(c\) are coefficients of the quadratic equation \(ax^2 +
bx + c\).

For the given equation \(x^2 - 6x + 5\), \(a = 1\), \(b = -6\), and \(c = 5\).

Now, let's calculate the roots and then plot them on a Matplotlib graph:

```python
import numpy as np
import matplotlib.pyplot as plt

# Coefficients of the quadratic equation


a=1
b = -6
c=5

# Calculate the discriminant


discriminant = b**2 - 4*a*c

# Check if the discriminant is positive, negative, or zero


if discriminant > 0:
# Two real and distinct roots
root1 = (-b + np.sqrt(discriminant)) / (2*a)
root2 = (-b - np.sqrt(discriminant)) / (2*a)
print("Roots are real and distinct.")
print("Root 1:", root1)
print("Root 2:", root2)

# Plot the quadratic equation


x = np.linspace(min(root1, root2) - 1, max(root1, root2) + 1, 100)
y = a*x**2 + b*x + c
plt.plot(x, y, label='y = x^2 - 6x + 5')

# Plot the roots


plt.scatter([root1, root2], [0, 0], color='red', label='Roots')

elif discriminant == 0:
# One real root (repeated)
root = -b / (2*a)
print("Root is real and repeated.")
print("Root:", root)

# Plot the quadratic equation


x = np.linspace(root - 1, root + 1, 100)
y = a*x**2 + b*x + c
plt.plot(x, y, label='y = x^2 - 6x + 5')

# Plot the root


plt.scatter(root, 0, color='red', label='Root')

else:
# Complex roots
real_part = -b / (2*a)
imaginary_part = np.sqrt(abs(discriminant)) / (2*a)
root1 = complex(real_part, imaginary_part)
root2 = complex(real_part, -imaginary_part)
print("Roots are complex.")
print("Root 1:", root1)
print("Root 2:", root2)

# Complex roots cannot be plotted on a real number line


# But we can still plot the quadratic equation
x = np.linspace(real_part - 1, real_part + 1, 100)
y = a*x**2 + b*x + c
plt.plot(x, y, label='y = x^2 - 6x + 5')

# Add labels and legend


plt.xlabel('x')
plt.ylabel('y')
plt.title('Quadratic Equation and Roots')
plt.axhline(0, color='black',linewidth=0.5)
plt.axvline(0, color='black',linewidth=0.5)
plt.grid(color = 'gray', linestyle = '--', linewidth = 0.5)
plt.legend()
plt.show()
```
This code calculates the roots of the quadratic equation and plots the equation
along with the roots on a Matplotlib graph. Depending on the nature of the roots
(real and distinct, real and repeated, or complex), it plots accordingly.
Q5. GRAPH SIN(X) AND COS(X) IN MATPOTLIB AND REPRESENT
THEM ON MATPLOTLIB, MAKE SURE THESE PLOT SHOLUD BE
DONE IN ONE PLOT ONLY.
ANS 5. To graph both \( \sin(x) \) and \( \cos(x) \) on the same plot using
Matplotlib, you can simply compute the values of both functions over a range of \
( x \) values and then plot them together. Here's how you can do it:

```python
import numpy as np
import matplotlib.pyplot as plt

# Generate x values
x = np.linspace(0, 2*np.pi, 100)

# Compute sin(x) and cos(x) values


sin_values = np.sin(x)
cos_values = np.cos(x)

# Plot sin(x) and cos(x) on the same plot


plt.plot(x, sin_values, label='sin(x)')
plt.plot(x, cos_values, label='cos(x)')

# Add labels and legend


plt.xlabel('x')
plt.ylabel('y')
plt.title('Plot of sin(x) and cos(x)')
plt.legend()

# Show plot
plt.show()
```

This code generates \( x \) values ranging from 0 to \( 2\pi \) (a full period of the
sine and cosine functions), computes the corresponding \( \sin(x) \) and \( \cos(x) \)
values, and then plots them on the same graph using `plt.plot()`. Finally, it adds
labels, a title, and a legend to the plot using `plt.xlabel()`, `plt.ylabel()`, `plt.title()`,
and `plt.legend()` functions respectively. Certainly! Here's a more concise version:
```python
import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 2*np.pi, 100)


plt.plot(x, np.sin(x), label='sin(x)')
plt.plot(x, np.cos(x), label='cos(x)')
plt.xlabel('x')
plt.ylabel('y')
plt.title('Plot of sin(x) and cos(x)')
plt.legend()
plt.show()
```

This code accomplishes the same task in a more condensed form. It generates \(x\)
values from 0 to \(2\pi\), computes \( \sin(x) \) and \( \cos(x) \) values, plots them
on the same graph, adds labels, a title, and a legend, and finally displays the plot.
Q6. GENERATE A 3D PLOT TO REPRESENT USING NUMPY
MESHGRID TO REPRESENT A PEAK AND A VALLEY IN A GRAPH.
ANS 6. Certainly! You can use NumPy's meshgrid to create a grid of points and
then plot a function that represents a peak and a valley in 3D using Matplotlib.
Here's an example:

```python
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# Generate grid points


x = np.linspace(-5, 5, 100)
y = np.linspace(-5, 5, 100)
X, Y = np.meshgrid(x, y)

# Define the peak and valley function


Z_peak = np.sin(np.sqrt(X**2 + Y**2)) # Peak function
Z_valley = -np.sin(np.sqrt(X**2 + Y**2)) # Valley function

# Create 3D plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

# Plot the peak


ax.plot_surface(X, Y, Z_peak, cmap='viridis', edgecolor='none')

# Plot the valley


ax.plot_surface(X, Y, Z_valley, cmap='magma', edgecolor='none')

# Set labels and title


ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
ax.set_title('Peak and Valley')

# Show plot
plt.show()
```

In this code:

- We create a grid of points in the \(xy\)-plane using `np.meshgrid`.


- We define two functions, one representing a peak and the other a valley.
- We create a 3D plot using Matplotlib and plot the surfaces of the peak and valley
functions.
- Finally, we set labels for the axes and a title for the plot and display it.

You can adjust the parameters of the functions and the range of the grid points to
visualize different peaks and valleys. Sure! Here's a more concise version:

```python
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

x = np.linspace(-5, 5, 100)
y = np.linspace(-5, 5, 100)
X, Y = np.meshgrid(x, y)

Z_peak = np.sin(np.sqrt(X**2 + Y**2))


Z_valley = -np.sin(np.sqrt(X**2 + Y**2))
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

ax.plot_surface(X, Y, Z_peak, cmap='viridis', edgecolor='none')


ax.plot_surface(X, Y, Z_valley, cmap='magma', edgecolor='none')

ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
ax.set_title('Peak and Valley')

plt.show()
```

This code generates a grid of points, calculates the values for a peak and a valley
function, and plots them in 3D using matplotlib.
Q7. WHY DO YOU THINK IT IS IMPORTANT TO MELT THE
DATAFRAME BEFORE WE DO SOME PLOTTING, IN WHAT
SCENERIOS THIS WILL BE IMPORTANT.
ANS 7. Melting a Data Frame is often important in scenarios where the data is in a
wide format (multiple columns) and needs to be converted into a long format
(single column) for visualization or analysis purposes. Here are a few reasons why
melting data might be important and the scenarios where it's particularly useful:

1. **Plotting Multiple Variables**: When you have multiple columns


representing different variables, melting the Data Frame allows you to plot these
variables against each other more easily. For example, when you want to create a
line plot or scatter plot with multiple lines or markers representing different
variables, melting the data makes it simpler to achieve this.

2. **Facet Plots**: In scenarios where you want to create facet plots (plots with
multiple subplots arranged in a grid, each showing a different subset of the data),
melting the data is often a prerequisite. Libraries like Seaborn provide functions
like `sns.catplot ()` or `sns.relplot()` which expect data in a long format for creating
facet plots.

3. **Statistical Analysis**: Some statistical analysis techniques, such as ANOVA


or regression analysis, require data to be in a long format. Melting the data allows
you to reshape it for these analyses.
4. **Database Structure**: Sometimes, you might receive data in a wide format
because it's easier to store or manipulate in a database. However, for analytical
purposes or visualization, it's often more convenient to work with the data in a long
format. Melting the data helps in converting it to a format that's more suitable for
analysis and visualization.

5. **Machine Learning**: In machine learning tasks like feature engineering,


you might need to reshape the data to create new features or to fit certain models.
Melting can be a useful preprocessing step in such cases.

6. **Time Series Analysis**: In time series analysis, you might have data in a
wide format where each column represents a time point. Melting the DataFrame
can transform it into a long format where each row represents a unique time point,
making it easier to perform time-based analysis or plotting.

7. **Hierarchical Data**: If you have hierarchical or nested data where multiple


variables are grouped within different levels of a hierarchy, melting the DataFrame
can help flatten the hierarchy and make the data more amenable to analysis and
visualization.

8. **Data Integration**: When integrating data from multiple sources with


different structures, melting can help standardize the data format and make it easier
to combine or merge datasets.

9. **Interactive Visualizations**: For interactive visualizations using tools like


Plotly or Bokeh, melting the data can be beneficial as it allows for more flexibility
in creating interactive plots with dynamic features like tooltips or hover
information.

10. **Longitudinal Studies**: In longitudinal studies where data is collected


from the same subjects over multiple time points, melting the DataFrame can
organize the data by subject and time point, facilitating longitudinal analysis or
plotting.

In summary, melting a DataFrame is an important data transformation technique


that can improve the flexibility, usability, and interpretability of the data,
especially in scenarios involving complex or multi-dimensional datasets.

You might also like