0% found this document useful (0 votes)
21 views

QB 1

Uploaded by

ksaikrishna5601
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

QB 1

Uploaded by

ksaikrishna5601
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Write a snippet to download data from di erent repositories

External source

import pandas as pd

url = "https://forstoringfiles.000webhostapp.com/vault/uploads/Iris.csv"

housing = pd.read_csv(url)

housing.head()

github

import pandas as pd

# Load the dataset

url = "https://raw.githubusercontent.com/ageron/handson-
ml/master/datasets/housing/housing.csv"

housing = pd.read_csv(url)

housing.head()
Write a snippet to load and read the data

import pandas as pd

url = "https://raw.githubusercontent.com/ageron/handson-
ml/master/datasets/housing/housing.csv"

housing = pd.read_csv(url)

housing.head()

code for custom tranformers

Although Scikit-Learn provides many useful transformers, you will need to write your
own for tasks such as custom cleanup operations or combining specific attributes. You
will want your transformer to work seamlessly with Scikit-Learn func- tionalities (such
as pipelines), and since Scikit-Learn relies on duck typing (not inher- itance), all you
need is to create a class and implement three methods: fit() (returning self), transform(),
and fit_transform(). You can get the last one for free by simply adding TransformerMixin
as a base class. Also, if you add BaseEstima tor as a base class (and avoid *args and
**kargs in your constructor) you will get two extra methods (get_params() and
set_params()) that will be useful for auto- matic hyperparameter tuning. For example,
here is a small transformer class that adds the combined attributes we discussed
earlier

from sklearn.base import BaseEstimator, TransformerMixin

rooms_ix, bedrooms_ix, population_ix, households_ix = 3, 4, 5, 6

class CombinedAttributesAdder(BaseEstimator, TransformerMixin):

def __init__(self, add_bedrooms_per_room = True): # no *args or **kargs

self.add_bedrooms_per_room = add_bedrooms_per_room

def fit(self, X, y=None):

return self # nothing else to do

def transform(self, X, y=None):

rooms_per_household = X[:, rooms_ix] / X[:, households_ix]

population_per_household = X[:, population_ix] / X[:, households_ix]

if self.add_bedrooms_per_room:
bedrooms_per_room = X[:, bedrooms_ix] / X[:, rooms_ix]

return np.c_[X, rooms_per_household, population_per_household,

bedrooms_per_room]

else:

return np.c_[X, rooms_per_household, population_per_household]

attr_adder = CombinedAttributesAdder(add_bedrooms_per_room=False)

housing_extra_attribs = attr_adder.transform(housing.values)

In this example the transformer has one hyperparameter, add_bedrooms_per_room, set


to True by default (it is often helpful to provide sensible defaults). This hyperpara- meter
will allow you to easily find out whether adding this attribute helps the Machine Learning
algorithms or not. More generally, you can add a hyperparameter to gate any data
preparation step that you are not 100% sure about. The more you automate these data
preparation steps, the more combinations you can automatically try out, making it
much more likely that you will find a great combination (and sav- ing you a lot of time).

Code for transformer pipeline

from sklearn.pipeline import Pipeline

from sklearn.preprocessing import StandardScaler

num_pipeline = Pipeline([

('imputer', SimpleImputer(strategy="median")),

('attribs_adder', CombinedAttributesAdder()),

('std_scaler', StandardScaler()),

])

housing_num_tr = num_pipeline.fit_transform(housing_num)

The Pipeline constructor in Scikit-Learn takes a list of name/estimator pairs to define a


sequence of steps, where all but the last estimator must be transformers with a
fit_transform() method. When the pipeline's fit() method is called, it sequentially applies
fit_transform() on all transformers and then calls fit() on the final estimator. The pipeline
inherits the methods of the final estimator. To handle both categorical and numerical
columns within a single transformer, Scikit-Learn's ColumnTransformer can be used.
Introduced in version 0.20, ColumnTransformer works well with Pandas DataFrames to
apply appropriate transformations to each column of the dataset.

from sklearn.compose import ColumnTransformer

num_attribs = list(housing_num)

cat_attribs = ["ocean_proximity"]

full_pipeline = ColumnTransformer([

("num", num_pipeline, num_attribs),

("cat", OneHotEncoder(), cat_attribs),

])

housing_prepared = full_pipeline.fit_transform(housing)

heres how to use the ColumnTransformer, first import the ColumnTransformer class.
Then, get lists of numerical and categorical column names. Construct a
ColumnTransformer with a list of tuples, where each tuple contains a name, a
transformer, and a list of column names (or indices) that the transformer applies to. In
this example, numerical columns use a pre-defined num_pipeline, and categorical
columns use a OneHotEncoder. Finally, apply the ColumnTransformer to the housing
data, which applies each transformer to the appropriate columns and concatenates the
outputs along the second axis, ensuring the transformers return the same number of
rows.
train and test data code

import pandas as pd

from sklearn.model_selection import train_test_split

url = "https://raw.githubusercontent.com/ageron/handson-
ml/master/datasets/housing/housing.csv"

housing = pd.read_csv(url)

train_set, test_set = train_test_split(housing, test_size=0.2, random_state=42)


Explain performance measure ? explain rmse and mse

When building and evaluating machine learning models, especially regression models,
performance measurement is critical. It helps us understand how well the model
predicts continuous target variables (e.g., housing prices, sales figures).

Here are two popular performance measure

Mean square error :

Mean Square Error (MSE) is a common measure used to evaluate the accuracy of a model. It
measures the average of the squares of the errors, which are the differences between the
observed and predicted values. The formula for MSE is given by:

MSE=

Where:

 y is the number of observations.


 yi is the actual value of the iii-th observation.
 y^i is the predicted value of the iii-th observation.
 ∑ denotes the summation over all observations from i=1 to n.
Code snippet for building a model for linear regression ,decision tree,and random forest

Linear Regression

from sklearn.linear_model import LinearRegression

lin_reg = LinearRegression()

lin_reg.fit(housing_prepared, housing_labels)

Decision tree

from sklearn.tree import DecisionTreeRegressor

tree_reg = DecisionTreeRegressor()

tree_reg.fit(housing_prepared, housing_labels)

Random forest

from sklearn.ensemble import RandomForestRegressor

forest_reg = RandomForestRegressor()

forest_reg.fit(housing_prepared, housing_labels)
explain fine tunning of the model and give code snippet to find parameters (grid search)

Fine-tuning a Model

Fine-tuning involves adjusting a model's hyperparameters to optimize its performance for a


specific task. These hyperparameters are settings that control the model's behavior but aren't
directly learned from the data. Examples include the number of trees in a Random Forest or
the learning rate in a neural network. By tweaking these hyperparameters, you can
significantly improve a model's ability to fit your data and make accurate predictions.

Finding Best Parameters with Grid Search

Manually trying out different hyperparameter combinations can be tedious and time-
consuming. Grid search automates this process by systematically evaluating a predefined set
of hyperparameter values. Here's how it works:

1. Define the Hyperparameter Grid: You specify a dictionary (param_grid) where


each key represents a hyperparameter and the corresponding value is a list of values to
try. In the example, the grid explores different combinations of n_estimators
(number of trees) and max_features (number of features considered) for the Random
Forest model.
2. Create the Model: You define the machine learning model you want to fine-tune
(e.g., RandomForestRegressor).
3. Perform Grid Search: Scikit-Learn's GridSearchCV class is used to perform the grid
search. You provide the model, the hyperparameter grid (param_grid), the number of
folds for cross-validation (cv), a scoring metric (scoring), and optionally, a flag to
return training scores (return_train_score).
4. Fit the Grid Search: Call the fit method of grid_search on your training data
(housing_prepared) and target labels (housing_labels). This trains the model with
all defined hyperparameter combinations using cross-validation and evaluates their
performance based on the scoring metric.
5. Access Results: After fitting, you can access the best hyperparameter combination
using grid_search.best_params_. This provides a dictionary containing the
hyperparameter names and their corresponding best values identified by the grid
search.
6. Retrieve Best Model: The grid_search.best_estimator_ attribute stores the
model instance trained with the best hyperparameters found during the search.

Code Snippet:

Python
from sklearn.model_selection import GridSearchCV

# Define the hyperparameter grid (refer to the explanation above)


param_grid = [
{'n_estimators': [3, 10, 30], 'max_features': [2, 4, 6, 8]},
{'bootstrap': [False], 'n_estimators': [3, 10], 'max_features': [2, 3,
4]},
]

# Create a RandomForestRegressor model (refer to the explanation above)


forest_reg = RandomForestRegressor()

# Create a GridSearchCV object (refer to the explanation above)


grid_search = GridSearchCV(forest_reg, param_grid, cv=5,
scoring='neg_mean_squared_error', return_train_score=True)

# Fit the grid search to the training data


grid_search.fit(housing_prepared, housing_labels)

# Access the best hyperparameters (refer to the explanation above)


print(grid_search.best_params_)

# Access the best model (refer to the explanation above)


print(grid_search.best_estimator_)
data visualization and gain insights code snippet

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

from pandas.plotting import scatter_matrix

# Load the dataset

housing = pd.read_csv('housing.csv')

# Visualizing Geographical Data

plt.figure(figsize=(10,7))

housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.4,

s=housing["population"]/100, label="population",

c="median_house_value", cmap=plt.get_cmap("jet"), colorbar=True)

plt.legend()

plt.show()

# Calculate Correlation Matrix

corr_matrix = housing.corr()

# Display Correlation Matrix as a Heatmap

plt.figure(figsize=(12, 8))

sns.heatmap(corr_matrix, annot=True, cmap="coolwarm", linewidths=0.5)

plt.title('Correlation Matrix')

plt.show()

# Visualize Correlations with Scatter Matrix


attributes = ["median_house_value", "median_income", "total_rooms",
"housing_median_age"]

scatter_matrix(housing[attributes], figsize=(12, 8))

plt.show()

# Zoom in on the most promising correlation: median_income vs median_house_value

plt.figure(figsize=(10,7))

housing.plot(kind="scatter", x="median_income", y="median_house_value", alpha=0.1)

plt.title('Correlation between Median Income and Median House Value')

plt.show()

You might also like