QB 1
QB 1
External source
import pandas as pd
url = "https://forstoringfiles.000webhostapp.com/vault/uploads/Iris.csv"
housing = pd.read_csv(url)
housing.head()
github
import pandas as pd
url = "https://raw.githubusercontent.com/ageron/handson-
ml/master/datasets/housing/housing.csv"
housing = pd.read_csv(url)
housing.head()
Write a snippet to load and read the data
import pandas as pd
url = "https://raw.githubusercontent.com/ageron/handson-
ml/master/datasets/housing/housing.csv"
housing = pd.read_csv(url)
housing.head()
Although Scikit-Learn provides many useful transformers, you will need to write your
own for tasks such as custom cleanup operations or combining specific attributes. You
will want your transformer to work seamlessly with Scikit-Learn func- tionalities (such
as pipelines), and since Scikit-Learn relies on duck typing (not inher- itance), all you
need is to create a class and implement three methods: fit() (returning self), transform(),
and fit_transform(). You can get the last one for free by simply adding TransformerMixin
as a base class. Also, if you add BaseEstima tor as a base class (and avoid *args and
**kargs in your constructor) you will get two extra methods (get_params() and
set_params()) that will be useful for auto- matic hyperparameter tuning. For example,
here is a small transformer class that adds the combined attributes we discussed
earlier
self.add_bedrooms_per_room = add_bedrooms_per_room
if self.add_bedrooms_per_room:
bedrooms_per_room = X[:, bedrooms_ix] / X[:, rooms_ix]
bedrooms_per_room]
else:
attr_adder = CombinedAttributesAdder(add_bedrooms_per_room=False)
housing_extra_attribs = attr_adder.transform(housing.values)
num_pipeline = Pipeline([
('imputer', SimpleImputer(strategy="median")),
('attribs_adder', CombinedAttributesAdder()),
('std_scaler', StandardScaler()),
])
housing_num_tr = num_pipeline.fit_transform(housing_num)
num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"]
full_pipeline = ColumnTransformer([
])
housing_prepared = full_pipeline.fit_transform(housing)
heres how to use the ColumnTransformer, first import the ColumnTransformer class.
Then, get lists of numerical and categorical column names. Construct a
ColumnTransformer with a list of tuples, where each tuple contains a name, a
transformer, and a list of column names (or indices) that the transformer applies to. In
this example, numerical columns use a pre-defined num_pipeline, and categorical
columns use a OneHotEncoder. Finally, apply the ColumnTransformer to the housing
data, which applies each transformer to the appropriate columns and concatenates the
outputs along the second axis, ensuring the transformers return the same number of
rows.
train and test data code
import pandas as pd
url = "https://raw.githubusercontent.com/ageron/handson-
ml/master/datasets/housing/housing.csv"
housing = pd.read_csv(url)
When building and evaluating machine learning models, especially regression models,
performance measurement is critical. It helps us understand how well the model
predicts continuous target variables (e.g., housing prices, sales figures).
Mean Square Error (MSE) is a common measure used to evaluate the accuracy of a model. It
measures the average of the squares of the errors, which are the differences between the
observed and predicted values. The formula for MSE is given by:
MSE=
Where:
Linear Regression
lin_reg = LinearRegression()
lin_reg.fit(housing_prepared, housing_labels)
Decision tree
tree_reg = DecisionTreeRegressor()
tree_reg.fit(housing_prepared, housing_labels)
Random forest
forest_reg = RandomForestRegressor()
forest_reg.fit(housing_prepared, housing_labels)
explain fine tunning of the model and give code snippet to find parameters (grid search)
Fine-tuning a Model
Manually trying out different hyperparameter combinations can be tedious and time-
consuming. Grid search automates this process by systematically evaluating a predefined set
of hyperparameter values. Here's how it works:
Code Snippet:
Python
from sklearn.model_selection import GridSearchCV
import pandas as pd
housing = pd.read_csv('housing.csv')
plt.figure(figsize=(10,7))
s=housing["population"]/100, label="population",
plt.legend()
plt.show()
corr_matrix = housing.corr()
plt.figure(figsize=(12, 8))
plt.title('Correlation Matrix')
plt.show()
plt.show()
plt.figure(figsize=(10,7))
plt.show()