BIDA practical print
BIDA practical print
a.
'
Output:
₹ 1,400.00
₹ 1,200.00
₹ 1,000.00 10th
₹ 800.00 11th
₹ 600.00
12th
₹ 400.00
13th
₹ 200.00
14th
₹ 0.00
India India India India India India 15th
1 2 3 4 5 6
b.
Output:
100000
90000
80000
70000
60000
50000
Dim Customer Count
40000
30000 Dim Time Count
20000
10000
0
1 2 3 4 5 6
Output:
Practical 2
Apply the what – if Analysis for data visualization. Design and generate necessary reports
based on the data warehouse data. Use Excel.
4. Type a name (60% highest), select cell C4 (% sold for the highest price) for the Changing
cells and click on OK.
5. Enter the corresponding value 0.6 and click on OK again.
OR
Practical 3
Perform the data classification using classification algorithm using R/Python.
Code:
> rainfall <- c(799,1174.8,865.1,1334.6,635.4,918.5,685.5,784.2,985,882.8,1071)
> rainfall.timeseries <- ts(rainfall,start = c(2021,1),frequency = 12)
> print(rainfall.timeseries)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov
2021 799.0 1174.8 865.1 1334.6 635.4 918.5 685.5 784.2 985.0 882.8 1071.0
> png(file = "rainfall.png")
> plot(rainfall.timeseries)
> dev.off()
null device
1
> plot(rainfall.timeseries)
Output:
Practical 4
Perform the data clustering using clustering algorithm using R/Python.
Code:
> newiris <- iris
> newiris$Species <- NULL
> (kc <- kmeans(newiris,3))
> table(iris$Species,kc$cluster)
> plot(newiris[c("Sepal.Length","Sepal.Width")],col=kc$cluster)
> points(kc$centers[,c("Sepal.Length","Sepal.Width")],col=1:3,pch=8,cex=2)
Output:
Practical 5
Perform the Linear regression on the given data warehouse data using R/Python.
Code:
> x <- c(151,174,138,186,128,136,179,163,152,131)
> y <- c(63,81,56,91,47,57,76,72,62,48)
> relation <- lm(y~x)
> png(file = "linearregression.png")
> plot(y,x,col = "blue",main = "Height & Weight Regression",abline(lm(x~y)),cex=1.3, pch=16,
xlab="Weight in Kg", ylab="Height in cm")
> dev.off()
Output:
Practical 6
Perform the logistic regression on the given data warehouse data using R/Python.
Code:
> quality<-read.csv("C:/quality.csv")
> str(quality)
> table(quality$PoorCare)
> install.packages("caTools")
> library(caTools)
> qualityTrain = subset(quality,split==TRUE)
> qualityTest =subset(quality,split==FALSE)
> nrow(qualityTrain)
> nrow(qualityTest)
> QualityLog=glm(PoorCare~OfficeVisits + Narcotics,data=qualityTrain,family = binomial) >
summary(QualityLog)
> predictTrain=predict(QualityLog,type = "response")
> summary(predictTrain)
> install.packages("ROCR")
> library(ROCR)
> ROCRpred=prediction(predictTrain,qualityTrain$PoorCare)
> ROCRperf=performance(ROCRpred,"tpr","fpr")
> plot(ROCRperf)
> plot(ROCRperf,colorize=TRUE)
> plot(ROCRperf,colorize=TRUE,print.cutoffs.at=seq(0,1,by=0.1),text.adj=c(-0.2,10.7))
Practical 7
Write a Python program to read data from a CSV file, perform simple data analysis, and
generate basic insights. (Use Pandas is a Python library).
Code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
#Summary statistics
summary_stats = df.describe()
print("Summary Statistics")
print(summary_stats)
#Correlation matrix
correlation_matrix = df.corr()
print("\nCorrelation Matrix: ")
print(correlation_matrix)
#Quality distribution
quality_distribution = df['quality'].value_counts().sort_index()
print("\nQuality Distribution: ")
print(quality_distribution)
#Plotting
plt.figure (figsize=(10, 6))
#Quality distribution plot
plt.subplot (2, 2, 1)
sns.countplot (x='quality', data=df, palette='viridis')
plt.title('Quality Distribution')
#Alcohol vs Quality
plt.subplot (2, 2, 3)
sns.boxplot (x='quality', y='alcohol', data=df, palette='viridis')
plt.title('Alcohol vs Quality')
#Density vs Quality
plt.subplot (2, 2, 4)
sns.boxplot (x='quality', y='density', data=df, palette='viridis')
plt.title('Density vs Quality')
plt.tight_layout ()
plt.show()
Output:
Practical 8
Perform data visualization
a. Perform data visualization using Python on any sales data.
Code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
print("\nData Structure:")
print(data.info()) # Get information on the DataFrame such as column types and non-null
counts
# Aggregate data by Date and calculate the total sales for each day
sales_by_date = data.groupby('Date')['Sales_Amount'].sum().reset_index()
plt.figure(figsize=(10, 6))
sns.lineplot(x='Date', y='Sales_Amount', data=sales_by_date, marker='o')
plt.title('Total Sales Over Time')
plt.xlabel('Date')
plt.ylabel('Total Sales ($)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
plt.figure(figsize=(10, 6))
sns.barplot(x='Sales_Amount', y='Product', data=top_products, palette='Blues_d')
plt.title('Top 10 Products by Sales Amount')
plt.xlabel('Sales Amount ($)')
plt.ylabel('Product')
plt.tight_layout()
plt.show()
# Main function to load data and generate visualizations
def main():
csv_file = 'sales_data.csv' # Replace with your actual CSV file path
data = load_data(csv_file)
Output:
b. Perform data visualization using PowerBI on any sales data.
3. Sales by Country
Use Replace Values to fill nulls with "Unknown" (for text) or 0 (for numeric fields).
Use Fill Down (for fields where previous values should be carried forward).
D. Change Data Types
Ensure that each column has the correct data type:
Dates: Convert to Date type (Order_Date, Ship_Date).
Numbers: Convert Quantity, Price, Total_Amount to Decimal Number or Whole Number.
Text: Convert Customer_Name, Product_Name, Region to Text.
E. Remove Duplicates
Identify and remove duplicates based on Order_ID or Invoice_ID (whichever is the unique identifier).
Dimension Tables:
DimCustomers: Extract unique Customer_ID, Customer_Name, Region.
DimProducts: Extract unique Product_Code, Product_Name, Category.
DimDates: Create a Date Table if not present (Order_Date, Year, Month, Quarter).
To create a Date Table in Power Query:
Go to New Source > Blank Query.
Open Advanced Editor and paste:
let
StartDate = #date(2020, 1, 1),
EndDate = #date(2030, 12, 31),
DateList = List.Dates(StartDate, Number.From(EndDate - StartDate) + 1, #duration(1, 0, 0, 0)),
DateTable = Table.FromList(DateList, Splitter.SplitByNothing(), {"Date"}),
ChangedType = Table.TransformColumnTypes(DateTable, {{"Date", type date}})
in
ChangedType
Click Close & Apply.
4. Define Relationships in Power BI Model
Click Close & Apply to load the transformed data into Power BI.
Practical 10
Create the cube with suitable dimension and fact tables based on ROLAP, MOLAP and
HOLAP model.
Step 1.Click File -> New -> Project ->Business Intelligence Projects ->select Analysis Services
Project-> Assign Project Name -> Click OK.
Step 2. In Solution Explorer, Right click on Data Source -> Click New Data Source
Step 9. In the Solution Explorer, Right Click on Data Source View -> Click on New Data Source
View.
Step 10. Click Next.
Step 11. Select Relational Data Source we have created previously (Sales_DW)-> Click Next.
Step 12. First move your Fact Table to the right side to include in object list.
Step 13.Select Fact Table in Right Pane (Fact product Sales) -> Click On Add Related Tables.
Step 16. In Solution Explorer -> Right Click on Cube-> Click New Cube.
Step 17. Click Next.
Step 18. Select Option Use existing Tables -> Click Next.
Step 19. Select Fact Table Name from Measure Group Tables (FactProductSales) -> Click Next.
Step 20. Choose Measures from the List which you want to place in your Cube --> Click Next.
Step 21. Select All Dimensions here which are associated with your Fact Table-> Click Next
Step 23. Now your Cube is ready, you can see the newly created cube and dimensions added in
your solution explorer.
Step 24.In Solution Explorer, double click on dimension Dim Product -> Drag and Drop Product
Name from Table in Data Source View and Add in Attribute Pane at left side.
Step 25.Double click On Dim Date dimension -> Drag and Drop Fields from Table shown in
Data Source View to Attributes-> Drag and Drop attributes from leftmost pane of attributes to
middle pane of Hierarchy.
Step 26. In Solution Explorer, right click on Project Name (Analysis Services Project3)
Click Properties.
Step 27.In Configuration Properties, Select Deployment-> Assign Your SQL Server Instance
Name Where Analysis Services Is Installed (mubin-pc\fairy) (Machine Name\Instance Name) ->
Choose Deployment Mode Deploy All as of now ->Select Processing Option Do Not Process ->
Click OK.
Click Deploy.
Step 29. Once Deployment will finish, you can see the message Deployment Completed in
deployment Properties.
Step 30. Open SQL Server Configuration Server→Click on SQL Analysis Server→ Copy the
account name and close the window.
Step 33. Once processing is complete, you can see Status as Process Succeeded
-->Click Close to close both the open windows for processing one after the other.
Step 34. Now go to SQL Server Management Studio, disconnect the existing database and
connect to Analysis Services→ in the database folder you will see the 1stcube has been
successfully created.