0% found this document useful (0 votes)
28 views

BitcoinAnalysis - Ipynb - Colaboratory

The document presents a case study that aims to develop a model for forecasting Bitcoin closing prices from 2017 to 2019 minute-by-minute trading data. It details integrating the datasets, exploring and cleaning the data, converting it to a daily format, and beginning to analyze trends to build a predictive model.

Uploaded by

ramihameed2000
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

BitcoinAnalysis - Ipynb - Colaboratory

The document presents a case study that aims to develop a model for forecasting Bitcoin closing prices from 2017 to 2019 minute-by-minute trading data. It details integrating the datasets, exploring and cleaning the data, converting it to a daily format, and beginning to analyze trends to build a predictive model.

Uploaded by

ramihameed2000
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

1/13/24, 10:19 PM BitcoinAnalysis.

ipynb - Colaboratory

"An Analysis of Bitcoin Trading Data


from 2017-2019: A Brief Case Study
Demonstrating Expertise in Predictive
Modeling of Closing Prices" This case
study aims to present a streamlined approach for
modeling Bitcoin trading data on a minute-by-minute
basis, spanning the years 2017 to 2019. The primary
objective is to develop a straightforward yet effective
model to forecast Bitcoin's closing prices, showcasing
both a deep understanding of the cryptocurrency market
and proficiency in data analysis techniques.
from google.colab import drive
drive.mount('/content/drive')

"Initially, we commence by integrating the datasets from


2017, 2018, and 2019 into a singular comprehensive
dataset."

https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=th-qpph1JIDH&printMode=true 1/12
1/13/24, 10:19 PM BitcoinAnalysis.ipynb - Colaboratory
import pandas as pd

# File paths .
file_2017 = '/content/drive/MyDrive/BTC-2017min.csv'
file_2018 = '/content/drive/MyDrive/BTC-2018min.csv'
file_2019 = '/content/drive/MyDrive/BTC-2019min.csv'

# Load the datasets .


data_2017 = pd.read_csv(file_2017)
data_2018 = pd.read_csv(file_2018)
data_2019 = pd.read_csv(file_2019)

# Merging the datasets


merged_data = pd.concat([data_2017, data_2018, data_2019])

# Saving the merged dataset


merged_data.to_csv('/content/drive/My Drive/BTC_2017-2019_merged.csv', index=False)

data exploration and data cleaning


Column Data Types:
merged_data.dtypes

unix int64
date object
symbol object
open float64
high float64
low float64
close float64
Volume BTC float64
Volume USD float64
dtype: object

Changing data types for usability:

https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=th-qpph1JIDH&printMode=true 2/12
1/13/24, 10:19 PM BitcoinAnalysis.ipynb - Colaboratory
# Convert 'unix' to datetime
merged_data['unix'] = pd.to_datetime(merged_data['unix'], unit='s') # Assuming 'unix' is in seconds
# Convert 'date' to datetime
merged_data['date'] = pd.to_datetime(merged_data['date'])

# Convert 'symbol' to string


merged_data['symbol'] = merged_data['symbol'].astype('string')

# Check the data types again


merged_data.dtypes

unix datetime64[ns]
date datetime64[ns]
symbol string
open float64
high float64
low float64
close float64
Volume BTC float64
Volume USD float64
dtype: object

check unix and date if the same?


# Compare 'unix' and 'date'
# Creating a new column 'is_same' to check if 'unix' and 'date' are the same (up to seconds)
merged_data['is_same'] = merged_data['unix'].dt.floor('S') == merged_data['date'].dt.floor('S')

# Check the comparison results


print(merged_data[['unix', 'date', 'is_same']].head())
# unique values
merged_data['is_same'].unique()

unix date is_same


0 2017-12-31 23:59:00 2017-12-31 23:59:00 True
1 2017-12-31 23:58:00 2017-12-31 23:58:00 True
2 2017-12-31 23:57:00 2017-12-31 23:57:00 True
3 2017-12-31 23:56:00 2017-12-31 23:56:00 True
4 2017-12-31 23:55:00 2017-12-31 23:55:00 True
array([ True])

since unix and date are exact match , drop unix


# Drop 'unix' and 'is_same' columns
merged_data = merged_data.drop(columns=['unix', 'is_same'])
merged_data.head()

https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=th-qpph1JIDH&printMode=true 3/12
1/13/24, 10:19 PM BitcoinAnalysis.ipynb - Colaboratory

Volume
date symbol open high low close Volume USD
BTC

2017-12-31
0 BTC/USD 13913.28 13913.28 13867.18 13880.00 0.591748 8213.456549
23:59:00

2017-12-31
1 BTC/USD 13913.26 13953.83 13884.69 13953.77 1.398784 19518.309658
23:58:00

2017-12-31
2 BTC/USD 13908.73 13913.26 13874.99 13913.26 0.775012 10782.944294
23:57:00

Value Counts for symbol:


merged_data['symbol'].value_counts()

BTC/USD 1576797
Name: symbol, dtype: Int64

Unique Values in a Column:


merged_data['symbol'].nunique()

Correlation Matrix: To check the correlation between


different numerical columns:
merged_data.corr()

<ipython-input-102-cc54846d37e8>:1: FutureWarning: The default value of numeric_only in DataFrame


merged_data.corr()
open high low close Volume BTC Volume USD

open 1.000000 0.999997 0.999996 0.999995 0.027315 0.252212

high 0.999997 1.000000 0.999994 0.999997 0.028008 0.253042

low 0.999996 0.999994 1.000000 0.999996 0.026497 0.251236

close 0.999995 0.999997 0.999996 1.000000 0.027233 0.252119

Volume BTC 0.027315 0.028008 0.026497 0.027233 1.000000 0.831629

Volume USD 0.252212 0.253042 0.251236 0.252119 0.831629 1.000000

https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=th-qpph1JIDH&printMode=true 4/12
1/13/24, 10:19 PM BitcoinAnalysis.ipynb - Colaboratory

Sample of Data:
merged_data.sample(5)

Volume
date symbol open high low close Volume USD
BTC

2018-08-26
183380 BTC/USD 6693.25 6693.40 6691.06 6691.06 1.736926 11621.875212
15:39:00

2019-09-27
137022 BTC/USD 8008.00 8008.00 8008.00 8008.00 0.013155 105.347082
20:17:00

2017-02-20
452575 BTC/USD 1060.95 1060.95 1059.92 1059.92 0.028804 30.529830
17:04:00

Check for duplicates & Nulls entries in the data


merged_data.duplicated().sum()

merged_data.isnull().sum()

date 0
symbol 0
open 0
high 0
low 0
close 0
Volume BTC 0
Volume USD 0
dtype: int64

merged_data['date'].head(5)

0 2017-12-31 23:59:00
1 2017-12-31 23:58:00
2 2017-12-31 23:57:00
3 2017-12-31 23:56:00
4 2017-12-31 23:55:00
Name: date, dtype: datetime64[ns]

merged_data['date'].nunique()

1576797

https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=th-qpph1JIDH&printMode=true 5/12
1/13/24, 10:19 PM BitcoinAnalysis.ipynb - Colaboratory

"Next, we proceed to meticulously organize the minutely


data in chronological order. This sorting by date is crucial
for maintaining the integrity of the time series."
merged_data_sorted = merged_data.sort_values(by='date')

"we then transform the minutely data into a daily format.


This conversion is aimed at refining the analysis process.
Aggregating the data on a daily basis allows for a clearer,
more manageable overview of trends and patterns, which
is particularly beneficial for more effective and insightful
analysis."
import pandas as pd

# Convert 'date' to datetime if not already done


merged_data_sorted['date'] = pd.to_datetime(merged_data_sorted['date'])

# Set the 'date' column as the index


merged_data_sorted.set_index('date', inplace=True)

# Resample to daily data and aggregate


daily_data = merged_data_sorted.resample('D').agg({
'open': 'mean', # mean of open prices
'high': 'mean', # mean of high prices
'low': 'mean', # mean of low prices
'close': 'mean', # mean of close prices
'Volume BTC': 'sum', # sum of BTC volumes
'Volume USD': 'sum' # sum of USD volumes
})

reset index and see data


https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=th-qpph1JIDH&printMode=true 6/12
1/13/24, 10:19 PM BitcoinAnalysis.ipynb - Colaboratory

daily_data.reset_index(inplace=True)
daily_data.head(5)

date open high low close Volume BTC Volume USD

0 2017-01-01 977.256602 977.385233 977.132620 977.276060 6850.593309 6.765936e+06

1 2017-01-02 1012.267604 1012.517181 1011.988826 1012.273903 8167.381030 8.276031e+06

2 2017-01-03 1020.001535 1020.226840 1019.794437 1020.040472 9089.658025 9.276735e+06

3 2017-01-04 1076.558840 1077.271167 1075.572542 1076.553639 21562.456972 2.347651e+07

4 2017-01-05 1043.608646 1044.905549 1042.094125 1043.547951 36018.861120 3.619081e+07

the next step is to securely store this refined dataset. We


accomplish this by exporting the 'daily_data' into a CSV
file.
daily_data.to_csv('/content/drive/MyDrive/daily_data.csv', index = False)

Adding a moving average (or moving mean) to your


dataset is a common technique in time series analysis,
especially in financial data analysis. It helps in smoothing
out short-term fluctuations and highlighting longer-term
trends or cycles.

https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=th-qpph1JIDH&printMode=true 7/12
1/13/24, 10:19 PM BitcoinAnalysis.ipynb - Colaboratory
import pandas as pd

# Load your dataset


daily_data = pd.read_csv('/content/drive/MyDrive/daily_data.csv')
#set_index
daily_data.set_index('date', inplace=True)

# Choose a window size for the moving average, 20 days


window_size = 20

# Calculate the moving average for the 'close' price


daily_data['moving_average_close'] = daily_data['close'].rolling(window=window_size).mean()

# Now your daily_data DataFrame has an additional column with the 20-day moving average of the close price
daily_data.reset_index('date', inplace=True)
#replace first 19 rows of null because of the 20 window.
#used this approach , using close price as defauly price.
daily_data['moving_average_close'].fillna(daily_data['close'], inplace=True)
print(daily_data.head(25)) # Displaying the first 25 rows to see some of the moving averages

date open high low close \


0 2017-01-01 977.256602 977.385233 977.132620 977.276060
1 2017-01-02 1012.267604 1012.517181 1011.988826 1012.273903
2 2017-01-03 1020.001535 1020.226840 1019.794437 1020.040472
3 2017-01-04 1076.558840 1077.271167 1075.572542 1076.553639
4 2017-01-05 1043.608646 1044.905549 1042.094125 1043.547951
5 2017-01-06 934.455278 935.419188 933.269312 934.416729
6 2017-01-07 869.618951 870.700465 868.904215 869.738333
7 2017-01-08 914.224917 914.637931 913.597944 913.966083
8 2017-01-09 893.495403 893.856319 893.047132 893.471535
9 2017-01-10 902.637313 902.858104 902.343042 902.638375
10 2017-01-11 846.285701 847.306472 845.153167 846.173313
11 2017-01-12 782.970292 783.542347 782.386604 782.961688
12 2017-01-13 807.222361 807.674451 806.778986 807.177507
13 2017-01-14 827.433958 827.573681 827.271819 827.412431
14 2017-01-15 817.127076 817.298757 816.911528 817.081007
15 2017-01-16 827.983313 828.105757 827.839722 827.977958
16 2017-01-17 876.174264 876.640868 875.760896 876.181472
17 2017-01-18 885.380229 885.695486 885.050083 885.345653
18 2017-01-19 893.326090 893.591750 893.053882 893.294389
19 2017-01-20 895.566618 895.695965 895.415535 895.552688
20 2017-01-21 917.654201 917.797965 917.532965 917.679382
21 2017-01-22 922.678097 922.897590 922.452764 922.689111
22 2017-01-23 920.178285 920.327340 919.997403 920.188507
23 2017-01-24 907.212306 907.418618 906.928014 907.186326
24 2017-01-25 894.505562 894.616382 894.376438 894.509347

Volume BTC Volume USD moving_average_close


0 6850.593309 6.765936e+06 977.276060
1 8167.381030 8.276031e+06 1012.273903
2 9089.658025 9.276735e+06 1020.040472
3 21562.456972 2.347651e+07 1076.553639
4 36018.861120 3.619081e+07 1043.547951
5 27916.703099 2.553144e+07 934.416729
6 20401.113591 1.761907e+07 869.738333
7 8937.492708 8.164011e+06 913.966083

https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=th-qpph1JIDH&printMode=true 8/12
1/13/24, 10:19 PM BitcoinAnalysis.ipynb - Colaboratory
8 8716.182941 7.782149e+06 893.471535
9 8535.521688 7.706384e+06 902.638375
10 35893.768368 2.945219e+07 846.173313
11 17400.141555 1.363246e+07 782.961688
12 11409.520330 9.224971e+06 807.177507
13 6614.718992 5.469742e+06 827.412431
14 4231.463903 3.454909e+06 817.081007
15 6166.043977 5.107435e+06 827.977958
16 12264.169385 1.077497e+07 876.181472
17 11181.898878 9.830026e+06 885.345653
18 11094.603298 9.928565e+06 893.294389
19 6618.627764 5.915721e+06 905.154059
20 5865.632031 5.373761e+06 902.174225
21 7166.665479 6.566289e+06 897.694986
22 3514.741429 3.234650e+06 892.702387
23 9405.046565 8.497003e+06 884.234022
24 5291.554742 4.725942e+06 876.782092

daily_data['date'].head()

0 2017-01-01
1 2017-01-02
2 2017-01-03
3 2017-01-04
4 2017-01-05
Name: date, dtype: object

daily_data.head()

date open high low close Volume BTC Volume USD moving_ave

2017-
0 977.256602 977.385233 977.132620 977.276060 6850.593309 6.765936e+06
01-01

2017-
1 1012.267604 1012.517181 1011.988826 1012.273903 8167.381030 8.276031e+06 1
01-02

2017-
2 1020.001535 1020.226840 1019.794437 1020.040472 9089.658025 9.276735e+06 1
01-03

2017-
3 1076.558840 1077.271167 1075.572542 1076.553639 21562.456972 2.347651e+07 1
01-04

2017-
4 1043.608646 1044.905549 1042.094125 1043.547951 36018.861120 3.619081e+07 1
01-05

"After preparing and saving the daily data, we shift our


focus to developing a predictive model. For this purpose,
a Linear Regression model is chosen due to its
https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=th-qpph1JIDH&printMode=true 9/12
1/13/24, 10:19 PM BitcoinAnalysis.ipynb - Colaboratory

effectiveness in capturing linear relationships between


variables. In this context, the model will be employed to
understand and predict the closing prices of Bitcoin
based on the daily data trends observed over the previous
years."
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error
import numpy as np

# Separate features and target


X = daily_data[['open', 'high', 'low', 'Volume BTC', 'Volume USD', 'moving_average_close']]
y = daily_data['close']

# Split the data


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the model


model = LinearRegression()
model.fit(X_train, y_train)

# Predict and evaluate


predictions = model.predict(X_test)
mae = mean_absolute_error(y_test, predictions)
rmse = np.sqrt(mean_squared_error(y_test, predictions))

print(f'Mean Absolute Error: {mae}')


print(f'Root Mean Squared Error: {rmse}')

Mean Absolute Error: 0.33095838248461557


Root Mean Squared Error: 0.540670528986689

Scaling the entire dataset is an important preprocessing


step, especially in regression analysis where features
might have different scales and units. This can
https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=th-qpph1JIDH&printMode=true 10/12
1/13/24, 10:19 PM BitcoinAnalysis.ipynb - Colaboratory

significantly impact the performance of many machine


learning algorithms, including linear regression.
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np

# Initialize the StandardScaler


scaler = StandardScaler()

# Scale the training data and also apply the same transformation to the test data
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Initialize the Linear Regression model


linear_model = LinearRegression()

# Train the model using the scaled training data


linear_model.fit(X_train_scaled, y_train)

# Predict using the scaled test data


scaled_predictions = linear_model.predict(X_test_scaled)

# Calculate Mean Absolute Error and Root Mean Squared Error


scaled_mae = mean_absolute_error(y_test, scaled_predictions)
scaled_rmse = np.sqrt(mean_squared_error(y_test, scaled_predictions))

# Print the performance metrics


print(f'Scaled Mean Absolute Error: {scaled_mae}')
print(f'Scaled Root Mean Squared Error: {scaled_rmse}')

Scaled Mean Absolute Error: 0.33095838193443516


Scaled Root Mean Squared Error: 0.5406705297316674

In summary, my implementation of both the unscaled and


scaled Linear Regression models is correct. The results
indicate that feature scaling did not significantly alter the
model's predictive accuracy in this specific scenario. This

https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=th-qpph1JIDH&printMode=true 11/12
1/13/24, 10:19 PM BitcoinAnalysis.ipynb - Colaboratory

can be a valuable insight into the nature of your dataset


and the model's behavior with respect to feature scaling.

"This concludes the current


demonstration of my analytical skills.
It was an exercise aimed at

https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=th-qpph1JIDH&printMode=true 12/12

You might also like