0% found this document useful (0 votes)
22 views

BitcoinAnalysis - Ipynb - Colaboratory

The document presents a case study that aims to model Bitcoin trading data from 2017-2019 on a minute-by-minute basis to forecast closing prices. It details integrating the datasets, exploring and cleaning the data, converting it to a daily format, and developing a model to predict closing prices. The goal is to demonstrate expertise in predictive modeling of cryptocurrency markets through a straightforward yet effective approach.

Uploaded by

ramihameed2000
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

BitcoinAnalysis - Ipynb - Colaboratory

The document presents a case study that aims to model Bitcoin trading data from 2017-2019 on a minute-by-minute basis to forecast closing prices. It details integrating the datasets, exploring and cleaning the data, converting it to a daily format, and developing a model to predict closing prices. The goal is to demonstrate expertise in predictive modeling of cryptocurrency markets through a straightforward yet effective approach.

Uploaded by

ramihameed2000
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

1/13/24, 10:19 PM BitcoinAnalysis.

ipynb - Colaboratory

"An Analysis of Bitcoin Trading Data


from 2017-2019: A Brief Case Study
Demonstrating Expertise in Predictive
Modeling of Closing Prices" This case study
aims to present a streamlined approach for modeling Bitcoin
trading data on a minute-by-minute
basis, spanning the years 2017 to 2019. The primary
objective is to develop a straightforward yet effective model
to forecast Bitcoin's closing prices, showcasing
both a deep understanding of the cryptocurrency market and
proficiency in data analysis techniques.
from google.colab import
drive
drive.mount('/content/drive')

"Initially, we commence by integrating the datasets from


2017, 2018, and 2019 into a singular comprehensive
dataset."

https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=th-qpph1JIDH&printMode=true 1/12
1/13/24, 10:19 PM BitcoinAnalysis.ipynb - Colaboratory
import pandas as pd

file_2017
# File paths . = '/content/drive/MyDrive/BTC-
2017min.csv' file_2018 =
'/content/drive/MyDrive/BTC-2018min.csv' file_2019
= '/content/drive/MyDrive/BTC-2019min.csv'

data_2017
# Load the datasets . =
pd.read_csv(file_2017) data_2018
= pd.read_csv(file_2018)
data_2019 =

merged_data
# = datasets
Merging the pd.concat([data_2017, data_2018, data_2019])

merged_data.to_csv('/content/drive/My
# Saving the merged dataset Drive/BTC_2017-2019_merged.csv',
index=False)

data exploration and data cleaning


Column Data Types:
merged_data.dtypes

unix int64
date object
symbol object
open float64
high float64
low float64
close float64
Volume BTC float64
Volume USD float64
dtype: object

Changing data types for usability:

https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=th-qpph1JIDH&printMode=true 2/12
1/13/24, 10:19 PM BitcoinAnalysis.ipynb - Colaboratory
# Convert 'unix' to datetime
merged_data['unix'] = pd.to_datetime(merged_data['unix'], # Assuming 'unix' is in seconds
unit='s')
# Convert 'date' to datetime
merged_data['date'] = pd.to_datetime(merged_data['date'])

merged_data['symbol']
# =
Convert 'symbol' to string
merged_data['symbol'].astype('string')

merged_data.dtypes
# Check the data types again

unix datetime64[ns]
date datetime64[ns]
symbol string
open float64
high float64
low float64
close float64
Volume BTC float64
Volume USD float64
dtype: object

check unix and date if the same?


Creating'unix'
# Compare a new and
column 'is_same' to check if 'unix' and 'date' are the same (up to
'date'
seconds)
merged_data['is_same'] = merged_data['unix'].dt.floor('S') ==
merged_data['date'].dt.floor('S')
print(merged_data[['unix',
# 'date', 'is_same']].head())
Check the comparison results

# unique values
merged_data['is_same'].unique()

unix date is_same


0 2017-12-31 23:59:00 2017-12-31 23:59:00 True
1 2017-12-31 23:58:00 2017-12-31 23:58:00 True
2 2017-12-31 23:57:00 2017-12-31 23:57:00 True
3 2017-12-31 23:56:00 2017-12-31 23:56:00 True
4 2017-12-31 23:55:00 2017-12-31 23:55:00 True
array([ True])

since unix and date are exact match , drop unix


merged_data
# = and
Drop 'unix' merged_data.drop(columns=['unix',
'is_same' columns 'is_same'])

merged_data.head()

https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=th-qpph1JIDH&printMode=true 3/12
1/13/24, 10:19 PM BitcoinAnalysis.ipynb - Colaboratory

Volume
date symbol open high low close
BTC Volume USD

2017-12-31
0 BTC/USD 13913.28 13913.28 13867.18 13880.00 0.591748 8213.456549
23:59:00
2017-12-31
1 BTC/USD 13913.26 13953.83 13884.69 13953.77 1.398784 19518.309658
23:58:00
2017-12-31
2 BTC/USD 13908.73 13913.26 13874.99 13913.26 0.775012 10782.944294
23:57:00

Value Counts for symbol:


merged_data['symbol'].value_counts()

BTC/USD 1576797
Name: symbol, dtype: Int64

Unique Values in a Column:


merged_data['symbol'].nunique()

Correlation Matrix: To check the correlation between


different numerical columns:
merged_data.corr()

<ipython-input-102-cc54846d37e8>:1: FutureWarning: The default value of numeric_only in


DataFrame merged_data.corr()
open high low close Volume BTC Volume USD

open 1.000000 0.999997 0.999996 0.999995 0.027315 0.252212

high 0.999997 1.000000 0.999994 0.999997 0.028008 0.253042

low 0.999996 0.999994 1.000000 0.999996 0.026497 0.251236

close 0.999995 0.999997 0.999996 1.000000 0.027233 0.252119

Volume BTC 0.027315 0.028008 0.026497 0.027233 1.000000 0.831629

Volume USD 0.252212 0.253042 0.251236 0.252119 0.831629 1.000000

https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=th-qpph1JIDH&printMode=true 4/12
1/13/24, 10:19 PM BitcoinAnalysis.ipynb - Colaboratory

Sample of Data:
merged_data.sample(5)

Volume
date symbol open high low close
BTC Volume USD

2018-08-26
183380 BTC/USD 6693.25 6693.40 6691.06 6691.06 1.736926 11621.875212
15:39:00
2019-09-27
137022 BTC/USD 8008.00 8008.00 8008.00 8008.00 0.013155 105.347082
20:17:00
2017-02-20
452575 BTC/USD 1060.95 1060.95 1059.92 1059.92 0.028804 30.529830
17:04:00

Check for duplicates & Nulls entries in the data


merged_data.duplicated().sum()

merged_data.isnull().sum()

date 0
symbol 0
open 0
high 0
low 0
close 0
Volume BTC 0
Volume USD 0
dtype: int64

merged_data['date'].head(5)

0 2017-12-31 23:59:00
1 2017-12-31 23:58:00
2 2017-12-31 23:57:00
3 2017-12-31 23:56:00
4 2017-12-31 23:55:00
Name: date, dtype: datetime64[ns]

merged_data['date'].nunique()

1576797

https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=th-qpph1JIDH&printMode=true 5/12
1/13/24, 10:19 PM BitcoinAnalysis.ipynb - Colaboratory

"Next, we proceed to meticulously organize the minutely


data in chronological order. This sorting by date is crucial
for maintaining the integrity of the time series."
merged_data_sorted = merged_data.sort_values(by='date')

"we then transform the minutely data into a daily format. This
conversion is aimed at refining the analysis process.
Aggregating the data on a daily basis allows for a clearer, more
manageable overview of trends and patterns, which is
particularly beneficial for more effective and insightful
analysis."
import pandas as pd

merged_data_sorted['date']
# = if not already done
Convert 'date' to datetime
pd.to_datetime(merged_data_sorted['date'])

merged_data_sorted.set_index('date',
# Set the 'date' column as the index inplace=True)

daily_data
# = merged_data_sorted.resample('D').agg({
Resample to daily data and aggregate

'open':
'close':
'low': 'mean',
'mean',
'mean', #
# mean
mean of
of open
close
low prices
prices
'high': 'mean', prices # mean of
# sum
'Volume BTC': 'sum',high of BTC
prices
'Volume USD': 'sum' volumes # sum of
}) USD volumes

reset index and see data

https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=th-qpph1JIDH&printMode=true 6/12
1/13/24, 10:19 PM BitcoinAnalysis.ipynb - Colaboratory

daily_data.reset_index(inplace=True)
daily_data.head(5)

date open high low close Volume BTC Volume USD

0 2017-01-01 977.256602 977.385233 977.132620 977.276060 6850.593309 6.765936e+06

1 2017-01-02 1012.267604 1012.517181 1011.988826 1012.273903 8167.381030 8.276031e+06

2 2017-01-03 1020.001535 1020.226840 1019.794437 1020.040472 9089.658025 9.276735e+06

3 2017-01-04 1076.558840 1077.271167 1075.572542 1076.553639 21562.456972 2.347651e+07

4 2017-01-05 1043.608646 1044.905549 1042.094125 1043.547951 36018.861120 3.619081e+07

the next step is to securely store this refined dataset. We


accomplish this by exporting the 'daily_data' into a CSV
file.
daily_data.to_csv('/content/drive/MyDrive/daily_data.csv', index =
False)

Adding a moving average (or moving mean) to your dataset


is a common technique in time series analysis,
especially in financial data analysis. It helps in smoothing
out short-term fluctuations and highlighting longer-term trends
or cycles.

https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=th-qpph1JIDH&printMode=true 7/12
1/13/24, 10:19 PM BitcoinAnalysis.ipynb - Colaboratory
import pandas as pd

daily_data
# Load your =dataset
pd.read_csv('/content/drive/MyDrive/daily_data.csv')
#set_index
daily_data.set_index('date', inplace=True)

window_size
# = 20 size for the moving average, 20 days
Choose a window

daily_data['moving_average_close']
# = daily_data['close'].rolling(window=window_size).mean()
Calculate the moving average for the 'close' price

daily_data.reset_index('date',
# Now your daily_data DataFrame inplace=True)
has an additional column with the 20-day moving average of the close
price
#replace first 19 rows of null because of the 20 window.
daily_data['moving_average_close'].fillna(daily_data['close'],
#used this approach , using close price as defauly price. inplace=True)

print(daily_data.head(25)) # Displaying the first 25 rows to see some of the moving


averages

date open high low close \


0 2017-01-01 977.256602 977.385233 977.132620 977.276060
1 2017-01-02 1012.267604 1012.517181 1011.988826 1012.273903
2 2017-01-03 1020.001535 1020.226840 1019.794437 1020.040472
3 2017-01-04 1076.558840 1077.271167 1075.572542 1076.553639
4 2017-01-05 1043.608646 1044.905549 1042.094125 1043.547951
5 2017-01-06 934.455278 935.419188 933.269312 934.416729
6 2017-01-07 869.618951 870.700465 868.904215 869.738333
7 2017-01-08 914.224917 914.637931 913.597944 913.966083
8 2017-01-09 893.495403 893.856319 893.047132 893.471535
9 2017-01-10 902.637313 902.858104 902.343042 902.638375
10 2017-01-11 846.285701 847.306472 845.153167 846.173313
11 2017-01-12 782.970292 783.542347 782.386604 782.961688
12 2017-01-13 807.222361 807.674451 806.778986 807.177507
13 2017-01-14 827.433958 827.573681 827.271819 827.412431
14 2017-01-15 817.127076 817.298757 816.911528 817.081007
15 2017-01-16 827.983313 828.105757 827.839722 827.977958
16 2017-01-17 876.174264 876.640868 875.760896 876.181472
17 2017-01-18 885.380229 885.695486 885.050083 885.345653
18 2017-01-19 893.326090 893.591750 893.053882 893.294389
19 2017-01-20 895.566618 895.695965 895.415535 895.552688
20 2017-01-21 917.654201 917.797965 917.532965 917.679382
21 2017-01-22 922.678097 922.897590 922.452764 922.689111
22 2017-01-23 920.178285 920.327340 919.997403 920.188507
23 2017-01-24 907.212306 907.418618 906.928014 907.186326
24 2017-01-25 894.505562 894.616382 894.376438 894.509347

Volume BTC Volume USD moving_average_close


0 6850.593309 6.765936e+06 977.276060
1 8167.381030 8.276031e+06 1012.273903
2 9089.658025 9.276735e+06 1020.040472
3 21562.456972 2.347651e+07 1076.553639
4 36018.861120 3.619081e+07 1043.547951
5 27916.703099 2.553144e+07 934.416729
6 20401.113591 1.761907e+07 869.738333
7 8937.492708 8.164011e+06 913.966083

https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=th-qpph1JIDH&printMode=true 8/12
1/13/24, 10:19 PM BitcoinAnalysis.ipynb - Colaboratory
8 8716.182941 7.782149e+06 893.471535
9 8535.521688 7.706384e+06 902.638375
10 35893.768368 2.945219e+07 846.173313
11 17400.141555 1.363246e+07 782.961688
12 11409.520330 9.224971e+06 807.177507
13 6614.718992 5.469742e+06 827.412431
14 4231.463903 3.454909e+06 817.081007
15 6166.043977 5.107435e+06 827.977958
16 12264.169385 1.077497e+07 876.181472
17 11181.898878 9.830026e+06 885.345653
18 11094.603298 9.928565e+06 893.294389
19 6618.627764 5.915721e+06 905.154059
20 5865.632031 5.373761e+06 902.174225
21 7166.665479 6.566289e+06 897.694986
22 3514.741429 3.234650e+06 892.702387
23 9405.046565 8.497003e+06 884.234022
24 5291.554742 4.725942e+06 876.782092

daily_data['date'].head()

0 2017-01-01
1 2017-01-02
2 2017-01-03
3 2017-01-04
4 2017-01-05
Name: date, dtype: object

daily_data.head()

date open high low close Volume BTC Volume USD moving_ave
2017-
0 977.256602 977.385233 977.132620 977.276060 6850.593309 6.765936e+06
01-01

2017- 1012.267604 1012.517181 1011.988826 1012.273903 8167.381030 8.276031e+06


1 01-02

2017-
2 01-03 1020.001535 1020.226840 1019.794437 1020.040472 9089.658025 9.276735e+06

2017-
3 01-04 1076.558840 1077.271167 1075.572542 1076.553639 21562.456972 2.347651e+07

2017-
4 01-05 1043.608646 1044.905549 1042.094125 1043.547951 36018.861120 3.619081e+07

"After preparing and saving the daily data, we shift our


focus to developing a predictive model. For this purpose, a
Linear Regression model is chosen due to its

https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=th-qpph1JIDH&printMode=true 9/12
1/13/24, 10:19 PM BitcoinAnalysis.ipynb - Colaboratory

effectiveness in capturing linear relationships between


variables. In this context, the model will be employed to
understand and predict the closing prices of Bitcoin
based on the daily data trends observed over the previous
years."
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error,
mean_absolute_error
import numpy as np

X Separate
# = daily_data[['open', 'high', 'low', 'Volume BTC', 'Volume USD',
features and target
'moving_average_close']]
y = daily_data['close']

X_train,
# X_test,
Split the data y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

model
# = LinearRegression()
Initialize and train the model

model.fit(X_train, y_train)

# Predict and evaluate


predictions = model.predict(X_test)
mae = mean_absolute_error(y_test, predictions)
rmse = np.sqrt(mean_squared_error(y_test, predictions))

print(f'Root Absolute
print(f'Mean Mean Squared Error:
Error: {rmse}')
{mae}')

Mean Absolute Error: 0.33095838248461557


Root Mean Squared Error: 0.540670528986689

Scaling the entire dataset is an important preprocessing step,


especially in regression analysis where features might have
different scales and units. This can

https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=th-qpph1JIDH&printMode=true 10/12
1/13/24, 10:19 PM BitcoinAnalysis.ipynb - Colaboratory

significantly impact the performance of many machine


learning algorithms, including linear regression.
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error,
mean_squared_error
import numpy as np

scaler
# = StandardScaler()
Initialize the StandardScaler

X_train_scaled
# = scaler.fit_transform(X_train)
Scale the training data and also apply the same transformation to the test
data
X_test_scaled = scaler.transform(X_test)

linear_model
# = LinearRegression()
Initialize the Linear Regression model

linear_model.fit(X_train_scaled,
# y_train)
Train the model using the scaled training data

scaled_predictions
# Predict using the =scaled
linear_model.predict(X_test_scaled)
test data

scaled_mae
# Calculate =Mean
mean_absolute_error(y_test,
Absolute Error and Root Mean Squared
scaled_predictions)
Error
scaled_rmse = np.sqrt(mean_squared_error(y_test,
scaled_predictions))
print(f'Scaled
# Mean Absolute
Print the performance Error: {scaled_mae}')
metrics

print(f'Scaled Root Mean Squared Error: {scaled_rmse}')

Scaled Mean Absolute Error: 0.33095838193443516


Scaled Root Mean Squared Error: 0.5406705297316674

In summary, my implementation of both the unscaled and


scaled Linear Regression models is correct. The results indicate
that feature scaling did not significantly alter the model's
predictive accuracy in this specific scenario. This

https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=th-qpph1JIDH&printMode=true 11/12
1/13/24, 10:19 PM BitcoinAnalysis.ipynb - Colaboratory

can be a valuable insight into the nature of your dataset and


the model's behavior with respect to feature scaling.

"This concludes the current


demonstration of my analytical skills.

https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=th-qpph1JIDH&printMode=true 12/12

You might also like