-
Notifications
You must be signed in to change notification settings - Fork 48
feat: add ml.metrics.pairwise.cosine_similarity function #374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Input data. X and Y are mapped by indexes, must have the same index. | ||
|
||
Returns: | ||
DataFrame with columns of X, Y and cosine_similarity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the type hint as well: bigframes.dataframe.DataFrame: DataFrame with columns of X, Y and cosine_similarity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -84,6 +62,59 @@ def _apply_sql( | |||
|
|||
return df | |||
|
|||
def distance( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add docstring here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
import bigframes.pandas as bpd | ||
|
||
|
||
def test_cosine_similarity(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this ML.DISTANCE COSINE in BQML is different from sklearn package sklearn.metrics.pairwise import cosine_similarity
, are these two comparable?
from sklearn.metrics.pairwise import cosine_similarity
X = [[4.1, 0.5, 1.0]]
Y = [[3.0, 0, 2.5]]
cosine_similarity(X, Y)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally different, sklearn takes matrixes and returns matrixes, which we will be hard to support.
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕