-
Notifications
You must be signed in to change notification settings - Fork 848
Open
Description
SynapseML version
1.0.10
System information
- Language version (e.g. python 3.8, scala 2.12): python 3.9
- Spark Version (e.g. 3.2.3): 3.5.1
- Spark Platform (e.g. Synapse, Databricks): AWS EMR Release 7.3.1
Describe the problem
Now I would like to try to install SynapseML to EMR for pyspark. If we execute configuration based on the below command on Jupyter notebooks that is work.
%%configure -f
{
"name": "synapseml",
"conf": {
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:1.0.9-spark3.5",
"spark.jars.repositories": "https://mmlspark.azureedge.net/maven"
}
}
But in production, we don't use Jupyter notebooks. Therefore, we first download corresponding jars from maven repository and copy to the path /usr/lib/spark/jars
on EMR and do not work and show com.microsoft.azure.synapse.ml.isolationforest.IsolationForest does not exist in the JVM
Have anyone know what is the root cause result in this? Thank you.
Code to reproduce issue
from synapse.ml.isolationforest import IsolationForest
# print(type(IsolationForest))
hyper_params = {
'n_estimators': 100,
'max_samples': 32
'max_features': 1,
'bootstrap': False,
'contamination': 0.1,
}
isolation_forest_model = (
IsolationForest()
.setNumEstimators(hyper_params["n_estimators"])
.setBootstrap(hyper_params["bootstrap"])
.setMaxSamples(hyper_params["max_samples"])
.setMaxFeatures(hyper_params["max_features"])
.setFeaturesCol("features")
.setPredictionCol("predictedLabel")
.setScoreCol("outlierScore")
.setContamination(hyper_params["contamination"])
.setContaminationError(0.01 * hyper_params["contamination"])
)
Other info / logs
An error was encountered:
com.microsoft.azure.synapse.ml.isolationforest.IsolationForest does not exist in the JVM
Traceback (most recent call last):
File "/mnt1/yarn/usercache/livy/appcache/application_1742368398137_0002/container_1742368398137_0002_01_000001/pyspark.zip/pyspark/__init__.py", line 139, in wrapper
return func(self, **kwargs)
File "/mnt1/yarn/usercache/livy/appcache/application_1742368398137_0002/container_1742368398137_0002_01_000001/com.microsoft.azure_synapseml-core_2.12-1.0.9-spark3.5.jar/synapse/ml/isolationforest/IsolationForest.py", line 78, in __init__
self._java_obj = self._new_java_obj("com.microsoft.azure.synapse.ml.isolationforest.IsolationForest", self.uid)
File "/mnt1/yarn/usercache/livy/appcache/application_1742368398137_0002/container_1742368398137_0002_01_000001/pyspark.zip/pyspark/ml/wrapper.py", line 84, in _new_java_obj
java_obj = getattr(java_obj, name)
File "/mnt1/yarn/usercache/livy/appcache/application_1742368398137_0002/container_1742368398137_0002_01_000001/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1664, in __getattr__
raise Py4JError("{0} does not exist in the JVM".format(new_fqn))
py4j.protocol.Py4JError: com.microsoft.azure.synapse.ml.isolationforest.IsolationForest does not exist in the JVM
What component(s) does this bug affect?
-
area/cognitive
: Cognitive project -
area/core
: Core project -
area/deep-learning
: DeepLearning project -
area/lightgbm
: Lightgbm project -
area/opencv
: Opencv project -
area/vw
: VW project -
area/website
: Website -
area/build
: Project build system -
area/notebooks
: Samples under notebooks folder -
area/docker
: Docker usage -
area/models
: models related issue
What language(s) does this bug affect?
-
language/scala
: Scala source code -
language/python
: Pyspark APIs -
language/r
: R APIs -
language/csharp
: .NET APIs -
language/new
: Proposals for new client languages
What integration(s) does this bug affect?
-
integrations/synapse
: Azure Synapse integrations -
integrations/azureml
: Azure ML integrations -
integrations/databricks
: Databricks integrations