0% found this document useful (0 votes)
67 views137 pages

Actual 4test DP 100 Practices 3

The document contains a series of questions and answers related to the DP-100 exam, focusing on designing and implementing data science solutions on Azure. It includes multiple-choice questions about selecting appropriate Azure services, configuring machine learning models, and handling data imbalances. The document also provides explanations and references for the answers to help users understand the concepts better.

Uploaded by

MUDATHIR YOUSIF
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views137 pages

Actual 4test DP 100 Practices 3

The document contains a series of questions and answers related to the DP-100 exam, focusing on designing and implementing data science solutions on Azure. It includes multiple-choice questions about selecting appropriate Azure services, configuring machine learning models, and handling data imbalances. The document also provides explanations and references for the answers to help users understand the concepts better.

Uploaded by

MUDATHIR YOUSIF
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 137

Microsoft.DP-100.v2022-01-04.

q114

Exam Code: DP-100


Exam Name: Designing and Implementing a Data Science Solution on Azure
Certification Provider: Microsoft
Free Question Number: 114
Version: v2022-01-04
# of views: 113
# of Questions views: 1140
https://www.freepdfdumps.com/Microsoft.DP-100.v2022-01-04.q114.html

NEW QUESTION: 1
You are developing deep learning models to analyze semi-structured, unstructured, and structured data
types.
You have the following data available for model building:
* Video recordings of sporting events
* Transcripts of radio commentary about events
* Logs from related social media feeds captured during sporting events
You need to select an environment for creating the model.
Which environment should you use?
A. Azure Cognitive Services
B. Azure Data Lake Analytics
C. Azure HDInsight with Spark MLib
D. Azure Machine Learning Studio
Answer: A (LEAVE A REPLY)
Explanation
Azure Cognitive Services expand on Microsoft's evolving portfolio of machine learning APIs and enable
developers to easily add cognitive features - such as emotion and video detection; facial, speech, and vision
recognition; and speech and language understanding - into their applications. The goal of Azure Cognitive
Services is to help developers create applications that can see, hear, speak, understand, and even begin to
reason. The catalog of services within Azure Cognitive Services can be categorized into five main pillars -
Vision, Speech, Language, Search, and Knowledge.
References:
https://docs.microsoft.com/en-us/azure/cognitive-services/welcome

NEW QUESTION: 2
You use Azure Machine Learning to train and register a model.
You must deploy the model into production as a real-time web service to an inference cluster named service-
compute that the IT department has created in the Azure Machine Learning workspace.
Client applications consuming the deployed web service must be authenticated based on their Azure Active
Directory service principal.
You need to write a script that uses the Azure Machine Learning SDK to deploy the model. The necessary
modules have been imported.
How should you complete the code? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.

Answer:

Explanation:
Box 1: AksCompute
Example:
aks_target = AksCompute(ws,"myaks")
# If deploying to a cluster configured for dev/test, ensure that it was created with enough
# cores and memory to handle this deployment configuration. Note that memory is also used by
# things such as dependencies and AML components.
deployment_config = AksWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1) service =
Model.deploy(ws, "myservice", [model], inference_config, deployment_config, aks_target) Box 2:
AksWebservice Box 3: token_auth_enabled=Yes Whether or not token auth is enabled for the Webservice.
Note: A Service principal defined in Azure Active Directory (Azure AD) can act as a principal on which
authentication and authorization policies can be enforced in Azure Databricks.
The Azure Active Directory Authentication Library (ADAL) can be used to programmatically get an Azure AD
access token for a user.
Incorrect Answers:
auth_enabled (bool): Whether or not to enable key auth for this Webservice. Defaults to True.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-azure-kubernetes-service
https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/aad/service-prin-aad-token

NEW QUESTION: 3
You need to identify the methods for dividing the data according, to the testing requirements.
Which properties should you select? To answer, select the appropriate option-, m the answer are a. NOTE:
Each correct selection is worth one point.

Answer:
NEW QUESTION: 4
You create a binary classification model using Azure Machine Learning Studio.
You must use a Receiver Operating Characteristic (RO C) curve and an F1 score to evaluate the model.
You need to create the required business metrics.
How should you complete the experiment? To answer, select the appropriate options in the dialog box in the
answer area.
NOTE: Each correct selection is worth one point.
Answer:
NEW QUESTION: 5
You need to define a modeling strategy for ad response.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the list
of actions to the answer area and arrange them in the correct order.

Answer:
1 - Implement a K-Means Clustering model.
2 - Use the cluster as a feature in a Decision Jungle model.
3 - Use the raw score as a feature in a Score Matchbox Recommender model.

NEW QUESTION: 6
You are developing a hands-on workshop to introduce Docker for Windows to attendees.
You need to ensure that workshop attendees can install Docker on their devices.
Which two prerequisite components should attendees install on the devices? Each correct answer presents
part of the solution.
NOTE: Each correct selection is worth one point.
A. Microsoft Hardware-Assisted Virtualization Detection Tool
B. Kitematic
C. BIOS-enabled virtualization
D. VirtualBox
E. Windows 10 64-bit Professional
Answer: C,E (LEAVE A REPLY)
Explanation
C: Make sure your Windows system supports Hardware Virtualization Technology and that virtualization is
enabled.
Ensure that hardware virtualization support is turned on in the BIOS settings. For example:

E: To run Docker, your machine must have a 64-bit operating system running Windows 7 or higher.
References:
https://docs.docker.com/toolbox/toolbox_install_windows/
https://blogs.technet.microsoft.com/canitpro/2015/09/08/step-by-step-enabling-hyper-v-for-use-on-
windows-10/
NEW QUESTION: 7
Note: This question is part of a series of questions that present the same scenario. Each question in the
series contains a unique solution that might meet the stated goals. Some question sets might have more than
one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions
will not appear in the review screen.
You are using Azure Machine learning Studio to perform feature engineering on a dataset. You need to
normalize values to produce a feature column grouped into bins.
Solution: Apply an Entropy Minimum Description Length (MDI) binning mode.
Does the solution meet the goal?
A. Yes
B. No
Answer: (SHOW ANSWER)

NEW QUESTION: 8
You are performing a classification task in Azure Machine learning Studio.
You must prepare balanced testing and training samples based on a provided data set.
Warning samples based on a provided data set.
You need to split the data with a 0.75:0.25.
Which value should you use for each parameter? To answer, select the appropriate options m the answer
area.
NOTE: Each correct selection is worth one point.

Answer:
NEW QUESTION: 9
You have a dataset that includes home sales data for a city. The dataset includes the following columns.

Each row in the dataset corresponds to an individual home sales transaction.


You need to use automated machine learning to generate the best model for predicting the sales price based
on the features of the house.
Which values should you use? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Answer:

Explanation:
Box 1: Regression
Regression is a supervised machine learning technique used to predict numeric values.
Box 2: Price
Reference:
https://docs.microsoft.com/en-us/learn/modules/create-regression-model-azure-machine-learning-designer

NEW QUESTION: 10
You have a model with a large difference between the training and validation error values.
You must create a new model and perform cross-validation.
You need to identify a parameter set for the new model using Azure Machine Learning Studio.
Which module you should use for each step? To answer, drag the appropriate modules to the correct steps.
Each module may be used once or more than once, or not at all. You may need to drag the split bar between
panes or scroll to view content.
NOTE: Each correct selection is worth one point.

Answer:

References:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/partition-and-sample

NEW QUESTION: 11
Note: This question is part of a series of questions that present the same scenario. Each question in the
series contains a unique solution that might meet the stated goals. Some question sets might have more than
one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions
will not appear in the review screen.
You create an Azure Machine Learning service datastore in a workspace. The datastore contains the
following files:
* /data/2018/Q1 .csv
* /data/2018/Q2.csv
* /data/2018/Q3.csv
* /data/2018/Q4.csv
* /data/2019/Q1.csv
All files store data in the following format:
id,M,f2,l
1,1,2,0
2,1,1,1
32,10
You run the following code:
You need to create a dataset named training_data and load the data from all files into a single data frame by
using the following code:

Solution: Run the following code:

Does the solution meet the goal?


A. No
B. Yes
Answer: (SHOW ANSWER)

NEW QUESTION: 12
You are hired as a data scientist at a winery. The previous data scientist used Azure Machine Learning.
You need to review the models and explain how each model makes decisions.
Which explainer modules should you use? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.

Answer:
Explanation:
Meta explainers automatically select a suitable direct explainer and generate the best explanation info based
on the given model and data sets. The meta explainers leverage all the libraries (SHAP, LIME, Mimic, etc.)
that we have integrated or developed. The following are the meta explainers available in the SDK:
Tabular Explainer: Used with tabular datasets.
Text Explainer: Used with text datasets.
Image Explainer: Used with image datasets.
Box 1: Tabular
Box 2: Text
Box 3: Image
Reference:
https://medium.com/microsoftazure/automated-and-interpretable-machine-learning-d07975741298

NEW QUESTION: 13
You are working on a classification task. You have a dataset indicating whether a student would like to play
soccer and associated attributes. The dataset includes the following columns:

You need to classify variables by type.


Which variable should you add to each category? To answer, select the appropriate options in the answer
area.
NOTE: Each correct selection is worth one point.

Answer:

References:
https://www.edureka.co/blog/classification-algorithms/

NEW QUESTION: 14
You are building a binary classification model by using a supplied training set.
The training set is imbalanced between two classes.
You need to resolve the data imbalance.
What are three possible ways to achieve this goal? Each correct answer presents a complete solution NOTE:
Each correct selection is worth one point.
A. Penalize the classification
B. Resample the data set using under sampling or oversampling
C. Generate synthetic samples in the minority class.
D. Use accuracy as the evaluation metric of the model.
E. Normalize the training feature set.
Answer: (SHOW ANSWER)
References:
https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/

NEW QUESTION: 15
You plan to build a team data science environment. Data for training models in machine learning pipelines will
be over 20 GB in size.
You have the following requirements:
Models must be built using Caffe2 or Chainer frameworks.
Data scientists must be able to use a data science environment to build the machine learning pipelines
and train models on their personal devices in both connected and disconnected network environments.
Personal devices must support updating machine learning pipelines when connected to a network.
You need to select a data science environment.
Which environment should you use?
A. Azure Machine Learning Service
B. Azure Machine Learning Studio
C. Azure Databricks
D. Azure Kubernetes Service (AKS)
Answer: A (LEAVE A REPLY)
Explanation/Reference:
Explanation:
The Data Science Virtual Machine (DSVM) is a customized VM image on Microsoft's Azure cloud built
specifically for doing data science. Caffe2 and Chainer are supported by DSVM.
DSVM integrates with Azure Machine Learning.
Incorrect Answers:
B: Use Machine Learning Studio when you want to experiment with machine learning models quickly and
easily, and the built-in machine learning algorithms are sufficient for your solutions.
References:
https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/overview

NEW QUESTION: 16
Note: This question is part of a series of questions that present the same scenario. Each question in the
series contains a unique solution that might meet the stated goals. Some question sets might have more than
one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions
will not appear in the review screen.
You train a classification model by using a logistic regression algorithm.
You must be able to explain the model's predictions by calculating the importance of each feature, both as an
overall global relative importance value and as a measure of local importance for a specific set of predictions.
You need to create an explainer that you can use to retrieve the required global and local feature importance
values.
Solution: Create a TabularExplainer.
Does the solution meet the goal?
A. Yes
B. No
Answer: B (LEAVE A REPLY)
Instead use Permutation Feature Importance Explainer (PFI).
Note 1:

Note 2: Permutation Feature Importance Explainer (PFI): Permutation Feature Importance is a technique
used to explain classification and regression models. At a high level, the way it works is by randomly shuffling
data one feature at a time for the entire dataset and calculating how much the performance metric of interest
changes. The larger the change, the more important that feature is. PFI can explain the overall behavior of
any underlying model but does not explain individual predictions.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-interpretability
Valid DP-100 Dumps shared by Actual4test.com for Helping Passing DP-100 Exam! Actual4test.com now
offer the newest DP-100 exam dumps, the Actual4test.com DP-100 exam questions have been
updated and answers have been corrected get the newest Actual4test.com DP-100 dumps with Test
Engine here: https://www.actual4test.com/exam/DP-100-questions (266 Q&As Dumps, 30%OFF Special
Discount: Freepdfdumps)

NEW QUESTION: 17
You create a multi-class image classification deep learning experiment by using the PyTorch framework. You
plan to run the experiment on an Azure Compute cluster that has nodes with GPU's.
You need to define an Azure Machine Learning service pipeline to perform the monthly retraining of the image
classification model. The pipeline must run with minimal cost and minimize the time required to train the
model.
Which three pipeline steps should you run in sequence? To answer, move the appropriate actions from the
list of actions to the answer area and arrange them in the correct order.

Answer:
Explanation:
Step 1: Configure a DataTransferStep() to fetch new image data...
Step 2: Configure a PythonScriptStep() to run image_resize.y on the cpu-compute compute target.
Step 3: Configure the EstimatorStep() to run training script on the gpu_compute computer target.
The PyTorch estimator provides a simple way of launching a PyTorch training job on a compute target.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-pytorch

NEW QUESTION: 18
You need to obtain the output from the pipeline execution. Where will you find the output?
A. the digitjdentification.py script
B. the Inference Clusters tab in Machine Learning studio
C. the debug log
D. the Activity Log in the Azure portal for the Machine Learning workspace
E. a file named parallel_run_step.txt located in the output folder
Answer: D (LEAVE A REPLY)

NEW QUESTION: 19
You have a dataset that contains 2,000 rows. You are building a machine learning classification model by
using Azure Learning Studio. You add a Partition and Sample module to the experiment.
You need to configure the module. You must meet the following requirements:
Divide the data into subsets
Assign the rows into folds using a round-robin method
Allow rows in the dataset to be reused
How should you configure the module? To answer, select the appropriate options in the dialog box in the
answer area.
NOTE: Each correct selection is worth one point.

Answer:

Explanation:
Use the Split data into partitions option when you want to divide the dataset into subsets of the data. This
option is also useful when you want to create a custom number of folds for cross-validation, or to split rows
into several groups.
Add the Partition and Sample module to your experiment in Studio (classic), and connect the dataset.
For Partition or sample mode, select Assign to Folds.
Use replacement in the partitioning: Select this option if you want the sampled row to be put back into the pool
of rows for potential reuse. As a result, the same row might be assigned to several folds.
If you do not use replacement (the default option), the sampled row is not put back into the pool of rows for
potential reuse. As a result, each row can be assigned to only one fold.
Randomized split: Select this option if you want rows to be randomly assigned to folds.
If you do not select this option, rows are assigned to folds using the round-robin method.
References:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/partition-and-sample

NEW QUESTION: 20
You have a model with a large difference between the training and validation error values.
You must create a new model and perform cross-validation.
You need to identify a parameter set for the new model using Azure Machine Learning Studio.
Which module you should use for each step? To answer, drag the appropriate modules to the correct steps.
Each module may be used once or more than once, or not at all. You may need to drag the split bar between
panes or scroll to view content.
NOTE: Each correct selection is worth one point.

Answer:

Explanation:
Box 1: Split data
Box 2: Partition and Sample
Box 3: Two-Class Boosted Decision Tree
Box 4: Tune Model Hyperparameters
Integrated train and tune: You configure a set of parameters to use, and then let the module iterate over
multiple combinations, measuring accuracy until it finds a "best" model. With most learner modules, you can
choose which parameters should be changed during the training process, and which should remain fixed.
We recommend that you use Cross-Validate Model to establish the goodness of the model given the specified
parameters. Use Tune Model Hyperparameters to identify the optimal parameters.
References:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/partition-and-sample
NEW QUESTION: 21
You create a script for training a machine learning model in Azure Machine Learning service.
You create an estimator by running the following code:

For each of the following statements, select Yes if the statement is true. Otherwise, select No.
NOTE: Each correct selection is worth one point.

Answer:

Explanation:
Box 1: Yes
Parameter source_directory is a local directory containing experiment configuration and code files needed for
a training job.
Box 2: Yes
script_params is a dictionary of command-line arguments to pass to the training script specified in
entry_script.
Box 3: No
Box 4: Yes
The conda_packages parameter is a list of strings representing conda packages to be added to the Python
environment for the experiment.

NEW QUESTION: 22
You need to select a feature extraction method.
Which method should you use?
A. Spearman correlation
B. Mutual information
C. Mann-Whitney test
D. Pearson's correlation
Answer: A (LEAVE A REPLY)
Spearman's rank correlation coefficient assesses how well the relationship between two variables can be
described using a monotonic function.
Note: Both Spearman's and Kendall's can be formulated as special cases of a more general correlation
coefficient, and they are both appropriate in this scenario.
Scenario: The MedianValue and AvgRoomsInHouse columns both hold data in numeric format. You need to
select a feature selection algorithm to analyze the relationship between the two columns in more detail.
References:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/feature-selection-modules

NEW QUESTION: 23
You need to define a process for penalty event detection.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the list
of actions to the answer area and arrange them in the correct order.
Answer:

NEW QUESTION: 24
You are using a decision tree algorithm. You have trained a model that generalizes well at a tree depth equal
to 10.
You need to select the bias and variance properties of the model with varying tree depth values.
Which properties should you select for each tree depth? To answer, select the appropriate options in the
answer area.

Answer:

Explanation:
In decision trees, the depth of the tree determines the variance. A complicated decision tree (e.g. deep) has
low bias and high variance.
Note: In statistics and machine learning, the bias-variance tradeoff is the property of a set of predictive
models whereby models with a lower bias in parameter estimation have a higher variance of the parameter
estimates across samples, and vice versa. Increasing the bias will decrease the variance. Increasing the
variance will decrease the bias.
References:
https://machinelearningmastery.com/gentle-introduction-to-the-bias-variance-trade-off-in-machine-learning/

NEW QUESTION: 25
You need to implement a new cost factor scenario for the ad response models as illustrated in the
performance curve exhibit.
Which technique should you use?
A. Set the threshold to 0.5 and retrain if weighted Kappa deviates +/- 5% from 0.45.
B. Set the threshold to 0.05 and retrain if weighted Kappa deviates +/- 5% from 0.5.
C. Set the threshold to 0.2 and retrain if weighted Kappa deviates +/- 5% from 0.6.
D. Set the threshold to 0.75 and retrain if weighted Kappa deviates +/- 5% from 0.15.
Answer: A (LEAVE A REPLY)
Explanation/Reference:
Explanation:
Scenario:
Performance curves of current and proposed cost factor scenarios are shown in the following diagram:

The ad propensity model uses a cut threshold is 0.45 and retrains occur if weighted Kappa deviated from
0.1 +/- 5%.

NEW QUESTION: 26
You are evaluating a Python NumPy array that contains six data points defined as follows:
data = [10, 20, 30, 40, 50, 60]
You must generate the following output by using the k-fold algorithm implantation in the Python Scikit-learn
machine learning library:
train: [10 40 50 60], test: [20 30]
train: [20 30 40 60], test: [10 50]
train: [10 20 30 50], test: [40 60]
You need to implement a cross-validation to generate the output.
How should you complete the code segment? To answer, select the appropriate code segment in the dialog
box in the answer area.
NOTE: Each correct selection is worth one point.
Answer:

Explanation
Box 1: k-fold
Box 2: 3
K-Folds cross-validator provides train/test indices to split data in train/test sets. Split dataset into k
consecutive folds (without shuffling by default).
The parameter n_splits ( int, default=3) is the number of folds. Must be at least 2.
Box 3: data
Example: Example:
>>>
>>> from sklearn.model_selection import KFold
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([1, 2, 3, 4])
>>> kf = KFold(n_splits=2)
>>> kf.get_n_splits(X)
2
>>> print(kf)
KFold(n_splits=2, random_state=None, shuffle=False)
>>> for train_index, test_index in kf.split(X):
print("TRAIN:", train_index, "TEST:", test_index)
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
TRAIN: [2 3] TEST: [0 1]
TRAIN: [0 1] TEST: [2 3]
References:
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html
NEW QUESTION: 27
Note: This question is part of a series of questions that present the same scenario. Each question in the
series contains a unique solution that might meet the stated goals. Some question sets might have more than
one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions
will not appear in the review screen.
You create an Azure Machine Learning service datastore in a workspace. The datastore contains the
following files:
* /data/2018/Q1 .csv
* /data/2018/Q2.csv
* /data/2018/Q3.csv
* /data/2018/Q4.csv
* /data/2019/Q1.csv
All files store data in the following format:
id,f1,f2,l
1,1,2,0
2,1,1,1
3.2.1.0
You run the following code:

You need to create a dataset named training_data and load the data from all files into a single data frame by
using the following code:

Solution: Run the following code:

Does the solution meet the goal?


A. Yes
B. No
Answer: (SHOW ANSWER)
Explanation
Use two file paths.
Use Dataset.Tabular_from_delimeted as the data isn't cleansed.
Note:
A TabularDataset represents data in a tabular format by parsing the provided file or list of files. This provides
you with the ability to materialize the data into a pandas or Spark DataFrame so you can work with familiar
data preparation and training libraries without having to leave your notebook. You can create a
TabularDataset object from .csv, .tsv, .parquet, .jsonl files, and from SQL query results.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-register-datasets

NEW QUESTION: 28
Note: This question is part of a series of questions that present the same scenario. Each question in the
series contains a unique solution that might meet the stated goals. Some question sets might have more than
one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions
will not appear in the review screen.
You are creating a new experiment in Azure Machine Learning Studio.
One class has a much smaller number of observations than tin- other classes in the training set.
You need to select an appropriate data sampling strategy to compensate for the class imbalance.
Solution: You use the Principal Components Analysis (PCA) sampling mode.
Does the solution meet the goal?
A. Yes
B. No
Answer: (SHOW ANSWER)
Explanation
Instead use the Synthetic Minority Oversampling Technique (SMOTE) sampling mode.
Note: SMOTE is used to increase the number of underepresented cases in a dataset used for machine
learning.
SMOTE is a better way of increasing the number of rare cases than simply duplicating existing cases.

NEW QUESTION: 29
You are using the Azure Machine Learning Service to automate hyper par a meter exploration of your neural
network classification model.
You must define the hyper parameter space to automatically tune hyper parameters using random sampling
according to following requirements:
* Learning rate must be selected from a normal distribution with a mean value of 10 and a standard deviation
of 3.
* Batch size must be 16, 32 and 64.
* Keep probability must be a value selected from a uniform distribution between the range of 0.05 and 0.1.
You need to use the par am .sampling method of the Python API for the Azure Machine Learning Service.
How should you complete the code segment? To answer, select the appropriate Options in the answer area.
NOTE: Each correct selection is worth one point.
Answer:

NEW QUESTION: 30
You have a feature set containing the following numerical features: X, Y, and Z.
The Poisson correlation coefficient (r-value) of X, Y, and Z features is shown in the following image:

Use the drop-down menus to select the answer choice that answers each question based on the information
presented in the graphic.
NOTE: Each correct selection is worth one point.

Answer:

Explanation:
Box 1: 0.859122
Box 2: a positively linear relationship
+1 indicates a strong positive linear relationship
-1 indicates a strong negative linear correlation
0 denotes no linear relationship between the two variables.
References:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/compute-linear-correlation

NEW QUESTION: 31
You are analyzing the asymmetry in a statistical distribution.
The following image contains two density curves that show the probability distribution of two datasets.
Use the drop-down menus to select the answer choice that answers each question based on the information
presented in the graphic.
NOTE: Each correct selection is worth one point.

Answer:

Explanation
Box 1: Positive skew
Positive skew values means the distribution is skewed to the right.
Box 2: Negative skew
Negative skewness values mean the distribution is skewed to the left.
References:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/compute-elementary-
statistics

Valid DP-100 Dumps shared by Actual4test.com for Helping Passing DP-100 Exam! Actual4test.com now
offer the newest DP-100 exam dumps, the Actual4test.com DP-100 exam questions have been
updated and answers have been corrected get the newest Actual4test.com DP-100 dumps with Test
Engine here: https://www.actual4test.com/exam/DP-100-questions (266 Q&As Dumps, 30%OFF Special
Discount: Freepdfdumps)

NEW QUESTION: 32
You need to build a feature extraction strategy for the local models.
How should you complete the code segment? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Answer:

Explanation
NEW QUESTION: 33
You create an Azure Machine Learning workspace.
You must create a custom role named that meets the following requirements:
* Role members must not be able to delete the workspace.
* Role members must not be able to create, update, or delete compute resource in the workspace.
* Role members must not be able to add new users to the workspace.
You need to create a JSON file for the DataScientist role in the Azure Machine Learning workspace.
The custom role must enforce the restrictions specified by the IT Operations team.
Which JSON code segment should you use?
A)

B)

C)
D)

A. Option A
B. Option B
C. Option C
D. Option D
Answer: A (LEAVE A REPLY)
Explanation
The following custom role can do everything in the workspace except for the following actions:
* It can't create or update a compute resource.
* It can't delete a compute resource.
* It can't add, delete, or alter role assignments.
* It can't delete the workspace.
To create a custom role, first construct a role definition JSON file that specifies the permission and scope for
the role. The following example defines a custom role named "Data Scientist Custom" scoped at a specific
workspace level:
data_scientist_custom_role.json :
{
"Name": "Data Scientist Custom",
"IsCustom": true,
"Description": "Can run experiment but can't create or delete compute.",
"Actions": ["*"],
"NotActions": [
"Microsoft.MachineLearningServices/workspaces/*/delete",
"Microsoft.MachineLearningServices/workspaces/write",
"Microsoft.MachineLearningServices/workspaces/computes/*/write",
"Microsoft.MachineLearningServices/workspaces/computes/*/delete",
"Microsoft.Authorization/*/write"
],
"AssignableScopes": [
"/subscriptions/<subscription_id>/resourceGroups/<resource_group_name>/providers/Microsoft.MachineLearni
]
}
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-assign-roles

NEW QUESTION: 34
A coworker registers a datastore in a Machine Learning services workspace by using the following code:

You need to write code to access the datastore from a notebook.

Answer:

Explanation:
Box 1: DataStore
To get a specific datastore registered in the current workspace, use the get() static method on the Datastore
class:
# Get a named datastore from the current workspace
datastore = Datastore.get(ws, datastore_name='your datastore name')
Box 2: ws
Box 3: demo_datastore
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-access-data

NEW QUESTION: 35
You are using a decision tree algorithm. You have trained a model that generalizes well at a tree depth equal
to
10.
You need to select the bias and variance properties of the model with varying tree depth values.
Which properties should you select for each tree depth? To answer, select the appropriate options in the
answer area.

Answer:

Explanation

In decision trees, the depth of the tree determines the variance. A complicated decision tree (e.g. deep) has
low bias and high variance.
Note: In statistics and machine learning, the bias-variance tradeoff is the property of a set of predictive
models whereby models with a lower bias in parameter estimation have a higher variance of the parameter
estimates across samples, and vice versa. Increasing the bias will decrease the variance. Increasing the
variance will decrease the bias.
References:
https://machinelearningmastery.com/gentle-introduction-to-the-bias-variance-trade-off-in-machine-learning/

NEW QUESTION: 36
You use Data Science Virtual Machines (DSVMs) for Windows and Linux in Azure.
You need to access the DSVMs.
Which utilities should you use? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.

Answer:
NEW QUESTION: 37
Note: This question is part of a series of questions that present the same scenario. Each question in the
series contains a unique solution that might meet the stated goals. Some question sets might have more than
one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions
will not appear in the review screen.
You have a Python script named train.py in a local folder named scripts. The script trains a regression model
by using scikit-learn. The script includes code to load a training data file which is also located in the scripts
folder.
You must run the script as an Azure ML experiment on a compute cluster named aml-compute.
You need to configure the run to ensure that the environment includes the required packages for model
training. You have instantiated a variable named aml-compute that references the target compute cluster.
Solution: Run the following code:

Does the solution meet the goal?


A. Yes
B. No
Answer: A (LEAVE A REPLY)
The scikit-learn estimator provides a simple way of launching a scikit-learn training job on a compute target. It
is implemented through the SKLearn class, which can be used to support single-node CPU training.
Example:
from azureml.train.sklearn import SKLearn
}
estimator = SKLearn(source_directory=project_folder,
compute_target=compute_target,
entry_script='train_iris.py'
)
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-scikit-learn

NEW QUESTION: 38
You are performing feature scaling by using the scikit-learn Python library for x.1 x2, and x3 features.
Original and scaled data is shown in the following image.

Use the drop-down menus to select the answer choice that answers each question based on the information
presented in the graphic.
NOTE: Each correct selection is worth one point.

Answer:
Explanation

Box 1: StandardScaler
The StandardScaler assumes your data is normally distributed within each feature and will scale them such
that the distribution is now centred around 0, with a standard deviation of 1.
Example:
All features are now on the same scale relative to one another.
Box 2: Min Max Scaler
Notice that the skewness of the distribution is maintained but the 3 distributions are brought into the same
scale so that they overlap.
Box 3: Normalizer
References:
http://benalexkeen.com/feature-scaling-with-scikit-learn/

NEW QUESTION: 39
The finance team asks you to train a model using data in an Azure Storage blob container named finance-
data.
You need to register the container as a datastore in an Azure Machine Learning workspace and ensure that
an error will be raised if the container does not exist.
How should you complete the code? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.

Answer:

Explanation:
Box 1: register_azure_blob_container
Register an Azure Blob Container to the datastore.
Box 2: create_if_not_exists = False
Create the file share if it does not exists, defaults to False.
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore.datastore

NEW QUESTION: 40
You need to define a process for penalty event detection.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the list
of actions to the answer area and arrange them in the correct order.

Answer:

1 - Import the global model and build 1 the local model using PyTorch.
2 - Build the global model using PyTorch.
3 - Build the global model using TensorFlow
4 - Import the global model and build the local model using TensorFlow.
NEW QUESTION: 41
You create an experiment in Azure Machine Learning Studio- You add a training dataset that contains 10.000
rows. The first 9.000 rows represent class 0 (90 percent). The first 1.000 rows represent class 1 (10 percent).
The training set is unbalanced between two Classes. You must increase the number of training examples for
class 1 to 4,000 by using data rows. You add the Synthetic Minority Oversampling Technique (SMOTE)
module to the experiment.
You need to configure the module.
Which values should you use? To answer, select the appropriate options in the dialog box in the answer area.
NOTE: Each correct selection is worth one point.

Answer:

NEW QUESTION: 42
You are tuning a hyperparameter for an algorithm. The following table shows a data set with different
hyperparameter, training error, and validation errors.
Use the drop-down menus to select the answer choice that answers each question based on the information
presented in the graphic.

Answer:
Explanation:
Box 1: 4
Choose the one which has lower training and validation error and also the closest match.
Minimize variance (difference between validation error and train error).
Box 2: 5
Minimize variance (difference between validation error and train error).
Reference:
https://medium.com/comet-ml/organizing-machine-learning-projects-project-management-
guidelines-2d2b85651bbd

NEW QUESTION: 43
You plan to use the Hyperdrive feature of Azure Machine Learning to determine the optimal hyperparameter
values when training a model.
You must use Hyperdrive to try combinations of the following hyperparameter values. You must not apply an
early termination policy.
* learning_rate: any value between 0.001 and 0.1
* batch_size: 16, 32, or 64
You need to configure the sampling method for the Hyperdrive experiment.
Which two sampling methods can you use? Each correct answer is a complete solution.
NOTE: Each correct selection is worth one point.
A. No sampling
B. Grid sampling
C. Bayesian sampling
D. Random sampling
Answer: (SHOW ANSWER)
Explanation/Reference:
C: Bayesian sampling is based on the Bayesian optimization algorithm and makes intelligent choices on the
hyperparameter values to sample next. It picks the sample based on how the previous samples performed,
such that the new sample improves the reported primary metric.
Bayesian sampling does not support any early termination policy
Example:
from azureml.train.hyperdrive import BayesianParameterSampling
from azureml.train.hyperdrive import uniform, choice
param_sampling = BayesianParameterSampling( {
"learning_rate": uniform(0.05, 0.1),
"batch_size": choice(16, 32, 64, 128)
}
)
D: In random sampling, hyperparameter values are randomly selected from the defined search space.
Random sampling allows the search space to include both discrete and continuous hyperparameters.
Incorrect Answers:
B: Grid sampling can be used if your hyperparameter space can be defined as a choice among discrete
values and if you have sufficient budget to exhaustively search over all values in the defined search space.
Additionally, one can use automated early termination of poorly performing runs, which reduces wastage of
resources.
Example, the following space has a total of six samples:
from azureml.train.hyperdrive import GridParameterSampling
from azureml.train.hyperdrive import choice
param_sampling = GridParameterSampling( {
"num_hidden_layers": choice(1, 2, 3),
"batch_size": choice(16, 32)
}
)
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters

NEW QUESTION: 44
You use the Azure Machine Learning service to create a tabular dataset named training.data. You plan to use
this dataset in a training script.
You create a variable that references the dataset using the following code:
training_ds = workspace.datasets.get("training_data")
You define an estimator to run the script.
You need to set the correct property of the estimator to ensure that your script can access the training.data
dataset Which property should you set?
A)

B)

C)

D)

A. Option A
B. Option B
C. Option C
D. Option D
Answer: (SHOW ANSWER)
Example:
# Get the training dataset
diabetes_ds = ws.datasets.get("Diabetes Dataset")
# Create an estimator that uses the remote compute
hyper_estimator = SKLearn(source_directory=experiment_folder,
inputs=[diabetes_ds.as_named_input('diabetes')], # Pass the dataset as an input compute_target =
cpu_cluster, conda_packages=['pandas','ipykernel','matplotlib'], pip_packages=['azureml-
sdk','argparse','pyarrow'], entry_script='diabetes_training.py') Reference:
https://notebooks.azure.com/GraemeMalcolm/projects/azureml-primers/html/04%20-%20Optimizing
%20Model%20Training.ipynb

NEW QUESTION: 45
You are building an intelligent solution using machine learning models.
The environment must support the following requirements:
* Data scientists must build notebooks in a cloud environment
* Data scientists must use automatic feature engineering and model building in machine learning pipelines.
* Notebooks must be deployed to retrain using Spark instances with dynamic worker allocation.
* Notebooks must be exportable to be version controlled locally.
You need to create the environment.
Which four actions should you perform in sequence? To answer, move the appropriate actions from the list of
actions to the answer area and arrange them in the correct order.
Answer:

Explanation
Step 1: Create an Azure HDInsight cluster to include the Apache Spark Mlib library Step 2: Install Microsot
Machine Learning for Apache Spark You install AzureML on your Azure HDInsight cluster.
Microsoft Machine Learning for Apache Spark (MMLSpark) provides a number of deep learning and data
science tools for Apache Spark, including seamless integration of Spark Machine Learning pipelines with
Microsoft Cognitive Toolkit (CNTK) and OpenCV, enabling you to quickly create powerful, highly-scalable
predictive and analytical models for large image and text datasets.
Step 3: Create and execute the Zeppelin notebooks on the cluster
Step 4: When the cluster is ready, export Zeppelin notebooks to a local environment.
Notebooks must be exportable to be version controlled locally.
References:
https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-zeppelin-notebook
https://azuremlbuild.blob.core.windows.net/pysparkapi/intro.html

NEW QUESTION: 46
You need to define an evaluation strategy for the crowd sentiment models.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the list
of actions to the answer area and arrange them in the correct order.

Answer:
Explanation:
Step 1: Define a cross-entropy function activation
When using a neural network to perform classification and prediction, it is usually better to use cross-entropy
error than classification error, and somewhat better to use cross-entropy error than mean squared error to
evaluate the quality of the neural network.
Step 2: Add cost functions for each target state.
Step 3: Evaluated the distance error metric.
References:
https://www.analyticsvidhya.com/blog/2018/04/fundamentals-deep-learning-regularization-techniques/

Valid DP-100 Dumps shared by Actual4test.com for Helping Passing DP-100 Exam! Actual4test.com now
offer the newest DP-100 exam dumps, the Actual4test.com DP-100 exam questions have been
updated and answers have been corrected get the newest Actual4test.com DP-100 dumps with Test
Engine here: https://www.actual4test.com/exam/DP-100-questions (266 Q&As Dumps, 30%OFF Special
Discount: Freepdfdumps)

NEW QUESTION: 47
You plan to build a team data science environment. Data for training models in machine learning pipelines will
be over 20 GB in size.
You have the following requirements:
* Models must be built using Caffe2 or Chainer frameworks.
* Data scientists must be able to use a data science environment to build the machine learning pipelines and
train models on their personal devices in both connected and disconnected network environments.
Personal devices must support updating machine learning pipelines when connected to a network.
You need to select a data science environment.
Which environment should you use?
A. Azure Machine Learning Service
B. Azure Machine Learning Studio
C. Azure Databricks
D. Azure Kubernetes Service (AKS)
Answer: A (LEAVE A REPLY)
The Data Science Virtual Machine (DSVM) is a customized VM image on Microsoft's Azure cloud built
specifically for doing data science. Caffe2 and Chainer are supported by DSVM.
DSVM integrates with Azure Machine Learning.
Incorrect Answers:
B: Use Machine Learning Studio when you want to experiment with machine learning models quickly and
easily, and the built-in machine learning algorithms are sufficient for your solutions.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/overview

NEW QUESTION: 48
You need to select a pre built development environment for a series of data science experiments. You must
use the R language for the experiments.
Which three environments can you use? Each correct answer presents a complete solution. NOTE: Each
correct selection is worth one point.
A. Data Science Virtual Machine (OSVM)
B. Azure Machine Learning Studio
C. MI.NET Library on a local environment
D. Azure Data bricks
E. Azure Cognitive Services
Answer: B,C,D (LEAVE A REPLY)

NEW QUESTION: 49
You are performing sentiment analysis using a CSV file that includes 12,000 customer reviews written in a
short sentence format. You add the CSV file to Azure Machine Learning Studio and configure it as the starting
point dataset of an experiment. You add the Extract N-Gram Features from Text module to the experiment to
extract key phrases from the customer review column in the dataset.
You must create a new n-gram dictionary from the customer review text and set the maximum n-gram size to
trigrams.
What should you select? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Answer:
Explanation
Vocabulary mode: Create
For Vocabulary mode, select Create to indicate that you are creating a new list of n-gram features.
N-Grams size: 3
For N-Grams size, type a number that indicates the maximum size of the n-grams to extract and store. For
example, if you type 3, unigrams, bigrams, and trigrams will be created.
Weighting function: Leave blank
The option, Weighting function, is required only if you merge or update vocabularies. It specifies how terms in
the two vocabularies and their scores should be weighted against each other.
References:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/extract-n-gram-features-
from-

NEW QUESTION: 50
You need to implement a new cost factor scenario for the ad response models as illustrated in the
performance curve exhibit.
Which technique should you use?
A. Set the threshold to 0.5 and retrain if weighted Kappa deviates +/- 5% from 0.45.
B. Set the threshold to 0.05 and retrain if weighted Kappa deviates +/- 5% from 0.5.
C. Set the threshold to 0.2 and retrain if weighted Kappa deviates +/- 5% from 0.6.
D. Set the threshold to 0.75 and retrain if weighted Kappa deviates +/- 5% from 0.15.
Answer: A (LEAVE A REPLY)
Explanation
Scenario:
Performance curves of current and proposed cost factor scenarios are shown in the following diagram:

The ad propensity model uses a cut threshold is 0.45 and retrains occur if weighted Kappa deviated from 0.1
+/- 5%.

NEW QUESTION: 51
You are analyzing the asymmetry in a statistical distribution.
The following image contains two density curves that show the probability distribution of two datasets.
Use the drop-down menus to select the answer choice that answers each question based on the information
presented in the graphic.
NOTE: Each correct selection is worth one point.

Answer:

Explanation
Box 1: Positive skew
Positive skew values means the distribution is skewed to the right.
Box 2: Negative skew
Negative skewness values mean the distribution is skewed to the left.
References:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/compute-elementary-
statistics

NEW QUESTION: 52
You need to obtain the output from the pipeline execution. Where will you find the output?
A. the Activity Log in the Azure portal for the Machine Learning workspace
B. a file named parallel_run_step.txt located in the output folder
C. the digitjdentification.py script
D. the Inference Clusters tab in Machine Learning studio
E. the debug log
Answer: B (LEAVE A REPLY)
output_action (str): How the output is to be organized. Currently supported values are 'append_row' and
'summary_only'.
'append_row' - All values output by run() method invocations will be aggregated into one unique file named
parallel_run_step.txt that is created in the output location.
'summary_only'
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-contrib-pipeline-
steps/azureml.contrib.pipeline.steps.parallelrunconfig
Topic 1, Overview
Current environment
Requirements
* Media used for penalty event detection will be provided by consumer devices. Media may include images
and videos captured during the sporting event and snared using social media. The images and videos will
have varying sizes and formats.
* The data available for model building comprises of seven years of sporting event media. The sporting event
media includes: recorded videos, transcripts of radio commentary, and logs from related social media feeds
feeds captured during the sporting events.
* Crowd sentiment will include audio recordings submitted by event attendees in both mono and stereo
Formats.
Advertisements
* Ad response models must be trained at the beginning of each event and applied during the sporting event.
* Market segmentation nxxlels must optimize for similar ad resporr.r history.
* Sampling must guarantee mutual and collective exclusivity local and global segmentation models that share
the same features.
* Local market segmentation models will be applied before determining a user's propensity to respond to an
advertisement.
* Data scientists must be able to detect model degradation and decay.
* Ad response models must support non linear boundaries features.
* The ad propensity model uses a cut threshold is 0.45 and retrains occur if weighted Kappa deviates from 0.1
+/-5%.
* The ad propensity model uses cost factors shown in the following diagram:

The ad propensity model uses proposed cost factors shown in the following diagram:

Performance curves of current and proposed cost factor scenarios are shown in the following diagram:
Penalty detection and sentiment
Findings
* Data scientists must build an intelligent solution by using multiple machine learning models for penalty event
detection.
* Data scientists must build notebooks in a local environment using automatic feature engineering and model
building in machine learning pipelines.
* Notebooks must be deployed to retrain by using Spark instances with dynamic worker allocation
* Notebooks must execute with the same code on new Spark instances to recode only the source of the data.
* Global penalty detection models must be trained by using dynamic runtime graph computation during
training.
* Local penalty detection models must be written by using BrainScript.
* Experiments for local crowd sentiment models must combine local penalty detection data.
* Crowd sentiment models must identify known sounds such as cheers and known catch phrases. Individual
crowd sentiment models will detect similar sounds.
* All shared features for local models are continuous variables.
* Shared features must use double precision. Subsequent layers must have aggregate running mean and
standard deviation metrics Available.
segments
During the initial weeks in production, the following was observed:
* Ad response rates declined.
* Drops were not consistent across ad styles.
* The distribution of features across training and production data are not consistent.
Analysis shows that of the 100 numeric features on user location and behavior, the 47 features that come
from location sources are being used as raw features. A suggested experiment to remedy the bias and
variance issue is to engineer 10 linearly uncorrected features.
Penalty detection and sentiment
* Initial data discovery shows a wide range of densities of target states in training data used for crowd
sentiment models.
* All penalty detection models show inference phases using a Stochastic Gradient Descent (SGD) are running
too stow.
* Audio samples show that the length of a catch phrase varies between 25%-47%, depending on region.
* The performance of the global penalty detection models show lower variance but higher bias when
comparing training and validation sets. Before implementing any feature changes, you must confirm the bias
and variance using all training and validation cases.

NEW QUESTION: 53
You need to identify the methods for dividing the data according, to the testing requirements.
Which properties should you select? To answer, select the appropriate option-, m the answer area. NOTE:
Each correct selection is worth one point.

Answer:
NEW QUESTION: 54
You create machine learning models by using Azure Machine Learning.
You plan to train and score models by using a variety of compute contexts. You also plan to create a new
compute resource in Azure Machine Learning studio.
You need to select the appropriate compute types.
Which compute types should you select? To answer, drag the appropriate compute types to the correct
requirements. Each compute type may be used once, more than once, or not at all. You may need to drag the
split bar between panes or scroll to view content.
NOTE: Each correct selection is worth one point.

Answer:
Explanation:
Box 1: Attached compute

Box 2: Inference cluster


Box 3: Training cluster
Box 4: Attached compute

NEW QUESTION: 55
You are producing a multiple linear regression model in Azure Machine Learning Studio.
Several independent variables are highly correlated.
You need to select appropriate methods for conducting effective feature engineering on all the data.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the list
of actions to the answer area and arrange them in the correct order.
Answer:

Explanation

Step 1: Use the Filter Based Feature Selection module


Filter Based Feature Selection identifies the features in a dataset with the greatest predictive power.
The module outputs a dataset that contains the best feature columns, as ranked by predictive power. It also
outputs the names of the features and their scores from the selected metric.
Step 2: Build a counting transform
A counting transform creates a transformation that turns count tables into features, so that you can apply the
transformation to multiple datasets.
Step 3: Test the hypothesis using t-Test
References:
https://docs.microsoft.com/bs-latn-ba/azure/machine-learning/studio-module-reference/filter-based-feature-
selec
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/build-counting-transform

NEW QUESTION: 56
You need to implement a scaling strategy for the local penalty detection data.
Which normalization type should you use?
A. Streaming
B. Weight
C. Batch
D. Cosine
Answer: (SHOW ANSWER)
Post batch normalization statistics (PBN) is the Microsoft Cognitive Toolkit (CNTK) version of how to evaluate
the population mean and variance of Batch Normalization which could be used in inference Original Paper.
In CNTK, custom networks are defined using the BrainScriptNetworkBuilder and described in the CNTK
network description language "BrainScript." Scenario:
Local penalty detection models must be written by using BrainScript.
References:
https://docs.microsoft.com/en-us/cognitive-toolkit/post-batch-normalization-statistics

NEW QUESTION: 57
Note: This question is part of a series of questions that present the same scenario. Each question in the
series contains a unique solution that might meet the stated goals. Some question sets might have more than
one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions
will not appear in the review screen.
You are analyzing a numerical dataset which contains missing values in several columns.
You must clean the missing values using an appropriate operation without affecting the dimensionality of the
feature set.
You need to analyze a full dataset to include all values.
Solution: Replace each missing value using the Multiple Imputation by Chained Equations (MICE) method.
Does the solution meet the goal?
A. Yes
B. NO
Answer: (SHOW ANSWER)
Explanation
Replace using MICE: For each missing value, this option assigns a new value, which is calculated by using a
method described in the statistical literature as "Multivariate Imputation using Chained Equations" or
"Multiple Imputation by Chained Equations". With a multiple imputation method, each variable with missing
data is modeled conditionally using the other variables in the data before filling in the missing values.
Note: Multivariate imputation by chained equations (MICE), sometimes called "fully conditional specification"
or "sequential regression multiple imputation" has emerged in the statistical literature as one principled
method of addressing missing data. Creating multiple imputations, as opposed to single imputations,
accounts for the statistical uncertainty in the imputations. In addition, the chained equations approach is very
flexible and can handle variables of varying types (e.g., continuous or binary) as well as complexities such as
bounds or survey skip patterns.
References:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3074241/
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/clean-missing-data
NEW QUESTION: 58
Note: This question is part of a series of questions that present the same scenario. Each question in the
series contains a unique solution that might meet the stated goals. Some question sets might have more than
one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions
will not appear in the review screen.
An IT department creates the following Azure resource groups and resources:

The IT department creates an Azure Kubernetes Service (AKS)-based inference compute target named aks-
cluster in the Azure Machine Learning workspace. You have a Microsoft Surface Book computer with a GPU.
Python 3.6 and Visual Studio Code are installed.
You need to run a script that trains a deep neural network (DNN) model and logs the loss and accuracy
metrics.
Solution: Install the Azure ML SDK on the Surface Book. Run Python code to connect to the workspace. Run
the training script as an experiment on the aks-cluster compute target.
Does the solution meet the goal?
A. Yes
B. No
Answer: B (LEAVE A REPLY)

NEW QUESTION: 59
You are using a decision tree algorithm. You have trained a model that generalizes well at a tree depth equal
to
10.
You need to select the bias and variance properties of the model with varying tree depth values.
Which properties should you select for each tree depth? To answer, select the appropriate options in the
answer area.
Answer:

Explanation

In decision trees, the depth of the tree determines the variance. A complicated decision tree (e.g. deep) has
low bias and high variance.
Note: In statistics and machine learning, the bias-variance tradeoff is the property of a set of predictive
models whereby models with a lower bias in parameter estimation have a higher variance of the parameter
estimates across samples, and vice versa. Increasing the bias will decrease the variance. Increasing the
variance will decrease the bias.
References:
https://machinelearningmastery.com/gentle-introduction-to-the-bias-variance-trade-off-in-machine-learning/
NEW QUESTION: 60
You have a feature set containing the following numerical features: X, Y, and Z.
The Poisson correlation coefficient (r-value) of X, Y, and Z features is shown in the following image:

Use the drop-down menus to select the answer choice that answers each question based on the information
presented in the graphic.
NOTE: Each correct selection is worth one point.

Answer:

Explanation:
Box 1: 0.859122
Box 2: a positively linear relationship
+1 indicates a strong positive linear relationship
-1 indicates a strong negative linear correlation
0 denotes no linear relationship between the two variables.
References:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/compute-linear-correlation
NEW QUESTION: 61
You have a comma-separated values (CSV) file containing data from which you want to train a classification
model.
You are using the Automated Machine Learning interface in Azure Machine Learning studio to train the
classification model. You set the task type to Classification.
You need to ensure that the Automated Machine Learning process evaluates only linear models.
What should you do?
A. Add all algorithms other than linear ones to the blocked algorithms list.
B. Set the Exit criterion option to a metric score threshold.
C. Clear the option to perform automatic featurization.
D. Clear the option to enable deep learning.
E. Set the task type to Regression
Answer: C (LEAVE A REPLY)
Explanation
Automatic featurization can fit non-linear models.
Reference:
https://econml.azurewebsites.net/spec/estimation/dml.html
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-automated-ml-for-ml-models

Valid DP-100 Dumps shared by Actual4test.com for Helping Passing DP-100 Exam! Actual4test.com now
offer the newest DP-100 exam dumps, the Actual4test.com DP-100 exam questions have been
updated and answers have been corrected get the newest Actual4test.com DP-100 dumps with Test
Engine here: https://www.actual4test.com/exam/DP-100-questions (266 Q&As Dumps, 30%OFF Special
Discount: Freepdfdumps)

NEW QUESTION: 62
You are evaluating a completed binary classification machine.
You need to use the precision as the evaluation metric.
Which visualization should you use?
A. scatter plot
B. coefficient of determination
C. Receiver Operating Characteristic CROC) curve
D. Gradient descent
Answer: (SHOW ANSWER)
Explanation
Receiver operating characteristic (or ROC) is a plot of the correctly classified labels vs. the incorrectly
classified labels for a particular model.
References:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml#confusion-matrix
NEW QUESTION: 63
You are retrieving data from a large datastore by using Azure Machine Learning Studio.
You must create a subset of the data for testing purposes using a random sampling seed based on the
system clock.
You add the Partition and Sample module to your experiment.
You need to select the properties for the module.
Which values should you select? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.

Answer:
Box 1: Sampling
Create a sample of data
This option supports simple random sampling or stratified random sampling. This is useful if you want to
create a smaller representative sample dataset for testing.
1. Add the Partition and Sample module to your experiment in Studio, and connect the dataset.
2. Partition or sample mode: Set this to Sampling.
3. Rate of sampling. See box 2 below.
Box 2: 0
3. Rate of sampling. Random seed for sampling: Optionally, type an integer to use as a seed value.
This option is important if you want the rows to be divided the same way every time. The default value is 0,
meaning that a starting seed is generated based on the system clock. This can lead to slightly different results
each time you run the experiment.
References:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/partition-and-sample

NEW QUESTION: 64
You have a multi-class image classification deep learning model that uses a set of labeled photographs. You
create the following code to select hyperparameter values when training the model.
For each of the following statements, select Yes if the statement is true. Otherwise, select No.
NOTE: Each correct selection is worth one point.

Answer:

Explanation:
Box 1: Yes
Hyperparameters are adjustable parameters you choose to train a model that govern the training process
itself. Azure Machine Learning allows you to automate hyperparameter exploration in an efficient manner,
saving you significant time and resources. You specify the range of hyperparameter values and a maximum
number of training runs. The system then automatically launches multiple simultaneous runs with different
parameter configurations and finds the configuration that results in the best performance, measured by the
metric you choose. Poorly performing training runs are automatically early terminated, reducing wastage of
compute resources. These resources are instead used to explore other hyperparameter configurations.
Box 2: Yes
uniform(low, high) - Returns a value uniformly distributed between low and high Box 3: No Bayesian sampling
does not currently support any early termination policy.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters

NEW QUESTION: 65
You are a lead data scientist for a project that tracks the health and migration of birds. You create a multi-
class image classification deep learning model that uses a set of labeled bird photographs collected by
experts.
You have 100,000 photographs of birds. All photographs use the JPG format and are stored in an Azure blob
container in an Azure subscription.
You need to access the bird photograph files in the Azure blob container from the Azure Machine Learning
service workspace that will be used for deep learning model training. You must minimize data movement.
What should you do?
A. Create an Azure Data Lake store and move the bird photographs to the store.
B. Create an Azure Cosmos DB database and attach the Azure Blob containing bird photographs storage to
the database.
C. Create and register a dataset by using TabularDataset class that references the Azure blob storage
containing bird photographs.
D. Register the Azure blob storage containing the bird photographs as a datastore in Azure Machine Learning
service.
E. Copy the bird photographs to the blob datastore that was created with your Azure Machine Learning
service workspace.
Answer: (SHOW ANSWER)
Explanation
We recommend creating a datastore for an Azure Blob container. When you create a workspace, an Azure
blob container and an Azure file share are automatically registered to the workspace.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-access-data

NEW QUESTION: 66
You have a model with a large difference between the training and validation error values.
You must create a new model and perform cross-validation.
You need to identify a parameter set for the new model using Azure Machine Learning Studio.
Which module you should use for each step? To answer, drag the appropriate modules to the correct steps.
Each module may be used once or more than once, or not at all. You may need to drag the split bar between
panes or scroll to view content.
NOTE: Each correct selection is worth one point.
Answer:

Explanation:
Box 1: Split data
Box 2: Partition and Sample
Box 3: Two-Class Boosted Decision Tree
Box 4: Tune Model Hyperparameters
Integrated train and tune: You configure a set of parameters to use, and then let the module iterate over
multiple combinations, measuring accuracy until it finds a "best" model. With most learner modules, you can
choose which parameters should be changed during the training process, and which should remain fixed.
We recommend that you use Cross-Validate Model to establish the goodness of the model given the specified
parameters. Use Tune Model Hyperparameters to identify the optimal parameters.
References:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/partition-and-sample

NEW QUESTION: 67
You need to correct the model fit issue.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the list
of actions to the answer area and arrange them in the correct order.
Answer:
Explanation:
Step 1: Augment the data
Scenario: Columns in each dataset contain missing and null values. The datasets also contain many outliers.
Step 2: Add the Bayesian Linear Regression module.
Scenario: You produce a regression model to predict property prices by using the Linear Regression and
Bayesian Linear Regression modules.
Step 3: Configure the regularization weight.
Regularization typically is used to avoid overfitting. For example, in L2 regularization weight, type the value to
use as the weight for L2 regularization. We recommend that you use a non-zero value to avoid overfitting.
Scenario:
Model fit: The model shows signs of overfitting. You need to produce a more refined regression model that
reduces the overfitting.
Incorrect Answers:
Multiclass Decision Jungle module:
Decision jungles are a recent extension to decision forests. A decision jungle consists of an ensemble of
decision directed acyclic graphs (DAGs).
L-BFGS:
L-BFGS stands for "limited memory Broyden-Fletcher-Goldfarb-Shanno". It can be found in the wwo-Class
Logistic Regression module, which is used to create a logistic regression model that can be used to predict
two (and only two) outcomes.
References:
<https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/linear-regr ession>

NEW QUESTION: 68
You have a dataset that includes home sales data for a city. The dataset includes the following columns.

Each row in the dataset corresponds to an individual home sales transaction.


You need to use automated machine learning to generate the best model for predicting the sales price based
on the features of the house.
Which values should you use? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.

Answer:
Reference:
https://docs.microsoft.com/en-us/learn/modules/create-regression-model-azure-machine-learning-designer

NEW QUESTION: 69
You create a multi-class image classification deep learning experiment by using the PyTorch framework. You
plan to run the experiment on an Azure Compute cluster that has nodes with GPU's.
You need to define an Azure Machine Learning service pipeline to perform the monthly retraining of the image
classification model. The pipeline must run with minimal cost and minimize the time required to train the
model.
Which three pipeline steps should you run in sequence? To answer, move the appropriate actions from the
list of actions to the answer area and arrange them in the correct order.
Answer:

Explanation
Step 1: Configure a DataTransferStep() to fetch new image data...
Step 2: Configure a PythonScriptStep() to run image_resize.y on the cpu-compute compute target.
Step 3: Configure the EstimatorStep() to run training script on the gpu_compute computer target.
The PyTorch estimator provides a simple way of launching a PyTorch training job on a compute target.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-pytorch

NEW QUESTION: 70
Note: This question is part of a series of questions that present the same scenario. Each question in the
series contains a unique solution that might meet the stated goals. Some question sets might have more than
one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions
will not appear in the review screen.
An IT department creates the following Azure resource groups and resources:

The IT department creates an Azure Kubernetes Service (AKS)-based inference compute target named aks-
cluster in the Azure Machine Learning workspace.
You have a Microsoft Surface Book computer with a GPU. Python 3.6 and Visual Studio Code are installed.
You need to run a script that trains a deep neural network (DNN) model and logs the loss and accuracy
metrics.
Solution: Install the Azure ML SDK on the Surface Book. Run Python code to connect to the workspace and
then run the training script as an experiment on local compute.
A. Yes
B. No
Answer: (SHOW ANSWER)

NEW QUESTION: 71
Note: This question is part of a series of questions that present the same scenario. Each question in the
series contains a unique solution that might meet the stated goals. Some question sets might have more than
one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions
will not appear in the review screen.
You have a Python script named train.py in a local folder named scripts. The script trains a regression model
by using scikit-learn. The script includes code to load a training data file which is also located in the scripts
folder.
You must run the script as an Azure ML experiment on a compute cluster named aml-compute.
You need to configure the run to ensure that the environment includes the required packages for model
training. You have instantiated a variable named aml-compute that references the target compute cluster.
Solution: Run the following code:

Does the solution meet the goal?


A. Yes
B. No
Answer: (SHOW ANSWER)
The scikit-learn estimator provides a simple way of launching a scikit-learn training job on a compute target. It
is implemented through the SKLearn class, which can be used to support single-node CPU training.
Example:
from azureml.train.sklearn import SKLearn
}
estimator = SKLearn(source_directory=project_folder,
compute_target=compute_target,
entry_script='train_iris.py'
)
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-scikit-learn

NEW QUESTION: 72
You use Data Science Virtual Machines (DSVMs) for Windows and Linux in Azure.
You need to access the DSVMs.
Which utilities should you use? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.

Answer:

NEW QUESTION: 73
You create an experiment in Azure Machine Learning Studio- You add a training dataset that contains 10.000
rows. The first 9.000 rows represent class 0 (90 percent). The first 1.000 rows represent class 1 (10 percent).
The training set is unbalanced between two Classes. You must increase the number of training examples for
class 1 to 4,000 by using data rows. You add the Synthetic Minority Oversampling Technique (SMOTE)
module to the experiment.
You need to configure the module.
Which values should you use? To answer, select the appropriate options in the dialog box in the answer area.
NOTE: Each correct selection is worth one point.

Answer:

NEW QUESTION: 74
You are building a binary classification model by using a supplied training set.
The training set is imbalanced between two classes.
You need to resolve the data imbalance.
What are three possible ways to achieve this goal? Each correct answer presents a complete solution NOTE:
Each correct selection is worth one point.
A. Normalize the training feature set.
B. Generate synthetic samples in the minority class.
C. Resample the data set using under sampling or oversampling
D. Use accuracy as the evaluation metric of the model.
E. Penalize the classification
Answer: (SHOW ANSWER)

NEW QUESTION: 75
A set of CSV files contains sales records. All the CSV files have the same data schema.
Each CSV file contains the sales record for a particular month and has the filename sales.csv. Each file in
stored in a folder that indicates the month and year when the data was recorded. The folders are in an Azure
blob container for which a datastore has been defined in an Azure Machine Learning workspace. The folders
are organized in a parent folder named sales to create the following hierarchical structure:

At the end of each month, a new folder with that month's sales file is added to the sales folder.
You plan to use the sales data to train a machine learning model based on the following requirements:
* You must define a dataset that loads all of the sales data to date into a structure that can be easily
converted to a dataframe.
* You must be able to create experiments that use only data that was created before a specific previous
month, ignoring any data that was added after that month.
* You must register the minimum number of datasets possible.
You need to register the sales data as a dataset in Azure Machine Learning service workspace.
What should you do?
A. Create a tabular dataset that references the datastore and explicitly specifies each 'sales/mm-yyyy/
sales.csv' file every month. Register the dataset with the name sales_dataset each month, replacing the
existing dataset and specifying a tag named month indicating the month and year it was registered. Use this
dataset for all experiments.
B. Create a tabular dataset that references the datastore and specifies the path 'sales/*/sales.csv', register
the dataset with the name sales_dataset and a tag named month indicating the month and year it was
registered, and use this dataset for all experiments.
C. Create a new tabular dataset that references the datastore and explicitly specifies each 'sales/mm-yyyy/
sales.csv' file every month. Register the dataset with the name sales_dataset_MM-YYYY each month with
appropriate MM and YYYY values for the month and year. Use the appropriate month-specific dataset for
experiments.
D. Create a tabular dataset that references the datastore and explicitly specifies each 'sales/mm-yyyy/
sales.csv' file. Register the dataset with the name sales_dataset each month as a new version and with a tag
named month indicating the month and year it was registered. Use this dataset for all experiments, identifying
the version to be used based on the month tag as necessary.
Answer: B (LEAVE A REPLY)
Specify the path.
Example:
The following code gets the workspace existing workspace and the desired datastore by name. And then
passes the datastore and file locations to the path parameter to create a new TabularDataset, weather_ds.
from azureml.core import Workspace, Datastore, Dataset
datastore_name = 'your datastore name'
# get existing workspace
workspace = Workspace.from_config()
# retrieve an existing datastore in the workspace by name
datastore = Datastore.get(workspace, datastore_name)
# create a TabularDataset from 3 file paths in datastore
datastore_paths = [(datastore, 'weather/2018/11.csv'),
(datastore, 'weather/2018/12.csv'),
(datastore, 'weather/2019/*.csv')]
weather_ds = Dataset.Tabular.from_delimited_files(path=datastore_paths)

NEW QUESTION: 76
You are working on a classification task. You have a dataset indicating whether a student would like to play
soccer and associated attributes. The dataset includes the following columns:
You need to classify variables by type.
Which variable should you add to each category? To answer, select the appropriate options in the answer
area.
NOTE: Each correct selection is worth one point.

Answer:
References:
https://www.edureka.co/blog/classification-algorithms/

Valid DP-100 Dumps shared by Actual4test.com for Helping Passing DP-100 Exam! Actual4test.com now
offer the newest DP-100 exam dumps, the Actual4test.com DP-100 exam questions have been
updated and answers have been corrected get the newest Actual4test.com DP-100 dumps with Test
Engine here: https://www.actual4test.com/exam/DP-100-questions (266 Q&As Dumps, 30%OFF Special
Discount: Freepdfdumps)

NEW QUESTION: 77
You have an Azure Machine Learning workspace that contains a CPU-based compute cluster and an Azure
Kubernetes Services (AKS) inference cluster. You create a tabular dataset containing data that you plan to
use to create a classification model.
You need to use the Azure Machine Learning designer to create a web service through which client
applications can consume the classification model by submitting new data and getting an immediate
prediction as a response.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the list
of actions to the answer area and arrange them in the correct order.
Answer:

Explanation:
Step 1: Create and start a Compute Instance
To train and deploy models using Azure Machine Learning designer, you need compute on which to run the
training process, test the model, and host the model in a deployed service.
There are four kinds of compute resource you can create:
Compute Instances: Development workstations that data scientists can use to work with data and models.
Compute Clusters: Scalable clusters of virtual machines for on-demand processing of experiment code.
Inference Clusters: Deployment targets for predictive services that use your trained models.
Attached Compute: Links to existing Azure compute resources, such as Virtual Machines or Azure Databricks
clusters.
Step 2: Create and run a training pipeline..
After you've used data transformations to prepare the data, you can use it to train a machine learning model.
Create and run a training pipeline Step 3: Create and run a real-time inference pipeline After creating and
running a pipeline to train the model, you need a second pipeline that performs the same data transformations
for new data, and then uses the trained model to inference (in other words, predict) label values based on its
features. This pipeline will form the basis for a predictive service that you can publish for applications to use.
Reference:
https://docs.microsoft.com/en-us/learn/modules/create-classification-model-azure-machine-learning-designer/

NEW QUESTION: 78
You are evaluating a Python NumPy array that contains six data points defined as follows:
data = [10, 20, 30, 40, 50, 60]
You must generate the following output by using the k-fold algorithm implantation in the Python Scikit-learn
machine learning library:
train: [10 40 50 60], test: [20 30]
train: [20 30 40 60], test: [10 50]
train: [10 20 30 50], test: [40 60]
You need to implement a cross-validation to generate the output.
How should you complete the code segment? To answer, select the appropriate code segment in the dialog
box in the answer area.
NOTE: Each correct selection is worth one point.

Answer:
Explanation:
Box 1: k-fold
Box 2: 3
K-Folds cross-validator provides train/test indices to split data in train/test sets. Split dataset into k
consecutive folds (without shuffling by default).
The parameter n_splits ( int, default=3) is the number of folds. Must be at least 2.
Box 3: data
Example: Example:
>>>
>>> from sklearn.model_selection import KFold
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([1, 2, 3, 4])
>>> kf = KFold(n_splits=2)
>>> kf.get_n_splits(X)
2
>>> print(kf)
KFold(n_splits=2, random_state=None, shuffle=False)
>>> for train_index, test_index in kf.split(X):
... print("TRAIN:", train_index, "TEST:", test_index)
... X_train, X_test = X[train_index], X[test_index]
... y_train, y_test = y[train_index], y[test_index]
TRAIN: [2 3] TEST: [0 1]
TRAIN: [0 1] TEST: [2 3]
References:
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html
NEW QUESTION: 79
You need to identify the methods for dividing the data according to the testing requirements.
Which properties should you select? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Answer:
Explanation
Scenario: Testing
You must produce multiple partitions of a dataset based on sampling using the Partition and Sample module
in Azure Machine Learning Studio.
Box 1: Assign to folds
Use Assign to folds option when you want to divide the dataset into subsets of the data. This option is also
useful when you want to create a custom number of folds for cross-validation, or to split rows into several
groups.
Not Head: Use Head mode to get only the first n rows. This option is useful if you want to test a pipeline on a
small number of rows, and don't need the data to be balanced or sampled in any way.
Not Sampling: The Sampling option supports simple random sampling or stratified random sampling. This is
useful if you want to create a smaller representative sample dataset for testing.
Box 2: Partition evenly
Specify the partitioner method: Indicate how you want data to be apportioned to each partition, using these
options:
* Partition evenly: Use this option to place an equal number of rows in each partition. To specify the number
of output partitions, type a whole number in the Specify number of folds to split evenly into text box.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/algorithm-module-reference/partition-and-sample

NEW QUESTION: 80
You are a data scientist building a deep convolutional neural network (CNN) for image classification.
The CNN model you built shows signs of overfitting.
You need to reduce overfitting and converge the model to an optimal fit.
Which two actions should you perform? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.
A. Reduce the amount of training data.
B. Add an additional dense layer with 512 input units.
C. Use training data augmentation
D. Add L1/L2 regularization.
E. Add an additional dense layer with 64 input units
Answer: B,E (LEAVE A REPLY)

NEW QUESTION: 81
You need to use the Python language to build a sampling strategy for the global penalty detection models.
How should you complete the code segment? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Answer:
Explanation
Box 1: import pytorch as deeplearninglib
Box 2: ..DistributedSampler(Sampler)..
DistributedSampler(Sampler):
Sampler that restricts data loading to a subset of the dataset.
It is especially useful in conjunction with class:`torch.nn.parallel.DistributedDataParallel`. In such case, each
process can pass a DistributedSampler instance as a DataLoader sampler, and load a subset of the original
dataset that is exclusive to it.
Scenario: Sampling must guarantee mutual and collective exclusively between local and global segmentation
models that share the same features.
Box 3: optimizer = deeplearninglib.train. GradientDescentOptimizer(learning_rate=0.10)

NEW QUESTION: 82
You are performing clustering by using the K-means algorithm.
You need to define the possible termination conditions.
Which three conditions can you use? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.
A. Centroids do not change between iterations.
B. The residual sum of squares (RSS) rises above a threshold.
C. The residual sum of squares (RSS) falls below a threshold.
D. A fixed number of iterations is executed.
E. The sum of distances between centroids reaches a maximum.
Answer: A,C,D (LEAVE A REPLY)
AD: The algorithm terminates when the centroids stabilize or when a specified number of iterations are
completed.
C: A measure of how well the centroids represent the members of their clusters is the residual sum of squares
or RSS, the squared distance of each vector from its centroid summed over all vectors. RSS is the objective
function and our goal is to minimize it.
References:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/k-means-clustering
https://nlp.stanford.edu/IR-book/html/htmledition/k-means-1.html

NEW QUESTION: 83
You need to implement early stopping criteria as suited in the model training requirements.
Which three code segments should you use to develop the solution? To answer, move the appropriate code
segments from the list of code segments to the answer area and arrange them in the correct order.
NOTE: More than one order of answer choices is correct. You will receive credit for any of the correct orders
you select.

Answer:
Explanation:
You need to implement an early stopping criterion on models that provides savings without terminating
promising jobs.
Truncation selection cancels a given percentage of lowest performing runs at each evaluation interval. Runs
are compared based on their performance on the primary metric and the lowest X% are terminated.
Example:
from azureml.train.hyperdrive import TruncationSelectionPolicy
early_termination_policy = TruncationSelectionPolicy(evaluation_interval=1, truncation_percentage=20,
delay_evaluation=5) Incorrect Answers:
Bandit is a termination policy based on slack factor/slack amount and evaluation interval. The policy early
terminates any runs where the primary metric is not within the specified slack factor / slack amount with
respect to the best performing training run.
Example:
from azureml.train.hyperdrive import BanditPolicy
early_termination_policy = BanditPolicy(slack_factor = 0.1, evaluation_interval=1, delay_evaluation=5
References:
https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters

NEW QUESTION: 84
You have a dataset that includes home sales data for a city. The dataset includes the following columns.

Each row in the dataset corresponds to an individual home sales transaction.


You need to use automated machine learning to generate the best model for predicting the sales price based
on the features of the house.
Which values should you use? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.

Answer:

Explanation:
Box 1: Regression
Regression is a supervised machine learning technique used to predict numeric values.
Box 2: Price
Reference:
https://docs.microsoft.com/en-us/learn/modules/create-regression-model-azure-machine-learning-designer

NEW QUESTION: 85
Note: This question is part of a series of questions that present the same scenario. Each question in the
series contains a unique solution that might meet the stated goals. Some question sets might have more than
one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions
will not appear in the review screen.
You create an Azure Machine Learning service datastore in a workspace. The datastore contains the
following files:
* /data/2018/Q1.csv
* /data/2018/Q2.csv
* /data/2018/Q3.csv
* /data/2018/Q4.csv
* /data/2019/Q1.csv
All files store data in the following format:
id,f1,f2,I
1,1,2,0
2,1,1,1
3,2,1,0
4,2,2,1
You run the following code:

You need to create a dataset named training_data and load the data from all files into a single data frame by
using the following code:

Solution: Run the following code:

Does the solution meet the goal?


A. Yes
B. No
Answer: A (LEAVE A REPLY)
Use two file paths.
Use Dataset.Tabular_from_delimeted as the data isn't cleansed.
Note:
A TabularDataset represents data in a tabular format by parsing the provided file or list of files. This provides
you with the ability to materialize the data into a pandas or Spark DataFrame so you can work with familiar
data preparation and training libraries without having to leave your notebook. You can create a
TabularDataset object from .csv, .tsv, .parquet, .jsonl files, and from SQL query results.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-register-datasets
NEW QUESTION: 86
You need to implement early stopping criteria as suited in the model training requirements.
Which three code segments should you use to develop the solution? To answer, move the appropriate code
segments from the list of code segments to the answer area and arrange them in the correct order.
NOTE: More than one order of answer choices is correct. You will receive credit for any of the correct orders
you select.

Answer:

Explanation

You need to implement an early stopping criterion on models that provides savings without terminating
promising jobs.
Truncation selection cancels a given percentage of lowest performing runs at each evaluation interval. Runs
are compared based on their performance on the primary metric and the lowest X% are terminated.
Example:
from azureml.train.hyperdrive import TruncationSelectionPolicy
early_termination_policy = TruncationSelectionPolicy(evaluation_interval=1, truncation_percentage=20,
delay_evaluation=5)

NEW QUESTION: 87
You arc I mating a deep learning model to identify cats and dogs. You have 25,000 color images.
You must meet the following requirements:
* Reduce the number of training epochs.
* Reduce the size of the neural network.
* Reduce over-fitting of the neural network.
You need to select the image modification values.
Which value should you use? To answer, select the appropriate Options in the answer area.
NOTE: Each correct selection is worth one point.

Answer:

NEW QUESTION: 88
You create a binary classification model using Azure Machine Learning Studio.
You must use a Receiver Operating Characteristic (RO C) curve and an F1 score to evaluate the model.
You need to create the required business metrics.
How should you complete the experiment? To answer, select the appropriate options in the dialog box in the
answer area.
NOTE: Each correct selection is worth one point.
Answer:
NEW QUESTION: 89
Note: This question is part of a series of questions that present the same scenario. Each question in the
series contains a unique solution that might meet the stated goals. Some question sets might have more than
one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions
will not appear in the review screen.
You are analyzing a numerical dataset which contain missing values in several columns.
You must clean the missing values using an appropriate operation without affecting the dimensionality of the
feature set.
You need to analyze a full dataset to include all values.
Solution: Use the last Observation Carried Forward (IOCF) method to impute the missing data points.
Does the solution meet the goal?
A. Yes
B. No
Answer: B (LEAVE A REPLY)
Explanation
Instead use the Multiple Imputation by Chained Equations (MICE) method.
Replace using MICE: For each missing value, this option assigns a new value, which is calculated by using a
method described in the statistical literature as "Multivariate Imputation using Chained Equations" or
"Multiple Imputation by Chained Equations". With a multiple imputation method, each variable with missing
data is modeled conditionally using the other variables in the data before filling in the missing values.
Note: Last observation carried forward (LOCF) is a method of imputing missing data in longitudinal studies. If
a person drops out of a study before it ends, then his or her last observed score on the dependent variable is
used for all subsequent (i.e., missing) observation points. LOCF is used to maintain the sample size and to
reduce the bias caused by the attrition of participants in a study.
References:
https://methods.sagepub.com/reference/encyc-of-research-design/n211.xml
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3074241/

NEW QUESTION: 90
Note: This question is part of a series of questions that present the same scenario. Each question in the
series contains a unique solution that might meet the stated goals. Some question sets might have more than
one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions
will not appear in the review screen.
You create a model to forecast weather conditions based on historical data.
You need to create a pipeline that runs a processing script to load data from a datastore and pass the
processed data to a machine learning model training script.
Solution: Run the following code:

Does the solution meet the goal?


A. Yes
B. No
Answer: B (LEAVE A REPLY)
Explanation
Note: Data used in pipeline can be produced by one step and consumed in another step by providing a
PipelineData object as an output of one step and an input of one or more subsequent steps.
Compare with this example, the pipeline train step depends on the process_step_output output of the pipeline
process step:
from azureml.pipeline.core import Pipeline, PipelineData
from azureml.pipeline.steps import PythonScriptStep
datastore = ws.get_default_datastore()
process_step_output = PipelineData("processed_data", datastore=datastore) process_step =
PythonScriptStep(script_name="process.py", arguments=["--data_for_train", process_step_output],
outputs=[process_step_output], compute_target=aml_compute, source_directory=process_directory)
train_step = PythonScriptStep(script_name="train.py", arguments=["--data_for_train", process_step_output],
inputs=[process_step_output], compute_target=aml_compute, source_directory=train_directory) pipeline =
Pipeline(workspace=ws, steps=[process_step, train_step]) Reference:
https://docs.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.pipelinedata?
view=azu

NEW QUESTION: 91
You have a dataset contains 2,000 rows. You arc building a machine learning classification model by using
Azure Machine Learning Studio. You add a Partition and Sample module to the experiment.
You need to configure the module. You must meet the following requirements:
* Divide the data into subsets.
* Assign the rows into folds using a round-robin method.
* Allow rows in the dataset to be reused.
How should you configure the module? To answer select the appropriate Options m the dialog box in the
answer area.
NOTE: Each correct selection is worth one point.

Answer:

Explanation
Valid DP-100 Dumps shared by Actual4test.com for Helping Passing DP-100 Exam! Actual4test.com now
offer the newest DP-100 exam dumps, the Actual4test.com DP-100 exam questions have been
updated and answers have been corrected get the newest Actual4test.com DP-100 dumps with Test
Engine here: https://www.actual4test.com/exam/DP-100-questions (266 Q&As Dumps, 30%OFF Special
Discount: Freepdfdumps)

NEW QUESTION: 92
You are evaluating a Python NumPy array that contains six data points defined as follows:
data = [10, 20, 30, 40, 50, 60]
You must generate the following output by using the k-fold algorithm implantation in the Python Scikit-learn
machine learning library:
train: [10 40 50 60], test: [20 30]
train: [20 30 40 60], test: [10 50]
train: [10 20 30 50], test: [40 60]
You need to implement a cross-validation to generate the output.
How should you complete the code segment? To answer, select the appropriate code segment in the dialog
box in the answer area.
NOTE: Each correct selection is worth one point.
Answer:

Explanation:
Box 1: k-fold
Box 2: 3
K-Folds cross-validator provides train/test indices to split data in train/test sets. Split dataset into k
consecutive folds (without shuffling by default).
The parameter n_splits ( int, default=3) is the number of folds. Must be at least 2.
Box 3: data
Example: Example:
>>>
>>> from sklearn.model_selection import KFold
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([1, 2, 3, 4])
>>> kf = KFold(n_splits=2)
>>> kf.get_n_splits(X)
2
>>> print(kf)
KFold(n_splits=2, random_state=None, shuffle=False)
>>> for train_index, test_index in kf.split(X):
... print("TRAIN:", train_index, "TEST:", test_index)
... X_train, X_test = X[train_index], X[test_index]
... y_train, y_test = y[train_index], y[test_index]
TRAIN: [2 3] TEST: [0 1]
TRAIN: [0 1] TEST: [2 3]
References:
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html

NEW QUESTION: 93
You need to modify the inputs for the global penalty event model to address the bias and variance issue.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the list
of actions to the answer area and arrange them in the correct order.

Answer:
NEW QUESTION: 94
You are using a decision tree algorithm. You have trained a model that generalizes well at a tree depth equal
to
10.
You need to select the bias and variance properties of the model with varying tree depth values.
Which properties should you select for each tree depth? To answer, select the appropriate options in the
answer area.

Answer:

Explanation
In decision trees, the depth of the tree determines the variance. A complicated decision tree (e.g. deep) has
low bias and high variance.
Note: In statistics and machine learning, the bias-variance tradeoff is the property of a set of predictive
models whereby models with a lower bias in parameter estimation have a higher variance of the parameter
estimates across samples, and vice versa. Increasing the bias will decrease the variance. Increasing the
variance will decrease the bias.
References:
https://machinelearningmastery.com/gentle-introduction-to-the-bias-variance-trade-off-in-machine-learning/

NEW QUESTION: 95
You are producing a multiple linear regression model in Azure Machine Learning Studio.
Several independent variables are highly correlated.
You need to select appropriate methods for conducting effective feature engineering on all the data.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the list
of actions to the answer area and arrange them in the correct order.

Answer:
Explanation:
Step 1: Use the Filter Based Feature Selection module
Filter Based Feature Selection identifies the features in a dataset with the greatest predictive power.
The module outputs a dataset that contains the best feature columns, as ranked by predictive power. It also
outputs the names of the features and their scores from the selected metric.
Step 2: Build a counting transform
A counting transform creates a transformation that turns count tables into features, so that you can apply the
transformation to multiple datasets.
Step 3: Test the hypothesis using t-Test
References:
https://docs.microsoft.com/bs-latn-ba/azure/machine-learning/studio-module-reference/filter-based-feature-
selection
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/build-counting-transform

NEW QUESTION: 96
You are retrieving data from a large datastore by using Azure Machine Learning Studio.
You must create a subset of the data for testing purposes using a random sampling seed based on the
system clock.
You add the Partition and Sample module to your experiment.
You need to select the properties for the module.
Which values should you select? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Answer:
References:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/partition-and-sample

NEW QUESTION: 97
You define a datastore named ml-data for an Azure Storage blob container. In the container, you have a
folder named train that contains a file named data.csv. You plan to use the file to train a model by using the
Azure Machine Learning SDK.
You plan to train the model by using the Azure Machine Learning SDK to run an experiment on local
compute.
You define a DataReference object by running the following code:

You need to load the training data.


Which code segment should you use?
A. Option A
B. Option B
C. Option C
D. Option D
E. Option E
Answer: E (LEAVE A REPLY)
Example:
data_folder = args.data_folder
# Load Train and Test data
train_data = pd.read_csv(os.path.join(data_folder, 'data.csv'))
Reference:
https://www.element61.be/en/resource/azure-machine-learning-services-complete-toolbox-ai

NEW QUESTION: 98
Note: This question is part of a series of questions that present the same scenario. Each question in the
series contains a unique solution that might meet the stated goals. Some question sets might have more than
one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions
will not appear in the review screen.
You are using Azure Machine Learning to run an experiment that trains a classification model.
You want to use Hyperdrive to find parameters that optimize the AUC metric for the model. You configure a
HyperDriveConfig for the experiment by running the following code:
You plan to use this configuration to run a script that trains a random forest model and then tests it with
validation data. The label values for the validation data are stored in a variable named y_test variable, and the
predicted probabilities from the model are stored in a variable named y_predicted.
You need to add logging to the script to allow Hyperdrive to optimize hyperparameters for the AUC metric.
Solution: Run the following code:

Does the solution meet the goal?


A. Yes
B. No
Answer: B (LEAVE A REPLY)
Explanation
Use a solution with logging.info(message) instead.
Note: Python printing/logging example:
logging.info(message)
Destination: Driver logs, Azure Machine Learning designer
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-debug-pipelines

NEW QUESTION: 99

For each of the following statements, select Yes if the statement is true. Otherwise, select No. NOTE: Each
correct selection is worth one point.

Answer:
NEW QUESTION: 100
You create a binary classification model to predict whether a person has a disease.
You need to detect possible classification errors.
Which error type should you choose for each description? To answer, select the appropriate options in the
answer area.
NOTE: Each correct selection is worth one point.
Answer:

Explanation:
Box 1: True Positive
A true positive is an outcome where the model correctly predicts the positive class Box 2: True Negative A
true negative is an outcome where the model correctly predicts the negative class.
Box 3: False Positive
A false positive is an outcome where the model incorrectly predicts the positive class.
Box 4: False Negative
A false negative is an outcome where the model incorrectly predicts the negative class.
Note: Let's make the following definitions:
"Wolf" is a positive class.
"No wolf" is a negative class.
We can summarize our "wolf-prediction" model using a 2x2 confusion matrix that depicts all four possible
outcomes:
Reference:
https://developers.google.com/machine-learning/crash-course/classification/true-false-positive-negative

NEW QUESTION: 101


You are creating a machine learning model in Python. The provided dataset contains several numerical
columns and one text column. The text column represents a product's category. The product category will
always be one of the following:
Bikes
Cars
Vans
Boats
You are building a regression model using the scikit-learn Python package.
You need to transform the text data to be compatible with the scikit-learn Python package.
How should you complete the code segment? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.

Answer:
Explanation:
Box 1: pandas as df
Pandas takes data (like a CSV or TSV file, or a SQL database) and creates a Python object with rows and
columns called data frame that looks very similar to table in a statistical software (think Excel or SPSS for
example.
Box 2: transpose[ProductCategoryMapping]
Reshape the data from the pandas Series to columns.
Reference:
https://datascienceplus.com/linear-regression-in-python/

NEW QUESTION: 102


You are retrieving data from a large datastore by using Azure Machine Learning Studio.
You must create a subset of the data for testing purposes using a random sampling seed based on the
system clock.
You add the Partition and Sample module to your experiment.
You need to select the properties for the module.
Which values should you select? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Answer:
Explanation:
Box 1: Sampling
Create a sample of data
This option supports simple random sampling or stratified random sampling. This is useful if you want to
create a smaller representative sample dataset for testing.
1. Add the Partition and Sample module to your experiment in Studio, and connect the dataset.
2. Partition or sample mode: Set this to Sampling.
3. Rate of sampling. See box 2 below.
Box 2: 0
3. Rate of sampling. Random seed for sampling: Optionally, type an integer to use as a seed value.
This option is important if you want the rows to be divided the same way every time. The default value is 0,
meaning that a starting seed is generated based on the system clock. This can lead to slightly different results
each time you run the experiment.
References:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/partition-and-sample

NEW QUESTION: 103


You need to configure the Edit Metadata module so that the structure of the datasets match.
Which configuration options should you select? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Answer:
Explanation:
Box 1: Floating point
Need floating point for Median values.
Scenario: An initial investigation shows that the datasets are identical in structure apart from the MedianValue
column. The smaller Paris dataset contains the MedianValue in text format, whereas the larger London
dataset contains the MedianValue in numerical format.
Box 2: Unchanged
Note: Select the Categorical option to specify that the values in the selected columns should be treated as
categories.
For example, you might have a column that contains the numbers 0,1 and 2, but know that the numbers
actually mean "Smoker", "Non smoker" and "Unknown". In that case, by flagging the column as categorical
you can ensure that the values are not used in numeric calculations, only to group data.

NEW QUESTION: 104


Note: This question is part of a series of questions that present the same scenario. Each question in the
series contains a unique solution that might meet the stated goals. Some question sets might have more than
one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions
will not appear in the review screen.
You are analyzing a numerical dataset which contains missing values in several columns.
You must clean the missing values using an appropriate operation without affecting the dimensionality of the
feature set.
You need to analyze a full dataset to include all values.
Solution: Remove the entire column that contains the missing data point.
Does the solution meet the goal?
A. Yes
B. No
Answer: B (LEAVE A REPLY)
Use the Multiple Imputation by Chained Equations (MICE) method.
References:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3074241/
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/clean-missing-data

NEW QUESTION: 105


You have a multi-class image classification deep learning model that uses a set of labeled photographs. You
create the following code to select hyperparameter values when training the model.

For each of the following statements, select Yes if the statement is true. Otherwise, select No.
NOTE: Each correct selection is worth one point.

Answer:
Explanation:
Box 1: Yes
Hyperparameters are adjustable parameters you choose to train a model that govern the training process
itself. Azure Machine Learning allows you to automate hyperparameter exploration in an efficient manner,
saving you significant time and resources. You specify the range of hyperparameter values and a maximum
number of training runs. The system then automatically launches multiple simultaneous runs with different
parameter configurations and finds the configuration that results in the best performance, measured by the
metric you choose. Poorly performing training runs are automatically early terminated, reducing wastage of
compute resources. These resources are instead used to explore other hyperparameter configurations.
Box 2: Yes
uniform(low, high) - Returns a value uniformly distributed between low and high Box 3: No Bayesian sampling
does not currently support any early termination policy.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters

NEW QUESTION: 106


You deploy a model in Azure Container Instance.
You must use the Azure Machine Learning SDK to call the model API.
You need to invoke the deployed model using native SDK classes and methods.
How should you complete the command? To answer, select the appropriate options in the answer areas.
NOTE: Each correct selection is worth one point.
Answer:

Reference:
https://docs.microsoft.com/bs-latn-ba/azure/machine-learning/how-to-deploy-azure-container-instance
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-troubleshoot-deployment

Valid DP-100 Dumps shared by Actual4test.com for Helping Passing DP-100 Exam! Actual4test.com now
offer the newest DP-100 exam dumps, the Actual4test.com DP-100 exam questions have been
updated and answers have been corrected get the newest Actual4test.com DP-100 dumps with Test
Engine here: https://www.actual4test.com/exam/DP-100-questions (266 Q&As Dumps, 30%OFF Special
Discount: Freepdfdumps)

NEW QUESTION: 107


You create a binary classification model.
You need to evaluate the model performance.
Which two metrics can you use? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.
A. relative absolute error
B. precision
C. accuracy
D. mean absolute error
E. coefficient of determination
Answer: B,C (LEAVE A REPLY)
The evaluation metrics available for binary classification models are: Accuracy, Precision, Recall, F1 Score,
and AUC.
Note: A very natural question is: 'Out of the individuals whom the model, how many were classified correctly
(TP)?' This question can be answered by looking at the Precision of the model, which is the proportion of
positives that are classified correctly.
References:
https://docs.microsoft.com/en-us/azure/machine-learning/studio/evaluate-model-performance

NEW QUESTION: 108


You are developing a linear regression model in Azure Machine Learning Studio. You run an experiment to
compare different algorithms.
The following image displays the results dataset output:

Use the drop-down menus to select the answer choice that answers each question based on the information
presented in the image.
NOTE: Each correct selection is worth one point.
Answer:

Explanation

Box 1: Boosted Decision Tree Regression


Mean absolute error (MAE) measures how close the predictions are to the actual outcomes; thus, a lower
score is better.
Box 2:
Online Gradient Descent: If you want the algorithm to find the best parameters for you, set Create trainer
mode option to Parameter Range. You can then specify multiple values for the algorithm to try.
References:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/evaluate-model
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/linear-regression

NEW QUESTION: 109


You create a new Azure subscription. No resources are provisioned in the subscription.
You need to create an Azure Machine Learning workspace.
What are three possible ways to achieve this goal? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.
A. Run Python code that uses the Azure ML SDK library and calls the Workspacesreate method with name,
subscriptionjd, resource_group, and location parameters.
B. Use the Azure Command Line Interface (CLI) with the Azure Machine Learning extension to call the az
group create function with -name and -location parameters, and then the az ml workspace create function,
specifying -w and -g parameters for the workspace name and resource group.
C. Navigate to Azure Machine Learning studio and create a workspace.
D. Run Python code that uses the Azure ML SDK library and calls the Workspace.get method with name,
subscriptionjd, and resource.group parameters.
E. Use an Azure Resource Management template that includes a
Microsoft.MachineLearningServices/workspaces resource and its dependencies.
Answer: A,C,D (LEAVE A REPLY)

NEW QUESTION: 110


You use the Azure Machine Learning SDK in a notebook to run an experiment using a script file in an
experiment folder.
The experiment fails.
You need to troubleshoot the failed experiment.
What are two possible ways to achieve this goal? Each correct answer presents a complete solution.
A. Use the get_metrics() method of the run object to retrieve the experiment run logs.
B. Use the get_details_with_logs() method of the run object to display the experiment run logs.
C. View the log files for the experiment run in the experiment folder.
D. View the logs for the experiment run in Azure Machine Learning studio.
E. Use the get_output() method of the run object to retrieve the experiment run logs.
Answer: B,D (LEAVE A REPLY)
Explanation
Use get_details_with_logs() to fetch the run details and logs created by the run.
You can monitor Azure Machine Learning runs and view their logs with the Azure Machine Learning studio.
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.steprun
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-monitor-view-training-logs

NEW QUESTION: 111


You are creating a new experiment in Azure Machine Learning Studio. You have a small dataset that has
missing values in many columns. The data does not require the application of predictors for each column. You
plan to use the Clean Missing Data module to handle the missing data.
You need to select a data cleaning method.
Which method should you use?
A. Synthetic Minority Oversampling Technique (SMOTE)
B. Replace using; Probabilistic PCA
C. Replace using MICE
D. Normalization
Answer: (SHOW ANSWER)

NEW QUESTION: 112


You need to identify the methods for dividing the data according, to the testing requirements.
Which properties should you select? To answer, select the appropriate option-, m the answer area. NOTE:
Each correct selection is worth one point.

Answer:
NEW QUESTION: 113
You are tuning a hyperparameter for an algorithm. The following table shows a data set with different
hyperparameter, training error, and validation errors.

Use the drop-down menus to select the answer choice that answers each question based on the information
presented in the graphic.
Answer:

Explanation:
Box 1: 4
Choose the one which has lower training and validation error and also the closest match.
Minimize variance (difference between validation error and train error).
Box 2: 5
Minimize variance (difference between validation error and train error).
Reference:
https://medium.com/comet-ml/organizing-machine-learning-projects-project-management-
guidelines-2d2b85651bbd

NEW QUESTION: 114


You need to identify the methods for dividing the data according, to the testing requirements.
Which properties should you select? To answer, select the appropriate option-, m the answer area. NOTE:
Each correct selection is worth one point.

Answer:
Valid DP-100 Dumps shared by Actual4test.com for Helping Passing DP-100 Exam! Actual4test.com now
offer the newest DP-100 exam dumps, the Actual4test.com DP-100 exam questions have been
updated and answers have been corrected get the newest Actual4test.com DP-100 dumps with Test
Engine here: https://www.actual4test.com/exam/DP-100-questions (266 Q&As Dumps, 30%OFF Special
Discount: Freepdfdumps)

You might also like