0% found this document useful (0 votes)
50 views

CSC 309 Project 10 K-Nearest Neighbors With Scikit-Learn

This document provides instructions for Project 10, which involves implementing a K-Nearest Neighbors classifier using Scikit-Learn on the Iris dataset. Students will read training (x1, y1) and test (x2) data from files, train a KNN classifier using x1 and y1, predict values for y2 given x2, and submit their predictions and a Python program to iLearn for grading. The document provides sample code and instructions on loading and preprocessing the data from the files into NumPy arrays for use in Scikit-Learn.

Uploaded by

wizbizphd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views

CSC 309 Project 10 K-Nearest Neighbors With Scikit-Learn

This document provides instructions for Project 10, which involves implementing a K-Nearest Neighbors classifier using Scikit-Learn on the Iris dataset. Students will read training (x1, y1) and test (x2) data from files, train a KNN classifier using x1 and y1, predict values for y2 given x2, and submit their predictions and a Python program to iLearn for grading. The document provides sample code and instructions on loading and preprocessing the data from the files into NumPy arrays for use in Scikit-Learn.

Uploaded by

wizbizphd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

CSc 309 Project 10

K-Nearest Neighbors with Scikit-Learn


Due Monday, 5/14/2018. 5:55pm (7% of your grade)
(Standard late policy: 10% per day late up to 3 days, then 50% max)
(This document is available from the CSc 309 iLearn site.)

You must work by yourself on this project.

For this project you will implement a Machine Learning Classifier, called K-Nearest
Neighbors and use it on the Iris dataset.

For more information on the Iris data set, please see the Wikipedia page.

https://en.wikipedia.org/wiki/Iris_flower_data_set

Your program will read training (x1, y1) and test (x2) data sets from files provided on the
website, run KNN, and predict the labels for test set.

http://unixlab.sfsu.edu/~ats/csc309s18/projects/p10/

At that link, you will find three files: x1, y1, and x2.

Your program must train your KNN classifier with x1 and y1, then it must predict the
values for y2, when given x2. I will then compare your answers with the actual answers,
for correctness.

See the class notes on Scikit-Learn, NumPy and SciPy to help.

Your program will read in each file’s data into numpy arrays. Hint: Use np.genfromtxt().

https://docs.scipy.org/doc/numpy/reference/routines.io.html

Then, using the those numpy arrays, you simply need to run a similar example to the one
we went over in the class slides for Scikit-learn but instead of loading the iris data from
the datasets module, you will be reading in the data from the files provided on the class
website as previously mentioned.
# Example from class, using Scikit­Learn

from sklearn import neighbors, datasets                    
from sklearn.model_selection import train_test_split
 
# load the iris dataset into a variable
iris = datasets.load_iris()
k = 15
 
# split the dataset into, x_train, x_test, y_train, y_test
x1, x2, y1, y2= train_test_split(iris.data, iris.target, 
test_size=0.1)
 
# Instantiate KNN classifier
clf = neighbors.KNeighborsClassifier(k)
 
# Fit the classifier with training data
clf.fit(x1, y1)
  
# Print the predicted and actual labels
print("Predicted labels: ", clf.predict(x2))
print("Actual labels:    ", y2)

SAMPLE OUTPUT:

Predicted labels:  [2 2 2 2 2 0 2 0 0 0 0 1 0 0 1]
Actual labels:     [2 2 2 2 2 0 2 0 0 0 0 1 0 0 1]

If you do not have scikit-learn, try running: “sudo pip3 install scikit-learn”

This project is intended to be very easy and expose you to your first Machine
Learning application. You are encouraged to try different values for K, inspect
variables, and to try other classifiers.

Submission: Submit two files: your main Python program and an output.txt file that
contains your y2 answers to the iLearn submission link. You can use python functions or
copy and paste to create the output.txt file. 90 points for correctness, 10 points for
headers, documentation and clarity.

You might also like