Course Title: Big Data for Business Applications
Semester: Spring, 2019
Course Prefix, Number and Section: ISM 6562
Credit Hours: 3
Instructor: Kaushik Dutta
Meeting Location (delivery method): Lecture
Office: ISDS Department (CIS Building)
Meeting Time: Friday 1.00 PM – 4.45 PM
Email:
[email protected]Phone: 813-974-6338
Teaching Assistants: Chaitanya Sharma
[email protected]Office Hours: Monday 10 AM – 12 Noon
Course Description:
With the advent of social media and IoT (internet of things), the data volume in organizations
has increased rapidly in recent years. With the increased in data volume to terabytes and even
petabytes, traditional database and analytical techniques are not sufficient. In this course
students will learn various big data technologies and how they can be used for data
management and data analytics purposes to handle such massive dataset. The first half of the
course will focus on big data storage technologies such as No-SQL database and distributed file
system. The second half of the course will focus on big data computational platforms such as
Hadoop map-reduce and Spark. The course will cover in-depth spark programming on big data
platform.
Course Objectives:
Identify various components of big data technologies;
Understand how big Data technologies can be used;
Design innovative architecture and online applications.
Apply the knowledge of big data technologies to design and scale applications
Learning Outcomes:
Upon completion of this course students should be able to:
Design and manage large business applications with millions of users
Design analytic applications using several tens to hundreds of terabytes data
Develop business applications running on top of several tens to hundreds of terabytes data
Perquisites:
Relational Database
Java or Python
Web application development
Willingness to work hard with Big data technology stack
Course Materials:
Students do not need to purchase any software or book for this class. However, the instructor
will direct students to various online materials, books, manuals, videos and software to
supplement the class lecture.
Book
1. Frank Kane's Taming Big Data with Apache Spark and Python - eBook freely available
from the library
2. Ullman – Mining of Massive Datasets - http://www.mmds.org/
Grading Scale and Basis for Grades:
The grading will be based on following components in the class –
Cassandra Quiz 5%
Assignment (Individual) 20%
eCommerce Project (Group) 25%
In Class Quizzes 10%
Final Exam 40%
The students will be given an “Incomplete” grade only as per the university policy without any
exception.
The final grading will be based roughly on the following scale –
95% A/A+
90 to 95% A-
80-90 B+
70-80 B
60-70 B-
Below 60% F
Please note that this is a tentative grading scale and can change based on the performance of
the students and overall performance of the class. However, the faculty reserves the right to
change the grading scale as he deems appropriate based on overall performance of the class.
BONUS POINTS FOR PARTICIPATING IN AN IRB APPROVED STUDY ON CLOUD
SECURITY (5%)
Name of the study:
SCARF: A Secured Cloud Filesystem (USF IRB #Pro00035037)
Short description of study:
The purpose of this study is to evaluate the new design of a secured cloud storage
system. The experiment will take approximately 15-20 minutes of your time.
You will 5 point extra credit for your participation.
Contact Vivek Kumar Singh at: [email protected] or call at 813 580 9131
Once you have participated by 17th February, Vivek will assign the bonus point in the
Canvas.
Attendance Policy:
Students are required to attend all classes. Failure to attend classes will result in losing grades.
If a students is absent in more than 3 classes the student may fail the class in variant of his/her
performance in other components of grading.
Course Outline and Schedule:
Week Discussion Topic Reading/Assignment
Introduction to Big Data Application
Big data Storage
1
Big data computation
Cassandra Overview
http://dl.acm.org/citation.cfm?id=17739
22
Cassandra Tutorial
https://academy.datastax.com/courses/d
s201-cassandra-core-concepts
NoSQL DB
http://arxiv.org/ftp/arxiv/papers/1307/13
07.0191.pdf
No-SQL Database Basics and concepts
2 Various No-SQL database systems
DynamoDB -
Cassandra
http://dl.acm.org/citation.cfm?id=12942
81
Google File System –
http://dl.acm.org/citation.cfm?id=94545
0
Big Table –
http://dl.acm.org/citation.cfm?id=13658
16
3 Cassandra, MongoDB and HBase
HDFS Tutorials
Distributed File Systems – HDFS, Cloudera
4 http://hortonworks.com/products/horton
Platform
works-sandbox/
Big Data Search System Distributed Cache –
5
Solr, ElasticSearch
Spark -
http://static.usenix.org/legacy/events/ho
tcloud10/tech/full_papers/Zaharia.pdf
Distributed Computing – Map-Reduce, Hadoop
Map-Reduce, MongoDB Map-Reduce, Hive, MapReduce –
6
Kudu, Impala
http://dl.acm.org/citation.cfm?id=17739
22
Ullman – Chapter 2
Spark -
7 Spark Platform http://static.usenix.org/legacy/events/ho
tcloud10/tech/full_papers/Zaharia.pdf
8 Spark basics programming – Resilient Chapter 2 - Frank Kane's Taming Big
distributed data, Jupyter on Cloudera platform Data
Chapter 3 - Frank Kane's Taming Big
9 Advanced spark programming
Data
Spark SQL – Dataframe, Dataset, Spark Chapter 5, 6 - Frank Kane's Taming Big
10
Machine Learning Libraries Data
Other distributed platforms – Messaging
11 platform (Kafka, Kinesis), Asynchronous
application platform (Node.JS),
12 Final Exam – Spark Programming Final Exam Week
Group Project: Application development or
The final project will be a group project, with each group comprising of 4 members. There are
two options on the project.
1. Each group is expected to create a fully functional, end to end eCommerce application
using available hardware and software resources. The development and implementation
can be done in a single workstation of multiple workstations connected in a distributed
manner.
Some of the components which are required (but not restricted to) for this application
are:
A suitable NoSQL database
A cache system
A CDN
A textual search system.
2. Each group is expected to take publicly available dataset, do the analysis, visualization,
model creation on the dataset using Jupyter Notebook and HDFS system on Cloudera
platform.
COURSE POLICIES
Late Work Policy:
There are no make-up opportunities for in-class exams. Topical assignments turned in
late will be assessed a penalty of 20% for each for each day the assignment is late.
Assignments will not be accepted if late by more than 5 days.
Extra Credit Policy:
There are no opportunities for extra credit in this course. Students’ focus should be on
the primary work in the course.
Grades of "Incomplete":
An “I” grade may be awarded to a student when 1) arrangements are made prior to the
end of the semester, 2) in the judgment of the instructor a valid reason is offered for
granting an Incomplete, 3) a clear path to a standard grade is agreed to by the
instructor and the student which will result in successful completion of course
requirements by the end of the succeeding semester. Offer specifics about your policy
on incomplete grades. “I” grades not removed by the end of the next semester will be
changed to “IF”.
Email:
The primary means of communication between instructor and students between live
class meetings will be email. “Blast emails” will occasionally be sent by the instructor to
all students via Canvas. Students can feel free to email their instructor with questions at
any time. Please anticipate a response time of 24 hours to email queries.
Canvas:
Canvas will be used in this course to disseminate materials turn in weekly assignments,
and return graded assignments. If you need help learning how to perform various tasks
related to this course or other courses being offered in Canvas, please view the
following videos or consult the Canvas help guides. You may also contact USF's IT
department at (813) 974-1222 or [email protected].
Laptop Usage:
Laptop/Tablet usage is encouraged in this course given the nature of the material.
Classroom Recording:
Audio and/or video recordings of lectures are prohibited, as is the live streaming of
lectures or dissemination of lectures via conference calling technologies. Instructor will
provide the recording time to time as and when necessary and technically feasible.
Phone Usage:
Students are asked to place their mobile phones on “silent” and to step outside the
classroom to take any important calls.
Academic Integrity:
Academic integrity is the foundation of the University of South Florida System’s
commitment to the academic honesty and personal integrity of its university
community. Academic integrity is grounded in certain fundamental values, which
include honesty, respect, and fairness. Broadly defined, academic honesty is the
completion of all academic endeavors and claims of scholarly knowledge as
representative of one’s own efforts. The final decision on an academic integrity
violation and related academic sanction at any USF System institution shall affect and
be applied to the academic status of the student throughout the USF System, unless
otherwise determined by the independently accredited institution.
Disruption to Academic Process:
Disruptive students in the academic setting hinder the educational process. Disruption
of the academic process is defined as the act, words, or general conduct of a student
in a classroom or other academic environment which in the reasonable estimation of
the instructor: (a) directs attention away from the academic matters at hand, such as
noisy distractions, persistent, disrespectful or abusive interruption of lecture, exam,
academic discussion, or general University operations, or (b) presents a danger to the
health, safety, or well-being of self or other persons.
Student Academic Grievance Procedures:
The purpose of these procedures is to provide all undergraduate and graduate
students taking courses within the University of South Florida System an opportunity
for objective review of facts and events pertinent to the cause of the academic
grievance. An “academic grievance” is a claim that a specific academic decision or
action that affects that student’s academic record or status has violated published
policies and procedures, or has been applied to the grievant in a manner different
from that used for other students.
Disability Access:
Students with disabilities are responsible for registering with Students with Disabilities
Services (SDS) in order to receive academic accommodations. SDS encourages
students to notify instructors of accommodation needs at least 5 business days prior to
needing the accommodation. A letter from SDS must accompany this request.
Sexual Misconduct/Sexual Harassment Reporting:
USF is committed to providing an environment free from sex discrimination, including
sexual harassment and sexual violence (USF System Policy 0-004). The USF Center for
Victim Advocacy and Violence Prevention is a confidential resource where you can talk
about incidents of sexual harassment and gender-based crimes including sexual
assault, stalking, and domestic/relationship violence. This confidential resource can
help you without having to report your situation to either the Office of Student Rights
and Responsibilities (OSSR) or the Office of Diversity, Inclusion, and Equal Opportunity
(DIEO), unless you request that they make a report. Please be aware that in
compliance with Title IX and under the USF System Policy, educators must report
incidents of sexual harassment and gender-based crimes including sexual assault,
stalking, and domestic/relationship violence. If you disclose any of these situations in
class, in papers, or to me personally, I am required to report it to OSSR or DIEO for
investigation. Contact the USF Center for Victim Advocacy and Violence Prevention:
(813) 974-5757.
Attendance Policy:
Students are expected to exhibit professionalism through regular attendance and on-
time arrivals to class lectures.
Religious Observances:
All students have a right to expect that the University will reasonably accommodate
their religious observances, practices and beliefs. If you observe religious holidays, you
should plan your allowed absences to include those dates.