Packt+ | Advance your knowledge in tech

You're reading from Big Data Analytics with Hadoop 3 Build highly effective analytics solutions to gain valuable insight into your big data

Product type Paperback

Published in May 2018

Publisher Packt

ISBN-13 9781788628846

Length 482 pages

Edition 1st Edition

Languages

Python

Tools

Hadoop

Concepts

Big Data

Author (1):

Sridhar Alla

View More author details

Table of Contents (13) Chapters

Preface

1. Introduction to Hadoop FREE CHAPTER

2. Overview of Big Data Analytics

3. Big Data Processing with MapReduce

4. Scientific Computing and Big Data Analysis with Python and Hadoop

5. Statistical Big Data Computing with R and Hadoop

6. Batch Analytics with Apache Spark

7. Real-Time Analytics with Apache Spark

8. Batch Analytics with Apache Flink

9. Stream Processing with Apache Flink

10. Visualizing Big Data

11. Introduction to Cloud Computing

12. Using Amazon Web Services

Schema – structure of data

A schema is the description of the structure of your data and can be either implicit or explicit. There are two main ways to convert existing RDDs into datasets as the DataFrames are internally based on the RDD; they are as follows:

Using reflection to infer the schema of the RDD
Through a programmatic interface with the help of which you can take an existing RDD and render a schema to convert the RDD into a dataset with schema

Implicit schema

Let's look at an example of loading a comma-separated values (CSV) file into a DataFrame. Whenever a text file contains a header, the read API can infer the schema by reading the header line. We also have the option to specify the separator to be used to split the text file lines.

We read the csv inferring the schema from the header line and use the comma (,) as the separator. We also show the use of the schema command and the printSchema command to verify the schema of the input file:

scala> val statesDF = spark.read.option...

The rest of the chapter is locked

Tech Concepts

Programming languages

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

You're reading from Big Data Analytics with Hadoop 3 Build highly effective analytics solutions to gain valuable insight into your big data

Table of Contents (13) Chapters

Schema – structure of data

Implicit schema

Authors (1)

Other recommended products

Personalised recommendations for you

You're reading from Big Data Analytics with Hadoop 3 Build highly effective analytics solutions to gain valuable insight into your big data

Table of Contents (13) Chapters

Schema – structure of data

Implicit schema

Authors (1)

Other recommended products

Personalised recommendations for you

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access