0% found this document useful (0 votes)

38 views8 pages

155928-Turn Big Data

This white paper outlines strategies for leveraging big data analytics to gain competitive advantages through three usage models: ETL using Apache Hadoop, interactive queries, and predictive analytics. It emphasizes the importance of a robust big data infrastructure and Intel's innovations in hardware and software to optimize performance, cost, and energy efficiency. The document also highlights the potential of big data to provide actionable insights across various industries, transforming how businesses operate.

Uploaded by

nlatifolia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views8 pages

155928-Turn Big Data

Uploaded by

nlatifolia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

White Paper

Big Data Analytics

Turn Big Data into Big Value

A Practical Strategy

ABSTRACT
Some of today’s most successful companies achieve game-changing business advantages by capturing, analyzing, and
acting upon vast amounts of diverse, fast-moving “big data.” This paper describes three usage models that can help you
implement a flexible and efficient big data infrastructure to realize competitive advantages in your own business. It also
describes Intel innovations in silicon, systems, and software that can help you to deploy these and other big data solutions
with optimal performance, cost, and energy efficiency.

Abstract. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 The Big Data Opportunity

The Big Data Opportunity . . . . . . . . . . . . . . . . . 1
Big data is often compared to a tsunami. Today’s five billion cell phone users and nearly
Extracting Business Value one billion Facebook* and Skype* users generate unprecedented volumes of data, and
from Big Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
they represent only part of the global online population. Intel estimates that more than
Usage Model 1—ETL Using 1,500 exabytes (EB) of data—1,500 billion gigabytes—flowed through the cloud in 2012.
Apache Hadoop*. . . . . . . . . . . . . . . . . . . . . . . . . . 3
To put that in perspective, the total number of words spoken in all of human history is
Infrastructure Considerations. . . . . . . . . . . 3 estimated at about 5 EB.
Usage Model 2—Interactive Queries . . . . . . . 4
Nor have the flood waters of big data begun to level out. We are moving quickly toward
Infrastructure Considerations. . . . . . . . . . . 4
the “Internet of things,” in which vast numbers of networked sensors in businesses, homes,
Usage Model 3—Predictive Analytics
cars, and public places drive data generation to almost unimaginable levels (Figure 1). Yet
on the Hadoop Platform. . . . . . . . . . . . . . . . . . . 6
comparing big data with a tsunami misses the most important point.
Infrastructure Considerations. . . . . . . . . . . 6
Creating a Better Foundation
for Big Data Analytics. . . . . . . . . . . . . . . . . . . . . 7
40K Exponential Growth Through 2020
Processor Advances for Performance
and Security. . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
New Tools and Optimized Software . . . . . 7 Structured Data
Exponential Growth
Exabytes (Billions of GB)

Semi/Unstructured Data
Advanced Power Management for
Lower Operating Costs . . . . . . . . . . . . . . . . . 8
Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Growth of 400%

0
2006 2008 2010 2012 2014 2016 2018 2020

Figure 1. Current and forecasted growth of big data. Source: Philippe Botteri of Accel Partners, Feb. 2013.
White Paper: Turn Big Data into Big Value

While a tsunami is destructive, big data holds tremendous potential value. Because the value of big data stretches across vast amounts of
With the right tools and strategies, businesses can extract insights that complex, fast-moving content, deriving meaningful insights often
deliver game-changing competitive advantages. A number of public and requires extensive mining and deep analysis that go beyond traditional
private organizations do that today. Business Intelligence (BI) queries and reports. Machine learning,
statistical modeling, graph algorithms and other emerging techniques
• Retailers analyze social media trends in real time to offer the hottest
can unveil valuable, actionable insights that deliver significant
products to the most likely buyers, and they do this at volumes and
with levels of granularity that have never before been possible. competitive advantages.

• Financial firms analyze credit card transactions, bill payments,

and bank account activity to detect and prevent fraud in real time
Extracting Business Value from Big Data
and to improve the recovery of lost funds. This paper explores three usage models to extract value from big data.
They apply across a wide variety of organizations. Each usage model
• Content providers analyze subscriber selections in real time, so they
builds on the previous one to deliver increasing value.
can recommend related content and offer new products and services
in ways that improve both revenue and customer satisfaction.
• Usage Model 1—Extract, Transform, and Load (ETL). Before you
• Cities use big data to predict and alleviate traffic congestion, and to can analyze data, you must aggregate, pre-process, and store it in a
stave off costly road expansions. database. ETL does that, but big data can overwhelm traditional ETL
tools and strategies. Apache Hadoop* offers a cost-effective answer
• Utilities load-balance their energy grids through real-time monitoring
to that challenge.
of energy usage, so they can transmit power more efficiently and
reliably and put off major infrastructure build outs. • Usage Model 2—Interactive Queries. Recent technology inno-
vations dramatically increase the performance and scalability of
Using big data to achieve these kinds of benefits requires new traditional data warehousing models. With these improvements,
approaches to data management. Big data differs from traditional real-time analytics can operate on much larger and more varied data
business information. Although transactional data comprises a sets to extend the value of existing BI investments and to integrate
portion of it, big data is multi-structured, fast moving, and may more effectively with new big data solutions such as Hadoop.
come in greater amounts than your infrastructure can handle.
• Usage Model 3—Predictive Analytics. New analytic techniques go
beyond data mining and visualization to determine not only what has
• Big data sets dwarf traditional business data—petabytes instead
happened and why, but to predict what is likely to happen based on all
of terabytes.
available data, including real-time streams from external sources. This
• Big data includes structured and unstructured content in many differ- last usage model builds on the two previous ones to create a more
ent formats, such as e-mail, social media, video, images, blogs, sensor unified and extensible analytics environment.
data, “shadow data” such as access journals and Web search histories,
and many other types.

• Big data is generated constantly, and instantaneous insights can

improve outcomes in real-time business scenarios. Although batch
analytics can still be valuable, on demand queries from live or
streaming data offer game-changing potential.

2
White Paper: Turn Big Data into Big Value

Usage Model 1—ETL using Apache Hadoop* traditional ETL processes. This enables Hadoop to load large amounts
of data in a short time, making that data quickly available for analysis,
Like traditional data, big data must be extracted from external
visualization, and other uses.
sources, transformed into structures that fit operational needs,
and loaded into a database for storage and management. Traditional Infrastructure Considerations
ETL solutions cannot handle the demands of poly-structured data,
Dual-socket servers based on the Intel® Xeon® processor E5 family
so Hadoop software has emerged as the de facto platform for
provide an optimal balance of capability versus cost for most Hadoop
addressing this need (Figure 2).
deployments. These servers offer more cores, cache, and memory
The distributed storage and processing environment of a Hadoop capacity than previous generation servers. They also provide up to
cluster works well for big data ETL. Hadoop breaks-up incoming twice the I/O bandwidth with 30 percent lower I/O latency.1 These
streams into pieces and applies simple operations in parallel to resources sustain high throughput for larger numbers of data-intensive
rapidly process large amounts of data. It supports all data types tasks executing in parallel.
and can operate across tens, hundreds, or even thousands of servers
Lightweight, I/O-bound workloads, such as simple data sorting oper-
to provide massive scalability. The Hadoop Distributed File System
ations, may not require the full processing power of the Intel Xeon
(HDFS) stores the results on low cost storage devices directly attached
processor E5 family. Such workloads run economically on high-density,
to each server in the cluster—ready for immediate up-loading to the
low-power servers based on the Intel® Xeon® processor E3 family or the
enterprise data warehouse or unstructured data stores.
Intel® Atom™ processor-based System on a Chip (Intel Atom SoC). With
Hadoop can process poly-structured data for analysis, even when power envelopes as low as 6 watts, the 64-bit x86-based Intel Atom
that data is not predefined. In other words, Hadoop supports a Schema SoC provides unprecedented density and energy-efficiency in a server-
on Read model as opposed to the Schema on Write model used in class processor.

Offload ETL with Hadoop

OLAP
CRM ANALYSIS
ETL

LOAD
ERP DATA DATA
EXTRACT TRANSFORM LOAD
WAREHOUSE MINING
LOAD

WEB SITE
TRAFFIC
REPORTING

ETL OFFLOAD
SOCIAL
MEDIA
Sqoop
FLUME Pig/MR HDFS
Data
Science
SENSOR
LOGS

Figure 2. Using Apache Hadoop,* organizations can ingest, process, and export massive amounts of diverse data at scale.

3
White Paper: Turn Big Data into Big Value

All servers in a Hadoop cluster require substantial memory and a Businesses looking to implement a powerful and cost-effective big
relatively large number of storage drives to meet the demands of data platform should consider combining a large-scale SQL data ware-
data-intensive Hadoop workloads. Sufficient memory is required to house with a Hadoop cluster. The cluster can quickly ingest and process
support high throughput for the many operations performed in parallel. large, diverse, and fast-moving data streams. Appropriate data sets can
Multiple storage drives (two or more per core) deliver the aggregate I/O then be loaded into the data warehouse for ad hoc SQL queries, analy-
throughput needed to avoid storage bottlenecks. Storage performance sis, and reports. Users also can query multi-structured data sets that
improves considerably with at least one Intel® Solid State Drive (Intel® reside in the Hadoop cluster using software such as Apache HBase,*
SSD) in each server node. Spark,* Shark,* SAP HANA,* Apache Cassandra,* MongoDB,* Tao,*
Neo4J,* Apache Drill,* or Impala.* This hybrid strategy offers a founda-
By processing data near where it is stored, Hadoop greatly reduces the
tion for faster, deeper insights than either solution alone can achieve.
need for high-volume data movement. Nevertheless, fast data import
and export requires sufficient network bandwidth. In most cases, each Similar processes apply whether you use a traditional data warehouse
rack of servers should use a 10 Gigabit Ethernet (10 GbE) switch and or a more modern system designed for larger volumes and faster data
each rack-level switch should connect to a 40 GbE cluster-level switch. streams: gather data from external sources then cleanse and format
As data volumes, workloads, and clusters grow, it may be necessary the data to fit into the warehouse data model. This can be done prior
to interconnect multiple cluster-level switches or even to uplink to to loading the data into the warehouse or it can be done on-the-fly as
another level of switching infrastructure. streaming data sources are fed into the warehouse.

For more detailed information, see the Intel white paper: “Extract, With the data loaded, analysis can begin. Modern data warehouses
Transform & Load (ETL) Big Data with Apache Hadoop*” posted in support ad hoc queries, enabling access on-demand for data with any
the Intel Developer Zone at software.intel.com. meaningful combination of values. This contrasts with more-traditional
data warehouses that generate only pre-defined reports based on
Usage Model 2—Interactive Queries known relationships.

A data warehouse provides a central repository for business data and

Infrastructure Considerations
BI functions, such as online analytical processing (OLAP) and data visu-
Whether you integrate your own SQL data warehouse solution or eval-
alization. New and historical data is gathered from disparate sources
uate appliances, the following considerations can help you optimize
and prepared for interactive queries and other types of analysis.
scalability, reliability, and total cost of ownership (TCO).
Big data can overwhelm traditional data warehouse capabilities and
The complex analytics performed in SQL data warehouses do not typi-
resources. Vendors have responded with a variety of advances in
cally scale well across large numbers of clustered nodes, so individual
performance and scalability. For example:
data warehouse servers must deliver high-performance and scalabil-
• In-memory databases eliminate the latencies and overhead associ-
ity. Four-socket, eight-socket, and larger servers based on the Intel®
ated with shuttling data back and forth between servers and storage
Xeon® processor E7 family provide the scalable performance needed
systems. This approach reduces data access times from millisec-
to handle demanding analytics workloads. For example, enterprise data
onds to nanoseconds, which practically eliminates a bottleneck that
impeded database performance for decades. Oracle TimesTen,* SAP warehouse appliances, including large-scale SMP and blade-based MPP
HANA,* Microsoft IMDB,* IBM solidDB,* VMware vFabric SQLFire,* and systems, use these processors to maximize overall performance and
a number of open source solutions use this strategy to speed-up the throughput. Each Intel Xeon E7 processor provides up to 10 cores, 20
processing and management of incoming data streams. threads, and 30 MB of last-level cache. These processors also support
DIMMs as large as 32 GB—up to 4 terabytes of total memory in an
• Data warehouse appliances combine servers, storage, operating
eight-socket server—so they can host very large in-memory databases.
systems, database management systems, and supporting compo-
nents into pre-built, highly-optimized, turnkey systems to simplify Since a data warehouse typically runs on a single server, system
integration and improve performance. Many data warehouse appli-
uptime is particularly important. The Intel Xeon processor E7 family
ances support in-memory databases, and some include proprietary
includes advanced reliability, availability, and serviceability (RAS)
data filtering technologies that accelerate data flow. Most of these
features built into the silicon to support mission-critical levels of reli-
appliances come as large-scale, symmetric multi-processor (SMP)
systems or massively-parallel processing (MPP), extensible blade ability and to protect data. All key interconnects, data stores, data
systems. Examples include IBM Netezza,* HP EDW Appliance,* Oracle paths, and subsystems integrate active and passive error monitoring.
Exadata,* Teradata DW Appliance,* Dell Parallel DW,* and the Pivotal
(formerly EMC Greenplum) Data Computing Appliance.*

4
White Paper: Turn Big Data into Big Value

Self-healing features proactively and reactively repair known errors and • Intelligent tiering to optimize performance versus cost, by automati-
also reduce the likelihood of future errors by acting automatically based cally moving “hot” data to faster storage devices and “cold” data to
on configurable error thresholds. Intel works extensively with hard- higher capacity, lower cost drives. With this approach, a small number
ware, operating system, virtual machine monitor (VMM), and application of high-speed drives, such as Intel® SSD 710 Series SATAs, can deliver
vendors to help ensure tight integration throughout the hardware and substantial performance improvements at relatively low cost.
software stack. Loading data sets into data warehouses quickly and efficiently enables
analytics applications to provide business insights in a timely manner.
As data volumes skyrocket, new strategies help scale data storage
Efficient ETL processing is one component of the solution. Another
capacity more efficiently and cost-effectively, both within and beyond
is a fast and efficient network to drive the growing business value of
the data warehouse. The following strategies can work together to
analytics throughout the enterprise. Intel® Ethernet products integrate
meet diverse needs at lower total cost.
technologies to address these requirements.
• Scale-out storage architectures deliver affordable high
• Near-native performance in virtualized environments.
capacity and support federation across private and hybrid clouds.
Virtualization improves infrastructure flexibility and utilization—
These solutions scale dynamically, and you can provision them
important for containing costs as big data solutions grow. Intel®
faster than traditional storage systems. They also help to improve
Virtualization Technology for connectivity (Intel® VT-c) helps to reduce
data management efficiency.
I/O bottlenecks and improve overall server performance in virtualized
• Low-latency, proximity storage is a good fit for data-intensive environments. Its Virtual Machine Device Queues (VMDQ) technology
applications that perform better when co-located with the data offloads traffic sorting and routing to dedicated silicon in the network
storage devices. Examples include business processes, decision adapter. Its PCI-SIG Single Root I/O Virtualization (SR-IOV) technology
support analyses, and high-performance computing workloads, as allows a single Intel® Ethernet Server Adapter port to support multiple,
well as collaborative processes, applications, and web infrastructure isolated connections to virtual machines.
running on virtualized servers.
• Unified 10 GbE networking. Consolidating data center traffic onto
• Centralized storage aggregated as logical pools in storage area a single, high-bandwidth network helps to reduce cost and complexity
networks (SANs) support high-performance business databases. and provides the performance and scalability needed to address rapidly
When optimized for affordable capacity rather than high performance, growing needs. Intel Ethernet Converged Network Adapters support
centralized solutions provide efficient storage for backup, archive, and fiber channel over Ethernet (FCoE) and iSCSI to simplify implementation
object store requirements. and reduce costs when consolidating local area network (LAN) traffic
and storage area network (SAN) traffic.
Higher storage efficiency can help to contain costs in the face
of rapid data growth. Many storage vendors integrate Intel Xeon • Simpler, faster connections to iSCSI SANs. Intel Ethernet
processors into their storage solutions to support advanced data Converged Network Adapters and Intel Ethernet Server Adapters
management functions that help to improve efficiency. According to provide hardware-based iSCSI acceleration to improve performance.
IDC’s June 2013 Worldwide Storage and Virtualized x86 Environments They also take advantage of native iSCSI initiators integrated into
leading operating systems to simplify iSCSI deployment and
2013–2017 Forecast, about 80 percent of worldwide, enterprise-
configuration in both native and virtualized networks.
class storage solutions for corporations, cloud, and HPC run on Intel
architecture. Look for storage platforms that support data-efficiency For more detailed information, see the Intel SQL Data Warehousing
technologies, including: Usage Model white paper.

• Data de-duplication to conserve capacity.

• Data compression to increase throughput.

• Thin provisioning to improve utilization, by enabling storage to be

provisioned on demand, instead of overprovisioning capacity based
on projected needs.

5
White Paper: Turn Big Data into Big Value

Usage Model 3—Predictive Analytics Intel IT began its own trailblazing big data analytics effort in 2010, and
recommends combining the two usage models already discussed in this
on the Hadoop Platform paper to create a hybrid analytics infrastructure (Figure 4).
Predictive analytics extracts higher value from data by capturing rela-
tionships from past events and using them to predict future outcomes 1. Deploy a data warehouse appliance based on an MPP architecture
to perform complex predictive analytics quickly on large data sets.
(Figure 3). Retailers use predictive analytics to deliver more compelling
A number of vendors have incorporated the Intel Xeon proces-
offers to individual customers, healthcare organizations use it to select
sor E7 family into blade-based appliances that deliver the required
best-fit treatment protocols, and financial services organizations use it
performance at relatively low cost. These systems fit into existing
to increase investment returns and reduce risk. enterprise BI solutions and provide integrated support for advanced
Although predictive analytics can aid in strategic business planning, analytics tools and applications, such as R, an open-source statistical
computing language that is popular among data scientists.
its greatest value may come from tactical guidance at the point of
decision and operational guidance at the point of execution. 2. Add a Hadoop cluster for fast, scalable, and affordable ETL for
Centralized teams of data scientists, database administrators, the data warehouse. Hadoop also runs other data processing and
and software developers work together to provide customized analytics functions that perform well in a distributed processing
solutions for the most critical business operations. As businesses environment. The Hadoop ecosystem offers a growing variety of
integrate this capability more widely into their operations, they tools and components to address these needs.
must provide optimized decision tools for a wider range of users
Infrastructure Considerations
and automated systems.
To provide maximum flexibility, the data warehouse and the Hadoop
Predictive analysis falls into two main categories: regression and
cluster should use a high-speed data loader and link together using
machine learning.
10 GbE or another high-bandwidth networking technology. This allows
• Regression techniques compare current data with historical models you to move data quickly between the two environments, so you can
to forecast the most probable outcome. use the most effective analytics techniques based on specific data
types, workloads, and business needs.
• Machine learning uses artificial intelligence with little or no human
intervention. The system analyzes a representational data set to
extract relationships, and it generalizes from that to make predictions
based on new data. Optical character recognition (OCR) is a classic
example, but new applications exploit big data across a wide range
of scenarios.

The Business Value of Analytics

How can we
make it happen?

What will Prescriptive

happen?
Analytics

Predictive
Why did it N
ATIO
Analytics
happen?
IMIZ
Diagnostic OPT
What t
Value

Analytics sigh
happened? Fore
Descriptive
Analytics
ht
Insig
ION
ORMAT
INF sigh
t
Hind

Difﬁculty

Figure 3. According to Gartner, the difficulty and business value of analytics both increase as the focus moves from hindsight to foresight.

6
White Paper: Turn Big Data into Big Value

Intel IT’s Hybrid Platform for Big Data Analysis

Near Real-Time to Seconds Minutes and Up
Interactions Interactions
Applications and Systems External Data Sources
Continuous streaming, instantaneous Heterogeneous data, caching,
reaction, interactive exploration web indexing, social media analytics

Data ﬁltered as needed for analysis

High-Speed Data Loader

Mass data stored for later interactive analysis

Massively Parallel Processing Hadoop*

(MPP) Data Warehouse Appliance
Data in Petabytes
Data in Terabytes

Figure 4. Intel IT’s big data platform provides a flexible foundation for analytics—including predictive analytics—by using a high-speed data loader to connect a
massively parallel processing (MPP) data warehouse appliance with clusters of industry-standard servers running Hadoop software.

Creating a Better Foundation • Fast, low-overhead data encryption. Intel® Advanced Encryption
Standard New Instructions (Intel AES-NI) provides hardware accel-
for Big Data Analytics eration for encryption to protect data in latency-sensitive analytics
As big data technologies and solutions advance, Intel products and environments without sacrificing performance. Intel performance
technologies help speed-up innovation throughout the ecosystem. tests show that Intel AES-NI can accelerate encryption performance
By working with hardware, software and service providers to ensure in a Hadoop cluster by up to 5.3x and decryption performance by up
broad support, Intel helps businesses integrate these new capabili- to 19.8x when used in combination with the Intel Distribution for
ties more simply and affordably on a standards-based, connected, Apache Hadoop software (Intel Distribution).2 Intel Xeon processors
and the upcoming Intel Atom SoC support Intel AES-NI.
managed, and secure architecture.
New Tools and Optimized Software
Processor Advances for Performance and Security Intel works both independently and in collaboration with leading
Intel processor advances deliver increasing performance and value software vendors and the open-source community to provide opti-
for next-generation big data solutions. Ongoing improvements in mized software stacks and services for big data analytics. These
per-thread performance, parallel execution, I/O throughput, memory efforts help to deliver new and advanced functionality throughout
capacity, and energy efficiency help businesses address rapidly the big data ecosystem. They also help to ensure the best possible
growing needs using affordable, mainstream computing systems. performance for big data applications running on Intel architecture.

Intel also integrates advanced security technologies that protect Intel also delivers software products that help address some of the
data more effectively, so you can integrate sensitive data into your most critical needs within the big data ecosystem.
big data analytics environment. Current security technologies in
Intel Xeon processors provide the following advantages. • Performance benchmarking for Hadoop clusters and applications.
The Intel® HiBench suite includes 10 benchmarks that IT organiza-
• Strong workload isolation on trusted infrastructure. Intel® tions and software vendors use to measure performance for specific,
Trusted Execution Technology (Intel® TXT) and Intel® Virtualization common tasks, such as sorting and word counting, and for more
Technology (Intel® VT) help to protect systems and software more comprehensive real-world functions, such as web searching, machine
effectively in virtualized and cloud environments. Intel VT provides learning, and data analytics. Intel engineers use the Intel HiBench suite
silicon-assisted workload isolation. Intel TXT can establish trusted to help with upstream Hadoop optimizations for Intel Architecture as
infrastructure pools by ensuring that Intel® Xeon® processor-based well as with Java* optimizations for Hadoop.
servers boot only into “known good states.”

7
White Paper: Turn Big Data into Big Value

Implementing Big Data Analytics

Intel is integrating predictive big data analytics into its existing business intelligence (BI) environment to help improve business efficiency
and performance. A number of big data proof-of-concept deployments are underway in partnership with Intel business groups. Current
focus areas include malware detection, chip design validation, market intelligence, and recommendation systems.

To learn about Intel IT strategies and best practices for implementing big data analytics, read the Intel IT white paper, “Mining Big Data in
the Enterprise for Better Business Intelligence.”

• An enterprise-ready distribution of Hadoop. The Intel Distribution Conclusion

provides the most up-to-date optimizations for Intel architecture
in a software package that simplifies deployment and supports The ability to capture, store, and analyze data from all sources offers
enterprise-class requirements for security and manageability. Many game-changing competitive advantage across a wide range of industries,
optimizations first go into the Intel Distribution and subsequently yet the tsunami of big data introduces tough new infrastructure chal-
get submitted to the open-source Apache Hadoop project. lenges. The three usage models presented in this paper provide a model
that enterprises can use and adapt to turn big data into business value.
• A fast, massively-scalable, distributed file system. Intel® Luster
storage software, an Intel optimized distribution of the Lustre* • Deploy Hadoop to ingest and prepare big data for analysis.
distributed file system, supports large-scale cluster computing.
• Connect your Hadoop cluster to a fast, scalable data warehouse
This software scales to support tens of thousands of client systems
for interactive query capabilities using mixed data.
and tens of petabytes of storage. It delivers more than a terabyte
per second of aggregate I/O throughput. • Add predictive analytics and machine learning applications to make
accurate predictions and act on them in real time.
Advanced Power Management for Lower Operating Costs
Intel innovations in silicon and software provide optimizations and
Storing and analyzing big data requires substantial infrastructure
targeted functionality to help you implement these and other big
build-outs for most organizations, and that requires managing energy
data usage models more simply and effectively.
consumption to contain total costs. The energy-efficiency of Intel Xeon
processors and Intel Atom SoCs can help. No matter which you choose,
software supports both Intel Xeon processor and Intel Atom processor
families without recompilation to help you avoid the complexity
of managing multiple architectures and code bases.
For more information visit these links on intel.com:
Intel offers tools to help you manage power consumption more effectively. Big Data Intelligence | The Intel IT Center | Private Cloud Solutions
• Efficient data center power management. Intel Data Center
Manager (Intel DCM) plugs into existing management consoles and
takes advantage of built-in instrumentation in Intel processors to
provide advanced power and thermal management, from individual
servers and blades, to racks, rows, and entire data centers.

• Integrated energy-management in Linux* environments. The

Running Average Power Limit (RAPL) Linux kernel software driver
developed by Intel provides support for monitoring, managing, and
capping power consumption for the Intel Xeon processor E5 family.

Source: The claim of up to 32% reduction in I/O latency is based on Intel internal measurements of the average time for an I/O device read to local system memory under idle conditions for the Intel®
1

Xeon® processor E5-2600 product family versus the Intel® Xeon® processor 5600 series. 8 GT/s and 128b/130b encoding in the PCIe 3.0 specification enables double the interconnect bandwidth
over the PCIe 2.0 specification. For more information, read the PCI-SIG* press release, “PCI-SIG releases PCI Express 3.0 Specification.”
For details, see the Intel solution brief, “Fast, Low-Overhead Encryption for Apache Hadoop*”. Software and workloads used in performance tests may have been optimized for performance only on
2

Intel microprocessors. Performance tests are measured using specific computer systems, components, software, operations, and functions. Any change to any of those factors may cause the results
to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other
products.
Copyright © 2013 Intel Corporation. All rights reserved. Intel, the Intel logo, Intel Xeon, and Intel Atom are trademarks of Intel Corporation in the U.S. and other countries.
*Other names and brands may be claimed as the property of others.
Printed in USA 0713/DF/HBD/PDF Please Recycle 329261-001US

Hadoop Report
No ratings yet
Hadoop Report
110 pages
Big Data
No ratings yet
Big Data
5 pages
Big Data Trends for Enterprises
No ratings yet
Big Data Trends for Enterprises
13 pages
Unit 1 Big Data
No ratings yet
Unit 1 Big Data
124 pages
Big Data Analysis by Deshbandhu
No ratings yet
Big Data Analysis by Deshbandhu
368 pages
Big Data-One
No ratings yet
Big Data-One
9 pages
Unit 5
No ratings yet
Unit 5
68 pages
Lauras
No ratings yet
Lauras
33 pages
Understanding Big Data Essentials
No ratings yet
Understanding Big Data Essentials
7 pages
Alteryx Hadoop Whitepaper Final1
No ratings yet
Alteryx Hadoop Whitepaper Final1
6 pages
Big Data Question Bank
No ratings yet
Big Data Question Bank
38 pages
Big Data Analytics02
No ratings yet
Big Data Analytics02
20 pages
Big Data Spectrum
No ratings yet
Big Data Spectrum
61 pages
Big Data Presentation
No ratings yet
Big Data Presentation
22 pages
Hadoop Research Paper
No ratings yet
Hadoop Research Paper
7 pages
Big Data Analysis Concepts and References
100% (1)
Big Data Analysis Concepts and References
60 pages
Unit1 - BDH
No ratings yet
Unit1 - BDH
77 pages
16-029 Pentaho Hadoop Ebook v7
No ratings yet
16-029 Pentaho Hadoop Ebook v7
22 pages
6-61456 Pentaho Hadoop Ebook
No ratings yet
6-61456 Pentaho Hadoop Ebook
22 pages
Content For
No ratings yet
Content For
7 pages
Lecture 2 - Hadoop 221
No ratings yet
Lecture 2 - Hadoop 221
28 pages
Understanding Big Data: Key Insights
No ratings yet
Understanding Big Data: Key Insights
31 pages
BDH Admin Ebook
No ratings yet
BDH Admin Ebook
807 pages
Bda Unit1
No ratings yet
Bda Unit1
19 pages
Big Data Analytics
No ratings yet
Big Data Analytics
36 pages
Lesson 1 - Hadoop and Big Data Overview
No ratings yet
Lesson 1 - Hadoop and Big Data Overview
57 pages
Deutsche Telekom Perspective On HADOOP and Big Data Technologies
No ratings yet
Deutsche Telekom Perspective On HADOOP and Big Data Technologies
19 pages
Introduction to Big Data Tools
No ratings yet
Introduction to Big Data Tools
40 pages
Hadoop & BigData (UNIT - 2)
No ratings yet
Hadoop & BigData (UNIT - 2)
22 pages
Unit-Iii CC&BD CS71
No ratings yet
Unit-Iii CC&BD CS71
89 pages
Big Data Technology Report With Pages Removed
No ratings yet
Big Data Technology Report With Pages Removed
32 pages
Big Data: Types and Applications
No ratings yet
Big Data: Types and Applications
4 pages
Understanding Big Data: Key Concepts
No ratings yet
Understanding Big Data: Key Concepts
60 pages
Big Data Analytics Course Guide
No ratings yet
Big Data Analytics Course Guide
31 pages
Anand J. Kulkarn
No ratings yet
Anand J. Kulkarn
4 pages
Ijet V3i3p13
No ratings yet
Ijet V3i3p13
5 pages
Unit 1
No ratings yet
Unit 1
11 pages
DBMS Unit1
No ratings yet
DBMS Unit1
30 pages
ETB 1 (Big Data)
No ratings yet
ETB 1 (Big Data)
28 pages
Understanding Big Data Analytics Basics
No ratings yet
Understanding Big Data Analytics Basics
35 pages
Database Trends & Innovations
No ratings yet
Database Trends & Innovations
5 pages
BIG DATA INTRODUCTION Hadoop
No ratings yet
BIG DATA INTRODUCTION Hadoop
24 pages
Extract Busine
No ratings yet
Extract Busine
4 pages
Big Data Analytics
No ratings yet
Big Data Analytics
5 pages
What Is Big Data
No ratings yet
What Is Big Data
8 pages
Understanding Big Data: Key Concepts
No ratings yet
Understanding Big Data: Key Concepts
7 pages
What Is Big Data Analytics-1
No ratings yet
What Is Big Data Analytics-1
9 pages
Fundamentals of Big Data JUNE 2022
No ratings yet
Fundamentals of Big Data JUNE 2022
11 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
153 pages
Big Data Analytics: Free Guide: 5 Data Science Tools To Consider
No ratings yet
Big Data Analytics: Free Guide: 5 Data Science Tools To Consider
8 pages
Book Chapter
No ratings yet
Book Chapter
23 pages
Informatica Big Data WorkBook
No ratings yet
Informatica Big Data WorkBook
35 pages
Chapter 1
No ratings yet
Chapter 1
40 pages
BIG DATA Module 1
No ratings yet
BIG DATA Module 1
16 pages
Hadoop Ecosystem Overview
No ratings yet
Hadoop Ecosystem Overview
229 pages
Gosmsgateway HTTP Api Module V.1.6
No ratings yet
Gosmsgateway HTTP Api Module V.1.6
14 pages
How To See Through These 3 Hardball Negotiation Tactics
No ratings yet
How To See Through These 3 Hardball Negotiation Tactics
4 pages
Population Health: Creating The Value That Healthcare Organizations Need
No ratings yet
Population Health: Creating The Value That Healthcare Organizations Need
12 pages
ECommerce in ASEAN
No ratings yet
ECommerce in ASEAN
53 pages
A Guide To Cold Emailing
No ratings yet
A Guide To Cold Emailing
6 pages
Understanding Customer Lifetime Value (CLV)
No ratings yet
Understanding Customer Lifetime Value (CLV)
2 pages
Databases & Data Warehouses MCQs
No ratings yet
Databases & Data Warehouses MCQs
127 pages
Kuliah 12 Creative Marketing PR
No ratings yet
Kuliah 12 Creative Marketing PR
37 pages
Overview of Business Intelligence Techniques
No ratings yet
Overview of Business Intelligence Techniques
2 pages
ABM105 - Module6 - S12023-2024EDITED
No ratings yet
ABM105 - Module6 - S12023-2024EDITED
7 pages
Data Analysis Techniques Guide
No ratings yet
Data Analysis Techniques Guide
12 pages
Noor As Sami Yousufi: Yousufi@iut-Dhaka - Edu
No ratings yet
Noor As Sami Yousufi: Yousufi@iut-Dhaka - Edu
2 pages
Power BI Guide for Professionals
No ratings yet
Power BI Guide for Professionals
5 pages
2023 Lesson 5 MNG3701 Powerpoint
No ratings yet
2023 Lesson 5 MNG3701 Powerpoint
12 pages
SAP TechEd 2021 Workshop Sessions
No ratings yet
SAP TechEd 2021 Workshop Sessions
15 pages
Study Topics For BDA
No ratings yet
Study Topics For BDA
2 pages
Cognos Content Store Survival Guide
No ratings yet
Cognos Content Store Survival Guide
22 pages
Business Intelligence and Analytic Kds051
No ratings yet
Business Intelligence and Analytic Kds051
2 pages
Introduction to Big Data Types
No ratings yet
Introduction to Big Data Types
7 pages
Genesis Summary Corp
No ratings yet
Genesis Summary Corp
23 pages
Module 1
No ratings yet
Module 1
26 pages
Data Mining and Business Intelligence Using Power Bi
No ratings yet
Data Mining and Business Intelligence Using Power Bi
12 pages
Petroleum and Natural Gas Business Plan by Slidesgo
No ratings yet
Petroleum and Natural Gas Business Plan by Slidesgo
19 pages
Data Mining MCQ Multiple Choice Questions With Answers: Eguardian
No ratings yet
Data Mining MCQ Multiple Choice Questions With Answers: Eguardian
15 pages
Dundas BI Features - A Single Business Intelligence (BI) Platform - Dashboards, Reports, Scorecards, Charts and Analytics - Dundas Data Visualization
No ratings yet
Dundas BI Features - A Single Business Intelligence (BI) Platform - Dashboards, Reports, Scorecards, Charts and Analytics - Dundas Data Visualization
32 pages
Caterpillar Tunneling BI Adoption
No ratings yet
Caterpillar Tunneling BI Adoption
2 pages
Assignment - AL - 801 Done
No ratings yet
Assignment - AL - 801 Done
18 pages
Telecom Use Cases
No ratings yet
Telecom Use Cases
33 pages
Power BI Training for Analysts
No ratings yet
Power BI Training for Analysts
10 pages
Benefits of IT Outsourcing Services
No ratings yet
Benefits of IT Outsourcing Services
4 pages
ForresterKantar Thought Leadership Data Strategy Connect Insights To Action With An Effective Mar
No ratings yet
ForresterKantar Thought Leadership Data Strategy Connect Insights To Action With An Effective Mar
19 pages
Spend Analysis Handbook Overview
No ratings yet
Spend Analysis Handbook Overview
9 pages
MBA - Master of Business Administration + Master in Big Data and Business Intelligence
No ratings yet
MBA - Master of Business Administration + Master in Big Data and Business Intelligence
2 pages
Guidelines For Report 2 With Rubric
No ratings yet
Guidelines For Report 2 With Rubric
3 pages
Digital Marketing Strategy and Analytics Guide
No ratings yet
Digital Marketing Strategy and Analytics Guide
32 pages

155928-Turn Big Data

Uploaded by

155928-Turn Big Data

Uploaded by

White Paper

Big Data Analytics

Turn Big Data into Big Value

Abstract. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 The Big Data Opportunity

• Financial firms analyze credit card transactions, bill payments,

• Big data is generated constantly, and instantaneous insights can

Offload ETL with Hadoop

A data warehouse provides a central repository for business data and

• Data de-duplication to conserve capacity.

• Data compression to increase throughput.

• Thin provisioning to improve utilization, by enabling storage to be

The Business Value of Analytics

What will Prescriptive

Intel IT’s Hybrid Platform for Big Data Analysis

Data ﬁltered as needed for analysis

High-Speed Data Loader

Mass data stored for later interactive analysis

Massively Parallel Processing Hadoop*

Implementing Big Data Analytics

• An enterprise-ready distribution of Hadoop. The Intel Distribution Conclusion

• Integrated energy-management in Linux* environments. The

You might also like