0% found this document useful (0 votes)

56 views8 pages

1 Das SDIL Smart Data Testbed

The Smart Data Innovation Lab (SDIL) provides researchers access to big data technologies to analyze large datasets. Industry and academia collaborate on projects focused on Industrie 4.0, energy, smart cities, and medicine. The SDIL aims to accelerate innovation using smart data approaches. It provides platforms like SAP HANA, Terracotta BigMemory Max, IBM Open Platform with Hadoop and Spark to enable analysis. Communities focus on Industrie 4.0 and energy to explore data-driven aspects through joint research.

Uploaded by

fawwaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views8 pages

1 Das SDIL Smart Data Testbed

Uploaded by

fawwaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

1 Das SDIL Smart Data Testbed

Autor: Prof. Dr. Michael Beigl, Prof. Dr. Bernhardt Neumair, Till Riedel, Nico
Schlitter, KIT/Smart Data Innovation Lab

The Smart Data Innovation Lab (SDIL) offers big data researchers a unique access
to a large variety of Big Data and In-Memory technologies. Industry and science
collaborate closely in order to find hidden value in big data and generate smart data.
Projects are focused on the strategic research areas of Industrie 4.0, Energy, Smart
Cities and Medicine.
SDIL bridges the gap between cutting-edge research and industrial big data
[Link] main goal of the SDIL is to accelerate innovation cycles using
smart data [Link] order to close today’s gap between academic research
and industry problems through a data driven innovation cycle the SDIL provides
extensive support to all collaborative research projects free of charge.

Figure1: The SDIL Innovation Cycle

1.1 Platform
The hardware and software provided by the SDIL platform enables
researchers to perform their analytics on unique state of the art hardware and
software without acquiring e.g. separate licensing or dealing with complicated
cost structures. To industrial data providers it gives a chance to analyze their
data together with an academic partner in fully secured on-premise
environment.

Seite 1
Figure 2: The SDIL Plattform

1.1.1 SAP HANA

SAP HANA is a revolutionary platform that allows customers to explore and

analyze large volumes of data in real-time, create flexible analytic models,
develop and deploy real-time applications. The SAP HANA in-memory
appliance is available on the SDIL Platform.

In addition, we installed the Application Function Library (AFL) on the HANA

instances. The AFL is a collection of pre-delivered commonly utilized
business, predictive and other types of algorithms for use in projects or
solutions that run on SAP HANA. These algorithms can be leveraged directly
in development projects, speeding up projects by avoiding writing custom
complex algorithms. AFL operations also offer very fast performance, as AFL
functions run in the core of SAP HANA in-memory DB. The AFL
packageincludes:

 The Predictive Analysis Library (PAL) is a set of functions in the AFL.

It contains pre-built, parameter-driven, commonly used algorithms
primarily related to predictive analysis and data mining. Support of
multiple algorithms, e.g., K-Means, Association Analysis, C4.5
Decision Tree, Multiple Linear Regression, or Exponential Smoothing.

Seite 2
Please refer to the official SAP HANA PAL user guide for further
information (SAP HANA PAL Library Documentation).
 The Business Function Library (BFL) is a set of functions in the AFL. It
contains pre-built, parameter-driven, commonly used algorithms and is
primarily related to the analysis of financial data. Please refer to the
official SAP HANA BFL user guide for further information (SAP HANA
BFL Library Documentation).

System: SAP HANA

Cores: 320 (4 servers with
80 cores each)
RAM: 4TB (each server
hosts 1TB of RAM)
Disk Space: 80TB (each
server hosts 20TB of disk
space)
Network: 10Gbit/s Ethernet
Software:
SAP HANA Database
System
Predictive Analysis Library
Business Function Library
Figure 3: Hardware and software configuration for the SAP HANA System

1.1.2 TerracottaBigMemory Max

Terracotta BigMemory Max is the in-memory data management platform for

real-time big data applications and developed by Software AG. It supports a
distributed in-memory data-storage topology, which enables the sharing of
data among multiple caches and in-memory data stores in multiple JVM.
BigMemory Max uses a Terracotta Server Array to manage data that is
shared by multiple application nodes in a cluster. Furthermore the use of off-
heap memory enables Java applications to leverage virtually all the RAM for
in-memory data storage without causing garbage collection pauses.

The BigMemory Max kit is installed and available on the SDIL Platform. A
single and active Terracotta Server is configured and running on this
machine. The server manages Terracotta clients, coordinates shared objects

Seite 3
and persists data. Terracotta clients run on application server along with the
applications being clustered by Terracotta. The data is held in the remote
server with a subset of recently used data held in each application node.

1.1.3 IBM Open Platform with Hadoop and Spark

The SDIL Platform is running a Hadoop cluster with Spark that can be used
to perform analytics following the map-reduce paradigm.

IBM SPSS Modeler

In addition, we provide specialized tools that build upon hadoop for further
analytics. The IBM SPSS Modeler built by IBM is a data mining and text
analytics software application. It provides a range of advanced algorithms
and techniques that include text and entity analytics, decision making
management, and optimization in order to build the predictive models and
conduct various range of data analysis tasks.
([Link]
0/en/modelerusersguide_book.pdf)

IBM SPSS Analytic Server

In order to start the analysis stream on the IBM SPSS Modeler Server, one
need to import the data. The IBM SPSS Modeler Server provides a number
of ways to transfer the data to the analytic streams: via files (csv, json, xml
and other known formats), using a DB2 database server, or using the SPSS
Analytic Server.

Seite 4
System: IBM Watson Foundation
Power 8
Cores: 140 (7 servers with
20 cores each)
RAM: 4TB
Disk Space: 300TB
Network: 40Gbit/s Ethernet
Software
IBM Open Platform with
Hadoop/Spark
SPSS Modeler
SPSS Analytic Server
DB2 with BLU Acceleration

Figure 4: Hardware and software configuration for the IBM Watson System

1.1.4 VirtualizationandResourceAllocation

HTCondor

In order to use the SDIL resources efficiently and to avoid interference

between users, we make use of the HTCondor batch system. This system
takes care of resource management and guarantees that users get exclusive
access to the requested resources. A program will run and returns after it’s
finished. While it is running it consumes memory (RAM) and CPU. If many
users run many programs the total available memory might not be sufficient
and the program or even the whole compute server might crash. When using
a batch system the system takes care of resource management and avoids
system overload and crashes. To do this, users need to specify which
computing task they would like to perform and what resources will be
required for this task. This so called job is submitted to the batch system and
the system takes care of executing of it as soon as the requested resources
will become available. The users can get an overview about their submitted
and running jobs via an API. Additionally, users can be informed via email
when their job is finished.

1.2 Communities
SDIL provides access to experts and domain-specific skills within Data
Innovation Communities fostering the exchange of project results. They

Seite 5
further provide the possibility for open-innovation and bilateral matchmaking
between industrial partners and academic institutions.

1.2.1 Data Innovation Community “Industrie 4.0”

Industrie 4.0 is a powerful driver of large data growth, and directly connected
with the “Internet of Things”. Through the Web, real and virtual worlds grow
together to form the Internet of Things. In production, machines as well as
production lines and warehousing systems are increasingly capable of
exchanging information on their own, triggering actions and controlling each
other. The aim is to significantly improve processes in the areas of
development and construction, manufacturing and service. This fourth
industrial revolution represents the linking of industrial manufacturing and
information technology – creating a new level of efficiency and effectiveness.
Industrie 4.0 creates new information spaces linking ERP systems,
databases, the Internet and real-time information from production facilities,
supply chains and products.

The Data Innovation Community “Industrie 4.0” wants to explore important

data-driven aspects of the fourth industrial revolution, such as proactive
service and maintenance of production resources or finding anomalies in
production processes.

The Data Innovation Community “Industrie 4.0” addresses all companies and
research institutions interested in conducting joint research with regard to
these aspects. This includes user companies as well as companies from the
automation and IT industries.

1.2.2 Data Innovation Community “Energy”

The energy industry is facing fundamental changes. The move towards

renewable energies; the EU stipulation to install smart meters; the
development of new, customer-centred business models: all these changes
combine to form entirely new challenges for IT infrastructure of the energy
industry. By analysing comprehensive data, both structured and
unstructured, e.g. data generated by mobile device apps, web portals or
social media, utility companies will be able to optimise their business
processes and develop new business models. A case in point: Big Data
analyses enable better consumption forecasts so that energy providers will
be able to better manage and control their energy purchases on the energy
markets. Thanks to Big Data, consumption rate models can be better tailored
towards specific user groups, and unhappy customers can be identified more
quickly – allowing for measures aimed at ensuring higher customer retention.

Seite 6
The Data Innovation Community “Energy” wants to explore important data-
driven aspects in the area of energy, such as the demand-driven fine-tuning
of consumption rate models based on smart meter generated data.

The Data Innovation Community “Energy” addresses all companies and

research institutions interested in conducting joint research with regard to
these aspects. This includes energy industry user companies as well as
companies from the automation and IT industries.

1.2.3 Data Innovation Community “Smart Cities”

Urban development and traffic management are also areas where Big Data
analyses open up entirely new possibilities. By means of integrated transport
communication solutions and intelligent traffic management systems, traffic in
fast-growing, densely populated urban areas can be managed better. In
cities, immense masses of data are generated by subway trains, busses,
taxis and traffic cameras, just to name a few. The existing IT environment
hardly allows for making forecasts or even extended data analyses in order
to play through different traffic and transport scenarios. But that’s the only
way to improve the respective services and further urban planning. Once
information can be analysed in real-time, correctly interpreted and put into
context with historical data, then traffic jams and dangerous situations can be
identified at an early stage, leading to a significant decrease in traffic volume,
emissions and driving time.

The Data Innovation Community “Smart Cities” wants to explore important

data-driven aspects of urban life, such as traffic control, but also waste
disposal or disaster control.

The Data Innovation Community “Smart Cities” addresses all companies and
research institutions interested in conducting joint research with regard to
these aspects, but also public bodies. This includes user companies as well
as companies from the automation and IT industries.

1.3 Data Innovation Community “Personalized Medicine”

Modern medicine as well generates increasingly larger data quantities.
Reasons for this are higher resolution data from state-of-the-art diagnostic
methods like magnetic resonance imaging (MRI), IT controlled medical
technology, comprehensive medical documentation and the ever more
detailed knowledge about the human genome. A case in point: personalised
cancer therapy. There, increasing use of software aims at taking terabytes of
data from clinical, molecular and medication data in diverse formats and

Seite 7
distilling from them effective treatment options for each individual patient in
real-time, in order to significantly improve treatment results.

Within the Data Innovation Community “Personalised Medicine”, important

data-driven aspects of personalised medicine are to be explored, such as the
need-driven care of patients, IT controlled medical technology or even web-
based patient care.

The Data Innovation Community “Personalised Medicine” addresses all

companies and research institutions interested in conducting joint research
with regard to these aspects. This includes industry user companies and
clinics but also companies from the automation and IT industries.

1.4 Legal, security and curation as cross-cutting activities

Template agreements and processes ensure fast project initiation at
maximum legal security fit to the common technological platform. A
standardized process allows anyone to set up a new collaborative project at
SDIL within 2 weeks.

Once you have successfully registered for the SDIL service, partners are
allowed to upload and work with your data on the SDIL Platform. The data
providers can upload their data using the SFTP or SCP protocols. All users
get a dedicated private home directory for their files. For projects involving
multiple users a project directory is available which is only accessible by the
project members.

The SDIL platform is protected by several layers of firewalls. Access to the

platform is only possible via dedicated login machines and only to users
which were approved beforehand in our identity management system. The
hardware itself is operated in a segregated server room with a dedicated
access control [Link] data processing takes place in compliance with
German data protection rules and regulations. Data sources are only
accessible if such access was expressively granted by the data provider in
[Link] against data loss we do frequent encrypted backups to
our tape library. All data is deleted from the platform after the project finished.

The SDIL guarantees a sustainable invest to all partners by curating

industrial data sources, best practices and code artifacts, that are contributed
on a fair share basis. Furthermore, it actively includes open data and open
source developments to augment the unique industrial grade solutions
provided within the platform.

Seite 8

Bda Unit-I
No ratings yet
Bda Unit-I
15 pages
BIGDATA
No ratings yet
BIGDATA
43 pages
Data Warehouse & BI Essentials
No ratings yet
Data Warehouse & BI Essentials
44 pages
Solutions: Week 2 Unit 1: SAP Big Data
No ratings yet
Solutions: Week 2 Unit 1: SAP Big Data
51 pages
MACHINE-LEARNING
No ratings yet
MACHINE-LEARNING
44 pages
Introduction to Big Data Concepts
No ratings yet
Introduction to Big Data Concepts
58 pages
Big Data Analytics Unit-1
100% (2)
Big Data Analytics Unit-1
5 pages
BDA1-4 Bunits
No ratings yet
BDA1-4 Bunits
113 pages
IBM - Architecting A Big Data Platform For - White Paper - IML14333USEN PDF
No ratings yet
IBM - Architecting A Big Data Platform For - White Paper - IML14333USEN PDF
36 pages
01 Alice Joseph
No ratings yet
01 Alice Joseph
7 pages
SAP Leonardo Intro 2017.10.19 PDF
No ratings yet
SAP Leonardo Intro 2017.10.19 PDF
98 pages
Ijieeb V11 N1 3
No ratings yet
Ijieeb V11 N1 3
8 pages
Understanding Big Data Types and History
No ratings yet
Understanding Big Data Types and History
22 pages
SAP's IoT and Big Data Insights
No ratings yet
SAP's IoT and Big Data Insights
30 pages
Business Intelligence
No ratings yet
Business Intelligence
26 pages
Types of Digital Data: Unit 1 Big Data KCS-061
No ratings yet
Types of Digital Data: Unit 1 Big Data KCS-061
12 pages
Big Data Analytics
0% (1)
Big Data Analytics
19 pages
Data Architecture for Analytics Tools
No ratings yet
Data Architecture for Analytics Tools
21 pages
Business Intelligence
No ratings yet
Business Intelligence
18 pages
Cours BI 23 24 Session 4 2
No ratings yet
Cours BI 23 24 Session 4 2
46 pages
Introduction to Big Data Tools
No ratings yet
Introduction to Big Data Tools
40 pages
Big Data Analytics
100% (1)
Big Data Analytics
14 pages
Business Intelligence in SQL Server
No ratings yet
Business Intelligence in SQL Server
11 pages
Big Data: Hadoop Framework Guide
No ratings yet
Big Data: Hadoop Framework Guide
3 pages
4 Big Data Architectures, Data Streaming, Lambda Architecture, Kappa Architecture, and Unifield Architecture
No ratings yet
4 Big Data Architectures, Data Streaming, Lambda Architecture, Kappa Architecture, and Unifield Architecture
7 pages
UNIT-1 BigData
No ratings yet
UNIT-1 BigData
10 pages
Big Data Analysis Solutions For Driving Innovation in On-Site Decision Making
No ratings yet
Big Data Analysis Solutions For Driving Innovation in On-Site Decision Making
9 pages
An Overview of Business Intelligence Technology
No ratings yet
An Overview of Business Intelligence Technology
11 pages
Big Data Tools & Technologies Overview
No ratings yet
Big Data Tools & Technologies Overview
6 pages
Unit-5 IBM Big Data Strategy
No ratings yet
Unit-5 IBM Big Data Strategy
7 pages
BA - Unit-1
No ratings yet
BA - Unit-1
82 pages
Types of Digital Data & Big Data
No ratings yet
Types of Digital Data & Big Data
136 pages
Dwbi Unit 4 & 5
No ratings yet
Dwbi Unit 4 & 5
26 pages
Demystifying Big Data RGc1.0
100% (1)
Demystifying Big Data RGc1.0
10 pages
Lecture 2
No ratings yet
Lecture 2
25 pages
Big Data as a Service Overview
No ratings yet
Big Data as a Service Overview
36 pages
Data Analytics with Spark Overview
No ratings yet
Data Analytics with Spark Overview
29 pages
An Overview of Business Intelligence Technology
No ratings yet
An Overview of Business Intelligence Technology
11 pages
BIG DATA Notes
No ratings yet
BIG DATA Notes
11 pages
12nov13 Big Data Analytics With Teradata Revolution Analytics Webinar 131112122130 Phpapp02
No ratings yet
12nov13 Big Data Analytics With Teradata Revolution Analytics Webinar 131112122130 Phpapp02
55 pages
Big Data Ibm 2014
No ratings yet
Big Data Ibm 2014
33 pages
BYTE D1-4 BigDataTechnologiesInfrastructures FINAL - Compressed
No ratings yet
BYTE D1-4 BigDataTechnologiesInfrastructures FINAL - Compressed
34 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
20 pages
Unit 1 Big Data
No ratings yet
Unit 1 Big Data
79 pages
Unit II
No ratings yet
Unit II
32 pages
SAP HANA - Fundamentals
No ratings yet
SAP HANA - Fundamentals
15 pages
Big Data Technology Report With Pages Removed
No ratings yet
Big Data Technology Report With Pages Removed
32 pages
ADBMS Chapter 9 Datawarehouse
No ratings yet
ADBMS Chapter 9 Datawarehouse
39 pages
Semantic Web 2022
No ratings yet
Semantic Web 2022
70 pages
MODULE 1 - ST
No ratings yet
MODULE 1 - ST
13 pages
Big Data Storage Platforms
No ratings yet
Big Data Storage Platforms
19 pages
Presentation 20
No ratings yet
Presentation 20
31 pages
Optimizing Business Intelligence System Using Big Data and Machine Learning
No ratings yet
Optimizing Business Intelligence System Using Big Data and Machine Learning
23 pages
Unit 1 Big Data
No ratings yet
Unit 1 Big Data
124 pages
KSO HANA ETL & Forecasting Guide
No ratings yet
KSO HANA ETL & Forecasting Guide
7 pages
Mobile Phones PDF
No ratings yet
Mobile Phones PDF
19 pages
Financial Accounting Duchac Reeve Warren
No ratings yet
Financial Accounting Duchac Reeve Warren
8 pages
Mcookbook PDF
No ratings yet
Mcookbook PDF
271 pages
EDP Training Sponsorship Form
0% (1)
EDP Training Sponsorship Form
6 pages
Technews 56
No ratings yet
Technews 56
13 pages
S111010 Best Practices For Your SAP Migration To HANA CenturyLink
No ratings yet
S111010 Best Practices For Your SAP Migration To HANA CenturyLink
32 pages
SAP Sfin Transaction Codes
No ratings yet
SAP Sfin Transaction Codes
8 pages
Ec2 CLT
No ratings yet
Ec2 CLT
829 pages
Course FT601
No ratings yet
Course FT601
4 pages
Upgrade Hol Instructions 2828570 PDF
No ratings yet
Upgrade Hol Instructions 2828570 PDF
28 pages
Sap Hana On Vmware Vsphere With Netapp Fas and All Flash Fas Systems
No ratings yet
Sap Hana On Vmware Vsphere With Netapp Fas and All Flash Fas Systems
32 pages
Implementing Sap Erp Financials V Narayan 59c5c67e1723dd45ad51e2ab
0% (1)
Implementing Sap Erp Financials V Narayan 59c5c67e1723dd45ad51e2ab
4 pages
Tally.ERP 9: Essential for Business & Accounting
No ratings yet
Tally.ERP 9: Essential for Business & Accounting
8 pages
Plastic Crates
No ratings yet
Plastic Crates
9 pages
Apache Pig: Data Analysis with Pig Latin
No ratings yet
Apache Pig: Data Analysis with Pig Latin
61 pages
Ddbms 1233
100% (1)
Ddbms 1233
29 pages
Big Data - 1
No ratings yet
Big Data - 1
48 pages
Paper Science Engineering Inventions
100% (1)
Paper Science Engineering Inventions
12 pages
Big Data Analytics 16 Marks Answers
No ratings yet
Big Data Analytics 16 Marks Answers
2 pages
ABDE™ - Introduction
No ratings yet
ABDE™ - Introduction
16 pages
IoT Platforms: Arduino, Cloud & Analytics
No ratings yet
IoT Platforms: Arduino, Cloud & Analytics
31 pages
Apache Kafka Course Curriculum Overview
No ratings yet
Apache Kafka Course Curriculum Overview
5 pages
Mir Shezan Data Analyst Resume
No ratings yet
Mir Shezan Data Analyst Resume
3 pages
Technical Seminar On Big Data'
No ratings yet
Technical Seminar On Big Data'
14 pages
CS8791 QB
No ratings yet
CS8791 QB
11 pages
Lecture 1 - Introduction To Analytics - 2025 - Updated2
No ratings yet
Lecture 1 - Introduction To Analytics - 2025 - Updated2
70 pages
Nbu 100 DB SCL
No ratings yet
Nbu 100 DB SCL
39 pages
CC - Course Material-Unit-V
No ratings yet
CC - Course Material-Unit-V
55 pages
Understanding Block Abstraction in HDFS
No ratings yet
Understanding Block Abstraction in HDFS
24 pages
6 Ise
No ratings yet
6 Ise
50 pages
Crop Disease Detection and Solutions Using Big Data
No ratings yet
Crop Disease Detection and Solutions Using Big Data
5 pages
HBase Architecture and Its Important Components
No ratings yet
HBase Architecture and Its Important Components
11 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
8 pages
Distributed File Systems Overview
No ratings yet
Distributed File Systems Overview
4 pages
Review MapReduce Simplified Data Processing On Large Cluster
No ratings yet
Review MapReduce Simplified Data Processing On Large Cluster
3 pages
Big Data
No ratings yet
Big Data
1 page
Overview of the Hadoop Ecosystem
100% (2)
Overview of the Hadoop Ecosystem
33 pages
Hadoop Security Protecting Your Big Data Platform 1st Edition Ben Spivey PDF Download
No ratings yet
Hadoop Security Protecting Your Big Data Platform 1st Edition Ben Spivey PDF Download
45 pages
Hadoop With Python
100% (7)
Hadoop With Python
71 pages
Big Data For Dummies
100% (1)
Big Data For Dummies
64 pages
CS8091 Big Data Analytics
No ratings yet
CS8091 Big Data Analytics
28 pages
Module 06 Hive - Distributed Data Warehouse
No ratings yet
Module 06 Hive - Distributed Data Warehouse
36 pages
Unit 3 Hbase, Mongodb and Couch DB
No ratings yet
Unit 3 Hbase, Mongodb and Couch DB
12 pages
MapReduce Programming in Hadoop
No ratings yet
MapReduce Programming in Hadoop
42 pages

1 Das SDIL Smart Data Testbed

Uploaded by

1 Das SDIL Smart Data Testbed

Uploaded by

1 Das SDIL Smart Data Testbed

Figure1: The SDIL Innovation Cycle

1.1.1 SAP HANA

SAP HANA is a revolutionary platform that allows customers to explore and

In addition, we installed the Application Function Library (AFL) on the HANA

 The Predictive Analysis Library (PAL) is a set of functions in the AFL.

System: SAP HANA

1.1.2 TerracottaBigMemory Max

Terracotta BigMemory Max is the in-memory data management platform for

1.1.3 IBM Open Platform with Hadoop and Spark

IBM SPSS Modeler

IBM SPSS Analytic Server

In order to use the SDIL resources efficiently and to avoid interference

1.2.1 Data Innovation Community “Industrie 4.0”

The Data Innovation Community “Industrie 4.0” wants to explore important

1.2.2 Data Innovation Community “Energy”

The energy industry is facing fundamental changes. The move towards

The Data Innovation Community “Energy” addresses all companies and

1.2.3 Data Innovation Community “Smart Cities”

The Data Innovation Community “Smart Cities” wants to explore important

1.3 Data Innovation Community “Personalized Medicine”

Within the Data Innovation Community “Personalised Medicine”, important

The Data Innovation Community “Personalised Medicine” addresses all

1.4 Legal, security and curation as cross-cutting activities

The SDIL platform is protected by several layers of firewalls. Access to the

The SDIL guarantees a sustainable invest to all partners by curating

You might also like