BIG DATA
サービスとツール
サービス
ビッグデータ解析
ミックスパネル(米)
● https://mixpanel.com/segmentation/
● http://youtu.be/nR2MzOeMoLc
● Google Analyticsより詳細にユーザ行動分析可能
● A/Bテストやファネルドライバー分析可。
● 簡単なコード、ミックスパネルのAPI経由により収集・分析可
能
Five Rocks(韓)
● https://www.5rocks.io/en/technology
● ミックスパネル的な機能を広告/ゲーム向けに強化
● Tapjoy社による買収
Intimate Merger
● http://corp.intimatemerger.com/
● インターネット上の様々なサーバーに蓄積されるビッグデー
タや自社サイトのログデータなどを一元管理、分析
● 広告配信などのアクションプランの最適化を実現
Flurry
● http://www.flurry.com/
● モバイルアナリティクス:ユーザ・アプリ インターアクション解
析
● In-app広告システム
Google Analytics
● http://www.google.com/analytics/features/
● http://youtu.be/WC3ONXJn9FQ
● 分析ツール
● コンテンツの分析
● ソーシャル解析
● モバイル アクセス解析
● コンバージョン解析
● 広告の分析
サービス
スマホアプリのクラッシュログ解析
Crittercism
● https://www.crittercism.com/
● ライブラリをアプリに追加
● 無償版/有償版有り。(フリーミアムモデル)
● 有償版は $24/月
● クラッシュログを Unresolved, Known, Resolved の3種
類に分類
● 同様のログをまとめる
Bugsense
(Splunk MINT Express)
● https://mint.splunk.com/
● ほぼcrittercismと同。
● 有償版は $19/月
● 無料プランは1ヶ月あたり500件までしかログを取得できな
い。
Smartbeat
● http://smrtbeat.com/
● http://youtu.be/P7y4gOASy80
● マルチプラットフォーム対応
● iOS(Objective-C), Android(Java) に加えiOS C/C++,
Android NDK(C/C++)
● レイヤーでのクラッシュ、例外発生の検知・解析が可
● ゲームエンジンUnity(C#, JS), Cocos2d-xでのエラー検知
もサポート済
● クラッシュ直前のユーザの画面キャプチャ確認可能
ツール
収集 保管 検索 共有 解析 可視化
EC2: virtual private servers using Xen.
EMR: (Elastic MapReduce): allows businesses, researchers, data analysts, and developers to easily
and cheaply process vast amounts of data. It uses a hosted Hadoop framework running on the web-
scale infrastructure of EC2 and Amazon S3.
S3: Web based storage.
Redshift: petabyte-scale data warehousing with column-based storage and multi-node compute.
SimpleDB: allows developers to run queries on structured data. It operates in concert with EC2 and S3
to provide "the core functionality of a database".
DynamoDB: scalable, low-latency NoSQL online Database Service backed by SSDs.
RDS: scalable database server with MySQL, Oracle, SQL Server, and PostgreSQL support.
http://en.wikipedia.org/wiki/Amazon_Web_Services
Cloud: Amazon Web Services (AWS)
Cloud: Google Cloud Platform
https://cloud.google.com/
Fluentd
http://www.fluentd.org/architecture
http://docs.fluentd.org/articles/free-alternative-to-splunk-by-fluentd
Fluentd + ElasticSearch + Kibana
ElasticSearch: Distributed, scalable 検索エンジン
Kibana: ElasticSearchの可視化UI
http://www.elasticsearch.org/overview/kibana
MongoDB
http://www.mongodb.org/
ドキュメント指向データベース:使いやすい。
簡単にできる:
● レプリケーション、High Availability
● Auto-Sharding
● Map/Reduce
http://www.mongodb.com/use-cases/real-time-analytics
Cassandra
http://cassandra.apache.org/
分散、イベンチュアル・コンシステンシー、
列指向データベース
MongoDBより使いにくいが、speedが速いと言われる。
http://www.datastax.com/wp-content/uploads/2013/02/WP-
Benchmarking-Top-NoSQL-Databases.pdf
Hazelcast
http://hazelcast.org/
In-memory, multiple-node data grid cluster
Distributed Data Structures: Map, MultiMap, Queue, Set, List, Topic, Lock,
AtomicLong, AtomicReference, Semaphore, CountDownLatch, IdGenerator
Distributed Computing: Executor Service, Entry Processor
Distributed Query: MapReduce, Aggregators
Integrated Clustering: JCache, Hibernate Second Level Cache, Servlet Session
Replication, Spring Integration, J2EE Transactions
Client inteface: Java, C++, .NET, REST, Memcache
Hadoop
http://hadooparchitecturetraining.blogspot.
jp/2013/05/apache-hadoop-components-ecosystem.html
https://storm.apache.org/
● Distributed realtime computation system.
● Storm makes it easy to reliably process unbounded
streams of data, doing for realtime processing what
Hadoop did for batch processing.
● Use cases: realtime analytics, online machine
learning, continuous computation, distributed RPC,
ETL, and more
Realtime: Storm
Batch + Realtime: Spark
https://spark.apache.org/
http://www.slideshare.net/search/slideshow?q=apache+spark
速度:
100x faster than Hadoop MapReduce in memory, or 10x
faster on disk.
Runs Everywhere:
Spark runs on Hadoop, Mesos, standalone, or in the
cloud. It can access diverse data sources including
HDFS, Cassandra, HBase, S3.
汎用性:
Combine SQL, streaming, and complex analytics.
Machine learning 機能 (MLlib 1.1):
● linear SVM and logistic regression
● classification and regression tree
● k-means clustering
● recommendation via alternating least squares
● singular value decomposition
● linear regression with L1- and L2-regularization
● multinomial naive Bayes
● basic statistics
● feature transformations
Graph 機能:
● GraphX unifies ETL, exploratory analysis, and
iterative graph computation within a single system.
● Seamlessly work with both graphs and collections:
You can view the same data as both graphs and
collections, transform and join graphs with RDDs
efficiently, and write custom iterative graph
algorithms using the Pregel API.
● Algorithms: PageRank, Connected components,
Label propagation, SVD++, Strongly connected
components, Triangle count...
Streaming 機能:
Spark Streaming can read data from HDFS, Flume,
Kafka, Twitter and ZeroMQ. You can also define your
own custom data sources.
Cloud: Databricks
https://databricks.com/product
Founded by the creators of Apache Spark, that aims to
help clients with cloud-based big data processing
using Spark.
http://youtu.be/dJQ5lV5Tldw
可視化ツール
Web時代ですから、可視化ツールは基本的にJavaScriptで
す:
https://github.com/sorrycc/awesome-
javascript#data-visualization
例: D3 https://github.com/mbostock/d3/wiki/Gallery
例: Query Builder
http://mistic100.github.io/jQuery-QueryBuilder/

BIG DATA サービス と ツール