Librería Portfolio

¿Necesita ayuda? 622 313 414

Búsqueda avanzada

Materias

TIENE EN SU CESTA DE LA COMPRA

en total 0,00 €

Ir a la cesta

LEARNING SPARK. LIGHTNING-FAST BIG DATA ANALYSIS

Título:: LEARNING SPARK. LIGHTNING-FAST BIG DATA ANALYSIS
Subtítulo:
Autor:: KARAU, H
Editorial:: O´REILLY
Año de edición:: 2015
Materia: DATA WAREHOUSING Y MINERIA DE DATOS
ISBN:: 978-1-4493-5862-4
Páginas:: 274
: 39,95 €

Sinopsis

Data in all domains is getting bigger. How can you work with it efficiently? This book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala.

Written by the developers of Spark, this book will have data scientists and engineers up and running in no time. You'll learn how to express parallel jobs with just a few lines of code, and cover applications from simple batch jobs to stream processing and machine learning.

Quickly dive into Spark capabilities such as distributed datasets, in-memory caching, and the interactive shell
Leverage Spark's powerful built-in libraries, including Spark SQL, Spark Streaming, and MLlib
Use one programming paradigm instead of mixing and matching tools like Hive, Hadoop, Mahout, and Storm
Learn how to deploy interactive, batch, and streaming applications
Connect to data sources including HDFS, Hive, JSON, and S3
Master advanced topics like data partitioning and shared variables

A Brief History of Spark
Spark Versions and Releases
Storage Layers for Spark
Chapter 2Downloading Spark and Getting Started
Downloading Spark
Introduction to Spark's Python and Scala Shells
Introduction to Core Spark Concepts
Standalone Applications
Conclusion
Chapter 3Programming with RDDs
RDD Basics
Creating RDDs
RDD Operations
Passing Functions to Spark
Common Transformations and Actions
Persistence (Caching)
Conclusion
Chapter 4Working with Key/Value Pairs
Motivation
Creating Pair RDDs
Transformations on Pair RDDs
Actions Available on Pair RDDs
Data Partitioning (Advanced)
Conclusion
Chapter 5Loading and Saving Your Data
Motivation
File Formats
Filesystems
Structured Data with Spark SQL
Databases
Conclusion
Chapter 6Advanced Spark Programming
Introduction
Accumulators
Broadcast Variables
Working on a Per-Partition Basis
Piping to External Programs
Numeric RDD Operations
Conclusion
Chapter 7Running on a Cluster
Introduction
Spark Runtime Architecture
Deploying Applications with spark-submit
Packaging Your Code and Dependencies
Scheduling Within and Between Spark Applications
Cluster Managers
Which Cluster Manager to Use?
Conclusion
Chapter 8Tuning and Debugging Spark
Configuring Spark with SparkConf
Components of Execution: Jobs, Tasks, and Stages
Finding Information
Key Performance Considerations
Conclusion
Chapter 9Spark SQL
Linking with Spark SQL
Using Spark SQL in Applications
Loading and Saving Data
JDBC/ODBC Server
User-Defined Functions
Spark SQL Performance
Conclusion
Chapter 10Spark Streaming
A Simple Example
Architecture and Abstraction
Transformations
Output Operations
Input Sources
24/7 Operation
Streaming UI
Performance Considerations
Conclusion
Chapter 11Machine Learning with MLlib
Overview
System Requirements
Machine Learning Basics
Data Types
Algorithms
Tips and Performance Considerations
Pipeline API
Conclusion