Vous êtes sur la page 1sur 184

Programming with Big Data in R

Drew Schmidt and George Ostrouchov

useR! 2014

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

The pbdR Core Team


Wei-Chen Chen1
George Ostrouchov2,3
Pragneshkumar Patel3
Drew Schmidt3
Support
This work used resources of National Institute for Computational Sciences at the University of Tennessee, Knoxville, which is
supported by the Office of Cyberinfrastructure of the U.S. National Science Foundation under Award No.
ARRA-NSF-OCI-0906324 for NICS-RDAV center. This work also used resources of the Oak Ridge Leadership Computing Facility
at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under
Contract No. DE-AC05-00OR22725.
1

Department of Ecology and Evolutionary Biology


University of Tennessee, Knoxville TN, USA
2

Computer Science and Mathematics Division


Oak Ridge National Laboratory, Oak Ridge TN, USA
3

Joint Institute for Computational Sciences


University of Tennessee, Knoxville TN, USA

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

About This Presentation

Downloads
This presentation is available at: http://r-pbd.org/tutorial

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

About This Presentation

Installation Instructions
Installation instructions for setting up a pbdR environment are available:
http://r-pbd.org/install.html
This includes instructions for installing R, MPI, and pbdR .

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Contents
1

Introduction

Profiling and Benchmarking

The pbdR Project

Introduction to pbdMPI

The Generalized Block Distribution

Basic Statistics Examples

Data Input

Introduction to pbdDMAT and the ddmatrix Structure

Examples Using pbdDMAT

10

MPI Profiling

11

Wrapup

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Introduction

Contents

Introduction
A Concise Introduction to Parallelism
A Quick Overview of Parallel Hardware
A Quick Overview of Parallel Software
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Introduction

A Concise Introduction to Parallelism

Introduction
A Concise Introduction to Parallelism
A Quick Overview of Parallel Hardware
A Quick Overview of Parallel Software
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Introduction

A Concise Introduction to Parallelism

Parallelism
Serial Programming

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Parallel Programming

Programming with Big Data in R

1/131

Introduction

A Concise Introduction to Parallelism

Difficulty in Parallelism
1

Implicit parallelism: Parallel details hidden from user

Explicit parallelism: Some assembly required. . .

Embarrassingly Parallel: Also called loosely coupled. Obvious how to


make parallel; lots of independence in computations.

Tightly Coupled: Opposite of embarrassingly parallel; lots of


dependence in computations.

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

2/131

Introduction

A Concise Introduction to Parallelism

Speedup
Wallclock Time: Time of the clock on the wall from start to finish
Speedup: unitless measure of improvement; more is better.
Sn1 ,n2 =

Run time for n1 cores


Run time for n2 cores

n1 is often taken to be 1
In this case, comparing parallel algorithm to serial algorithm

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

3/131

Introduction

Good Speedup

A Concise Introduction to Parallelism

Bad Speedup

group
Application

Optimal
4

Speedup

Speedup

group
Application
Optimal
4

Cores

http://r-pbd.org/tutorial

Cores

ppbbddR
R Core Team

Programming with Big Data in R

4/131

Introduction

A Concise Introduction to Parallelism

Scalability and Benchmarking


1

Strong: Fixed total problem size.


Less work per core as more cores are added.

Weak: Fixed local (per core) problem size.


Same work per core as more cores are added.

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

5/131

Introduction

Good Strong Scaling

A Concise Introduction to Parallelism

Good Weak Scaling

50

3100

40

3050

30

Time

Speedup

3000
20

10

2950

0
0

20

40

60

Cores

http://r-pbd.org/tutorial

4000

8000

12000

16000

Cores

ppbbddR
R Core Team

Programming with Big Data in R

6/131

Introduction

A Concise Introduction to Parallelism

Shared and Distributed Memory Machines

Shared Memory
Direct access to read/change
memory (one node)

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Distributed
No direct access to read/change
memory (many nodes); requires
communication

Programming with Big Data in R

7/131

Introduction

A Concise Introduction to Parallelism

Shared and Distributed Memory Machines

Shared Memory Machines


Thousands of cores

Nautilus, University of Tennessee


1024 cores
4 TB RAM

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Distributed Memory Machines


Hundreds of thousands of cores

Kraken, University of Tennessee


112,896 cores
147 TB RAM

Programming with Big Data in R

8/131

Introduction

A Quick Overview of Parallel Hardware

Introduction
A Concise Introduction to Parallelism
A Quick Overview of Parallel Hardware
A Quick Overview of Parallel Software
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Introduction

A Quick Overview of Parallel Hardware

Three Basic Flavors of Hardware

Distributed Memory
Interconnection Network
PROC
+ cache

PROC
+ cache

PROC
+ cache

PROC
+ cache

Mem

Mem

Mem

Mem

Co-Processor
GPU
or
MIC

Shared Memory
CORE
+ cache

CORE
+ cache

CORE
+ cache

Local Memory
GPU: Graphical Processing Unit

CORE
+ cache

MIC: Many Integrated Core

Network
Memory

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

9/131

Introduction

A Quick Overview of Parallel Hardware

Your Laptop or Desktop

Distributed Memory
Interconnection Network
PROC
+ cache

PROC
+ cache

PROC
+ cache

PROC
+ cache

Mem

Mem

Mem

Mem

Co-Processor
GPU
or
MIC

Shared Memory
CORE
+ cache

CORE
+ cache

CORE
+ cache

Local Memory
GPU: Graphical Processing Unit

CORE
+ cache

MIC: Many Integrated Core

Network
Memory

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

10/131

Introduction

A Quick Overview of Parallel Hardware

A Server or Cluster

Distributed Memory
Interconnection Network
PROC
+ cache

PROC
+ cache

PROC
+ cache

PROC
+ cache

Mem

Mem

Mem

Mem

Co-Processor
GPU
or
MIC

Shared Memory
CORE
+ cache

CORE
+ cache

CORE
+ cache

Local Memory
GPU: Graphical Processing Unit

CORE
+ cache

MIC: Many Integrated Core

Network
Memory

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

11/131

Introduction

A Quick Overview of Parallel Hardware

Server to Supercomputer

Distributed Memory
Interconnection Network
PROC
+ cache

PROC
+ cache

PROC
+ cache

PROC
+ cache

Mem

Mem

Mem

Mem

Co-Processor
GPU
or
MIC

Shared Memory
CORE
+ cache

CORE
+ cache

CORE
+ cache

Local Memory
GPU: Graphical Processing Unit

CORE
+ cache

MIC: Many Integrated Core

Network
Memory

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

12/131

Introduction

A Quick Overview of Parallel Hardware

Knowing the Right Words

ing
t
r
u
e
ib
st
Clu Distr

Distributed Memory
Interconnection Network
PROC
+ cache

PROC
+ cache

PROC
+ cache

PROC
+ cache

Mem

Mem

Mem

Mem

Mu
CORE
+ cache

e
cor
y
n

a
ing
r M
d
o
a
U
o
ffl
GP
O

Co-Processor

ore
ltic

GPU
or
MIC

Shared Memory
CORE
+ cache

CORE
+ cache

Local Memory
GPU: Graphical Processing Unit

CORE
+ cache

MIC: Many Integrated Core

Network

ing
d
a
re
ti t h
l
u
M

Memory

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

13/131

Introduction

A Quick Overview of Parallel Software

Introduction
A Concise Introduction to Parallelism
A Quick Overview of Parallel Hardware
A Quick Overview of Parallel Software
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Introduction

A Quick Overview of Parallel Software

Native Programming Models and Tools

Distributed Memory

Focus on who owns what data and what


communication is needed

Sockets
MPI
Hadoop

Interconnection Network
PROC
+ cache

PROC
+ cache

PROC
+ cache

PROC
+ cache

Mem

Mem

Mem

Mem

Co-Processor
GPU
or
MIC

Shared Memory
CORE
+ cache

CORE
+ cache

CORE
+ cache

Network
Memory

http://r-pbd.org/tutorial

Same Task on Blocks of data

CUDA
OpenCL
OpenACC

Local Memory
GPU: Graphical Processing Unit

CORE
+ cache

MIC: Many Integrated Core

Focus on which tasks can be


parallel

ppbbddR
R Core Team

OpenMP
OpenACC

OpenMP
Pthreads
fork

Programming with Big Data in R

14/131

Introduction

A Quick Overview of Parallel Software

30+ Years of Parallel Computing Research

Distributed Memory

Focus on who owns what data and what


communication is needed

Sockets
MPI
Hadoop

Interconnection Network
PROC
+ cache

PROC
+ cache

PROC
+ cache

PROC
+ cache

Mem

Mem

Mem

Mem

Co-Processor
GPU
or
MIC

Shared Memory
CORE
+ cache

CORE
+ cache

CORE
+ cache

Network
Memory

http://r-pbd.org/tutorial

Same Task on Blocks of data

CUDA
OpenCL
OpenACC

Local Memory
GPU: Graphical Processing Unit

CORE
+ cache

MIC: Many Integrated Core

Focus on which tasks can be


parallel

ppbbddR
R Core Team

OpenMP
OpenACC

OpenMP
Pthreads
fork

Programming with Big Data in R

15/131

Introduction

A Quick Overview of Parallel Software

Last 10 years of Advances

Distributed Memory

Focus on who owns what data and what


communication is needed

Sockets
MPI
Hadoop

Interconnection Network
PROC
+ cache

PROC
+ cache

PROC
+ cache

PROC
+ cache

Mem

Mem

Mem

Mem

Co-Processor
GPU
or
MIC

Shared Memory
CORE
+ cache

CORE
+ cache

CORE
+ cache

Network
Memory

http://r-pbd.org/tutorial

Same Task on Blocks of data

CUDA
OpenCL
OpenACC

Local Memory
GPU: Graphical Processing Unit

CORE
+ cache

MIC: Many Integrated Core

Focus on which tasks can be


parallel

ppbbddR
R Core Team

OpenMP
OpenACC

OpenMP
Pthreads
fork

Programming with Big Data in R

16/131

Introduction

A Quick Overview of Parallel Software

Putting It All Together Challenge

Distributed Memory

Focus on who owns what data and what


communication is needed

Sockets
MPI
Hadoop

Interconnection Network
PROC
+ cache

PROC
+ cache

PROC
+ cache

PROC
+ cache

Mem

Mem

Mem

Mem

Co-Processor
GPU
or
MIC

Shared Memory
CORE
+ cache

CORE
+ cache

CORE
+ cache

Network
Memory

http://r-pbd.org/tutorial

Same Task on Blocks of data

CUDA
OpenCL
OpenACC

Local Memory
GPU: Graphical Processing Unit

CORE
+ cache

MIC: Many Integrated Core

Focus on which tasks can be


parallel

ppbbddR
R Core Team

OpenMP
OpenACC

OpenMP
Pthreads
fork

Programming with Big Data in R

17/131

Introduction

A Quick Overview of Parallel Software

R Interfaces to Native Tools

Distributed Memory

Focus on who owns what data and what


communication is needed

PROC
+ cache

PROC
+ cache

PROC
+ cache

PROC
+ cache

Mem

Mem

Mem

Mem

Same Task on Blocks of data

Co-Processor
GPU
or
MIC

Shared Memory
CORE
+ cache

CORE
+ cache

Network
Memory

CUDA
OpenCL
OpenACC

Local Memory
GPU: Graphical Processing Unit

CORE
+ cache

MIC: Many Integrated Core

Focus on which tasks can be


parallel

snow + multicore = parallel

http://r-pbd.org/tutorial

snow
Rmpi
pbdMPI

RHadoop

Interconnection Network

CORE
+ cache

Sockets
MPI
Hadoop

ppbbddR
R Core Team

OpenMP
OpenACC

OpenMP
Pthreads
fork

Foreign
Langauge
Interfaces:
.C
.Call
Rcpp
OpenCL
inline
.
.
.

multicore

Programming with Big Data in R

18/131

Introduction

Summary

Introduction
A Concise Introduction to Parallelism
A Quick Overview of Parallel Hardware
A Quick Overview of Parallel Software
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Introduction

Summary

Summary
Three flavors of hardware
Distributed is stable
Multicore and co-processor are evolving
Two memory models
Distributed works in multicore

Parallelism hierarchy
Medium to big machines have all three

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

19/131

Profiling and Benchmarking

Contents

Profiling and Benchmarking


Why Profile?
Profiling R Code
Advanced R Profiling
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Profiling and Benchmarking

Why Profile?

Profiling and Benchmarking


Why Profile?
Profiling R Code
Advanced R Profiling
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Profiling and Benchmarking

Why Profile?

Performance and Accuracy


Sometimes = 3.14 is (a) infinitely faster
than the correct answer and (b) the difference between the correct and the wrong
answer is meaningless. . . . The thing is, some
specious value of correctness is often irrelevant because it doesnt matter. While performance almost always matters. And I absolutely detest the fact that people so often
dismiss performance concerns so readily.
Linus Torvalds, August 8, 2008

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

20/131

Profiling and Benchmarking

Why Profile?

Why Profile?
Because performance matters.
Bad practices scale up!
Your bottlenecks may surprise you.
Because R is dumb.
R users claim to be data people. . . so act like it!

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

21/131

Profiling and Benchmarking

Why Profile?

Compilers often correct bad behavior. . .


clang example.c
A Really Dumb Loop

main :

int main () {
int x , i ;
for ( i =0; i <10; i ++)
x = 1;
return 0;
}

. cfi _ startproc
# BB #0:
movl
movl

$ 0 , -4(% rsp )
$ 0 , -12(% rsp )

cmpl
jge

$ 10 , -12(% rsp )
. LBB0 _ 4

movl

$ 1 , -8(% rsp )

movl
addl
movl
jmp

-12(% rsp ) , % eax


$ 1 , % eax
% eax , -12(% rsp )
. LBB0 _ 1

movl
ret

$ 0 , % eax

. LBB0 _ 1:

# BB #2:

clang -O3 example.c


# BB #3:

main :
. cfi _ startproc
# BB #0:
xorl

% eax ,
% eax

ret

http://r-pbd.org/tutorial

. LBB0 _ 4:

ppbbddR
R Core Team

Programming with Big Data in R

22/131

Profiling and Benchmarking

Why Profile?

R will not!

Dumb Loop
1
2
3
4
5
6
7

for
tA
Y
Q
Y
Q
}

(i
<<<<<-

Better Loop

in 1: n ) {
t(A)
tA % * % Q
qr . Q ( qr ( Y ) )
A %*% Q
qr . Q ( qr ( Y ) )

1
2
3
4
5
6
7

8
9

for
Y
Q
Y
Q
}

(i
<<<<-

in 1: n ) {
tA % * % Q
qr . Q ( qr ( Y ) )
A %*% Q
qr . Q ( qr ( Y ) )

9
10

http://r-pbd.org/tutorial

tA <- t(A)

ppbbddR
R Core Team

Programming with Big Data in R

23/131

Profiling and Benchmarking

Why Profile?

Example from a Real R Package

Exerpt from Original function


1
2
3
4

while (i <= N ) {
for ( j in 1: i ) {
d.k <- as.matrix(x)[l==j,l==j]
...

Exerpt from Modified function


1

x.mat <- as.matrix(x)

2
3
4
5
6

while (i <= N ) {
for ( j in 1: i ) {
d.k <- x.mat[l==j,l==j]
...

http://r-pbd.org/tutorial

ppbbddR
R Core Team

By changing just 1 line of


code, performance of the
main method improved by
over 350%!

Programming with Big Data in R

24/131

Profiling and Benchmarking

Why Profile?

Some Thoughts
R is slow.
Bad programmers are slower.
R isnt very clever (compared to a compiler).
The Bytecode compiler helps, but not nearly as much as a compiler.

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

25/131

Profiling and Benchmarking

Profiling R Code

Profiling and Benchmarking


Why Profile?
Profiling R Code
Advanced R Profiling
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Profiling and Benchmarking

Profiling R Code

Timings
Getting simple timings as a basic measure of performance is easy, and
valuable.
system.time() timing blocks of code.
Rprof() timing execution of R functions.
Rprofmem() reporting memory allocation in R .
tracemem() detect when a copy of an R object is created.
The rbenchmark package Benchmark comparisons.

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

26/131

Profiling and Benchmarking

Advanced R Profiling

Profiling and Benchmarking


Why Profile?
Profiling R Code
Advanced R Profiling
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Profiling and Benchmarking

Advanced R Profiling

Other Profiling Tools


perf (Linux)
PAPI
MPI profiling: fpmpi, mpiP, TAU

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

27/131

Profiling and Benchmarking

Advanced R Profiling

Profiling MPI Codes with pbdPROF


1. Rebuild pbdR packages
R CMD INSTALL pbdMPI _ 0.2 -1. tar . gz \
-- configure - args = \
" -- enable - pbdPROF "

2. Run code
mpirun - np 64 Rscript my _ script . R

3. Analyze results
1
2
3

library ( pbdPROF )
prof <- read . prof ( " output . mpiP " )
plot ( prof , plot . type = " messages2 " )

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

28/131

Profiling and Benchmarking

Advanced R Profiling

Profiling with pbdPAPI


Performance Application Programming
Interface
High and low level interfaces
Linux only :(
Function
system.flips()
system.flops()
system.cache()
system.epc()
system.idle()
system.cpuormem()
system.utilization()

http://r-pbd.org/tutorial

Description of Measurement
Time, floating point instructions, and Mflips
Time, floating point operations, and Mflops
Cache misses, hits, accesses, and reads
Events per cycle
Idle cycles
CPU or RAM bound
CPU utilization

ppbbddR
R Core Team

Programming with Big Data in R

29/131

Profiling and Benchmarking

Summary

Profiling and Benchmarking


Why Profile?
Profiling R Code
Advanced R Profiling
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Profiling and Benchmarking

Summary

Summary
Profile, profile, profile.
Use system.time() to get a general sense of a method.
Use rbenchmarks benchmark() to compare 2 methods.
Use Rprof() for more detailed profiling.
Other tools exist for more hardcore applications (pbdPAPI and
pbdPROF).

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

30/131

The pbdR Project

Contents

The pbdR Project


The pbdR Project
pbdR Connects R to HPC Libraries
Using pbdR
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

The pbdR Project

The pbdR Project

The pbdR Project


The pbdR Project
pbdR Connects R to HPC Libraries
Using pbdR
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

The pbdR Project

The pbdR Project

Programming with Big Data in R (pbdR)


Striving for Productivity, Portability, Performance
Freea R packages.
Bridging high-performance compiled code
with high-productivity of R
Scalable, big data analytics.
Offers implicit and explicit parallelism.
Methods have syntax identical to R.
a

http://r-pbd.org/tutorial

MPL, BSD, and GPL licensed

ppbbddR
R Core Team

Programming with Big Data in R

31/131

The pbdR Project

The pbdR Project

pbdR Packages

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

32/131

The pbdR Project

The pbdR Project

pbdR Motivation
Why HPC libraries (MPI, ScaLAPACK, PETSc, . . . )?
The HPC community has been at this for decades.
Theyre tested. They work. Theyre fast.
Youre not going to beat Jack Dongarra at dense linear algebra.

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

33/131

The pbdR Project

The pbdR Project

Simple Interface for MPI Operations with pbdMPI


Rmpi
1
2
3
4

pbdMPI

# int
mpi . allreduce (x , type =1)
# double
mpi . allreduce (x , type =2)

allreduce ( x )

Types in R
1
2
3
4
5
6

> is . integer (1)


[1] FALSE
> is . integer (2)
[1] FALSE
> is . integer (1:2)
[1] TRUE

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

34/131

The pbdR Project

The pbdR Project

Distributed Matrices and Statistics with pbdDMAT


Matrix Exponentiation
Speedup (Relative to Serial Code)
Nodes

1 2 3 4

12 4

12

16

24

32

48

64

Cores

library ( pbdDMAT )

2
3
4

dx <- ddmatrix ( " rnorm " , 5000 , 5000)


expm ( dx )

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

35/131

The pbdR Project

The pbdR Project

Distributed Matrices and Statistics with pbdDMAT


Least Squares Benchmark
Fitting y~x With Fixed Local Size of ~43.4 MiB
125

Predictors

.4
41

1.4

.4
41

500

1000

2000

.09

16

10

0.7

17

Run Time (Seconds)

100

.3468 35
2142. 85.

75

.09

16

10

50

.35
85
.3468
2142.

0.7

17

34

.09

16

00

64

24

40

80

17

32

5
.34 8 5.3
2142.6 8

5
1004
08
20
16

25

10
0.7

Cores

x < d d m a t r i x ( rnorm , nrow=m, n c o l=n )


y < d d m a t r i x ( rnorm , nrow=m, n c o l =1)
mdl < lm . f i t ( x=x , y=y )

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

36/131

The pbdR Project

The pbdR Project

Getting Started with HPC for R Users: pbdDEMO


140 page, textbook-style vignette.
Over 30 demos, utilizing all packages.
Not just a hello world!
Demos include:
PCA
Regression
Parallel data input
Model-based clustering
Simple Monte Carlo simulation
Bayesian MCMC

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

37/131

The pbdR Project

pbdR Connects R to HPC Libraries

The pbdR Project


The pbdR Project
pbdR Connects R to HPC Libraries
Using pbdR
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

The pbdR Project

pbdR Connects R to HPC Libraries

R and pbdR Interfaces to HPC Libraries


pbdDEMO

Tau

Distributed Memory
Interconnection Network
ScaLAPACK

PBLAS
BLACSPROC

PROC
+ cache

+ cache

PROC
+ cache

pbdPROF
pbdPROF
pbdPROF

PETSc

mpiP

Trilinos

PROC
+ cache

fpmpi
pbdMPI

pbdDMAT
Mem

Mem

MPI
Mem

Mem

MKL
LibSci

pbdPAPI

PAPI

GPU
or
MIC

DPLASMA

NetCDF4

Shared Memory
CORE
+ cache

Co-Processor

CORE
ACML
+ cache

CORE
+ cache

magma
Local Memory

PLASMA
MAGMA

Network

pbdNCDF4

GPU: Graphical Processing Unit

CORE
+ cache

HiPLARM
HiPLAR

Memory

MIC: Many Integrated Core

ADIOS
cuBLAS
pbdADIOS

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

38/131

The pbdR Project

Using pbdR

The pbdR Project


The pbdR Project
pbdR Connects R to HPC Libraries
Using pbdR
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

The pbdR Project

Using pbdR

pbdR Paradigms
pbdR programs are R programs!
Differences:
Batch execution (non-interactive).
Parallel code utilizes Single Program/Multiple Data (SPMD) style
Emphasizes data parallelism.

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

39/131

The pbdR Project

Using pbdR

Batch Execution
Running a serial R program in batch:
1

Rscript my _ script . r

or
1

R CMD BATCH my _ script . r

Running a parallel (with MPI) R program in batch:


1

mpirun - np 2 Rscript my _ par _ script . r

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

40/131

The pbdR Project

Using pbdR

Single Program/Multiple Data (SPMD)


SPMD is a programming paradigm.
Not to be confused with SIMD.

Paradigms
Programming models
OOP, Functional, SPMD, . . .

http://r-pbd.org/tutorial

ppbbddR
R Core Team

SIMD
Hardware instructions
MMX, SSE, . . .

Programming with Big Data in R

41/131

The pbdR Project

Using pbdR

Single Program/Multiple Data (SPMD)


SPMD is arguably the simplest extension of serial programming.
Only one program is written, executed in batch on all processors.
Different processors are autonomous; there is no manager.
Dominant programming model for large machines for 30 years.

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

42/131

The pbdR Project

Summary

Summary
pbdR connects R to scalable HPC libraries.
The pbdDEMO package offers numerous examples and explanations
for getting started with distributed R programming.
pbdR programs are R programs.

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

43/131

Introduction to pbdMPI

Contents

Introduction to pbdMPI
Managing a Communicator
Reduce, Gather, Broadcast, and Barrier
Other pbdMPI Tools
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Introduction to pbdMPI

Managing a Communicator

Introduction to pbdMPI
Managing a Communicator
Reduce, Gather, Broadcast, and Barrier
Other pbdMPI Tools
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Introduction to pbdMPI

Managing a Communicator

Message Passing Interface (MPI)


MPI : Standard for managing communications (data and instructions)
between different nodes/computers.
Implementations: OpenMPI, MPICH2, Cray MPT, . . .
Enables parallelism (via communication) on distributed machines.
Communicator : manages communications between processors.

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

44/131

Introduction to pbdMPI

Managing a Communicator

MPI Operations (1 of 2)
Managing a Communicator: Create and destroy communicators.
init() initialize communicator
finalize() shut down communicator(s)
Rank query: determine the processors position in the communicator.
comm.rank() who am I?
comm.size() how many of us are there?
Printing: Printing output from various ranks.
comm.print(x)
comm.cat(x)
WARNING: only use these functions on results, never on
yet-to-be-computed things.

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

45/131

Introduction to pbdMPI

Managing a Communicator

Quick Example 1
Rank Query: 1 rank.r
1
2

library ( pbdMPI , quietly = TRUE )


init ()

3
4
5

my . rank <- comm . rank ()


comm . print ( my . rank , all . rank = TRUE )

6
7

finalize ()

mpirun - np 2 Rscript 1 _ rank . r

Execute this script via:

Sample Output:
1
2
3
4

http://r-pbd.org/tutorial

ppbbddR
R Core Team

COMM . RANK = 0
[1] 0
COMM . RANK = 1
[1] 1

Programming with Big Data in R

46/131

Introduction to pbdMPI

Managing a Communicator

Quick Example 2
Hello World: 2 hello.r
1
2

library ( pbdMPI , quietly = TRUE )


init ()

3
4

comm . print ( " Hello , world " )

5
6

comm . print ( " Hello again " , all . rank = TRUE , quietly = TRUE )

7
8

finalize ()

Execute this script via:


1

mpirun - np 2 Rscript 2 _ hello . r

Sample Output:
1
2
3
4

http://r-pbd.org/tutorial

ppbbddR
R Core Team

COMM . RANK = 0
[1] " Hello , world "
[1] " Hello again "
[1] " Hello again "

Programming with Big Data in R

47/131

Introduction to pbdMPI

Reduce, Gather, Broadcast, and Barrier

Introduction to pbdMPI
Managing a Communicator
Reduce, Gather, Broadcast, and Barrier
Other pbdMPI Tools
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Introduction to pbdMPI

Reduce, Gather, Broadcast, and Barrier

MPI Operations
1

Reduce

Gather

Broadcast

Barrier

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

48/131

Introduction to pbdMPI

Reduce, Gather, Broadcast, and Barrier

Reductions Combine results into single result

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

49/131

Introduction to pbdMPI

Reduce, Gather, Broadcast, and Barrier

Gather Many-to-one

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

50/131

Introduction to pbdMPI

Reduce, Gather, Broadcast, and Barrier

Broadcast One-to-many

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

51/131

Introduction to pbdMPI

Reduce, Gather, Broadcast, and Barrier

Barrier Synchronization

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

52/131

Introduction to pbdMPI

Reduce, Gather, Broadcast, and Barrier

MPI Operations (2 of 2)
Reduction: each processor has a number x; add all of them up, find
the largest/smallest, . . . .
reduce(x, op=sum) reduce to one
allreduce(x, op=sum) reduce to all
Gather: each processor has a number; create a new object on some
processor containing all of those numbers.
gather(x) gather to one
allgather(x) gather to all
Broadcast: one processor has a number x that every other processor
should also have.
bcast(x)
Barrier: computation wall; no processor can proceed until all
processors can proceed.
barrier()
http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

53/131

Introduction to pbdMPI

Reduce, Gather, Broadcast, and Barrier

Quick Example 3
Reduce and Gather: 3 gt.r
1
2

library ( pbdMPI , quietly = TRUE )


init ()

3
4

comm . set . seed ( diff = TRUE )

5
6

n <- sample (1:10 , size =1)

7
8
9

gt <- gather ( n )
comm . print ( unlist ( gt ) )

10
11
12

sm <- allreduce (n , op = sum )


comm . print ( sm , all . rank = T )

13
14

finalize ()

Execute this script via:


1

mpirun - np 2 Rscript 3 _ gt . r

Sample Output:
1
2
3
4
5
6

http://r-pbd.org/tutorial

ppbbddR
R Core Team

COMM . RANK = 0
[1] 2 8
COMM . RANK = 0
[1] 10
COMM . RANK = 1
[1] 10

Programming with Big Data in R

54/131

Introduction to pbdMPI

Reduce, Gather, Broadcast, and Barrier

Quick Example 4
Broadcast: 4 bcast.r
1
2

library ( pbdMPI , quietly = T )


init ()

3
4
5
6
7
8

if ( comm . rank () ==0) {


x <- matrix (1:4 , nrow =2)
} else {
x <- NULL
}

9
10

y <- bcast (x , rank . source =0)

11
12

comm . print (y , rank =1)

13
14

finalize ()

Execute this script via:


1

mpirun - np 2 Rscript 4 _ bcast . r

Sample Output:
1
2
3
4

http://r-pbd.org/tutorial

ppbbddR
R Core Team

COMM . RANK = 1
[ ,1] [ ,2]
[1 ,]
1
3
[2 ,]
2
4

Programming with Big Data in R

55/131

Introduction to pbdMPI

Other pbdMPI Tools

Introduction to pbdMPI
Managing a Communicator
Reduce, Gather, Broadcast, and Barrier
Other pbdMPI Tools
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Introduction to pbdMPI

Other pbdMPI Tools

Random Seeds
pbdMPI offers a simple interface for managing random seeds:
comm.set.seed(seed=1234, diff=TRUE) All processors
generate different streams.
comm.set.seed(seed=1234, diff=FALSE) All processors
generate same streams.

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

56/131

Introduction to pbdMPI

Other pbdMPI Tools

Other Helper Tools


pbdMPI Also contains useful tools for Manager/Worker and task
parallelism codes:
Task Subsetting: Distributing a list of jobs/tasks
get.jid(n)
*ply: Functions in the *ply family.
pbdApply(X, MARGIN, FUN, ...) analogue of apply()
pbdLapply(X, FUN, ...) analogue of lapply()
pbdSapply(X, FUN, ...) analogue of sapply()

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

57/131

Introduction to pbdMPI

Summary

Introduction to pbdMPI
Managing a Communicator
Reduce, Gather, Broadcast, and Barrier
Other pbdMPI Tools
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Introduction to pbdMPI

Summary

Summary
Start by loading the package:
1

library ( pbdMPI , quiet = TRUE )

Always initialize before starting and finalize when finished:


1

init ()

2
3

# ...

4
5

finalize ()

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

58/131

GBD

Contents

The Generalized Block Distribution


GBD: a Way to Distribute Your Data
Example GBD Distributions
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

GBD

GBD: a Way to Distribute Your Data

The Generalized Block Distribution


GBD: a Way to Distribute Your Data
Example GBD Distributions
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

GBD

GBD: a Way to Distribute Your Data

Distributing Data
Problem: How to distribute the data

x =

x1,1
x2,1
x3,1
x4,1
x5,1
x6,1
x7,1
x8,1
x9,1
x10,1

http://r-pbd.org/tutorial

x1,2
x2,2
x3,2
x4,2
x5,2
x6,2
x7,2
x8,2
x9,2
x10,2

x1,3
x2,3
x3,3
x4,3
x5,3
x6,3
x7,3
x8,3
x9,3
x10,3

103

ppbbddR
R Core Team

Programming with Big Data in R

59/131

GBD

Distributing a Matrix
Data

x1,1 x1,2
x2,1 x2,2

x
3,1 x3,2
x
4,1 x4,2

x5,2
x
x = 5,1
x6,1 x6,2

x7,1 x7,2

x8,1 x8,2

x9,1 x9,2
x10,1 x10,2

http://r-pbd.org/tutorial

GBD: a Way to Distribute Your Data

Across 4 Processors: Block Distribution


Processors

0
x1,3
1
x2,3

2
x3,3

3
x4,3

x5,3

x6,3

x7,3

x8,3

x9,3
x10,3 103

ppbbddR
R Core Team

Programming with Big Data in R

60/131

GBD

Distributing a Matrix
Data

x1,1 x1,2
x2,1 x2,2

x
3,1 x3,2
x
4,1 x4,2

x5,2
x
x = 5,1
x6,1 x6,2

x7,1 x7,2

x8,1 x8,2

x9,1 x9,2
x10,1 x10,2

http://r-pbd.org/tutorial

GBD: a Way to Distribute Your Data

Across 4 Processors: Local Load Balance


Processors

0
x1,3
1
x2,3

2
x3,3

3
x4,3

x5,3

x6,3

x7,3

x8,3

x9,3
x10,3 103

ppbbddR
R Core Team

Programming with Big Data in R

61/131

GBD

GBD: a Way to Distribute Your Data

The GBD Data Structure


Throughout the examples, we will make use of the Generalized Block Distribution, or GBD
distributed matrix structure.
1

GBD is distributed. No processor owns all the data.

GBD is non-overlapping. Rows uniquely assigned to


processors.

GBD is row-contiguous. If a processor owns one element


of a row, it owns the entire row.

GBD is globally row-major, locally column-major.

GBD is often locally balanced, where each processor


owns (almost) the same amount of data. But this is
not required.

The last row of the local storage of a processor is adjacent (by global row) to the first row
of the local storage of next processor (by communicator number) that owns data.

GBD is (relatively) easy to understand, but can lead to bottlenecks if you have many more
columns than rows.

http://r-pbd.org/tutorial

ppbbddR
R Core Team

x1,1
x2,1
x3,1
x4,1
x5,1
x6,1
x7,1
x8,1
x9,1
x10,1

Programming with Big Data in R

x1,2
x2,2
x3,2
x4,2
x5,2
x6,2
x7,2
x8,2
x9,2
x10,2

x1,3
x2,3
x3,3
x4,3
x5,3
x6,3
x7,3
x8,3
x9,3
x10,3

62/131

GBD

Example GBD Distributions

The Generalized Block Distribution


GBD: a Way to Distribute Your Data
Example GBD Distributions
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

GBD

Example GBD Distributions

Understanding GBD: Global Matrix

x =

x11
x21
x31
x41
x51
x61
x71
x81
x91

x12
x22
x32
x42
x52
x62
x72
x82
x92

x13
x23
x33
x43
x53
x63
x73
x83
x93

x14
x24
x34
x44
x54
x64
x74
x84
x94

x15
x25
x35
x45
x55
x65
x75
x85
x95

x16
x26
x36
x46
x56
x66
x76
x86
x96

x17
x27
x37
x47
x57
x67
x77
x87
x97

x18
x28
x38
x48
x58
x68
x78
x88
x98

x19
x29
x39
x49
x59
x69
x79
x89
x99

99

Processors = 0 1 2 3 4 5

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

63/131

GBD

Example GBD Distributions

Understanding GBD: Load Balanced GBD

x =

x11
x21
x31
x41
x51
x61
x71
x81
x91

x12
x22
x32
x42
x52
x62
x72
x82
x92

x13
x23
x33
x43
x53
x63
x73
x83
x93

x14
x24
x34
x44
x54
x64
x74
x84
x94

x15
x25
x35
x45
x55
x65
x75
x85
x95

x16
x26
x36
x46
x56
x66
x76
x86
x96

x17
x27
x37
x47
x57
x67
x77
x87
x97

x18
x28
x38
x48
x58
x68
x78
x88
x98

x19
x29
x39
x49
x59
x69
x79
x89
x99

99

Processors = 0 1 2 3 4 5

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

64/131

GBD

Example GBD Distributions

Understanding GBD: Local View




x11 x12 x13 x14 x15 x16 x17 x18 x19


x21 x22 x23 x24 x25 x26 x27 x28 x29
x31 x32 x33 x34 x35 x36 x37 x38 x39
x41 x42 x43 x44 x45 x46 x47 x48 x49
x51 x52 x53 x54 x55 x56 x57 x58 x59
x61 x62 x63 x64 x65 x66 x67 x68 x69




x71 x72 x73 x74 x75 x76 x77 x78 x79


x81 x82 x83 x84 x85 x86 x87 x88 x89
x91 x92 x93 x94 x95 x96 x97 x98 x99

29

29

29

19

19

19

Processors = 0 1 2 3 4 5

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

65/131

GBD

Example GBD Distributions

Understanding GBD: Non-Balanced GBD

x =

x11
x21
x31
x41
x51
x61
x71
x81
x91

x12
x22
x32
x42
x52
x62
x72
x82
x92

x13
x23
x33
x43
x53
x63
x73
x83
x93

x14
x24
x34
x44
x54
x64
x74
x84
x94

x15
x25
x35
x45
x55
x65
x75
x85
x95

x16
x26
x36
x46
x56
x66
x76
x86
x96

x17
x27
x37
x47
x57
x67
x77
x87
x97

x18
x28
x38
x48
x58
x68
x78
x88
x98

x19
x29
x39
x49
x59
x69
x79
x89
x99

99

Processors = 0 1 2 3 4 5

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

66/131

GBD

Example GBD Distributions

Understanding GBD: Local View




09

x11
x21

x31
x41

x51
x61

x71



x12
x22
x32
x42

x13
x23
x33
x43

x14
x24
x34
x44

x15
x25
x35
x45

x16
x26
x36
x46

x17
x27
x37
x47

x18
x28
x38
x48

x52 x53 x54 x55 x56 x57 x58


x62 x63 x64 x65 x66 x67 x68
x72 x73 x74 x75 x76 x77 x78

x19
x29

x39
x49 49

x59
x69 29

x79 19

09

x81 x82 x83 x84 x85 x86 x87 x88 x89


x91 x92 x93 x94 x95 x96 x97 x98 x99

29

Processors = 0 1 2 3 4 5
http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

67/131

GBD

Summary

The Generalized Block Distribution


GBD: a Way to Distribute Your Data
Example GBD Distributions
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

GBD

Summary

Summary
Need to distribute your data? Try splitting by row.
May not work well if your data is square (or longer than tall).

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

68/131

Basic Statistics Examples

Contents

Basic Statistics Examples


pbdMPI Example: Monte Carlo Simulation
pbdMPI Example: Sample Covariance
pbdMPI Example: Linear Regression
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Basic Statistics Examples

pbdMPI Example: Monte Carlo Simulation

Basic Statistics Examples


pbdMPI Example: Monte Carlo Simulation
pbdMPI Example: Sample Covariance
pbdMPI Example: Linear Regression
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Basic Statistics Examples

pbdMPI Example: Monte Carlo Simulation

Example 1: Monte Carlo Simulation


Sample N uniform observations (xi , yi ) in the unit square [0, 1] [0, 1].
Then




# Blue
# Inside Circle
=4
4
# Total
# Blue + # Red
1.0

0.8

0.6

0.4

0.2

0.0

0.0

http://r-pbd.org/tutorial

0.2

0.4

ppbbddR
R Core Team

0.6

0.8

1.0

Programming with Big Data in R

69/131

Basic Statistics Examples

pbdMPI Example: Monte Carlo Simulation

Example 1: Monte Carlo Simulation GBD Algorithm


1

Let n be big-ish; well take n = 50, 000.

Generate an n 2 matrix x of standard uniform observations.

Count the number of rows satisfying x 2 + y 2 1

Ask everyone else what their answer is; sum it all up.

Take this new answer, multiply by 4 and divide by n

If my rank is 0, print the result.

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

70/131

Basic Statistics Examples

pbdMPI Example: Monte Carlo Simulation

Example 1: Monte Carlo Simulation Code


Serial Code
1
2
3
4
5

N <- 50000
X <- matrix ( runif ( N * 2) , ncol =2)
r <- sum ( rowSums ( X ^2) <= 1)
PI <- 4 * r / N
print ( PI )

Parallel Code
1
2
3

library ( pbdMPI , quiet = TRUE )


init ()
comm . set . seed ( seed =1234567 , diff = TRUE )

4
5
6
7
8
9
10

N . gbd <- 50000 / comm . size ()


X . gbd <- matrix ( runif ( N . gbd * 2) , ncol = 2)
r . gbd <- sum ( rowSums ( X . gbd ^2) <= 1)
r <- allreduce ( r . gbd )
PI <- 4 * r / ( N . gbd * comm . size () )
comm . print ( PI )

11
12

finalize ()

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

71/131

Basic Statistics Examples

pbdMPI Example: Monte Carlo Simulation

Note
For the remainder, we will exclude loading, init, and finalize calls.

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

72/131

Basic Statistics Examples

pbdMPI Example: Sample Covariance

Basic Statistics Examples


pbdMPI Example: Monte Carlo Simulation
pbdMPI Example: Sample Covariance
pbdMPI Example: Linear Regression
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Basic Statistics Examples

pbdMPI Example: Sample Covariance

Example 2: Sample Covariance


n

cov (xnp ) =

1 X
(xi x ) (xi x )T
n1
i=1

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

73/131

Basic Statistics Examples

pbdMPI Example: Sample Covariance

Example 2: Sample Covariance GBD Algorithm


1

Determine the total number of rows N.

Compute the vector of column means of the full matrix.

Subtract each columns mean from that columns entries in each local
matrix.

Compute the crossproduct locally and reduce.

Divide by N 1.

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

74/131

Basic Statistics Examples

pbdMPI Example: Sample Covariance

Example 2: Sample Covariance Code


Serial Code
1
2

N <- nrow ( X )
mu <- colSums ( X ) / N

3
4
5

X <- sweep (X , STATS = mu , MARGIN =2)


Cov . X <- crossprod ( X ) / (N -1)

6
7

print ( Cov . X )

Parallel Code
1
2

N <- allreduce ( nrow ( X . gbd ) , op = " sum " )


mu <- allreduce ( colSums ( X . gbd ) / N , op = " sum " )

3
4
5

X . gbd <- sweep ( X . gbd , STATS = mu , MARGIN =2)


Cov . X <- allreduce ( crossprod ( X . gbd ) , op = " sum " ) / (N -1)

6
7

comm . print ( Cov . X )

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

75/131

Basic Statistics Examples

pbdMPI Example: Linear Regression

Basic Statistics Examples


pbdMPI Example: Monte Carlo Simulation
pbdMPI Example: Sample Covariance
pbdMPI Example: Linear Regression
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Basic Statistics Examples

pbdMPI Example: Linear Regression

Example 3: Linear Regression


Find such that
y = X + 
When X is full rank,
= (X T X )1 X T y

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

76/131

Basic Statistics Examples

pbdMPI Example: Linear Regression

Example 3: Linear Regression GBD Algorithm


1

Locally, compute tx = x T

Locally, compute A = tx x. Query every other processor for this


result and sum up all the results.

Locally, compute B = tx y . Query every other processor for this


result and sum up all the results.

Locally, compute A1 B

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

77/131

Basic Statistics Examples

pbdMPI Example: Linear Regression

Example 3: Linear Regression Code


Serial Code
1
2
3

tX <- t ( X )
A <- tX % * % X
B <- tX % * % y

4
5

ols <- solve ( A ) % * % B

Parallel Code
1
2
3

tX . gbd <- t ( X . gbd )


A <- allreduce ( tX . gbd % * % X . gbd , op = " sum " )
B <- allreduce ( tX . gbd % * % y . gbd , op = " sum " )

4
5

ols <- solve ( A ) % * % B

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

78/131

Basic Statistics Examples

Summary

Basic Statistics Examples


pbdMPI Example: Monte Carlo Simulation
pbdMPI Example: Sample Covariance
pbdMPI Example: Linear Regression
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Basic Statistics Examples

Summary

Summary
SPMD programming is (often) a natural extension of serial
programming.
More pbdMPI examples in pbdDEMO.

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

79/131

Data Input

Contents

Data Input
Cluster Computer and File System
Serial Data Input
Parallel Data Input
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Data Input

Cluster Computer and File System

Data Input
Cluster Computer and File System
Serial Data Input
Parallel Data Input
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Data Input

Cluster Computer and File System

One Node to One Storage Server


Cluster computer

File System
Storage
Server

Disk

Compute Nodes

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

80/131

Data Input

Cluster Computer and File System

Many Nodes to One Storage Server


Cluster computer

File System
Storage
Server

Disk

Compute Nodes

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

81/131

Data Input

Cluster Computer and File System

Many Nodes to Few Storage Servers


Cluster computer

Compute Nodes

http://r-pbd.org/tutorial

Parallel
File System

Storage
Servers

ppbbddR
R Core Team

Disk

Programming with Big Data in R

82/131

Data Input

Cluster Computer and File System

Few Nodes to Few Storage Servers - Default


Cluster computer

Compute Nodes

http://r-pbd.org/tutorial

I/O Nodes

ppbbddR
R Core Team

Parallel
File System

Storage
Servers

Disk

Programming with Big Data in R

83/131

Data Input

Cluster Computer and File System

Few Nodes to Few Storage Servers - Coordinated


Cluster computer

Compute Nodes
pbdADIOS

http://r-pbd.org/tutorial

I/O Nodes

Parallel
File System

Storage
Servers

Disk

ADIOS

ppbbddR
R Core Team

Programming with Big Data in R

84/131

Data Input

Serial Data Input

Data Input
Cluster Computer and File System
Serial Data Input
Parallel Data Input
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Data Input

Serial Data Input

Separate manual: http://r-project.org/


scan()
read.table()
read.csv()
socket

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

85/131

Data Input

Serial Data Input

CSV Data: Read Serial then Distribute


Listing:
1
2
3
4
5
6

library ( pbdDMAT )
if ( comm . rank () == 0) { # only read on process 0
x <- read . csv ( " myfile . csv " )
} else {
x <- NULL
}

7
8

dx <- as . ddmatrix ( x )

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

86/131

Data Input

Parallel Data Input

Data Input
Cluster Computer and File System
Serial Data Input
Parallel Data Input
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Data Input

Parallel Data Input

New Issues
How to read in parallel?
CSV, SQL, NetCDF4, HDF, ADIOS, custom binary
How to partition data across nodes?
How to structure for scalable libraries?
Read directly into form needed or restructure?
...
A lot of work needed here!

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

87/131

Data Input

Parallel Data Input

CSV Data
Serial Code
1

x <- read . csv ( " x . csv " )

2
3

Parallel Code
1
2

library ( pbdDEMO , quiet = TRUE )


init . grid ()

3
4
5

dx <- read . csv . ddmatrix ( " x . csv " , header = TRUE , sep = " ," ,
nrows =10 , ncols =10 , num . rdrs =2 , ICTXT =0)

6
7

dx

8
9

finalize ()

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

88/131

Data Input

Parallel Data Input

Binary Data: Vector

1
2

# # set up start and length for reading a vector of n doubles


size <- 8 # bytes

3
4

my _ ids <- get . jid (n , method = " block " )

5
6
7

my _ start <- ( my _ ids [1] - 1) * size


my _ length <- length ( my _ ids )

8
9
10
11

con <- file ( " binary . vector . file " , " rb " )
seekval <- seek ( con , where = my _ start , rw = " read " )
x <- readBin ( con , what = " double " , n = my _ length , size = size )

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

89/131

Data Input

Parallel Data Input

Binary Data: Matrix

1
2

# # read an nrow by ncol matrix of doubles split by columns


size <- 8 # bytes

3
4
5
6
7

my _ ids <- get . jid ( ncol , method = " block " )


my _ ncol <- length ( my _ ids )
my _ start <- ( my _ ids [1] - 1) *nrow* size
my _ length <- my _ ncol *nrow

8
9
10
11

con <- file ( " binary . matrix . file " , " rb " )
seekval <- seek ( con , where = my _ start , rw = " read " )
x <- readBin ( con , what = " double " , n = my _ length , size = size )

12
13
14
15
16
17
18

# # glue together as a column - block ddmatrix


gdim <- c ( nrow , ncol )
ldim <- c ( nrow , my _ ncol )
bldim <- c ( nrow , allreduce ( my _ ncol , op = " max " ) )
X <- new ( " ddmatrix " , Data = matrix (x , nrow , my _ ncol ) ,
dim = gdim , ldim = ldim , bldim = bldim , ICTXT =1)

19
20
21
22

# # redistribute for ScaLAPACK s block - cyclic


X <- redistribute (X , bldim = c (2 , 2) , ICTXT =0)
Xprc <- prcomp ( X )

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

90/131

Data Input

Parallel Data Input

NetCDF4 Data
1
2

# ## parallel read after determining start and length


nc <- nc _ open _ par ( file _ name )

3
4
5
6

nc _ var _ par _ access ( nc , " variable _ name " )


new . X <- ncvar _ get ( nc , " variable _ name " , start , length )
nc _ close ( nc )

7
8

finalize ()

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

91/131

Data Input

Summary

Data Input
Cluster Computer and File System
Serial Data Input
Parallel Data Input
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Data Input

Summary

Summary
Mostly do it yourself
Parallel file system for big data
Binary files for true parallel reads
Know number of readers vs number of storage servers

Redistribution help from ddmatrix functions


More help under development

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

92/131

Introduction to pbdDMAT and the ddmatrix Structure

Contents

Introduction to pbdDMAT and the ddmatrix Structure


Introduction to Distributed Matrices
pbdDMAT
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Introduction to pbdDMAT and the ddmatrix Structure

Introduction to Distributed Matrices

Introduction to pbdDMAT and the ddmatrix Structure


Introduction to Distributed Matrices
pbdDMAT
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Introduction to pbdDMAT and the ddmatrix Structure

Introduction to Distributed Matrices

Distributed Matrices
You can only get so far with one node. . .

1000

Seconds Elapsed time

Script

100

Inverse of a 10000x10000 random matrix

Linear regression over a 10000x10000 matrix (c = a \\ b')

10000x10000 crossproduct matrix (b = a' * a)

Determinant of a 10000x10000 random matrix

Cholesky decomposition of a 10000x10000 matrix

Eigenvalues of a 10000x10000 random matrix

10

16

64

256

Cores

The solution is to distribute the data.


http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

93/131

Introduction to pbdDMAT and the ddmatrix Structure

Introduction to Distributed Matrices

Distributed Matrices

(a) Block

(b) Cyclic

(c) Block-Cyclic

Figure: Matrix Distribution Schemes

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

94/131

Introduction to pbdDMAT and the ddmatrix Structure

Introduction to Distributed Matrices

Distributed Matrices

(a) 2d Block

(b) 2d Cyclic

(c) 2d Block-Cyclic

Figure: Matrix Distribution Schemes Onto a 2-Dimensional Grid

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

95/131

Introduction to pbdDMAT and the ddmatrix Structure

Processor Grid Shapes


T

0
1


4
0
3
5
(a) 1 6

1
4

2
5

(b) 2 3

Introduction to Distributed Matrices

0
2
4

1
3
5

(c) 3 2

0
1
2
3
4
5

(d) 6 1

Table: Processor Grid Shapes with 6 Processors

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

96/131

Introduction to pbdDMAT and the ddmatrix Structure

Introduction to Distributed Matrices

The ddmatrix Class


For distributed dense matrix objects, we use the special S4 class
ddmatrix.

Data

dim
ddmatrix = ldim

bldim

ICTXT

http://r-pbd.org/tutorial

The local submatrix (an R matrix)


Dimension of the global matrix
Dimension of the local submatrix
Dimension of the blocks
MPI Grid Context

ppbbddR
R Core Team

Programming with Big Data in R

97/131

Introduction to pbdDMAT and the ddmatrix Structure

Introduction to Distributed Matrices

Understanding ddmatrix: Global Matrix

x =

http://r-pbd.org/tutorial

x11
x21
x31
x41
x51
x61
x71
x81
x91

x12
x22
x32
x42
x52
x62
x72
x82
x92

x13
x23
x33
x43
x53
x63
x73
x83
x93

x14
x24
x34
x44
x54
x64
x74
x84
x94

x15
x25
x35
x45
x55
x65
x75
x85
x95

ppbbddR
R Core Team

x16
x26
x36
x46
x56
x66
x76
x86
x96

x17
x27
x37
x47
x57
x67
x77
x87
x97

x18
x28
x38
x48
x58
x68
x78
x88
x98

x19
x29
x39
x49
x59
x69
x79
x89
x99

Programming with Big Data in R

99

98/131

Introduction to pbdDMAT and the ddmatrix Structure

Introduction to Distributed Matrices

ddmatrix: 1-dimensional Row Block

x =

http://r-pbd.org/tutorial

x11
x21
x31
x41
x51
x61
x71
x81
x91

x12
x22
x32
x42
x52
x62
x72
x82
x92

x13
x23
x33
x43
x53
x63
x73
x83
x93

x14
x24
x34
x44
x54
x64
x74
x84
x94

x15
x25
x35
x45
x55
x65
x75
x85
x95

0

1
Processor grid =
2
3
ppbbddR
R Core Team

x16 x17 x18


x26 x27 x28
x36 x37 x38
x46 x47 x48
x56 x57 x58
x66 x67 x68
x76 x77 x78
x86 x87 x88
x96 x97 x98


(0,0)


(1,0)
=

(2,0)


(3,0)

x19
x29
x39
x49
x59
x69
x79
x89
x99

Programming with Big Data in R

99

99/131

Introduction to pbdDMAT and the ddmatrix Structure

Introduction to Distributed Matrices

ddmatrix: 2-dimensional Row Block

x =

x11
x21
x31
x41
x51
x61
x71
x81
x91

x12
x22
x32
x42
x52
x62
x72
x82
x92

x13
x23
x33
x43
x53
x63
x73
x83
x93

x14
x24
x34
x44
x54
x64
x74
x84
x94

x15
x25
x35
x45
x55
x65
x75
x85
x95


0 1
Processor grid =
2 3
http://r-pbd.org/tutorial

ppbbddR
R Core Team

x16
x26
x36
x46
x56
x66
x76
x86
x96

x17
x27
x37
x47
x57
x67
x77
x87
x97


(0,0)
=
(1,0)

x18
x28
x38
x48
x58
x68
x78
x88
x98

x19
x29
x39
x49
x59
x69
x79
x89
x99

99


(0,1)
(1,1)

Programming with Big Data in R

100/131

Introduction to pbdDMAT and the ddmatrix Structure

Introduction to Distributed Matrices

ddmatrix: 1-dimensional Row Cyclic

x =

http://r-pbd.org/tutorial

x11
x21
x31
x41
x51
x61
x71
x81
x91

x12
x22
x32
x42
x52
x62
x72
x82
x92

x13
x23
x33
x43
x53
x63
x73
x83
x93

x14
x24
x34
x44
x54
x64
x74
x84
x94

x15
x25
x35
x45
x55
x65
x75
x85
x95

0

1
Processor grid =
2
3
ppbbddR
R Core Team

x16 x17 x18


x26 x27 x28
x36 x37 x38
x46 x47 x48
x56 x57 x58
x66 x67 x68
x76 x77 x78
x86 x87 x88
x96 x97 x98


(0,0)


(1,0)
=

(2,0)


(3,0)

x19
x29
x39
x49
x59
x69
x79
x89
x99

Programming with Big Data in R

99

101/131

Introduction to pbdDMAT and the ddmatrix Structure

Introduction to Distributed Matrices

ddmatrix: 2-dimensional Row Cyclic

x =

x11
x21
x31
x41
x51
x61
x71
x81
x91

x12
x22
x32
x42
x52
x62
x72
x82
x92

x13
x23
x33
x43
x53
x63
x73
x83
x93

x14
x24
x34
x44
x54
x64
x74
x84
x94

x15
x25
x35
x45
x55
x65
x75
x85
x95


0 1
Processor grid =
2 3
http://r-pbd.org/tutorial

ppbbddR
R Core Team

x16
x26
x36
x46
x56
x66
x76
x86
x96

x17
x27
x37
x47
x57
x67
x77
x87
x97


(0,0)
=
(1,0)

x18
x28
x38
x48
x58
x68
x78
x88
x98

x19
x29
x39
x49
x59
x69
x79
x89
x99

99


(0,1)
(1,1)

Programming with Big Data in R

102/131

Introduction to pbdDMAT and the ddmatrix Structure

Introduction to Distributed Matrices

ddmatrix: 2-dimensional Block-Cyclic

x =

x11
x21
x31
x41
x51
x61
x71
x81
x91

x12
x22
x32
x42
x52
x62
x72
x82
x92

x13
x23
x33
x43
x53
x63
x73
x83
x93

x14
x24
x34
x44
x54
x64
x74
x84
x94

x15
x25
x35
x45
x55
x65
x75
x85
x95


0 1
Processor grid =
2 3
http://r-pbd.org/tutorial

ppbbddR
R Core Team

x16
x26
x36
x46
x56
x66
x76
x86
x96

x17
x27
x37
x47
x57
x67
x77
x87
x97


(0,0)
=
(1,0)

x18
x28
x38
x48
x58
x68
x78
x88
x98

x19
x29
x39
x49
x59
x69
x79
x89
x99

99


(0,1)
(1,1)

Programming with Big Data in R

103/131

Introduction to pbdDMAT and the ddmatrix Structure

pbdDMAT

Introduction to pbdDMAT and the ddmatrix Structure


Introduction to Distributed Matrices
pbdDMAT
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Introduction to pbdDMAT and the ddmatrix Structure

pbdDMAT

The ddmatrix Data Structure


The more complicated the processor grid, the more complicated the
distribution.

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

104/131

Introduction to pbdDMAT and the ddmatrix Structure

pbdDMAT

ddmatrix: 2-dimensional Block-Cyclic with 6 Processors

x =

x11
x21
x31
x41
x51
x61
x71
x81
x91

x12
x22
x32
x42
x52
x62
x72
x82
x92

x13
x23
x33
x43
x53
x63
x73
x83
x93

x14
x24
x34
x44
x54
x64
x74
x84
x94

x15
x25
x35
x45
x55
x65
x75
x85
x95

x16
x26
x36
x46
x56
x66
x76
x86
x96

x17
x27
x37
x47
x57
x67
x77
x87
x97



0 1 2 (0,0)

=
Processor grid =
3 4 5 (1,0)
http://r-pbd.org/tutorial

ppbbddR
R Core Team

x18
x28
x38
x48
x58
x68
x78
x88
x98

x19
x29
x39
x49
x59
x69
x79
x89
x99

(0,1)
(1,1)

Programming with Big Data in R

99


(0,2)
(1,2)
105/131

Introduction to pbdDMAT and the ddmatrix Structure

pbdDMAT

Understanding ddmatrix: Local View

x11
x21
x51
x61
x91

x12
x22
x52
x62
x92

x17
x27
x57
x67
x97

x31
x41

x71
x81

x32
x42
x72
x82

x37
x47
x77
x87

x18
x28
x58
x68
x98

54

x38

x48

x78
x88 44

x13
x23
x53
x63
x93

x14
x24
x54
x64
x94

x33
x43
x73
x83

x34
x44
x74
x84

x19
x29
x59
x69
x99

ppbbddR
R Core Team

53

x39

x49

x79
x89 43



0 1 2 (0,0)

=
Processor grid =
3 4 5 (1,0)
http://r-pbd.org/tutorial

x15
x25
x55
x65
x95
x35
x45
x75
x85

(0,1)
(1,1)

Programming with Big Data in R

x16
x26
x56
x66
x96

52

x36
x46

x76
x86 42


(0,2)
(1,2)
106/131

Introduction to pbdDMAT and the ddmatrix Structure

pbdDMAT

The ddmatrix Data Structure


1

ddmatrix is distributed. No one processor owns all


of the matrix.
ddmatrix is non-overlapping. Any piece owned by
one processor is owned by no other processors.
ddmatrix can be row-contiguous or not,
depending on the processor grid and blocking
factor used.

x11
x21
x31
x41
x51
x61
x71
x81
x91

x12
x22
x32
x42
x52
x62
x72
x82
x92

x13
x23
x33
x43
x53
x63
x73
x83
x93

ddmatrix is locally column-major and globally, it


depends. . .

GBD is a generalization of the one-dimensional block ddmatrix distribution.


Otherwise there is no relation.

ddmatrix is confusing, but very robust.

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

x14
x24
x34
x44
x54
x64
x74
x84
x94

x15
x25
x35
x45
x55
x65
x75
x85
x95

107/131

Introduction to pbdDMAT and the ddmatrix Structure

pbdDMAT

Pros and Cons of This Data Structure


Pros

Cons
Confusing layout.

Robust for matrix


computations.

This is why we hide most of the distributed details.


The details are there if you want them (you dont want them).

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

108/131

Introduction to pbdDMAT and the ddmatrix Structure

pbdDMAT

Methods for class ddmatrix


pbdDMAT has over 100 methods with identical syntax to R:
`[`, rbind(), cbind(), . . .
lm.fit(), prcomp(), cov(), . . .
`%*%`, solve(), svd(), norm(), . . .
median(), mean(), rowSums(), . . .
Serial Code
1

cov ( x )

Parallel Code
1

cov ( x )

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

109/131

Introduction to pbdDMAT and the ddmatrix Structure

pbdDMAT

Comparing pbdMPI and pbdDMAT


pbdMPI:
MPI + sugar.
GBD not the only structure pbdMPI can handle (just a useful
convention).
pbdDMAT:
Distributed matrices + statistics.
The ddmatrix structure must be used for pbdDMAT.
If the data is not 2d block-cyclic compatible, ddmatrix will definitely
give the wrong answer.

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

110/131

Introduction to pbdDMAT and the ddmatrix Structure

Summary

Introduction to pbdDMAT and the ddmatrix Structure


Introduction to Distributed Matrices
pbdDMAT
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Introduction to pbdDMAT and the ddmatrix Structure

Summary

Summary
1

Start by loading the package:


1

library ( pbdDMAT , quiet = TRUE )

Always initialize before starting and finalize when finished:


1

init . grid ()

2
3

# ...

4
5

finalize ()

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

111/131

Examples Using pbdDMAT

Contents

Examples Using pbdDMAT


RandSVD
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Examples Using pbdDMAT

RandSVD

Examples Using pbdDMAT


RandSVD
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Examples Using pbdDMAT

RandSVD

Randomized
SVD1
PROBABILISTIC ALGORITHMS FOR MATRIX APPROXIMATION

227

Prototype for Randomized SVD


Given an m n matrix A, a target number k of singular vectors, and an
exponent q (say, q = 1 or q = 2), this procedure computes an approximate
rank-2k factorization U V , where U and V are orthonormal, and is
nonnegative and diagonal.
Stage A:
244 1 Generate an n N.
HALKO, P. G. MARTINSSON, AND J. A. TROPP
2k
Gaussian test matrix .
2
Form Y = (AA )q A by multiplying alternately with A and A .
Algorithm
Randomized
Power
Iterationbasis for
3
Construct a matrix Q4.3:
whose
columns form
an orthonormal
Given
m ofn Ymatrix
A and integers  and q, this algorithm computes an
thean
range
.
m


orthonormal
matrix
Q
whose
range
approximates
the range of A.
Stage B:
Draw B
an=n Q
A.
Gaussian random matrix .
41 Form
2
Form the m  matrix Y = (AA )q A via alternating
application

5
Compute anSVD of the small matrix: B = U V .
of A and A .
6
Set U = QU
.
3
Construct an m  matrix Q whose columns form an orthonormal
Note: The computation of Y in step 2 is vulnerable to round-o errors.
basis for the range of Y , e.g., via the QR factorization Y = QR.
When high accuracy is required, we must incorporate an orthonormalization
Note: This procedure is vulnerable to round-o errors; see Remark 4.3. The
step between each application of A and A ; see Algorithm 4.4.
recommended implementation appears as Algorithm 4.4.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
The theoryAlgorithm
developed in4.4:
thisRandomized
paper provides
much
more
detailed
information
Subspace Iteration
15
about
the performance
of theAproto-algorithm.
Given
an m n matrix
and integers  and q, this algorithm computes an 16

the singular
values
of A decay
the error
A of
QQ
mWhen
 orthonormal
matrix
Q whose
range slightly,
approximates
the range
A.A does17
depend
on the
dimensions
of the
matrix
1 not
Draw
an n
standard
Gaussian
matrix
.(sections 10.210.3).
canY0reduce
of the bracket
in the error bound
by combining18
2 We
Form
= Athe
andsize
compute
its QR factorization
Y0 = Q(1.8)
0 R0 .
3 the
for proto-algorithm
j = 1, 2, . . . , q with a power iteration (section 10.4). For an example,19
20
below.


 
4 see section
Form Y1.6
j = A Qj1 and compute its QR factorization Yj = Qj Rj .
For the
structured
matrices we mentioned in section 1.4.1, related21
random
5
Form
Yj = AQ
j and compute its QR factorization Yj = Qj Rj .
error
22
6
end bounds are in force (section 11).
7

We can obtain inexpensive a posteriori error estimates to verify the quality


Q = Qq .
of the approximation (section 4.3).

Serial R
randSVD < f u n c t i o n (A , k , q=3)
{
## S t a g e A
Omega < matrix(rnorm(n*2*k),
nrow=n, ncol=2*k)
Y < A %% Omega
Q < q r .Q( q r (Y) )
At < t (A)
for ( i in 1: q)
{
Y < At %% Q
Q < q r .Q( q r (Y) )
Y < A %% Q
Q < q r .Q( q r (Y) )
}
## S t a g e B
B < t (Q) %% A
U < La . s v d (B) $u
U < Q %% U
U[ , 1: k ]
}

1 1.6. Example: Randomized SVD. We conclude this introduction with a short


Halko N, Martinsson P-G and Tropp J A 2011 Finding structure with randomness: probabilistic algorithms
discussion
of how
ideasthe
allow
us to perform
an approximate
of a large data
Algorithm
4.3these
targets
xed-rank
problem.
To address SVD
the xed-precision
approximate
decompositions
Rev. matrix
53 in21788
matrix, which
ismatrix
aincorporate
compelling
application
ofSIAM
randomized
approximation
[113].
problem,
we can
the
error estimators
described
section
4.3 to obtain
The
two-stage
randomized
method
oers
a
natural
approach
to
compuan adaptive scheme analogous with Algorithm 4.2. In situations where itSVD
is critical
to
tations. near-optimal
Unfortunately,
the simplesterrors,
version
this
schemethe
is oversampling
inadequate inbeyond
many
achieve
approximation
oneofcan
increase
applications
the singular
of the
matrix
decay slowly.
To
our
standardbecause
recommendation
 spectrum
= k + 5 all
the input
way to
 = may
2k without
changing
ppbbddR
http://r-pbd.org/tutorial
Team
with Big Data in R
address
thisofdiculty,
we incorporate
q stepscost.
of a Apower
iteration,
whereappears
q =Programming
1 or
R Core
the
scaling
the asymptotic
computational
supporting
analysis
in

for constructing

112/131

Examples Using pbdDMAT

RandSVD

Randomized SVD

Serial R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

Parallel pbdR

randSVD < f u n c t i o n (A , k , q=3)


{
## S t a g e A
Omega < matrix(rnorm(n*2*k),
nrow=n, ncol=2*k)
Y < A %% Omega
Q < q r .Q( q r (Y) )
At < t (A)
for ( i in 1: q)
{
Y < At %% Q
Q < q r .Q( q r (Y) )
Y < A %% Q
Q < q r .Q( q r (Y) )
}
## S t a g e B
B < t (Q) %% A
U < La . s v d (B) $u
U < Q %% U
U[ , 1: k ]
}

http://r-pbd.org/tutorial

ppbbddR
R Core Team

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

randSVD < f u n c t i o n (A , k , q=3)


{
## S t a g e A
Omega < ddmatrix(rnorm,
nrow=n, ncol=2*k)
Y < A %% Omega
Q < q r .Q( q r (Y) )
At < t (A)
for ( i in 1: q)
{
Y < At %% Q
Q < q r .Q( q r (Y) )
Y < A %% Q
Q < q r .Q( q r (Y) )
}
## S t a g e B
B < t (Q) %% A
U < La . s v d (B) $u
U < Q %% U
U[ , 1: k ]
}

Programming with Big Data in R

113/131

Examples Using pbdDMAT

RandSVD

Randomized SVD
30 Singular Vectors from a 100,000 by 1,000 Matrix
Algorithm

full

30 Singular Vectors from a 100,000 by 1,000 Matrix


Speedup of Randomized vs. Full SVD

randomized

128
15
64

Speedup

16

Speedup

32

10

2
1

16

32

64

128

Cores

http://r-pbd.org/tutorial

16

32

64

128

Cores

ppbbddR
R Core Team

Programming with Big Data in R

114/131

Examples Using pbdDMAT

Summary

Examples Using pbdDMAT


RandSVD
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Examples Using pbdDMAT

Summary

Summary
pbdDMAT makes distributed (dense) linear algebra easier.
Can enable rapid prototyping at large scale.

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

115/131

MPI Profiling

Contents

10

MPI Profiling
Profiling with the pbdPROF Package
Installing pbdPROF
Example
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

MPI Profiling

10

Profiling with the pbdPROF Package

MPI Profiling
Profiling with the pbdPROF Package
Installing pbdPROF
Example
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

MPI Profiling

Profiling with the pbdPROF Package

Introduction to pbdPROF
Successful Google Summer of Code 2013 project.
Available on the CRAN.
Enables profiling of MPI-using R scripts.
pbdR packages officially supported; can work with others. . .
Also reads, parses, and plots profiler outputs.

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

116/131

MPI Profiling

Profiling with the pbdPROF Package

How it works
MPI calls get hijacked by profiler and logged:

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

117/131

MPI Profiling

Profiling with the pbdPROF Package

Introduction to pbdPROF
Currently supports the profilers fpmpi and mpiP.
fpmpi is distributed with pbdPROF and installs easily, but offers
minimal profiling capabilities.
mpiP is fully supported also, but you have to install and link it
yourself.

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

118/131

MPI Profiling

10

Installing pbdPROF

MPI Profiling
Profiling with the pbdPROF Package
Installing pbdPROF
Example
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

MPI Profiling

Installing pbdPROF

Installing pbdPROF
1

Build pbdPROF.

Rebuild pbdMPI (linking with pbdPROF).

Run your analysis as usual.

Interactively analyze profiler outputs with pbdPROF.

This is explained at length in the pbdPROF vignette.

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

119/131

MPI Profiling

Installing pbdPROF

Rebuild pbdMPI
R CMD INSTALL pbdMPI _ 0.2 -2. tar . gz
-- configure - args = " -- enable - pbdPROF "

Any package which explicitly links with an MPI library must be rebuilt
in this way (pbdMPI, Rmpi, . . . ).
Other pbdR packages link with pbdMPI, and so do not need to be
rebuilt.
See pbdPROF vignette if something goes wrong.

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

120/131

MPI Profiling

10

Example

MPI Profiling
Profiling with the pbdPROF Package
Installing pbdPROF
Example
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

MPI Profiling

Example

An Example from pbdDMAT


Compute SVD in pbdDMAT package.
Profile MPI calls with mpiP.

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

121/131

MPI Profiling

Example

Example Script
my svd.r
1
2
3

library ( pbdMPI , quietly = TRUE )


library ( pbdDMAT , quietly = TRUE )
init . grid ()

4
5
6
7

n <- 1000
x <- ddmatrix ( " rnorm " , n , n )

8
9

my . svd <- La . svd ( x )

10
11
12

finalize ()

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

122/131

MPI Profiling

Example

Example Script
Run example with 4 ranks:
$ mpirun - np 4 Rscript my _ svd . r
mpiP :
mpiP : mpiP : mpiP V3 .3.0 ( Build Sep 23 2013 / 14:00:47)
mpiP : Direct questions and errors to
mpip - help@lists . sourceforge . net
mpiP :
Using 2 x2 for the default grid size
mpiP :
mpiP : Storing mpiP output in [. / R .4.5944.1. mpiP ].
mpiP :

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

123/131

MPI Profiling

Example

Read Profiler Data into R


Interactively (or in batch) Read in Profiler Data
1
2

library ( pbdPROF )
prof . data <- read . prof ( " R .4.28812.1. mpiP " )

Partial Output of Example Data


> prof . data
An mpip profiler object :
[[1]]
Task AppTime MPITime MPI .
1
0
5.71 0.0387 0.68
2
1
5.70 0.0297 0.52
3
2
5.71 0.0540 0.95
4
3
5.71 0.0355 0.62
5
*
22.80 0.1580 0.69
[[2]]
ID Lev File . Address Line _ Parent _ Funct MPI _ Call
1
1
0 1.397301 e +14
[ unknown ] Allreduce
2
2
0 1.397301 e +14
[ unknown ]
Bcast

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

124/131

MPI Profiling

Example

Generate plots
1

plot ( prof . data )

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

125/131

MPI Profiling

Example

Generate plots
1

plot ( prof . data , plot . type = " stats1 " )

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

126/131

MPI Profiling

Example

Generate plots
1

plot ( prof . data , plot . type = " stats2 " )

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

127/131

MPI Profiling

Example

Generate plots
1

plot ( prof . data , plot . type = " messages1 " )

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

128/131

MPI Profiling

Example

Generate plots
1

plot ( prof . data , plot . type = " messages2 " )

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

129/131

MPI Profiling

10

Summary

MPI Profiling
Profiling with the pbdPROF Package
Installing pbdPROF
Example
Summary

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

MPI Profiling

Summary

Summary
pbdPROF offers tools for profiling R-using MPI codes.
Easily builds fpmpi; also supports mpiP.

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

130/131

Wrapup

Contents

11

Wrapup

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Wrapup

Summary
Profile your code to understand your bottlenecks.
pbdR makes distributed parallelism with R easier.
Distributing data to multiple nodes
For truly large data, I/O must be parallel as well.

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Wrapup

The pbdR Project


Our website: http://r-pbd.org/
Email us at: RBigData@gmail.com
Our google group: http://group.r-pbd.org/
Where to begin?
The pbdDEMO package
http://cran.r-project.org/web/packages/pbdDEMO/
The pbdDEMO Vignette: http://goo.gl/HZkRt

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R

Thanks for coming!

Questions?

http://r-pbd.org/
Come see our poster on Wednesday at 5:30!

http://r-pbd.org/tutorial

ppbbddR
R Core Team

Programming with Big Data in R