Académique Documents
Professionnel Documents
Culture Documents
Roman Leshchinskiy
What I do
Data Parallel Haskell
compiles nested data-parallel programs to flat data-parallel ones
lots of arrays and collective operations involved
What is this about?
What I do
Data Parallel Haskell
compiles nested data-parallel programs to flat data-parallel ones
lots of arrays and collective operations involved
zipWith (-)
(zipWith (*)
(zipWith (-) (replicate_s segd as1) xs)
(zipWith (-) (replicate_s segd bs1) ys))
(zipWith (*)
(zipWith (-) (replicate_s segd bs2) ys)
(zipWith (-) (replicate_s segd as2) xs))
What is this about?
What I do
Data Parallel Haskell
compiles nested data-parallel programs to flat data-parallel ones
lots of arrays and collective operations involved
What other people do
array programs with lots of collective operations
What I do
Data Parallel Haskell
compiles nested data-parallel programs to flat data-parallel ones
lots of arrays and collective operations involved
What other people do
array programs with lots of collective operations
What everybody wants
no temporary arrays
fused loops
C-like speed
Loop fusion is easy!
RULES
"map/map" map f (map g xs) = map (f . g) xs
Loop fusion is easy!
RULES
"map/map" map f (map g xs) = map (f . g) xs
Loop fusion is easy!
RULES
"map/map" map f (map g xs) = map (f . g) xs
"filter/filter" filter f (filter g xs)
= filter (λ x → f x && g x) xs
Loop fusion is easy!
RULES
"map/map" map f (map g xs) = map (f . g) xs
"filter/filter" filter f (filter g xs)
= filter (λ x → f x && g x) xs
Loop fusion is easy!
RULES
"map/map" map f (map g xs) = map (f . g) xs
"filter/filter" filter f (filter g xs)
= filter (λ x → f x && g x) xs
"map/filter" map f (filter g xs) = mapFilter f g xs
Loop fusion is easy!
RULES
"map/map" map f (map g xs) = map (f . g) xs
"filter/filter" filter f (filter g xs)
= filter (λ x → f x && g x) xs
"map/filter" map f (filter g xs) = mapFilter f g xs
"map/mapFilter" map f (mapFilter g h xs)
= mapFilter (f . g) h xs
"mapFilter/filter" mapFilter f g (filter h xs)
= mapFilter (f λ x → g x && h x) xs
...
Loop fusion is easy!
RULES
"map/map" map f (map g xs) = map (f . g) xs
"filter/filter" filter f (filter g xs)
"map/filter"
E A
= filter (λ x → f x && g x) xs
map f (filter g xs) = mapFilter f g xs
ID
"map/mapFilter" map f (mapFilter g h xs)
= mapFilter (f . g) h xs
A D
"mapFilter/filter" mapFilter f g (filter h xs)
... B
= mapFilter (f λ x → g x && h x) xs
The challenge
st
re
map :: (a → b) → Array a → Array b
e
th
map f = unstream . mapS f . stream
do
C
Step 2: inline them H
tG
sumsq :: Num a ⇒ Array a → a
sumsq = sum . map (λx -> x*x)
Le
inline
mapS :: (a → b) → Stream a → b
mapS f (Stream step s) = Stream step’ s
where step’ s = case step s of
Yield x s’ → Yield (f x) s’
Done → Done
Optimising stream operations
sumsq xs = go 0 0
where
step1 i = case i < length xs of
True → Yield (xs ! i) (i+1)
False → Done
go z i = case step2 i of
Yield x i’ → go (z+x) i’
Done → z
Optimising stream operations
sumsq xs = go 0 0
where
step1 i = case i < length xs of
True → Yield (xs ! i) (i+1)
False → Done
go z i = case step2 i of
Yield x i’ → go (z+x) i’
Done → z
inline
Optimising stream operations
sumsq xs = go 0 0
where
step1 i = case i < length xs of
True → Yield (xs ! i) (i+1)
False → Done
sumsq xs = go 0 0
where
step1 i = case i < length xs of
True → Yield (xs ! i) (i+1)
case of case
False → Done
sumsq xs = go 0 0
where
step1 i = case i < length xs of
True → Yield (xs ! i) (i+1)
False → Done
go z i = case step1 i of
Yield x i’ → go (z + square x) i’
Done → z
Optimising stream operations
sumsq xs = go 0 0
where
step1 i = case i < length xs of
True → Yield (xs ! i) (i+1)
False → Done
go z i = case step1 i of
Yield x i’ → go (z + square x) i’
Done → z
inline
Optimising stream operations
sumsq xs = go 0 0
where
go z i = case (case i < length xs of
True → Yield (xs ! i) (i+1)
False → Done) of
Yield x i’ → go (z + square x) i’
Done → z
Optimising stream operations
case of case
sumsq xs = go 0 0
where
go z i = case (case i < length xs of
True → Yield (xs ! i) (i+1)
False → Done) of
Yield x i’ → go (z + square x) i’
Done → z
Optimising stream operations
sumsq xs = go 0 0
where
go z i = case i < length xs of
True → go (z + square (xs ! i)) (i+1)
False → z
Optimising stream operations
sumsq xs = go 0 0
where
go z i = case i < length xs of
True → go (z + square (xs ! i)) (i+1)
False → z
optimal loop
no Stream or Step values ever created
only general-purpose optimisations
will be optimised further (unboxing etc.)
requires a great compiler (thanks GHC team!)
Why does it work?
sumsq xs = go 0 0
where
step1 i = case i < length xs of
True → Yield (xs ! i) (i+1)
False → Done
go z i = case step2 i of
Yield x i’ → go (z+x) i’
Done → z
Why does it work?
sumsq xs = go 0 0
where
step1 i = case i < length xs of
u r sive
True -r→ c Yield (xs ! i) (i+1)
n o n e
False → Done
go z i = case step2 i of
Yield x i’ → go (z+x) i’
Done → z
Why does it work?
sumsq xs = go 0 0
where
step1 i = case i < length xs of
u r sive
True -r→ c Yield (xs ! i) (i+1)
n o n e
False → Done
go z i = case step2 i of
Yield x i’ → go (z+x) i’
Done → z
Why does it work?
sumsq xs = go 0 0
where
step1 i = case i < length xs of
u r sive
True -r→ c Yield (xs ! i) (i+1)
n o n e
False → Done
go z i = case step2 i of
sive→ go (z+x) i’
Yield xcuri’
re
Done → z
A slight problem
RULES
splitD (joinD xs) = xs
Fusing distributed types
RULES
splitD (joinD xs) = xs
Fusing distributed types
RULES
splitD (joinD xs) = xs
mapD f (mapD g xs) = mapD (f . g) xs
Fusing distributed types
RULES
splitD (joinD xs) = xs
mapD f (mapD g xs) = mapD (f . g) xs
Fusing distributed types
RULES
splitD (joinD xs) = xs
mapD f (mapD g xs) = mapD (f . g) xs
Distributed types on multicores
splitD scatter
joinD gather
mapD execute operation on each node
Runtime @ greyarea
10000
1000
100
10
1
1 2 4 8 16 32 64