Vous êtes sur la page 1sur 21

Estimating pi example

scala> val count = sc.parallelize(1 to 10).map{i =>


| val x = Math.random()
| val y = Math.random()
| if(x*x +y*y < 1) 1 else 0
| }.reduce(_ + _)
14/07/06 14:02:48 INFO spark.SparkContext: Starting job: reduce at <console>:16
14/07/06 14:02:48 INFO scheduler.DAGScheduler: Got job 0 (reduce at <console>:16) with 1
output partitions (allowLocal=false)
14/07/06 14:02:48 INFO scheduler.DAGScheduler: Final stage: Stage 0(reduce at <console>:16)
14/07/06 14:02:48 INFO scheduler.DAGScheduler: Parents of final stage: List()
14/07/06 14:02:48 INFO scheduler.DAGScheduler: Missing parents: List()
14/07/06 14:02:48 INFO scheduler.DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map at
<console>:12), which has no missing parents
14/07/06 14:02:49 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from Stage 0
(MappedRDD[1] at map at <console>:12)
14/07/06 14:02:49 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
14/07/06 14:02:49 INFO scheduler.TaskSetManager: Starting task 0.0:0 as TID 0 on executor
localhost: localhost (PROCESS_LOCAL)
14/07/06 14:02:49 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as 1335 bytes in 13 ms
14/07/06 14:02:49 INFO executor.Executor: Running task ID 0
14/07/06 14:02:49 INFO executor.Executor: Serialized size of result for 0 is 675
14/07/06 14:02:49 INFO executor.Executor: Sending result for 0 directly to driver
14/07/06 14:02:49 INFO executor.Executor: Finished task ID 0
14/07/06 14:02:49 INFO scheduler.DAGScheduler: Completed ResultTask(0, 0)
14/07/06 14:02:49 INFO scheduler.DAGScheduler: Stage 0 (reduce at <console>:16) finished in
0.325 s
14/07/06 14:02:50 INFO scheduler.TaskSetManager: Finished TID 0 in 251 ms on localhost

(progress: 1/1)
14/07/06 14:02:50 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all
completed, from pool
14/07/06 14:02:50 INFO spark.SparkContext: Job finished: reduce at <console>:16, took
1.352754969 s
count: Int = 9

scala> println("Pi is roughly" + 4.0 * count/10)


Pi is roughly3.6

Example BroadcastTest
vipul@vipul:~/spark$ ./bin/run-example BroadcastTest
Spark assembly has been built with Hive, including Datanucleus jars on classpath
14/07/06 13:48:51 WARN util.Utils: Your hostname, vipul resolves to a loopback address:
127.0.1.1; using 10.0.2.15 instead (on interface eth1)
14/07/06 13:48:51 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
14/07/06 13:48:51 INFO spark.SecurityManager: Changing view acls to: vipul
14/07/06 13:48:51 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls
disabled; users with view permissions: Set(vipul)
14/07/06 13:48:52 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/07/06 13:48:52 INFO Remoting: Starting remoting
14/07/06 13:48:53 INFO Remoting: Remoting started; listening on addresses :
[akka.tcp://spark@vipul.local:42457]
14/07/06 13:48:53 INFO Remoting: Remoting now listens on addresses:
[akka.tcp://spark@vipul.local:42457]
14/07/06 13:48:53 INFO spark.SparkEnv: Registering MapOutputTracker
14/07/06 13:48:53 INFO spark.SparkEnv: Registering BlockManagerMaster
14/07/06 13:48:53 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-

20140706134853-1bb0
14/07/06 13:48:53 INFO storage.MemoryStore: MemoryStore started with capacity 297.0 MB.
14/07/06 13:48:53 INFO network.ConnectionManager: Bound socket to port 56860 with id =
ConnectionManagerId(vipul.local,56860)
14/07/06 13:48:53 INFO storage.BlockManagerMaster: Trying to register BlockManager
14/07/06 13:48:53 INFO storage.BlockManagerInfo: Registering block manager vipul.local:56860
with 297.0 MB RAM
14/07/06 13:48:53 INFO storage.BlockManagerMaster: Registered BlockManager
14/07/06 13:48:53 INFO spark.HttpServer: Starting HTTP Server
14/07/06 13:48:53 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/07/06 13:48:53 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:60352
14/07/06 13:48:53 INFO broadcast.HttpBroadcast: Broadcast server started at
http://10.0.2.15:60352
14/07/06 13:48:53 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-2b3e3e827b11-4d14-a64f-7062cac3bcb0
14/07/06 13:48:53 INFO spark.HttpServer: Starting HTTP Server
14/07/06 13:48:53 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/07/06 13:48:53 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:59283
14/07/06 13:48:59 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/07/06 13:48:59 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
14/07/06 13:48:59 INFO ui.SparkUI: Started SparkUI at http://vipul.local:4040
14/07/06 13:49:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
14/07/06 13:49:03 INFO spark.SparkContext: Added JAR file:/home/vipul/spark/lib/sparkexamples-1.0.0-hadoop2.2.0.jar at http://10.0.2.15:59283/jars/spark-examples-1.0.0-hadoop2.2.0.jar
with timestamp 1404634741559
Iteration 0
===========
14/07/06 13:49:04 INFO storage.MemoryStore: ensureFreeSpace(4000128) called with curMem=0,
maxMem=311387750

14/07/06 13:49:04 INFO storage.MemoryStore: Block broadcast_0 stored as values to memory


(estimated size 3.8 MB, free 293.1 MB)
14/07/06 13:49:04 INFO spark.SparkContext: Starting job: collect at BroadcastTest.scala:53
14/07/06 13:49:04 INFO scheduler.DAGScheduler: Got job 0 (collect at BroadcastTest.scala:53)
with 2 output partitions (allowLocal=false)
14/07/06 13:49:04 INFO scheduler.DAGScheduler: Final stage: Stage 0(collect at
BroadcastTest.scala:53)
14/07/06 13:49:04 INFO scheduler.DAGScheduler: Parents of final stage: List()
14/07/06 13:49:04 INFO scheduler.DAGScheduler: Missing parents: List()
14/07/06 13:49:04 INFO scheduler.DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map at
BroadcastTest.scala:51), which has no missing parents
14/07/06 13:49:04 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 0
(MappedRDD[1] at map at BroadcastTest.scala:51)
14/07/06 13:49:04 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
14/07/06 13:49:04 INFO scheduler.TaskSetManager: Starting task 0.0:0 as TID 0 on executor
localhost: localhost (PROCESS_LOCAL)
14/07/06 13:49:04 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as 1617 bytes in 3 ms
14/07/06 13:49:04 INFO executor.Executor: Running task ID 0
14/07/06 13:49:05 INFO executor.Executor: Fetching http://10.0.2.15:59283/jars/spark-examples1.0.0-hadoop2.2.0.jar with timestamp 1404634741559
14/07/06 13:49:05 INFO util.Utils: Fetching http://10.0.2.15:59283/jars/spark-examples-1.0.0hadoop2.2.0.jar to /tmp/fetchFileTemp513938117893672559.tmp
14/07/06 13:49:09 INFO executor.Executor: Adding file:/tmp/spark-dad582e4-2b8d-4ecb-bb1126b6fda3c9e6/spark-examples-1.0.0-hadoop2.2.0.jar to class loader
14/07/06 13:49:09 INFO storage.BlockManager: Found block broadcast_0 locally
14/07/06 13:49:10 INFO executor.Executor: Serialized size of result for 0 is 562
14/07/06 13:49:10 INFO executor.Executor: Sending result for 0 directly to driver
14/07/06 13:49:10 INFO executor.Executor: Finished task ID 0
14/07/06 13:49:10 INFO scheduler.TaskSetManager: Starting task 0.0:1 as TID 1 on executor
localhost: localhost (PROCESS_LOCAL)
14/07/06 13:49:10 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as 1617 bytes in 1 ms

14/07/06 13:49:10 INFO executor.Executor: Running task ID 1


14/07/06 13:49:10 INFO storage.BlockManager: Found block broadcast_0 locally
14/07/06 13:49:10 INFO executor.Executor: Serialized size of result for 1 is 562
14/07/06 13:49:10 INFO executor.Executor: Sending result for 1 directly to driver
14/07/06 13:49:10 INFO executor.Executor: Finished task ID 1
14/07/06 13:49:10 INFO scheduler.DAGScheduler: Completed ResultTask(0, 0)
14/07/06 13:49:10 INFO scheduler.TaskSetManager: Finished TID 0 in 5289 ms on localhost
(progress: 1/2)
14/07/06 13:49:10 INFO scheduler.TaskSetManager: Finished TID 1 in 140 ms on localhost
(progress: 2/2)
14/07/06 13:49:10 INFO scheduler.DAGScheduler: Completed ResultTask(0, 1)
14/07/06 13:49:10 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all
completed, from pool
14/07/06 13:49:10 INFO scheduler.DAGScheduler: Stage 0 (collect at BroadcastTest.scala:53)
finished in 5.412 s
14/07/06 13:49:10 INFO spark.SparkContext: Job finished: collect at BroadcastTest.scala:53, took
5.953444462 s
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
Iteration 0 took 6809 milliseconds
Iteration 1

===========
14/07/06 13:49:10 INFO storage.MemoryStore: ensureFreeSpace(4000128) called with
curMem=4000128, maxMem=311387750
14/07/06 13:49:10 INFO storage.MemoryStore: Block broadcast_1 stored as values to memory
(estimated size 3.8 MB, free 289.3 MB)
14/07/06 13:49:10 INFO spark.SparkContext: Starting job: collect at BroadcastTest.scala:53
14/07/06 13:49:10 INFO scheduler.DAGScheduler: Got job 1 (collect at BroadcastTest.scala:53)
with 2 output partitions (allowLocal=false)
14/07/06 13:49:10 INFO scheduler.DAGScheduler: Final stage: Stage 1(collect at
BroadcastTest.scala:53)
14/07/06 13:49:10 INFO scheduler.DAGScheduler: Parents of final stage: List()
14/07/06 13:49:10 INFO scheduler.DAGScheduler: Missing parents: List()
14/07/06 13:49:10 INFO scheduler.DAGScheduler: Submitting Stage 1 (MappedRDD[3] at map at
BroadcastTest.scala:51), which has no missing parents
14/07/06 13:49:10 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 1
(MappedRDD[3] at map at BroadcastTest.scala:51)
14/07/06 13:49:10 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
14/07/06 13:49:10 INFO scheduler.TaskSetManager: Starting task 1.0:0 as TID 2 on executor
localhost: localhost (PROCESS_LOCAL)
14/07/06 13:49:10 INFO scheduler.TaskSetManager: Serialized task 1.0:0 as 1618 bytes in 1 ms
14/07/06 13:49:10 INFO executor.Executor: Running task ID 2
14/07/06 13:49:10 INFO storage.BlockManager: Found block broadcast_1 locally
14/07/06 13:49:10 INFO executor.Executor: Serialized size of result for 2 is 562
14/07/06 13:49:10 INFO executor.Executor: Sending result for 2 directly to driver
14/07/06 13:49:10 INFO executor.Executor: Finished task ID 2
14/07/06 13:49:10 INFO scheduler.TaskSetManager: Starting task 1.0:1 as TID 3 on executor
localhost: localhost (PROCESS_LOCAL)
14/07/06 13:49:10 INFO scheduler.TaskSetManager: Serialized task 1.0:1 as 1618 bytes in 0 ms
14/07/06 13:49:10 INFO executor.Executor: Running task ID 3
14/07/06 13:49:10 INFO scheduler.TaskSetManager: Finished TID 2 in 43 ms on localhost

(progress: 1/2)
14/07/06 13:49:10 INFO scheduler.DAGScheduler: Completed ResultTask(1, 0)
14/07/06 13:49:10 INFO storage.BlockManager: Found block broadcast_1 locally
14/07/06 13:49:10 INFO executor.Executor: Serialized size of result for 3 is 562
14/07/06 13:49:10 INFO executor.Executor: Sending result for 3 directly to driver
14/07/06 13:49:10 INFO executor.Executor: Finished task ID 3
14/07/06 13:49:10 INFO scheduler.DAGScheduler: Completed ResultTask(1, 1)
14/07/06 13:49:10 INFO scheduler.DAGScheduler: Stage 1 (collect at BroadcastTest.scala:53)
finished in 0.054 s
14/07/06 13:49:10 INFO spark.SparkContext: Job finished: collect at BroadcastTest.scala:53, took
0.102999062 s
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
Iteration 1 took 130 milliseconds
Iteration 2
===========
14/07/06 13:49:10 INFO storage.MemoryStore: ensureFreeSpace(4000128) called with
curMem=8000256, maxMem=311387750
14/07/06 13:49:10 INFO scheduler.TaskSetManager: Finished TID 3 in 42 ms on localhost
(progress: 2/2)

14/07/06 13:49:10 INFO storage.MemoryStore: Block broadcast_2 stored as values to memory


(estimated size 3.8 MB, free 285.5 MB)
14/07/06 13:49:10 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all
completed, from pool
14/07/06 13:49:10 INFO spark.SparkContext: Starting job: collect at BroadcastTest.scala:53
14/07/06 13:49:10 INFO scheduler.DAGScheduler: Got job 2 (collect at BroadcastTest.scala:53)
with 2 output partitions (allowLocal=false)
14/07/06 13:49:10 INFO scheduler.DAGScheduler: Final stage: Stage 2(collect at
BroadcastTest.scala:53)
14/07/06 13:49:10 INFO scheduler.DAGScheduler: Parents of final stage: List()
14/07/06 13:49:10 INFO scheduler.DAGScheduler: Missing parents: List()
14/07/06 13:49:10 INFO scheduler.DAGScheduler: Submitting Stage 2 (MappedRDD[5] at map at
BroadcastTest.scala:51), which has no missing parents
14/07/06 13:49:10 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 2
(MappedRDD[5] at map at BroadcastTest.scala:51)
14/07/06 13:49:10 INFO scheduler.TaskSchedulerImpl: Adding task set 2.0 with 2 tasks
14/07/06 13:49:10 INFO scheduler.TaskSetManager: Starting task 2.0:0 as TID 4 on executor
localhost: localhost (PROCESS_LOCAL)
14/07/06 13:49:10 INFO scheduler.TaskSetManager: Serialized task 2.0:0 as 1616 bytes in 0 ms
14/07/06 13:49:10 INFO executor.Executor: Running task ID 4
14/07/06 13:49:10 INFO storage.BlockManager: Found block broadcast_2 locally
14/07/06 13:49:10 INFO executor.Executor: Serialized size of result for 4 is 562
14/07/06 13:49:10 INFO executor.Executor: Sending result for 4 directly to driver
14/07/06 13:49:10 INFO scheduler.TaskSetManager: Starting task 2.0:1 as TID 5 on executor
localhost: localhost (PROCESS_LOCAL)
14/07/06 13:49:10 INFO scheduler.TaskSetManager: Serialized task 2.0:1 as 1616 bytes in 0 ms
14/07/06 13:49:10 INFO executor.Executor: Finished task ID 4
14/07/06 13:49:10 INFO scheduler.DAGScheduler: Completed ResultTask(2, 0)
14/07/06 13:49:10 INFO scheduler.TaskSetManager: Finished TID 4 in 68 ms on localhost
(progress: 1/2)
14/07/06 13:49:10 INFO executor.Executor: Running task ID 5

14/07/06 13:49:10 INFO storage.BlockManager: Found block broadcast_2 locally


14/07/06 13:49:10 INFO executor.Executor: Serialized size of result for 5 is 562
14/07/06 13:49:10 INFO executor.Executor: Sending result for 5 directly to driver
14/07/06 13:49:10 INFO scheduler.TaskSetManager: Finished TID 5 in 62 ms on localhost
(progress: 2/2)
14/07/06 13:49:10 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all
completed, from pool
14/07/06 13:49:10 INFO scheduler.DAGScheduler: Completed ResultTask(2, 1)
14/07/06 13:49:10 INFO scheduler.DAGScheduler: Stage 2 (collect at BroadcastTest.scala:53)
finished in 0.113 s
14/07/06 13:49:10 INFO executor.Executor: Finished task ID 5
14/07/06 13:49:10 INFO spark.SparkContext: Job finished: collect at BroadcastTest.scala:53, took
0.17788202 s
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
Iteration 2 took 226 milliseconds
14/07/06 13:49:10 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/metrics/json,null}
14/07/06 13:49:10 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
14/07/06 13:49:10 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/,null}

14/07/06 13:49:10 INFO handler.ContextHandler: stopped


o.e.j.s.ServletContextHandler{/static,null}
14/07/06 13:49:10 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/executors/json,null}
14/07/06 13:49:10 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/executors,null}
14/07/06 13:49:10 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/environment/json,null}
14/07/06 13:49:10 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/environment,null}
14/07/06 13:49:10 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
14/07/06 13:49:10 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/storage/rdd,null}
14/07/06 13:49:10 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/storage/json,null}
14/07/06 13:49:10 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/storage,null}
14/07/06 13:49:10 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/stages/pool/json,null}
14/07/06 13:49:10 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/stages/pool,null}
14/07/06 13:49:10 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/stages/stage/json,null}
14/07/06 13:49:10 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/stages/stage,null}
14/07/06 13:49:10 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/stages/json,null}
14/07/06 13:49:10 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/stages,null}
14/07/06 13:49:11 INFO ui.SparkUI: Stopped Spark web UI at http://vipul.local:4040
14/07/06 13:49:11 INFO scheduler.DAGScheduler: Stopping DAGScheduler
14/07/06 13:49:12 INFO spark.MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!

14/07/06 13:49:12 INFO network.ConnectionManager: Selector thread was interrupted!


14/07/06 13:49:12 INFO network.ConnectionManager: ConnectionManager stopped
14/07/06 13:49:12 INFO storage.MemoryStore: MemoryStore cleared
14/07/06 13:49:12 INFO storage.BlockManager: BlockManager stopped
14/07/06 13:49:12 INFO storage.BlockManagerMasterActor: Stopping BlockManagerMaster
14/07/06 13:49:12 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
14/07/06 13:49:12 INFO spark.SparkContext: Successfully stopped SparkContext
14/07/06 13:49:12 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down
remote daemon.
14/07/06 13:49:12 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon
shut down; proceeding with flushing remote transports.

SparkPi

vipul@vipul:~/spark$ ./bin/run-example SparkPi


Spark assembly has been built with Hive, including Datanucleus jars on classpath
14/07/06 13:53:00 WARN util.Utils: Your hostname, vipul resolves to a loopback address:
127.0.1.1; using 10.0.2.15 instead (on interface eth1)
14/07/06 13:53:00 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
14/07/06 13:53:01 INFO spark.SecurityManager: Changing view acls to: vipul
14/07/06 13:53:01 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls
disabled; users with view permissions: Set(vipul)
14/07/06 13:53:02 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/07/06 13:53:02 INFO Remoting: Starting remoting
14/07/06 13:53:02 INFO Remoting: Remoting started; listening on addresses :
[akka.tcp://spark@vipul.local:57380]
14/07/06 13:53:02 INFO Remoting: Remoting now listens on addresses:
[akka.tcp://spark@vipul.local:57380]
14/07/06 13:53:02 INFO spark.SparkEnv: Registering MapOutputTracker

14/07/06 13:53:02 INFO spark.SparkEnv: Registering BlockManagerMaster


14/07/06 13:53:02 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local20140706135302-c75d
14/07/06 13:53:02 INFO storage.MemoryStore: MemoryStore started with capacity 297.0 MB.
14/07/06 13:53:02 INFO network.ConnectionManager: Bound socket to port 60798 with id =
ConnectionManagerId(vipul.local,60798)
14/07/06 13:53:02 INFO storage.BlockManagerMaster: Trying to register BlockManager
14/07/06 13:53:02 INFO storage.BlockManagerInfo: Registering block manager vipul.local:60798
with 297.0 MB RAM
14/07/06 13:53:02 INFO storage.BlockManagerMaster: Registered BlockManager
14/07/06 13:53:02 INFO spark.HttpServer: Starting HTTP Server
14/07/06 13:53:02 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/07/06 13:53:02 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:60487
14/07/06 13:53:02 INFO broadcast.HttpBroadcast: Broadcast server started at
http://10.0.2.15:60487
14/07/06 13:53:02 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-a7bc76fb38af-448c-968e-de9281a4a1af
14/07/06 13:53:02 INFO spark.HttpServer: Starting HTTP Server
14/07/06 13:53:02 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/07/06 13:53:02 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:36866
14/07/06 13:53:09 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/07/06 13:53:09 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
14/07/06 13:53:09 INFO ui.SparkUI: Started SparkUI at http://vipul.local:4040
14/07/06 13:53:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
14/07/06 13:53:12 INFO spark.SparkContext: Added JAR file:/home/vipul/spark/lib/sparkexamples-1.0.0-hadoop2.2.0.jar at http://10.0.2.15:36866/jars/spark-examples-1.0.0-hadoop2.2.0.jar
with timestamp 1404634992354
14/07/06 13:53:14 INFO spark.SparkContext: Starting job: reduce at SparkPi.scala:35
14/07/06 13:53:14 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkPi.scala:35) with 2
output partitions (allowLocal=false)

14/07/06 13:53:14 INFO scheduler.DAGScheduler: Final stage: Stage 0(reduce at SparkPi.scala:35)


14/07/06 13:53:14 INFO scheduler.DAGScheduler: Parents of final stage: List()
14/07/06 13:53:14 INFO scheduler.DAGScheduler: Missing parents: List()
14/07/06 13:53:14 INFO scheduler.DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map at
SparkPi.scala:31), which has no missing parents
14/07/06 13:53:14 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 0
(MappedRDD[1] at map at SparkPi.scala:31)
14/07/06 13:53:14 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
14/07/06 13:53:14 INFO scheduler.TaskSetManager: Starting task 0.0:0 as TID 0 on executor
localhost: localhost (PROCESS_LOCAL)
14/07/06 13:53:14 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as 1408 bytes in 4 ms
14/07/06 13:53:14 INFO executor.Executor: Running task ID 0
14/07/06 13:53:14 INFO executor.Executor: Fetching http://10.0.2.15:36866/jars/spark-examples1.0.0-hadoop2.2.0.jar with timestamp 1404634992354
14/07/06 13:53:14 INFO util.Utils: Fetching http://10.0.2.15:36866/jars/spark-examples-1.0.0hadoop2.2.0.jar to /tmp/fetchFileTemp9109572652174788248.tmp
14/07/06 13:53:19 INFO executor.Executor: Adding file:/tmp/spark-13e489a6-c50d-48a7-8f232bbb091283e6/spark-examples-1.0.0-hadoop2.2.0.jar to class loader
14/07/06 13:53:20 INFO executor.Executor: Serialized size of result for 0 is 675
14/07/06 13:53:20 INFO executor.Executor: Sending result for 0 directly to driver
14/07/06 13:53:20 INFO executor.Executor: Finished task ID 0
14/07/06 13:53:20 INFO scheduler.TaskSetManager: Starting task 0.0:1 as TID 1 on executor
localhost: localhost (PROCESS_LOCAL)
14/07/06 13:53:20 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as 1408 bytes in 1 ms
14/07/06 13:53:20 INFO executor.Executor: Running task ID 1
14/07/06 13:53:20 INFO executor.Executor: Serialized size of result for 1 is 675
14/07/06 13:53:20 INFO scheduler.DAGScheduler: Completed ResultTask(0, 0)
14/07/06 13:53:20 INFO scheduler.TaskSetManager: Finished TID 0 in 5680 ms on localhost
(progress: 1/2)
14/07/06 13:53:20 INFO executor.Executor: Sending result for 1 directly to driver

14/07/06 13:53:20 INFO executor.Executor: Finished task ID 1


14/07/06 13:53:20 INFO scheduler.TaskSetManager: Finished TID 1 in 116 ms on localhost
(progress: 2/2)
14/07/06 13:53:20 INFO scheduler.DAGScheduler: Completed ResultTask(0, 1)
14/07/06 13:53:20 INFO scheduler.DAGScheduler: Stage 0 (reduce at SparkPi.scala:35) finished in
5.808 s
14/07/06 13:53:20 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all
completed, from pool
14/07/06 13:53:20 INFO spark.SparkContext: Job finished: reduce at SparkPi.scala:35, took
6.268545396 s
Pi is roughly 3.14692
14/07/06 13:53:20 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/metrics/json,null}
14/07/06 13:53:20 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
14/07/06 13:53:20 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/,null}
14/07/06 13:53:20 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/static,null}
14/07/06 13:53:20 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/executors/json,null}
14/07/06 13:53:20 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/executors,null}
14/07/06 13:53:20 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/environment/json,null}
14/07/06 13:53:20 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/environment,null}
14/07/06 13:53:20 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
14/07/06 13:53:20 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/storage/rdd,null}
14/07/06 13:53:20 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/storage/json,null}
14/07/06 13:53:20 INFO handler.ContextHandler: stopped

o.e.j.s.ServletContextHandler{/storage,null}
14/07/06 13:53:20 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/stages/pool/json,null}
14/07/06 13:53:20 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/stages/pool,null}
14/07/06 13:53:20 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/stages/stage/json,null}
14/07/06 13:53:20 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/stages/stage,null}
14/07/06 13:53:20 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/stages/json,null}
14/07/06 13:53:20 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/stages,null}
14/07/06 13:53:20 INFO ui.SparkUI: Stopped Spark web UI at http://vipul.local:4040
14/07/06 13:53:20 INFO scheduler.DAGScheduler: Stopping DAGScheduler
14/07/06 13:53:21 INFO spark.MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
14/07/06 13:53:21 INFO network.ConnectionManager: Selector thread was interrupted!
14/07/06 13:53:21 INFO network.ConnectionManager: ConnectionManager stopped
14/07/06 13:53:21 INFO storage.MemoryStore: MemoryStore cleared
14/07/06 13:53:21 INFO storage.BlockManager: BlockManager stopped
14/07/06 13:53:21 INFO storage.BlockManagerMasterActor: Stopping BlockManagerMaster
14/07/06 13:53:21 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
14/07/06 13:53:21 INFO spark.SparkContext: Successfully stopped SparkContext
14/07/06 13:53:21 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down
remote daemon.
14/07/06 13:53:21 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon
shut down; proceeding with flushing remote transports.

LocalPi
vipul@vipul:~/spark$ ./bin/run-example LocalPi

Spark assembly has been built with Hive, including Datanucleus jars on classpath
Pi is roughly 3.14028
vipul@vipul:~/spark$ ./bin/run-example LocalPi
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Pi is roughly 3.15104

LocalLR
vipul@vipul:~/spark$ ./bin/run-example LocalLR
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Initial w: DenseVector(-0.8066603352924779, -0.5488747509304204, -0.7351625370864459,
0.8228539509375878, -0.6662446067860872, -0.33245457898921527, 0.9664202269036932,
-0.20407887461434115, 0.4120993933386614, -0.8125908063470539)
On iteration 1
Jul 06, 2014 1:56:25 PM com.github.fommil.netlib.BLAS <clinit>
WARNING: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
Jul 06, 2014 1:56:25 PM com.github.fommil.netlib.BLAS <clinit>
WARNING: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
On iteration 2
On iteration 3
On iteration 4
On iteration 5
Final w: DenseVector(5816.075967498844, 5222.008066011373, 5754.751978607454,
3853.1772062206874, 5593.565827145935, 5282.38787420105, 3662.9216051953567,
4890.782103406075, 4223.371512250295, 5767.368579668877)

Word Count in Scala


(Connection Refused Error )

scala> val file = sc.textFile("hdfs://127.0.1.1//vipul/README.txt")


14/07/06 14:25:39 INFO storage.MemoryStore: ensureFreeSpace(69816) called with
curMem=141094, maxMem=311387750
14/07/06 14:25:39 INFO storage.MemoryStore: Block broadcast_1 stored as values to memory
(estimated size 68.2 KB, free 296.8 MB)
file: org.apache.spark.rdd.RDD[String] = MappedRDD[5] at textFile at <console>:12

scala> val counts = file.flatMap(line=>line.split(" "))


counts: org.apache.spark.rdd.RDD[String] = FlatMappedRDD[6] at flatMap at <console>:14

scala> .map(word=>(word,1))
res2: org.apache.spark.rdd.RDD[(String, Int)] = MappedRDD[7] at map at <console>:17

scala> .reduceByKey(_+_)
java.net.ConnectException: Call From vipul/127.0.1.1 to vipul:8020 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.j
ava:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
at org.apache.hadoop.ipc.Client.call(Client.java:1351)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:18
6)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientName
nodeProtocolTranslatorPB.java:651)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:1701)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1647)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:222)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:172)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at org.apache.spark.rdd.FlatMappedRDD.getPartitions(FlatMappedRDD.scala:30)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:59)
at org.apache.spark.rdd.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:370)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:19)
at $iwC$$iwC$$iwC.<init>(<console>:24)
at $iwC$$iwC.<init>(<console>:26)
at $iwC.<init>(<console>:28)
at <init>(<console>:30)
at .<init>(<console>:34)
at .<clinit>(<console>)
at .<init>(<console>:7)

at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:834)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608)
at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611)
at org.apache.spark.repl.SparkILoop$
$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982)
at org.apache.spark.repl.Main$.main(Main.scala:31)

at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:547)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642)
at org.apache.hadoop.ipc.Client$Connection.access$2600(Client.java:314)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1399)
at org.apache.hadoop.ipc.Client.call(Client.java:1318)
... 83 more

Vous aimerez peut-être aussi