Académique Documents
Professionnel Documents
Culture Documents
(progress: 1/1)
14/07/06 14:02:50 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all
completed, from pool
14/07/06 14:02:50 INFO spark.SparkContext: Job finished: reduce at <console>:16, took
1.352754969 s
count: Int = 9
Example BroadcastTest
vipul@vipul:~/spark$ ./bin/run-example BroadcastTest
Spark assembly has been built with Hive, including Datanucleus jars on classpath
14/07/06 13:48:51 WARN util.Utils: Your hostname, vipul resolves to a loopback address:
127.0.1.1; using 10.0.2.15 instead (on interface eth1)
14/07/06 13:48:51 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
14/07/06 13:48:51 INFO spark.SecurityManager: Changing view acls to: vipul
14/07/06 13:48:51 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls
disabled; users with view permissions: Set(vipul)
14/07/06 13:48:52 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/07/06 13:48:52 INFO Remoting: Starting remoting
14/07/06 13:48:53 INFO Remoting: Remoting started; listening on addresses :
[akka.tcp://spark@vipul.local:42457]
14/07/06 13:48:53 INFO Remoting: Remoting now listens on addresses:
[akka.tcp://spark@vipul.local:42457]
14/07/06 13:48:53 INFO spark.SparkEnv: Registering MapOutputTracker
14/07/06 13:48:53 INFO spark.SparkEnv: Registering BlockManagerMaster
14/07/06 13:48:53 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-
20140706134853-1bb0
14/07/06 13:48:53 INFO storage.MemoryStore: MemoryStore started with capacity 297.0 MB.
14/07/06 13:48:53 INFO network.ConnectionManager: Bound socket to port 56860 with id =
ConnectionManagerId(vipul.local,56860)
14/07/06 13:48:53 INFO storage.BlockManagerMaster: Trying to register BlockManager
14/07/06 13:48:53 INFO storage.BlockManagerInfo: Registering block manager vipul.local:56860
with 297.0 MB RAM
14/07/06 13:48:53 INFO storage.BlockManagerMaster: Registered BlockManager
14/07/06 13:48:53 INFO spark.HttpServer: Starting HTTP Server
14/07/06 13:48:53 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/07/06 13:48:53 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:60352
14/07/06 13:48:53 INFO broadcast.HttpBroadcast: Broadcast server started at
http://10.0.2.15:60352
14/07/06 13:48:53 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-2b3e3e827b11-4d14-a64f-7062cac3bcb0
14/07/06 13:48:53 INFO spark.HttpServer: Starting HTTP Server
14/07/06 13:48:53 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/07/06 13:48:53 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:59283
14/07/06 13:48:59 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/07/06 13:48:59 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
14/07/06 13:48:59 INFO ui.SparkUI: Started SparkUI at http://vipul.local:4040
14/07/06 13:49:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
14/07/06 13:49:03 INFO spark.SparkContext: Added JAR file:/home/vipul/spark/lib/sparkexamples-1.0.0-hadoop2.2.0.jar at http://10.0.2.15:59283/jars/spark-examples-1.0.0-hadoop2.2.0.jar
with timestamp 1404634741559
Iteration 0
===========
14/07/06 13:49:04 INFO storage.MemoryStore: ensureFreeSpace(4000128) called with curMem=0,
maxMem=311387750
===========
14/07/06 13:49:10 INFO storage.MemoryStore: ensureFreeSpace(4000128) called with
curMem=4000128, maxMem=311387750
14/07/06 13:49:10 INFO storage.MemoryStore: Block broadcast_1 stored as values to memory
(estimated size 3.8 MB, free 289.3 MB)
14/07/06 13:49:10 INFO spark.SparkContext: Starting job: collect at BroadcastTest.scala:53
14/07/06 13:49:10 INFO scheduler.DAGScheduler: Got job 1 (collect at BroadcastTest.scala:53)
with 2 output partitions (allowLocal=false)
14/07/06 13:49:10 INFO scheduler.DAGScheduler: Final stage: Stage 1(collect at
BroadcastTest.scala:53)
14/07/06 13:49:10 INFO scheduler.DAGScheduler: Parents of final stage: List()
14/07/06 13:49:10 INFO scheduler.DAGScheduler: Missing parents: List()
14/07/06 13:49:10 INFO scheduler.DAGScheduler: Submitting Stage 1 (MappedRDD[3] at map at
BroadcastTest.scala:51), which has no missing parents
14/07/06 13:49:10 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 1
(MappedRDD[3] at map at BroadcastTest.scala:51)
14/07/06 13:49:10 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
14/07/06 13:49:10 INFO scheduler.TaskSetManager: Starting task 1.0:0 as TID 2 on executor
localhost: localhost (PROCESS_LOCAL)
14/07/06 13:49:10 INFO scheduler.TaskSetManager: Serialized task 1.0:0 as 1618 bytes in 1 ms
14/07/06 13:49:10 INFO executor.Executor: Running task ID 2
14/07/06 13:49:10 INFO storage.BlockManager: Found block broadcast_1 locally
14/07/06 13:49:10 INFO executor.Executor: Serialized size of result for 2 is 562
14/07/06 13:49:10 INFO executor.Executor: Sending result for 2 directly to driver
14/07/06 13:49:10 INFO executor.Executor: Finished task ID 2
14/07/06 13:49:10 INFO scheduler.TaskSetManager: Starting task 1.0:1 as TID 3 on executor
localhost: localhost (PROCESS_LOCAL)
14/07/06 13:49:10 INFO scheduler.TaskSetManager: Serialized task 1.0:1 as 1618 bytes in 0 ms
14/07/06 13:49:10 INFO executor.Executor: Running task ID 3
14/07/06 13:49:10 INFO scheduler.TaskSetManager: Finished TID 2 in 43 ms on localhost
(progress: 1/2)
14/07/06 13:49:10 INFO scheduler.DAGScheduler: Completed ResultTask(1, 0)
14/07/06 13:49:10 INFO storage.BlockManager: Found block broadcast_1 locally
14/07/06 13:49:10 INFO executor.Executor: Serialized size of result for 3 is 562
14/07/06 13:49:10 INFO executor.Executor: Sending result for 3 directly to driver
14/07/06 13:49:10 INFO executor.Executor: Finished task ID 3
14/07/06 13:49:10 INFO scheduler.DAGScheduler: Completed ResultTask(1, 1)
14/07/06 13:49:10 INFO scheduler.DAGScheduler: Stage 1 (collect at BroadcastTest.scala:53)
finished in 0.054 s
14/07/06 13:49:10 INFO spark.SparkContext: Job finished: collect at BroadcastTest.scala:53, took
0.102999062 s
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
Iteration 1 took 130 milliseconds
Iteration 2
===========
14/07/06 13:49:10 INFO storage.MemoryStore: ensureFreeSpace(4000128) called with
curMem=8000256, maxMem=311387750
14/07/06 13:49:10 INFO scheduler.TaskSetManager: Finished TID 3 in 42 ms on localhost
(progress: 2/2)
SparkPi
o.e.j.s.ServletContextHandler{/storage,null}
14/07/06 13:53:20 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/stages/pool/json,null}
14/07/06 13:53:20 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/stages/pool,null}
14/07/06 13:53:20 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/stages/stage/json,null}
14/07/06 13:53:20 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/stages/stage,null}
14/07/06 13:53:20 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/stages/json,null}
14/07/06 13:53:20 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/stages,null}
14/07/06 13:53:20 INFO ui.SparkUI: Stopped Spark web UI at http://vipul.local:4040
14/07/06 13:53:20 INFO scheduler.DAGScheduler: Stopping DAGScheduler
14/07/06 13:53:21 INFO spark.MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
14/07/06 13:53:21 INFO network.ConnectionManager: Selector thread was interrupted!
14/07/06 13:53:21 INFO network.ConnectionManager: ConnectionManager stopped
14/07/06 13:53:21 INFO storage.MemoryStore: MemoryStore cleared
14/07/06 13:53:21 INFO storage.BlockManager: BlockManager stopped
14/07/06 13:53:21 INFO storage.BlockManagerMasterActor: Stopping BlockManagerMaster
14/07/06 13:53:21 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
14/07/06 13:53:21 INFO spark.SparkContext: Successfully stopped SparkContext
14/07/06 13:53:21 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down
remote daemon.
14/07/06 13:53:21 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon
shut down; proceeding with flushing remote transports.
LocalPi
vipul@vipul:~/spark$ ./bin/run-example LocalPi
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Pi is roughly 3.14028
vipul@vipul:~/spark$ ./bin/run-example LocalPi
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Pi is roughly 3.15104
LocalLR
vipul@vipul:~/spark$ ./bin/run-example LocalLR
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Initial w: DenseVector(-0.8066603352924779, -0.5488747509304204, -0.7351625370864459,
0.8228539509375878, -0.6662446067860872, -0.33245457898921527, 0.9664202269036932,
-0.20407887461434115, 0.4120993933386614, -0.8125908063470539)
On iteration 1
Jul 06, 2014 1:56:25 PM com.github.fommil.netlib.BLAS <clinit>
WARNING: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
Jul 06, 2014 1:56:25 PM com.github.fommil.netlib.BLAS <clinit>
WARNING: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
On iteration 2
On iteration 3
On iteration 4
On iteration 5
Final w: DenseVector(5816.075967498844, 5222.008066011373, 5754.751978607454,
3853.1772062206874, 5593.565827145935, 5282.38787420105, 3662.9216051953567,
4890.782103406075, 4223.371512250295, 5767.368579668877)
scala> .map(word=>(word,1))
res2: org.apache.spark.rdd.RDD[(String, Int)] = MappedRDD[7] at map at <console>:17
scala> .reduceByKey(_+_)
java.net.ConnectException: Call From vipul/127.0.1.1 to vipul:8020 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.j
ava:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
at org.apache.hadoop.ipc.Client.call(Client.java:1351)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:18
6)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientName
nodeProtocolTranslatorPB.java:651)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:1701)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1647)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:222)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:172)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at org.apache.spark.rdd.FlatMappedRDD.getPartitions(FlatMappedRDD.scala:30)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:59)
at org.apache.spark.rdd.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:370)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:19)
at $iwC$$iwC$$iwC.<init>(<console>:24)
at $iwC$$iwC.<init>(<console>:26)
at $iwC.<init>(<console>:28)
at <init>(<console>:30)
at .<init>(<console>:34)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:834)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608)
at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611)
at org.apache.spark.repl.SparkILoop$
$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:547)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642)
at org.apache.hadoop.ipc.Client$Connection.access$2600(Client.java:314)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1399)
at org.apache.hadoop.ipc.Client.call(Client.java:1318)
... 83 more