spark-submit任务提交流程
2020-07-03 14:20:49   12  举报             
     
         
 spark submit 提交流程
    作者其他创作
 大纲/内容
 最终的结果就是提交jvm参数,+ ApplicationMaster 或者 ExecutorLauncher到yarn上去运行
    利用YarnClient 与 Yarn集群连接
  new ApplicationMasterArguments(args)
  提交Application
  yarnClient.submitApplication(appContext)
  prepareSubmitEnvironment
  org.apache.spark.deploy.yarn.ApplicationMaster
  //Endpoint生命周期The life-cycle of an endpoint is: constructor -> onStart -> receive -> onStop
  runDriver()
  最终直接执行SparkPi的main方法
  org.apache.spark.executor.CoarseGrainedExecutorBackend
  new Client()
  org.apache.spark.deploy.yarn.YarnAllocator#handleAllocatedContainers//使用线程池,去运行多个Executor线程runAllocatedContainers(containersToUse) 
  run()
  startUserApplication()
  org.apache.spark.deploy.yarn.ApplicationMaster#runDriver
  org.apache.spark.deploy.SparkSubmit
  createAllocator()  
  根据启动参数判断是 client 模式;cluster 模式
  利用反射执行用户的程序main
  master.run()
  启动AM
  client
  org.apache.spark.deploy.SparkSubmit#doSubmit
  main()
  org.apache.spark.deploy.yarn.ApplicationMaster#runImpl
  start()方法
  spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client ./examples/jars/spark-examples_2.11-2.1.1.jar 100
  org.apache.spark.rpc.ThreadSafeRpcEndpoint
  org.apache.spark.deploy.yarn.Client#createContainerLaunchContext
  spark-submit
  org.apache.spark.deploy.yarn.Client#submitApplicationlauncherBackend.connect()yarnClient.init(hadoopConf)yarnClient.start()createContainerLaunchContext
  submitApplication()
  main
  org.apache.spark.deploy.yarn.ApplicationMaster#startUserApplicationuserThread.setName(\"Driver\
  org.apache.spark.deploy.yarn.ExecutorRunnable#runnmClient.start()startContainer()prepareCommand() --->bin/java org.apache.spark.executor.CoarseGrainedExecutorBackend
  org.apache.spark.deploy.yarn.Client
  allocateResources() 
  YarnClient.createYarnClient
  此时应该看CoarseGrainedExecutorBackend 下的两个方法:onStart()receive()
  在此方法内,封装解析一系列参数,最后将bin/java xxxx 指令封装,简而言之,最后提交到Yarn上的就是该command bin/java org.apache.spark.deploy.yarn.ApplicationMaster (Cluster)  bin/java org.apache.spark.deploy.yarn.ExecutorLauncher  (client)
  注册AM
  org.apache.spark.deploy.yarn.Client#runthis.appId = submitApplication()
  SparkSubmit
  new ApplicationMaster(amArgs)
  override def onStart() {    logInfo(\"Connecting to driver: \
  创建AM
  handleAllocatedContainers
  通过代码可以看见,无论是ApplicationMaster,或者ExecutorLauncher,都是走的ApplicationMaster.main(args)方法
  registerAM())
     收藏 
     
 
 
 
 
  0 条评论
 下一页