spark-0.8-src
2017-01-02 16:22:49 0 举报
spark-0.8-src是Apache Spark的源代码版本,这是一套开源的大数据处理框架,主要用于处理大规模数据集。它提供了一种高效、易用且灵活的方式来处理和分析数据。Spark的主要特点包括其内存计算能力、容错性以及与多种数据源的兼容性。此外,它还支持SQL查询、流处理、机器学习等多种功能。spark-0.8-src版本是Apache Spark早期版本,虽然现在已经有更高版本的Spark,但这个版本仍然具有学习和研究的价值,因为它包含了Spark的基本架构和核心算法。对于大数据处理和分析的研究者和开发者来说,spark-0.8-src是一个宝贵的资源。
作者其他创作
大纲/内容
Task
Running stages
rs offer
MapOutputTracker
Add CTSM
API to make
MapPartitonsRDD
SchedulingMode
GlommedRDD
SlaveExecutor
CartesiandRDD
监听
trackerActor
extends
MapPartitonsWithInedexRDD
coalesce()
Run
sample()
Partition
ResultTask
(暂不支持组合)
UnionRDD
2 RDD Partitions
CacheManager
flatMap()
result stage
pipe()
Add pool
YarnClusterScheduler
RDD.first()RDD.take()RDD.filter()
compute()
no
runJob()
RDD
getMissingParentStages
ClusterTaskSetManager
process
ShuffledRDD
cmd
CheckpointRDD
创建
TaskSetManager
管理、监控
spark.default.parallelism
RDDs
None & checkpointed
firstParent.iterator
Iterator()
get(key)
submitTasks(TaskSet)
SchedulableBuilder
launchTask
CoarseMesosSchedulerBackend
filter()
locality
SparkContext
mem/disk
path/dir
TaskScheduler
Denpendency
MesosSchedulerBackend
FlatMappedRDD
prepareJob()
Parent ready
map()
putIfAbsent
Task执行结果
Actor
Run Tasks
宽
PipedRDD
CacheManagergetOrCompute()
LocalTaskSetManager
SerializerManager
zipPartitions()
TaskSet
add(key)
Ref
Implement by
Waiting stages
computeOrReadCheckpoint
SparkDeploySchedulerBackend
SchedulerBackend
makeOffers
DAGScheduler
MasterClient Driver
submitWaitingStages
TaskContext
StandaloneExecutorBackend
Poisson
loading HashSet
CoalescedRDD
glom()
runLocallyWithinThread
add
StorageLevel
有依赖(有parent)
ZippedRDD
buildPool
根据部署绑定
FilteredRDD
DAG
RDDCheckpointData
ClusterScheduler
wrap
BlockManagerMasterActor
JobWaiter
union()
None & not checkpointed
cartesian()
SparkContext.initialize()
zip()
subclasses
产生finalStage
StandaloneSchedulerBackend
processEvent()消费queue
Executor
cpFile
SampledRDD
BlockManager
Tasks
Job
无依赖
ShuffleMapStage
BlockManagerSlaveActor
Scheduler
SparkEnv
rdd.iterator()
Storage
注册shuffle操作
BlockManagerMaster
ZippedPartitionsRDD234
action
ShuffleMapTask
Put in Queue
LocalScheduler
MappedRDD
Function()
窄
产生Job
Jobs
transformation
submitMissingTasks()
Add LTSM
0 条评论
下一页