Page Repo
App Crawler
Crawl Logic
DPark
HBase
URL Repo
Simulator User Action Crawler
Logger
HDFS
URL Dispatcher
Session Crawler
URL Extraction Rules
. . . . . .
Robots File Handler
JS Enginer
Captcha Handler
Mobile Page Crawler
Content Parser
IP-Proxy Manager
Administrator
分布式存储
Field Extraction Rules
用户操作界面
Field Repo
Captcha Crawler
MESOS
Monitor
Content Acceptor
Noraml Crawler