A Big Spatial Data Processing Framework Applying to National Geographic Conditions Monitoring
In this paper, a novel framework for spatial data processing is proposed, which apply to National Geographic Conditions Monitoring project of China. It includes 4 layers: spatial data storage, spatial RDDs, spatial operations, and spatial query language. The spatial data storage layer uses HDFS to store large size of spatial vector/raster data in the distributed cluster. The spatial RDDs are the abstract logical dataset of spatial data types, and can be transferred to the spark cluster to conduct spark transformations and actions. The spatial operations layer is a series of processing on spatial RDDs, such as range query, k nearest neighbor and spatial join. The spatial query language is a user-friendly interface which provide people not familiar with Spark with a comfortable way to operation the spatial operation. Compared with other spatial frameworks, it is highlighted that comprehensive technologies are referred for big spatial data processing. Extensive experiments on real datasets show that the framework achieves better performance than traditional process methods.