Archiving and Managing Remote Sensing Data using State of the Art Storage Technologies
Integrated Multi-mission Ground Segment for Earth Observation Satellites (IMGEOS) was established with an objective to eliminate human interaction to the maximum extent. All emergency data products will be delivered within an hour of acquisition through FTP delivery. All other standard data products will be delivered through FTP within a day. The IMGEOS activity was envisaged to reengineer the entire chain of operations at the ground segment facilities of NRSC at Shadnagar and Balanagar campuses to adopt an integrated multi-mission approach. To achieve this, the Information Technology Infrastructure was consolidated by implementing virtualized tiered storage and network computing infrastructure in a newly built Data Centre at Shadnagar Campus. One important activity that influences all other activities in the integrated multi-mission approach is the design of appropriate storage and network architecture for realizing all the envisaged operations in a highly streamlined, reliable and secure environment. Storage was consolidated based on the major factors like accessibility, long term data protection, availability, manageability and scalability. The broad operational activities are reception of satellite data, quick look, generation of browse, production of standard and valueadded data products, production chain management, data quality evaluation, quality control and product dissemination. For each of these activities, there are numerous other detailed sub-activities and pre-requisite tasks that need to be implemented to support the above operations.
The IMGEOS architecture has taken care of choosing the right technology for the given data sizes, their movement and long-term lossless retention policies. Operational costs of the solution are kept to the minimum possible. Scalability of the solution is also ensured. The main function of the storage is to receive and store the acquired satellite data, facilitate high speed availability of the data for further processing at Data Processing servers and help to generate data products at a rate of about 1000 products per day. It also archives all the acquired data on tape storage for long-term retention and utilization. Data sizes per satellite pass range from hundreds of megabytes to tens of gigabytes
The images acquired from remote sensing satellites are valuable assets of NRSC and are used as input for further generation of different types of user data products through multiple Data Processing systems. Hence, it is required to collect and store the data within a shared, high speed repository concurrently accessible by multiple systems. After the raw imagery is stored on a high-speed repository, the images must be processed in order for them to be useful for value-added processing or for imagery analysts. The raw image file has to be copied on to data processing servers for further processing. Given the large file sizes, it is impractical to transfer these files to processing servers via a local area network. Even at gigabit Ethernet rates (up to 60 MB/s), a 5 GB file will take at least 83 seconds. For this reason, it is useful to employ a shared file system which allows every processing system to directly access the same pool where raw images were stored. Concurrent access by multiple systems is ensured for processing and generation of data products. With the above reasons, it was chosen to have high speed disk arrays for acquisition and processing purposes and tape based storage systems for long-term huge data (Peta Bytes) archival in a virtualized multitier storage architecture.
This paper explains the architecture involved in a virtualized tiered storage environment being used for acquisition, processing and archiving the remote sensing data. It also explains the data management aspects involved in ensuring data availability and archiving Peta bytes sized, remote sensing data acquired over the past 40 years.