From Fragmented Data to Unified Insights: Leveraging Data Standardization Tools for Better Collaboration and Agronomic Big Data Analysis
The quantity and scope of agronomic data available for researchers in both industry and academia is increasing rapidly. Data sources include a myriad of different streams, such as field experiments, sensors, climatic data, socioeconomic data or remote sensing. The lack of standards and workflows frequently leads agronomic data to be fragmented and siloed, hampering collaboration efforts within research labs, university departments, or research institutes. Researchers and businesses therefore allocate significant time resources into unifying these fragmented data layers into a coherent structure. Implementing data standardization schemes can enable efficient collaboration, and leveraging the collective power of the research community to address critical agronomic knowledge gaps.
This presentation will provide an overview of available research data standardization tools, and explain the underlying FAIR and other data management principles. Using Agmatix’s Axiom platform as an example, we will demonstrate how data from multiple sources can be standardized and used for insightful modeling. We have used 3774 experimental data points from 20 different sources in the United States – universities, commercial associations and farm-management systems - to construct a corn prediction model. The standardization platform unifies all datasets to common headers and units, allowing to explore data distributions to ensure an adequate crop yield parameter space is covered. Production environment descriptors such as nutrients inputs, soil texture, soil organic matter and planting dates were augmented with relevant climatic data retrieved for specific growth stages periods in each experiment. The model, built as an ensemble of decision trees, was able to achieve good accuracy in yield prediction across the different production environments (R2 = 0.9. RMSE = 1.0 Mg/Ha, Mean Absolute Prediction Error of 7%). Standard feature permutation procedures found the key factors affecting the model: nutrients inputs, soil organic matter, and climatic conditions during two critical crop growth periods. Analyzing the data for the effect of these factors found informative yield trends, which can now be further explored.
While demonstrated here on corn yield, many different agronomic research domains can benefit from standardization of data. We call on the agronomic research community to adopt standardization tools, and share their data through public repositories, community-managed or private-public data platforms. This will allow: i) better transparency; ii) availability of the data for re-use by numerous public and private sector stakeholders; iii) better return on investment; and iv) increased potential for tackling the current global agronomic challenges.