Integrating heterogeneous across-country data for proxy-based random forest prediction of enteric methane in dairy cattle

dc.contributor.authorNegussie, Enyew
dc.contributor.authorGonzalez-Recio, Oscar
dc.contributor.authorBattagin, Mara
dc.contributor.authorBayat, Ali-Reza
dc.contributor.authorBoland, Tommy
dc.contributor.authorde Haas, Yvette
dc.contributor.authorGarcia-Rodriguez, Aser
dc.contributor.authorGarnsworthy, Philip C.
dc.contributor.authorGengler, Nicolas
dc.contributor.authorKreuzer, Michael
dc.contributor.authorKuhla, Bjorn
dc.contributor.authorLassen, Jan
dc.contributor.authorPeiren, Nico
dc.contributor.authorPszczola, Marcin
dc.contributor.authorSchwarm, Angela
dc.contributor.authorSoyeurt, Helene
dc.contributor.authorvanlierde, Amelie
dc.contributor.authorYan, Tianhai
dc.contributor.authorBiscarini, Filippo
dc.date.accessioned2022-06-14T14:24:28Z
dc.date.available2022-06-14T14:24:28Z
dc.date.issued2022-03-22
dc.descriptionPublication history: Accepted - 9 February 2022; Published online - 26 March 2022en_US
dc.description.abstractDirect measurements of methane (CH4) from individual animals are difficult and expensive. Predictions based on proxies for CH4 are a viable alternative. Most prediction models are based on multiple linear regressions (MLR) and predictor variables that are not routinely available in commercial farms, such as dry matter intake (DMI) and diet composition. The use of machine learning (ML) algorithms to predict CH4 emissions from across-country heterogeneous data sets has not been reported. The objectives were to compare performances of ML ensemble algorithm random forest (RF) and MLR models in predicting CH4 emissions from proxies in dairy cows, and assess effects of imputing missing data points on prediction accuracy. Data on CH4 emissions and proxies for CH4 from 20 herds were provided by 10 countries. The integrated data set contained 43,519 records from 3,483 cows, with 18.7% missing data points imputed using k-nearest neighbor imputation. Three data sets were created, 3k (no missing records), 21k (missing DMI imputed from milk, fat, protein, body weight), and 41k (missing DMI, milk fat, and protein records imputed). These data sets were used to test scenarios (with or without DMI, imputed vs. nonimputed DMI, milk fat, and protein), and prediction models (RF vs. MLR). Model predictive ability was evaluated within and between herds through 10-fold cross-validation. Prediction accuracy was measured as correlation between observed and predicted CH4, root mean squared error (RMSE) and mean normalized discounted cumulative gain (NDCG). Inclusion of DMI in the model improved within and between-herd prediction accuracy to 0.77 (RMSE = 23.3%) and 0.58 (RMSE = 31.9%) in RF and to 0.50 (RMSE = 0.327) and 0.13 (RMSE = 42.71) in MLR, respectively than when DMI was not included in the predictive model. When missing DMI records were imputed, within and between-herd accuracy increased to 0.84 (RMSE = 18.5%) and 0.63 (RMSE = 29.9%), respectively. In all scenarios, RF models out-performed MLR models. Results suggest routinely measured variables from dairy farms can be used in developing globally robust prediction models for CH4 if coupled with state-of-the-art techniques for imputation and advanced ML algorithms for predictive modeling.en_US
dc.description.sponsorshipThis paper is the result of the concerted effort of all participants and support from the networks of COST Action FA1302 “METHAGENE: Large-scale methane measurements on individual ruminants for genetic evaluations.” The authors thank all individuals and groups who have directly or indirectly contributed to this work; special thanks are due to the technical and financial support from the COST Action FA1302 of the European Union. In addition, all financial and technical support from all participating countries and research centers involved in this work is greatly acknowledged.en_US
dc.identifierhttp://hdl.handle.net/20.500.12518/456
dc.identifier.citationNegussie, E., González-Recio, O., Battagin, M., Bayat, A.-R., Boland, T., de Haas, Y., Garcia-Rodriguez, A., Garnsworthy, P.C., Gengler, N., Kreuzer, M., Kuhla, B., Lassen, J., Peiren, N., Pszczola, M., Schwarm, A., Soyeurt, H., Vanlierde, A., Yan, T. and Biscarini, F. (2022) ‘Integrating heterogeneous across-country data for proxy-based random forest prediction of enteric methane in dairy cattle’, Journal of Dairy Science. American Dairy Science Association. doi:10.3168/jds.2021-20158.en_US
dc.identifier.issn0022-0302
dc.identifier.issn1525-3198
dc.identifier.urihttps://doi.org/10.3168/jds.2021-20158
dc.language.isoenen_US
dc.publisherElsevieren_US
dc.rights© 2022, The Authors. Published by Elsevier Inc. and Fass Inc. on behalf of the American Dairy Science Association®. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/)en_US
dc.subjectenteric methaneen_US
dc.subjectmachine learningen_US
dc.subjectprediction modelsen_US
dc.subjectproxies for methaneen_US
dc.titleIntegrating heterogeneous across-country data for proxy-based random forest prediction of enteric methane in dairy cattleen_US
dc.typeArticleen_US
dcterms.dateAccepted2022-02-09
dcterms.dateSubmitted2021-01-15

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Integrating heterogeneous across-country data for proxy-based random forest prediction of enteric methane in dairy cattle.pdf
Size:
1.59 MB
Format:
Adobe Portable Document Format
Description:
Final published version

Collections