Browsing by Author "Rosa, Guilherme J. M."

Now showing 1 - 3 of 3

Effect of Genotype Imputation on Genome-Enabled Prediction of Complex Traits: an Empirical Study With Mice Data
(Bmc, 2014) Felipe, Vivian P. S.; Okut, Hayrettin; Gianola, Daniel; Silva, Martinho A.; Rosa, Guilherme J. M.
Background: Genotype imputation is an important tool for whole-genome prediction as it allows cost reduction of individual genotyping. However, benefits of genotype imputation have been evaluated mostly for linear additive genetic models. In this study we investigated the impact of employing imputed genotypes when using more elaborated models of phenotype prediction. Our hypothesis was that such models would be able to track genetic signals using the observed genotypes only, with no additional information to be gained from imputed genotypes. Results: For the present study, an outbred mice population containing 1,904 individuals and genotypes for 1,809 pre-selected markers was used. The effect of imputation was evaluated for a linear model (the Bayesian LASSO-BL) and for semi and non-parametric models (Reproducing Kernel Hilbert spaces regressions-RKHS, and Bayesian Regularized Artificial Neural Networks-BRANN, respectively). The RKHS method had the best predictive accuracy. Genotype imputation had a similar impact on the effectiveness of BL and RKHS. BRANN predictions were, apparently, more sensitive to imputation errors. In scenarios where the masking rates were 75% and 50%, the genotype imputation was not beneficial. However, genotype imputation incorporated information about important markers and improved predictive ability, especially for body mass index (BMI), when genotype information was sparse (90% masking), and for body weight (BW) when the reference sample for imputation was weakly related to the target population. Conclusions: In conclusion, genotype imputation is not always helpful for phenotype prediction, and so it should be considered in a case-by-case basis. In summary, factors that can affect the usefulness of genotype imputation for prediction of yet-to-be observed traits are: the imputation accuracy itself, the structure of the population, the genetic architecture of the target trait and also the model used for phenotype prediction.
Predicting Complex Quantitative Traits With Bayesian Neural Networks: a Case Study With Jersey Cows and Wheat
(Bmc, 2011) Gianola, Daniel; Okut, Hayrettin; Weigel, Kent A.; Rosa, Guilherme J. M.
Background: In the study of associations between genomic data and complex phenotypes there may be relationships that are not amenable to parametric statistical modeling. Such associations have been investigated mainly using single-marker and Bayesian linear regression models that differ in their distributions, but that assume additive inheritance while ignoring interactions and non-linearity. When interactions have been included in the model, their effects have entered linearly. There is a growing interest in non-parametric methods for predicting quantitative traits based on reproducing kernel Hilbert spaces regressions on markers and radial basis functions. Artificial neural networks (ANN) provide an alternative, because these act as universal approximators of complex functions and can capture non-linear relationships between predictors and responses, with the interplay among variables learned adaptively. ANNs are interesting candidates for analysis of traits affected by cryptic forms of gene action. Results: We investigated various Bayesian ANN architectures using for predicting phenotypes in two data sets consisting of milk production in Jersey cows and yield of inbred lines of wheat. For the Jerseys, predictor variables were derived from pedigree and molecular marker (35,798 single nucleotide polymorphisms, SNPS) information on 297 individually cows. The wheat data represented 599 lines, each genotyped with 1,279 markers. The ability of predicting fat, milk and protein yield was low when using pedigrees, but it was better when SNPs were employed, irrespective of the ANN trained. Predictive ability was even better in wheat because the trait was a mean, as opposed to an individual phenotype in cows. Non-linear neural networks outperformed a linear model in predictive ability in both data sets, but more clearly in wheat. Conclusion: Results suggest that neural networks may be useful for predicting complex traits using high-dimensional genomic information, a situation where the number of unknowns exceeds sample size. ANNs can capture nonlinearities, adaptively. This may be useful when prediction of phenotypes is crucial.
Predicting Expected Progeny Difference for Marbling Score in Angus Cattle Using Artificial Neural Networks and Bayesian Regression Models
(Bmc, 2013) Okut, Hayrettin; Wu, Xiao-Liao; Rosa, Guilherme J. M.; Bauck, Stewart; Woodward, Brent W.; Schnabel, Robert D.; Gianola, Daniel
Background: Artificial neural networks (ANN) mimic the function of the human brain and are capable of performing massively parallel computations for data processing and knowledge representation. ANN can capture nonlinear relationships between predictors and responses and can adaptively learn complex functional forms, in particular, for situations where conventional regression models are ineffective. In a previous study, ANN with Bayesian regularization outperformed a benchmark linear model when predicting milk yield in dairy cattle or grain yield of wheat. Although breeding values rely on the assumption of additive inheritance, the predictive capabilities of ANN are of interest from the perspective of their potential to increase the accuracy of prediction of molecular breeding values used for genomic selection. This motivated the present study, in which the aim was to investigate the accuracy of ANN when predicting the expected progeny difference (EPD) of marbling score in Angus cattle. Various ANN architectures were explored, which involved two training algorithms, two types of activation functions, and from 1 to 4 neurons in hidden layers. For comparison, BayesC pi models were used to select a subset of optimal markers (referred to as feature selection), under the assumption of additive inheritance, and then the marker effects were estimated using BayesCp with p set equal to zero. This procedure is referred to as BayesCpC and was implemented on a high-throughput computing cluster. Results: The ANN with Bayesian regularization method performed equally well for prediction of EPD as BayesCpC, based on prediction accuracy and sum of squared errors. With the 3K-SNP panel, for example, prediction accuracy was 0.776 using BayesCpC, and ranged from 0.776 to 0.807 using BRANN. With the selected 700-SNP panel, prediction accuracy was 0.863 for BayesCpC and ranged from 0.842 to 0.858 for BRANN. However, prediction accuracy for the ANN with scaled conjugate gradient back-propagation was lower, ranging from 0.653 to 0.689 with the 3K-SNP panel, and from 0.743 to 0.793 with the selected 700-SNP panel. Conclusions: ANN with Bayesian regularization performed as well as linear Bayesian regression models in predicting additive genetic values, supporting the idea that ANN are useful as universal approximators of functions of interest in breeding contexts.