Skip to content

AHassanpour88/Beyond_Scenario_MoBPSopti

Repository files navigation

Resource optimization in breeding program using Kernel regression

Table of contents

0 Introduction
1 Description of the underlying simulation case
2 Nadaraya-Watson kernel function estimator
3 Estimation of the number of needed simulations
4 Variance reduction using Kernel regression

0 Introduction

This repository contains scripts for resource optimization in breeding program using Kernel regression based on our first paper on the general optimization of breeding program design which is now available in (https://academic.oup.com/g3journal/article/13/12/jkad217/7281644). Breeding programs nowadays have become increasingly larger and more structurally complex with various interdependent parameters and contrasting objectives. Therefore, it is practically impossible to derive a strategy intuitively to optimize a breeding program for available resources. As a result, it is a common practice to narrow down the optimization problem to a set of scenarios and focus on a smaller subset of possibilities that, in turn, are deeply analyzed in detail. We aim to provide guidance on constructing a multi-objective optimization problem using stochastic simulations and integrating its use into breeding research programs seeking to design, implement and evaluate the best resource utilization or near-optimal resource combination beyond just analyzing scenario differences.

1 Description of the underlying simulation case and objective goals

We will propose a general pipeline for optimizing breeding programs and to showcase our approach we are using a simplified classical dairy cattle breeding scheme. In dairy cattle breeding program, the performance traits of a bull cannot be determined phenotypically and can be expressed only by cows, so pre-selected bulls must be mated to cows to produce test daughters. The offspring performance of the test bulls will be used as criteria for selection decisions, and selected sires and cows with the desired trait will then be used as parents for the next cycle. In our example, a genomic breeding value estimation was used to select bulls, whereas cows were selected based on pedigree breeding value estimation. With the two older generations of cows having phenotypic information available, the last three generations have been considered for selection. Using 1,000 purely additive QTLs, a mean genomic value of 100, and a genomic standard deviation of 10, we simulated a single trait with a heritability of 0.3. In our example, the optimization problem was formulated to achieve three goals: 1) maximizing genetic gain for the trait of economic importance, 2) maintaining genetic diversity, and staying within budget while still achieving the first two objectives. For this, we create a design space for potential breeding programs by generating random inputs to a composite objective function given a certain set of conditions using stochastic simulation. The simulations script is available in the Simulation_Script.R.

2 Nadaraya-Watson kernel function estimator

Once the simulation is complete, and we have our statistic used for the objective function measure calculated, we can then use it to optimize and obtain potential maxima nonparametrically as it cannot be derived directly. We proposed using Kernel regression and the local linear estimator @Nadaraya1964@Watson1964. The Nadaraya-Watson estimate is a powerful statistical tool used to predict an unknown value from observations and uses weighted averages of the data points. The idea behind this technique is that each observation has some influence on the prediction, but not all observations are equally important when making a prediction. Therefore, this method allows us to assign weights to different observations so that those with more relevance can have more influence over the result than others that may be less relevant or irrelevant for our purposes. The optimization method and the script for plotting are available in the Visualization_Script.R. As there is a well-known bias-variance trade-off for selecting the bandwidth in high or small-density areas of search space using a Kernel method, we used a Kernel estimator to calculate the local variance and to determine the amount of weight that is given to nearby observations in the estimate for each objective function. The script for calculating the local variance is available in the Kernel_for_local_variance.R.

3 Estimation of the number of needed simulations

Due to the inherent randomness of the simulation process, the estimates obtained from a single stochastic simulation may vary significantly from one simulation to another, or when the values of the parameters in different simulations differ only slightly. Reducing variance is an important aspect of simulation-based optimization problems, as it can help to improve the accuracy and reliability of the simulation results. In our example, in the first step, a large number of simulations (60K simulations) were performed based on available computer power at arbitrary positions in the search space limited by budget constraints to get an accurate picture of how the composite function behaves. To study the effect of the number of needed simulations in the optimization process, the dataset generation procedure after smoothing assembled different sample sizes from an original dataset for our objective function by comparing the proposed optimum to the final optimum. The script for calculating the number of needed simulation is available in the Sensivitiy_Script.R. The kernel density estimator (KDE) was used for estimating the probability density function (PDF) of a random variable. It is based on the idea of using a Kernel function to smooth out the data and estimate the underlying distribution of the data. The result of this calculation gives us an estimate of how likely it is to miss the maximum by picking a new search space that does not include the optima. The script for visualizing the number of needed simulation is available in the KDE.R.

Variance reduction using Kernel regression

A sliding window approach was used for estimating the variance of the target function in our search space before and after applying the Kernel smoother. For this, the search space was divided into 10 windows and the variance of the target function was calculated within each window. This process allows us to examine the variability of the target function across the search space by dividing it into smaller, overlapping windows and calculating the variance within each window, and to quantify exactly how much variance reduction has been achieved through a Kernel smoothing method. The script for calculating the variance is available in the Sliding_window.R.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages