Analogously, for markers with three different variants, we have to count the number of zeros in the marker vectors M _{i,•}?M _{l,•} (For the relation of Eqs. (11) and (8), see the derivation of Eq. (8) in Additional file 2).

The categorical epistasis (CE) model The i,l-th entry of the corresponding relationship matrix C _{E} is given by the inner product of the genotypes i, l in the coding of the categorical epistasis model. Thus, the matrix counts the number of pairs which are in identical configuration and we can express the entry C _{E} _{we,l} in terms of C _{i,l} since we can calculate the number of identical pairs from the number of identical loci:

Here, we also count the “pair” of a locus with itself by allowing k ? i,l >. Excluding these effects from the matrix would mean, the maximum of k equals C _{we,l} ?1. In matrix notation Eq. (12) can be written as

Additionally to the previously discussed EGBLUP model, a common approach to incorporate “non-linearities” is based on Reproducing Kernel Hilbert Space regression [21, 31] by modeling the covariance matrix as a function of a certain distance between the genotypes. The most prominent variant for genomic prediction is the Gaussian kernel. Here, the covariance C o v _{i,l} of two individuals is described by

with d _{i,l} being the squared Euclidean distance of the genotype vectors of individuals i and l, and b a bandwidth parameter that has to be chosen. This approach is independent of translations of the coding, since the Euclidean distance remains unchanged if both genotypes are translated. Moreover, this approach is also invariant with respect to a scaling factor, if the bandwidth parameter is adapted accordingly (in this context see also [ 32 ]). Thus, EGBLUP and the Gaussian kernel RKHS approach capture both "non-linearities" but they behave differently if the coding is translated.

