Citation: Gehlen, J.; Li, J.; Hourican,
C.; Tassi, S.; Mishra, P.P.; Lehtimäki, T.;
Kähönen, M.; Raitakari, O.; Bosch,
J.A.; Quax, R. Bias in O-Information
Estimation. Entropy 2024, 26, 837.
https://doi.org/10.3390/e26100837
Academic Editor: Chaoming
Song
Received: 5 August 2024
Revised: 18 September 2024
Accepted: 24 September 2024
Published: 30 September 2024
Copyright: © 2024 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
entropy
Article
Bias in O-Information Estimation
Johanna Gehlen 1, Jie Li 1, Cillian Hourican 1 , Stavroula Tassi 2,3, Pashupati P. Mishra 4,5,6, Terho Lehtimäki 4,5,6,
Mika Kähönen 5,7, Olli Raitakari 8,9,10,11, Jos A. Bosch 12 and Rick Quax 1,13,*
1 Computational Science Lab, Informatics Institute, University of Amsterdam,
1098 Amsterdam, The Netherlands; johanna.gehlen@gmail.com (J.G.); j.li4@uva.nl (J.L.);
c.j.hourican@uva.nl (C.H.)
2 Unit of Medical Technology and Intelligent Information Systems (MEDLAB), Department of Material Science
and Engineering, University of Ioannina, 45110 Ioannina, Greece; tassistav@uoi.gr
3 Department of Mechanical and Aeronautics Engineering, University of Patras, 26504 Rio, Greece
4 Department of Clinical Chemistry, Faculty of Medicine and Health Technology, Tampere University,
33720 Tampere, Finland; pashupati.mishra@tuni.fi (P.P.M.); terho.lehtimaki@tuni.fi (T.L.)
5 Finnish Cardiovascular Research Center Tampere, Faculty of Medicine and Health Technology,
Tampere University, 33720 Tampere, Finland; mika.kahonen@tuni.fi
6 Department of Clinical Chemistry, Fimlab Laboratories, 33520 Tampere, Finland
7 Department of Clinical Physiology, Tampere University Hospital, 33520 Tampere, Finland
8 Centre for Population Health Research, University of Turku and Turku University Hospital,
20520 Turku, Finland; olli.raitakari@utu.fi
9 Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku,
20520 Turku, Finland
10 Department of Clinical Physiology and Nuclear Medicine, Turku University Hospital, 20520 Turku, Finland
11 InFLAMES Research Flagship, University of Turku, 20520 Turku, Finland
12 Clinical Psychology, Faculty of Social and Behavioural Sciences, University of Amsterdam,
1018 Amsterdam, The Netherlands; j.a.bosch@uva.nl
13 Institute for Advanced Study, 1012 Amsterdam, The Netherlands
* Correspondence: r.quax@uva.nl
Abstract: Higher-order relationships are a central concept in the science of complex systems. A
popular method of attempting to estimate the higher-order relationships of synergy and redundancy
from data is through the O-information. It is an information–theoretic measure composed of Shannon
entropy terms that quantifies the balance between redundancy and synergy in a system. However,
bias is not yet taken into account in the estimation of the O-information of discrete variables. In this
paper, we explain where this bias comes from and explore it for fully synergistic, fully redundant,
and fully independent simulated systems of n = 3 variables. Specifically, we explore how the sample
size and number of bins affect the bias in the O-information estimation. The main finding is that
the O-information of independent systems is severely biased towards synergy if the sample size is
smaller than the number of jointly possible observations. This could mean that triplets identified as
highly synergistic may in fact be close to independent. A bias approximation based on the Miller–
Maddow method is derived for the O-information. We find that for systems of n = 3 variables the
bias approximation can partially correct for the bias. However, simulations of fully independent
systems are still required as null models to provide a benchmark of the bias of the O-information.
Keywords: higher-order relationships; O-information; information synergy; bias; complex systems
1. Introduction
Current research in the field of neuroscience [1–4], biology [5], social science [6], and
physics [7] confirms that considering higher-order relationships provides highly valuable
insights into the structure and dynamics of complex systems. At the same time, the
predominant practice of constructing association networks in many fields is still based on
pairwise correlations, which are known to miss most or all higher-order relationships [8].
Entropy 2024, 26, 837. https://doi.org/10.3390/e26100837 https://www.mdpi.com/journal/entropy
Entropy 2024, 26, 837 2 of 24
In particular, synergy is a measure used to quantify the higher-order relationships
within a system. In a synergistic system, variables tend to be pairwise independent, while
the system follows global constraints [4]. An example of a maximally synergistic system
is the XOR gate, where X1 and X2 are i.i.d binary variables, and Y = X1⊕ X2. This is
maximally synergistic as X1 and X2 together fully determine Y, while still being pairwise
independent of each other and of Y [8]. Therefore, information about any single variable
in the system can only be accessed by looking at the system as a whole. As variables in
synergistic systems tend to be more independent on a pairwise level, synergistic higher-
order relationships are exactly the type that are elusive in a pairwise analysis. A targeted
search for synergy is therefore necessary to uncover these higher-order relationships.
On the opposite side of synergy we have redundancy. In a redundant system, the
variables are highly correlated with each other, which can be pairwise, but also at a higher
level between sets of variables. An example of a system containing redundancy would be
a system where variables X1 and X2 are both fully determined by a third variable, X3. In
this case, both X1 and X2 would inherently carry information about each other through X3.
This introduces redundancy, as each variable contains information about the remaining
variables of the system.
There is currently no agreed-upon method of calculating synergy [9]. One popular
framework is the Partial Information Decomposition (PID) method, introduced in 2010 by
William and Beer [10], which postulates that information can be decomposed into unique,
redundant, and synergistic information atoms [11]. However, no measure that satisfies the
axioms, or similar sets of axioms, have been found [12] and the search continues [13]. Even
if found, the calculations needed for the full decomposition may in general increase super
exponentially with the size of the system [12]. Other approaches exist but have their own
drawbacks. For instance, Quax et al. [8] propose a method to directly quantify synergy by
using intermediate so-called synergistic random variables. However, the method has an
extremely high computational complexity. Furthermore, it remains to be seen whether its
concept of orthogonal decomposition actually works well for variables with a small number
of states, such as binary variables. Other approaches include localized formulations [14,15]
and based on information geometry [16,17].
In the absence of an agreed-upon synergy measure, an easy-to-compute heuristic that
is gaining popularity is the O-information measure, introduced 2019 by Rosas et al. [9]. It
quantifies the balance between synergy and redundancy in a system of variables; while
the O-information loses the ability to quantify synergy directly, and may miss synergistic
associations altogether if outweighed by redundant effects, it can nevertheless determine
whether a system is “synergy-dominated” or “redundancy-dominated”. Moreover, it
is calculated through a linear combination of Shannon entropy measures, making its
interpretation intuitive and its calculation computationally efficient. Its computation also
scales very well with the system size, easily allowing higher-order interactions of more than
three variables to be analyzed [9]. These attributes motivate the use of the O-information to
quantify the “synergy-dominance” within systems of variables.
Current research on the O-information highlights its potential to identify synergy-
and redundancy-dominated variable groups, demonstrating its broad applicability. For
example, the authors of [9,12] present proof of concept through musical score analysis. It
has also been used in the neuroscience domain, as seen in [3,4]. Dynamic O-information,
explored by [3], measures system evolution via synergistic interactions. Meanwhile, the
authors of [18] analyze higher-order fMRI data, while [19,20] extend the O-information to
the O-information rate (OIR) for studying dynamical interactions in the frequency domain,
which is further applied to post-stroke brain networks [21]. The authors of [22] investigate
O-information gradients in an Ising spin model and macro-economic data, while [12]
introduce the local O-information for granular synergy-redundancy analysis.
However, a caveat that has been unnoticed is the statistical bias that occurs when
estimating the O-information, which is introduced through the estimation of its Shannon
entropy term components. As shown in this paper, these biases are exasperated by the
Entropy 2024, 26, 837 3 of 24
estimation of the joint entropy, and are especially strong when the sample size is small
relative to the number of bins that the variables are discretized into. This can make the
interpretation of the O-information misleading, as the biases cause higher-order relation-
ships to appear more redundancy- or synergy-dominated than they are in reality. Although
one obvious way to avoid this particular type of bias is to directly compute O-information
from continuous data, for which methods indeed exist [23], this is not always possible. For
example, when some or all of the variables in the data may have already been discretized
or are discrete by nature. No mixed method for estimating O-information exists, so if
some of the data are discretized, it seems unavoidable to discretize all variables, since the
continuous method will have entirely different biases as well as potentially different order
of magnitude.
If O-information is to become a widely used measure to identify synergistic higher-
order relationships, it is crucial to be aware of the bias in its naive estimation. With this
paper, we hope to provide an intuition for the direction and magnitude of the bias in the
naive O-information estimation, and how this bias behaves across different sample sizes
N, number of discretization bins K, and various joint distributions for systems of n = 3
variables. Moreover, an O-information bias approximation based on the Miller–Maddow
entropy estimation is derived. We explore the cases in which this bias approximation does
(and does not) improve the O-information estimation for n = 3, and inspire future work
into the bias correction of the O-information. For simplicity, our analysis focuses on these
variable “triplets”, but all analyses can be extended to systems of more than n = 3 variables.
2. Theoretical Background
2.1. The O-Information
The O-information, denoted by Ω(Xn), quantifies the balance between redundancy
and synergy within a system Xn = (X1, ...,Xn) , and is calculated through a linear combina-
tion of Shannon’s entropy measures [9]:
Ω(Xn) = (n− 2)H(Xn) +
n
∑
j=1
[
H(Xj)− H(Xn−j)
]
, (1)
where H(Xj) is the entropy of variable Xj, H(Xn) is the joint entropy of the system Xn, and
H(Xn−j) is the joint entropy of the system X
n
−j = (X1, ...,Xj−1,Xj+1, ...,Xn) without variable
Xj. If the O-information Ω(Xn) > 0, this implies the system is redundancy dominated;
meanwhile, if Ω(Xn) < 0, this implies synergy domination. If the O-information is zero,
the variables may be fully independent, or the synergy and redundancy in the system
cancel each other out perfectly [9].
2.2. Properties of the O-Information
The O-information is a symmetric measure [9], and does not require the specification
of a target variable. It is used on cross-sectional data, but extensions of the O-information
to time-series data have been explored in the previous literature [22].
Tight bounds on the O-information are provided in [9]. Let Kj be the cardinality of
the discrete random variable Xj. Then, let Kmax = max{K1, ...,Kj} be the cardinality of the
variable with the largest cardinality in system Xn. The upper and lower bounds on the
O-information are then:
(2− n) logKmax ≤ Ω(Xn) ≤ (n− 2) logKmax, (2)
where the log is always calculated with the base of 2 unless specified otherwise. The
authors of [9] further provides proof that when all variables are discretized into the same
number of bins Kj = K∀j, the maximum O-information Ω(Xn) = (n − 2) logK can be
achieved by a system if—and only if—the outcome of each random variable can be inferred
by any other variable in the system. Then, the system is maximally redundant. The
Entropy 2024, 26, 837 4 of 24
minimum O-information Ω(Xn) = (2− n) logK can be achieved if—and only if—Xn−1 are
independent and uniformly distributed, and Xn = ∑n−1j=1 Xj mod (K) [9]. Then, the system
is maximally synergistic.
2.3. O-Information Estimation
When the underlying marginal and joint probability distributions are not known, like
in most empirical datasets, the O-information must be estimated from the available data.
Let the estimation of O-information and its entropy terms be denoted as:
Ωˆ(Xn) := (n− 2)Hˆ(Xn) +
n
∑
j=1
[
Hˆ(Xj)− Hˆ(Xn−j)
]
. (3)
It is well-known that a limited sample size results in a downward bias in the estima-
tion of the entropy terms H(X), H(Xn) and H(Xn−j). This can be proven using Jensen’s
Inequality, but can also be intuitively explained: with a finite sample size we observe rare
events less, or maybe not at all. With the naive entropy estimation, we then assign these
events a smaller probability than they actually have. Therefore, the probability distribution
seems less spread out, and by definition this results in a smaller entropy estimation. It is
then clear to see that these biases may subsequently introduce a bias into the estimation of
O-information (3), such that
E[Ωˆ(Xn)] = Ω(Xn)− δ, (4)
where δ is the bias in O-information estimation. The balance of the biases introduced by each
of the entropy terms in (3) determines whether O-information is under- or overestimated.
The marginal and joint distributions of the variables in the system will ultimately determine
the sign of O-information bias. If the bias is not accounted for, a sample size that is too
small may result in a system of variables Xn to appear to be more redundancy or synergy-
dominated than it is in reality.
3. Previous Literature
3.1. Entropy Bias Approximation Techniques
The bias in the naive entropy estimation has been explored extensively in the previous
literature. The most prominent methods include the Miller–Maddow bias approxima-
tion [24], the Jackknife entropy estimation [25], the Grassberger estimation [26], and several
Bayesian methods initiated by Wolpert and Wolf [27]. In this paper, we extend the Miller–
Maddow method from the entropy to O-information due to its low computational cost
in comparison to the Jackknife method, its simplicity in comparison to the Grassberger
method, and its accuracy in comparison so the Bayesian methods.
The Miller–Maddow Entropy Bias Approximation
The Miller–Maddow method [24] estimates the expected value of the naive entropy
estimations E[Hˆ(X)] by making a Taylor approximation of the function − pˆilog( pˆi) around
each bin’s respective probability pi. This is repeated for all possible outcomes of pˆi, and the
expected value over all Taylor approximations per bin i is taken. See Appendix A for the
full derivation.
A requirement of the Miller–Maddow method is that N is in the asymptotic sampling
regime, which means that the sample size is large enough such that each outcome of the
random variable is observed many times [28].
When taking the Taylor approximation to the second order, the resulting Miller–
Maddow entropy estimator is
E[Hˆ(X)]MM = H(X)− K− 12Nln(2) . (5)
Entropy 2024, 26, 837 5 of 24
Let the Miller–Maddow bias approximation term of entropy be denoted by δH ,
such that
δH :=
K− 1
2Nln(2)
. (6)
This entropy bias approximation (6) only depends on the number of bins, K, and the
number of observations, N. We see that as N → ∞, the bias approximation δH → 0. This
makes sense intuitively, as the naive plugin estimator pˆi → pi as N → ∞. Moreover, as
K → ∞, the bias approximation δH → ∞, as increasing the number of bins decreases the
mean number of possible observations per bin.
3.2. Mutual Information Bias Approximation Techniques
Beyond the bias approximations for entropy estimations, several bias approximation
methods have been explored for the estimation of other information theoretic measures as
well, such as the mutual information.
In this paper, however, we derive the O-information bias approximation from the
Miller–Maddow entropy bias approximation. As the mutual information is a composite
measure of various entropy terms, each with their own biases, it would be more difficult
to attribute the behavior of the O-information estimation bias to its individual mutual
information components. The entropy formulation is preferred over a mutual informa-
tion formulation because it is clearer to separate the bias of each entropy term in the
O-information estimation (3). Especially as we investigate the O-information estimation
bias for various joint distributions which each have distinct behaviors of H(Xn) and H(Xn−j),
it is important to keep the sources of the bias in each of these terms as clear as possible.
Moreover, it is more convenient to generalize the bias of the joint entropy of a system with
n = 3 variables to systems of any other number of variables. This makes the entropy for-
mulation of the O-information estimation more compatible when considering the extension
of this analysis to various system sizes in future work.
3.3. Permutation Tests
Further, we do not use permutation tests to quantify the bias of O-information. For
a single triplet, the permutation test involves the permutation of data for each variable,
forcing independence between the variables, and then calculating the new O-information
from the resulting joint probability distributions. This process is repeated many times per
triplet, and finally the original O-information estimation is compared to the distribution
of the O-information estimations of the permuted data. As the permutation destroys
any relationships between the three variables, the mean O-information after permutation
should, in theory, be zero. If the mean is not zero, the deviation from zero represents the
bias in the O-information estimation that persists. However, since this process is repeated
for each triplet individually, estimating the bias using permutation tests is computationally
very expensive. This is particularly true when trying to identify synergistic triplets in
a larger dataset. Therefore, we develop a bias approximation in this paper that can be
reasonably applied to the entire dataset, making the process much more convenient for
exploratory analyses.
4. Methods
4.1. Derivation of the Miller–Maddow O-Information Bias Approximation
The Miller–Maddow entropy estimation (5) applies to the entropy of a single random
variable. In order to approximate the bias in the O-information estimation, we must extend
the entropy estimations to the joint entropy terms Hˆ(Xn) and Hˆ(Xn−j).
The Taylor expansion process is the same for the joint entropy estimation Hˆ(Xn) as for
the entropy estimation Hˆ(X), just that we are now looking at the probabilities of possible
bin combinations, rather than the probabilities of individual variables’ bins. The estimated
joint entropy therefore becomes
Entropy 2024, 26, 837 6 of 24
E[Hˆ(Xn)] ≈ H(Xn)−
(
K(n) − 1
2Nln(2)
)
, (7)
where K(n) is the number of possible bin combinations the n variables in the system can
have. We see that this yields the following joint entropy bias approximation term:
δJoint :=
K(n) − 1
2Nln(2)
. (8)
Similarly, we can derive that the estimation bias of the joint entropy of system Hˆ(Xn−j)
can be approximated by:
δ
−j
Joint :=
K(−j) − 1
2N ln(2)
, (9)
where K(−j) is the number of possible bin combinations of all variables in the system except
for variable Xj.
As the estimation Ωˆ(Xn) is a random variable, we can take the expected value of (3):
E[Ωˆ(Xn)] = (n− 2)E[Hˆ(Xn)] +
n
∑
j=1
[
E[Hˆ(Xj)]−E[Hˆ(Xn−j)]
]
. (10)
Substituting the bias approximations (6), (8), and (9) and simplifying we obtain the
final O-information estimation derived from the Miller–Maddow method:
E[Ωˆ(Xn)] ≈ Ω(Xn)− 1
2N ln(2)
[
n
∑
j=1
(
Kj − K(−j)
)
+ (n− 2)(K(n) − 1)
]
. (11)
Thus, we find that by extending the Miller–Maddow entropy estimation to the joint
and conditional entropy, the O-information bias can be approximated by the bias term:
δMM :=
1
2N ln(2)
[
n
∑
j=1
(
Kj − K(−j)
)
+ (n− 2)K(n) − n+ 2
]
. (12)
4.2. Simulating the Bias in the Naive O-Information Estimation
The effect of sample size, N, and number of bins, K, on the bias of the naive O-
information estimation can be investigated in a controlled environment through simulations.
In these simulations, we construct variable triplets with joint distributions for which we
know the true Ω(Xn), and compare this to our estimation Ωˆ(Xn) that is based on solely the
available, simulated samples.
4.2.1. The Discretization Strategy
As we explore the discrete case in this paper, the sampled variables must be discretized
before estimating their O-information. In empirical data, the true underlying bin widths
are not known, and the best we can do is to construct the bins based on the available
sample we observe. In our case, we use the quantile binning strategy, where the bin width
is determined such that the number of observation in each bin stays the same. For instance,
with K = 10, the lowest 10% of observations are allocated into bin 1, the highest 10% of
observations into bin 10. This results in the marginal distributions of each variable to be
approximately uniform.
Of course, in our simulations, we do know the true underlying marginal and joint
probability distributions of the variables. However, as we want our simulations to reflect
the scenario of working with real empirical data, we construct the bins for each simulated
variable according to the observed outcomes we have sampled, rather than the true un-
Entropy 2024, 26, 837 7 of 24
derlying distribution, while this likely reduces the accuracy of the bias correction in these
simulations, it allows us to obtain an idea of the performance of the Miller–Maddow bias
correction on real, empirical data.
4.2.2. The Simulated Distributions
We construct three different systems: The first system achieves the theoretical max-
imum O-information (a fully redundant system), the second achieves a theoretical O-
information of zero (a system of fully independent variables), and the third achieves the
theoretical minimum O-information (a fully synergistic system).
These three distributions are constructed as follows:
• Redundant triplet: each variable is a copy of each other X1 = X2 = X3. Each variable
is uniformly distributed and discretized into K bins. This achieves the maximum
possible O-information of a triplet with K bins: Ω = log(K) [9].
• Independent triplet: each variable is independent and uniformly distributed, dis-
cretized into K bins. Due to the variables’ independence, the true O-information
Ω = 0 regardless of N and K.
• Synergistic triplet: X1 and X2 are independent and uniformly distributed, discretized
into K bins. X3 = X1 +X2 mod K. This achieves the minimum possible O-information,
Ω = − log(K) [9].
We set the number of bins to be equal across the three variables in the system, such
that K1 = K2 = K3 = K. This allows us to apply the same bias approximation method
across all triplets in the dataset.
It is important to realize that these three systems require a different minimum sample
size in order to theoretically be able to estimate their joint probability distributions. This
is simple to see for a system of independent variables: the number of samples needed
increases exponentially with the number of variables in the system n. In contrast, for a
system of fully redundant variables, we only need a minimum sample size of K to be able
to estimate the joint probability distribution, regardless of the number of variables in the
system. In a fully synergistic system, all variables are independent from each other, except
for the last, which is fully determined by all others. Thus, the theoretical minimum sample
size needed to estimate the joint distribution of all variables except for the last increases
exponentially (just like in the fully independent system). To estimate the joint distribution
of the entire system however, only Kn−1 samples are needed.
4.2.3. Simulation Setup
The tested number of bins ranges from K = 2 to K = 50, increasing in intervals of 1.
The tested sample sizes range from N = 500 to N = 20, 000, increasing in intervals of 500.
30 trials are carried out due to the stochasticity in the data generation, leading to 30 naive
O-information estimations per (N,K) combination per system:
Ωˆ1, ..., Ωˆ30.
Let the mean naive O-information estimation over the 30 trials be denoted as Ωˆ, and
the resulting mean bias be denoted as
δ := Ω− Ωˆ. (13)
As the data are generated artificially, we know the underlying number of possible bin
combinations K(−j) and K(n). Note that K is known by definition of the quantile binning
strategy. These values can then be substituted into the equation for the O-information bias
correction (12). Again, we obtain 30 bias-corrected O-information estimations
ΩˆBC1, ..., ΩˆBC30,
and we let the mean observed bias be denoted as
Entropy 2024, 26, 837 8 of 24
δBC := Ω− ΩˆBC. (14)
However, in real empirical data, the true K(−j) and K(n) values are usually not known,
and have to be estimated from the available data. We estimate them here by simply counting
the number of joint bin combinations that occur in the data. Let the mean observed bias
using the estimated K(−j) and K(n) values be denoted as
δBC′ := Ω− ΩˆBC′ (15)
The mean error of the naive O-information estimation can then be compared to the
mean error of the bias-corrected O-information,
ε′ := |δBC′ | − |δ|. (16)
We focus on the bias approximation using δBC′ rather than δBC, as we usually do not
know the true K(−j) and K(n) values and must estimate them from the available data.
4.3. Application to the Young Finns Study
Next, we aim to investigate how our knowledge of the bias in the O-information
estimation affects the conclusions we draw about the higher-order interactions of variable
triplets in an empirical dataset. This is explored on the empirical dataset of the Young Finns
Study (YFS), which is one the largest and long-standing observational cohort studies glob-
ally that has followed a cohort of individuals from childhood to adulthood [29]. Specifically,
we are interested whether we can discover synergistic higher-order relationships between
cardiovascular disease (CVD) and depression variables. The subset of data we use is the
2007 wave of the YFS, and contains lipidomic data, metabolomic data, depression scores,
and CVD related phenotypes. For more information on the YFS dataset, see Appendix B.
Only the participants who appear in all four datasets are kept in our final dataset, ensuring
that lipidome-, NMR-, depression-, and CVD-related phenotype data are available for all
participants. This leaves a total of N = 1684 participants. Any missing values are imputed
with the median of the feature.
The dataset contains a mix of continuous and discrete variables. Variables are dis-
cretized using the quantile K-bins method, where all variables are discretized into
K = 10 bins with an equal number of samples in each bin. Just like in our simulations,
this results in the marginal probability distributions of the variables being approximately
uniform after discretization, and they are perfectly uniform if N is a multiple of K. As the
number of bins, K, is one of the main components in the O-information estimation and
the bias approximation, δMM, it is crucial that all variables are discretized into the same
number of bins, K. The quantile binning strategy ensures that this is the case, given that
N ≥ K. An equal-width binning does not guarantee this, as empty bins can remain, even if
N ≥ K.
After a feature selection described in Appendix C, we compute the naive and bias-
corrected O-information estimations for all possible sets of variable triplets. The resulting
distribution of O-information estimations from the empirical data are then compared to the
O-information estimations of our simulations with the same N and K values.
5. Experiments and Results
5.1. Bias of the Naive O-Information Estimation in Simulations
First, we explore the effect of varying the sample size, N, and the number of bins, K,
on the mean naive O-information estimation Ωˆ and its bias δ (13). Figure 1 shows Ωˆ for all
simulated distributions, and Figure 2 shows δ for all simulated distributions. Note that the
bias of the redundant triplet is shown for a range of much smaller sample sizes, with N
only going up to N = 50. As the joint entropy of the fully redundant triplet is equal to the
marginal entropies, the O-information bias is much smaller; thus, this is the scale needed to
observe the bias for up to K = 50 bins.
Entropy 2024, 26, 837 9 of 24
Figure 2 can be read as follows: if δ > 0 (red), then Ωˆ is more synergistic than the
true Ω. If δ < 0 (blue), then Ωˆ is more redundant than Ω. In this figure, we also plot the
N = K line for the redundant triplet, the N = K2 and N = K3 lines for the independent
triplet, and the N = K2 line for the synergistic triplet. As explained in Section 4.2.2, these
boundary lines are the theoretical minimum sample sizes that are needed to observe all
possible joint bin combinations.
0 5 10 15 20 25 30 35 40 45
5
9
13
17
21
25
29
33
37
41
45
49
S
am
pl
e 
si
ze
 N
Fully Redundant
0 5 10 15 20 25 30 35 40 45
500
2000
3500
5000
6500
8000
9500
11,000
12,500
14,000
15,500
17,000
18,500
Independent
0 5 10 15 20 25 30 35 40 45
500
2000
3500
5000
6500
8000
9500
11,000
12,500
14,000
15,500
17,000
18,500
Fully Synergistic
4
2
0
2
4
Number of bins K
Figure 1. The mean naive O-information estimation Ωˆ per (N,K) combination over the 30 trials. This
analysis is performed for the fully redundant triplet (left panel), the triplet of independent variables
(middle panel), and the fully synergistic triplet (right panel).
0 5 10 15 20 25 30 35 40 45
5
9
13
17
21
25
29
33
37
41
45
49
S
am
pl
e 
si
ze
 N
Fully Redundant
N = K
N = K2
N = K3
0 5 10 15 20 25 30 35 40 45
500
2000
3500
5000
6500
8000
9500
11,000
12,500
14,000
15,500
17,000
18,500
S
am
pl
e 
si
ze
 N
Independent
N = K
N = K2
N = K3
0 5 10 15 20 25 30 35 40 45
500
2000
3500
5000
6500
8000
9500
11,000
12,500
14,000
15,500
17,000
18,500
S
am
pl
e 
si
ze
 N
Fully Synergistic
N = K
N = K2
N = K3
4
2
0
2
4
Number of bins K
Figure 2. The bias δ of the naive O-information estimation per (N,K) combination. Boundary lines
indicate the theoretically minimum sample size, N, needed to observe the number of bins, K, as well
as the joint bin combinations K−j and K(n).
The bias of the naive O-information estimation, δ, is particularly strong for the triplet of
independent random variables, being strongly biased towards synergy as seen in Figure 2.
The most severe bias occurs approximately in the space where K2 ≲ N ≲ K3. This is
demonstrated through the red shaded area between the dashed lines of N = K2 and
N = K3. Interestingly, despite the lower N/K ratio, a decreased (but still positive) bias
is observed in the region where N ≲ K2, demonstrated by the lighter shades below the
N = K2 line.
For the redundant triplet, we observe that Ωˆ is biased strongly towards synergy, being
most severe when N/K is the smallest. Around N = K, however, the bias fades to zero. For
the synergistic triplet, we observe that the N = K2 line approximately traces the boundary
above which Ωˆ is biased. The bias δ has around the same magnitude as in the redundant
Entropy 2024, 26, 837 10 of 24
and independent triplet, but an opposite direction, while the redundant and independent
triplets were biased towards synergy, the synergistic triplet is biased towards redundancy.
Looking at the standard deviations of the naive O-information estimations on Figure 3,
we observe that they are quite low for all three simulated systems, ranging up to a standard
deviation of 0.08. The lowest standard deviations are seen for the redundant triplet, which is
approximately zero for all (N,K) combinations. The standard deviations of the synergistic
and independent triplets are comparatively higher, especially for small N.
0 5 10 15 20 25 30 35 40 45
5
9
13
17
21
25
29
33
37
41
45
49
S
am
pl
e 
si
ze
 N
Fully Redundant
0 5 10 15 20 25 30 35 40 45
500
2000
3500
5000
6500
8000
9500
11,000
12,500
14,000
15,500
17,000
18,500
Independent
0 5 10 15 20 25 30 35 40 45
500
2000
3500
5000
6500
8000
9500
11,000
12,500
14,000
15,500
17,000
18,500
Fully Synergistic
0.00
0.02
0.04
0.06
0.08
S
ta
nd
ar
d 
D
ev
ia
tio
n 
of
 
Number of bins K
Figure 3. The standard deviation of the naive O-information estimations Ωˆ1, ..., Ωˆ30 per (N,K)
combination over the 30 trials, for each of the three simulated systems.
5.2. The Miller–Maddow O-Information Bias Approximation
Next, we explore the accuracy of the O-information bias approximation δMM derived
in (12) on our simulations. Recall that K(−j) and K(n) are estimated from the data by simply
counting the number of unique joint bin occurrences. We are mainly interested in the
decrease in the estimation error after the bias correction, which is given by ε′ in (16).
Figure 4 shows ε′ for all simulated (N,K) combinations. A negative value (blue)
implies that the bias-corrected O-information estimation ΩˆBC′ is more accurate than the
naive O-information estimation Ωˆ. A positive value (red) implies that the naive estimation
is more accurate. This analysis highlights the situations in which the bias approximation
δMM (12) works well, and in which situations it fails.
0 5 10 15 20 25 30 35 40 45
5
9
13
17
21
25
29
33
37
41
45
49
S
am
pl
e 
si
ze
 N
Fully Redundant
0 5 10 15 20 25 30 35 40 45
500
2000
3500
5000
6500
8000
9500
11,000
12,500
14,000
15,500
17,000
18,500
Independent
0 5 10 15 20 25 30 35 40 45
500
2000
3500
5000
6500
8000
9500
11,000
12,500
14,000
15,500
17,000
18,500
Fully Synergistic
1.0
0.5
0.0
0.5
1.0
'
Number of bins K
Figure 4. The difference in an estimation error between the naive and bias-corrected O-information
estimation (ε′). A negative value (blue) implies that ΩˆBC′ is more accurate than the naive estimation
Ωˆ. A positive value (red) implies that the naive estimation is more accurate.
To better visualize the behavior of Ωˆ, ΩˆBC′ , and the true O-information Ω, Figure 5
shows these values for the K = 10 slice and Figure 6 for the K = 30 slice of the heatmap
Entropy 2024, 26, 837 11 of 24
in Figure 4. These two K values are chosen because they exhibit quite different behaviors,
which provides an interesting contrast.
0 10 20 30 40 50
2.4
2.8
3.2
3.6
4.0
O
-in
fo
rm
at
io
n
Fully Redundant
0
50
00
10
,00
0
15
,00
0
20
,00
0
1.25
1.00
0.75
0.50
0.25
0.00
Independent
0
50
00
10
,00
0
15
,00
0
20
,00
0
3.48
3.42
3.36
3.30
3.24
3.18
Fully synergistic
BC ′
Sample size N
Figure 5. Behavior of the true O-information Ω, the mean naive O-information estimation Ωˆ, and the
bias-corrected O-information estimation ΩˆBC′ for K = 10 bins.
0 10 20 30 40 50
2.4
3.2
4.0
4.8
5.6
O
-in
fo
rm
at
io
n
Fully Redundant
0
50
00
10
,00
0
15
,00
0
20
,00
0
3.0
2.4
1.8
1.2
0.6
0.0
Independent
0
50
00
10
,00
0
15
,00
0
20
,00
0
5.0
4.5
4.0
3.5
3.0
2.5
Fully synergistic
BC ′
Sample size N
Figure 6. Behavior of the true O-information Ω, the mean naive O-information estimation Ωˆ, and the
mean bias-corrected O-information estimation ΩˆBC′ for K = 30 bins.
For the independent triplet, four separate sections can be identified in Figure 4. There
is close to no bias in the naive estimation when N ≳ K3, and the bias correction is also
approximately zero in this case. In Figure 5, this is equivalent to the section of large N
where both Ω and ΩˆBC have almost converged to the true O-information Ω = 0. When
K2 ≲ N ≲ K3, the bias correction indeed reduces the bias, as indicated by the negative ε′.
In Figure 6 this is the section in which both Ωˆ and ΩˆBC approach Ω, but ΩˆBC approaches
it faster. Then, as N decreases further, there is a small section where the naive estimation
and the bias-corrected estimation have approximately equal errors. In Figure 6, this is
equivalent to the part where the two O-information values intersect. This only happens
when the number of bins K ≳ 15, so we do not observe this intersection in Figure 5. For
small N and large K, ε′ > 0, implying that the naive estimation is more accurate than the
bias-corrected estimation. This is also reflected in Figure 6 for small sample sizes, where
the orange line, ΩˆBC′ , briefly dips below the blue line, Ωˆ.
For the redundant triplet we also observe an interesting pattern. The bias correction
seems to improve the accuracy of the O-information estimation for most N ≲ K. On
Figures 5 and 6, this is reflected in the part where both Ωˆ and ΩˆBC′ approach Ω, when
Entropy 2024, 26, 837 12 of 24
N < K, but ΩˆBC′ approaches it faster. On and around the N = K line, the naive estimation
becomes more accurate than the bias-corrected estimation. In Figures 5 and 6, this is
reflected by Ωˆ, intersecting with the true Ω, while the ΩˆBC′ overshoots the true Ω value at
N = K. Gradually, however, ΩˆBC′ approaches the true value again as N increases further.
There is also an intriguing pattern of white dots along the N = K line in the heatmap in
Figure 4. For the larger sample sizes the error fades to approximately zero.
For the synergistic triplet, the pattern is much more straightforward. The bias correc-
tion improves the accuracy of the O-information estimation wherever there is a bias in the
naive estimation to begin with, which is where N ≳ K2. If N ≲ K2, the bias approximation
is also around zero.
5.3. Application to the Young Finns Study
Applying this exploration to an empirical dataset, the O-information of all possible
triplets in the feature selected YFS dataset is estimated. Figure 7 compares results from
the empirical YFS data with the results from our simulations. For the empirical data, it
shows the distribution of the naive O-information estimations and the estimations with
the bias correction applied (12). For the simulations, dashed vertical lines at Ωˆ = −0.377
and ΩˆBC′ = −0.142 show the mean naive O-information estimation and the mean bias-
corrected O-information estimation of the simulated independent triplets with N = 1684
and K = 10. The dark shaded distributions around these dashed lines additionally show
the distribution of these O-information estimations over all 30 trials.
Figure 7. Distribution of the naive estimated O-information Ωˆ and the bias-corrected estimated
O-information ΩˆBC′ for all triplets in the feature selected dataset. The dashed lines of Ωˆ = −0.377
and ΩˆBC′ = −0.142, respectively, indicate the mean naive O-information estimation and the mean
bias-corrected O-information estimation of the simulated independent triplets. The darker shaded
distributions around the dashed lines are the distributions of the simulated O-information estimations.
Taking a closer look at the 20 most synergistic triplets in the YFS dataset, we note two
important results. Firstly, 90% of these highly synergistic triplets contain ratio variables.
For example, a ratio triplet might look like “MHDLC, MHDLTG, MHDLTGPCT”, where
MHDLC is the total cholesterol in medium HDL, MHDLTG are the triglycerides in medium
HDL, and MHDLTGPCT is the ratio of the triglycerides to total lipids ratio in medium
HDL. Note that lipids include both cholesterol and triglycerides (as well as other types
of lipids) [30]; thus, the MHDLTGPCT variable contains information about the ratio of
cholesterol to triglycerides. This provides a proof of concept of the O-information in a
technical sense, as these ratios embody the core concept of synergy: after all, a ratio can only
Entropy 2024, 26, 837 13 of 24
be computed if the two input variables are known. A table of the 20 most synergistic triplets
with indication which ones contain such a ratio variable can be found in Appendix D.
However, in a medical context, the synergy of ratio triplets is not relevant. Therefore,
we exclude such triplets from our further analysis. We additionally remove variables that
are sums or means of other variables. Triplets containing these composite variables are
likely to appear highly synergistic due to the inherent mathematical relationships between
each other. Consequently, these triplets are not medically relevant as they do not represent
actual synergistic biological pathways.
On this remaining dataset, we now explore whether the bias correction gives any
further insight on the synergistic triplets. Specifically, whether correcting for the bias
changes the order of synergistic triplets in the dataset, or if it reveals any new synergistic
triplets. The result is that the top five most synergistic triplets remain unchanged before and
after the bias correction. When looking at the overall composition of the top 50 remaining
synergistic triplets, we find that 36 of the triplets are present both before and after the
bias correction (see Appendix E). Therefore, the bias correction introduces 14 new syn-
ergistic triplets which can be analyzed. None of the most synergistic triplets contain the
depression score.
The most synergistic remaining triplet both before and after bias correction is [“UnSat-
Deg”: Estimated degree of unsaturation, “SLDLTG”: Triglycerides in small LDL; mmol/L ,
“FAw3”: Omega-3 fatty acids; mmol/L ], with Ωˆ = −0.691 and ΩˆBC = −0.592.
6. Discussion
6.1. The Naive O-Information Estimation in Simulations
The mean naive O-information estimation Ωˆ over 30 trials and their standard devia-
tions are shown in Figure 1 and Figure 3, respectively. The Ωˆ values of the fully redundant
triplet and the fully synergistic triplet behave as expected, as they seem to follow their
theoretical equations Ωmax = logK and Ωmin = − logK. The slight deviation from the the-
oretical Ωmin in the synergistic triplet is due to the bias, discussed in depth in the following
sections. The Ωˆ of the independent triplet very clearly shows the presence of a bias, as we
see Ωˆ < 0 for a large section of the heatmap, specifically where N/K is small.
The standard deviations of Ωˆ in Figure 3 also align with our expectations. As
X1 = X2 = X3 in the redundant triplet, the joint distributions are identical to the marginal
distributions. Moreover, the quantile binning strategy ensures that we have a near perfect
uniform marginal distribution in every trial. Together, this causes the standard deviation of
Ωˆ to be approximately zero. The standard deviations for the synergistic and independent
triplet are larger, as there two independent variables in the synergistic triplet, and three
independent variables in the independent triplet. However, overall the standard deviations
are still quite low.
6.2. Bias of the Naive O-Information Estimation in Simulations
Before discussing the behavior of the bias δ, we point out that the quantile binning
strategy results in the naive entropy estimation Hˆ(X) to have zero bias if N mod K = 0
and N ≥ K. Even if N mod K ̸= 0, the bias of Hˆ(X) is still very close to zero. This
happens because the bins for each variable are constructed based on the observed samples
(even though we know the ground truth quantiles of the simulated variables), resulting in
their quasi-uniform distributions. We do this to make the simulation comparable to the
procedure of binning empirical data, for which the ground truth quantiles are not known.
If a different binning strategy is used, we can also expect the entropy estimation to become
biased, thus increasing the bias of the naive O-information estimation. However, it is
not trivial to see how the bias in the entropy estimation would affect the bias of the joint
entropy estimation.
Entropy 2024, 26, 837 14 of 24
6.2.1. Bias in the Independent Triplet
A strong bias towards synergy is observed when K2 ≲ N ≲ K3. A sample size smaller
than K3 implies that not all of the jointly possible bins can be observed. Thus, by definition,
an accurate estimation of the joint probability distribution is impossible. This leads to a
severe underestimation of Hˆ(Xn). From the equation of the O-information estimation (3), it
is clear that an underestimation of Hˆ(Xn) biases the O-information downwards, making it
appear more synergistic than it is in reality. Moreover, even if the sample size is equal to or
just slightly larger than K3, it is unlikely that every unique bin combination is observed at
least once. Thus, even above the N = K3 line, some bias δ > 0 remains, which slowly fades
to zero as the ratio N/K increases.
We also see a decreased (but still positive) bias δwhen N ≲ K2. Here, the sample size is
too small to observe all possible bin combinations between two of the three variables. Thus,
it is impossible to even estimate Hˆ(Xn−j) accurately, resulting in a severe underestimation
of both Hˆ(Xn) and Hˆ(Xn−j). As seen by the opposite sign of these two terms in (3), their
underestimations cancel each other out to some extent. This results in a smaller bias δ,
despite the very small sample size, N, and large number of bins, K.
Clearly, this smaller bias when N ≲ K2 should not be taken as an improvement. In
Figure 3, we see a much higher standard deviation of O-information estimations for the
case that N ≲ K2. Hˆ(Xn) and Hˆ(Xn−j) are both still severely underestimated, and achieving
a careful balance between their two biases is not a reliable method of reducing the bias in
the naive estimation.
It is important to emphasize that a significant bias can still be observed with relatively
large sample sizes. For example, when N = 10, 000 and K = 50, the triplet of independent
variables has Ωˆ ≈ −3.11, while its theoretical O-information should be zero. To put this into
perspective, the minimum bound of the O-information for a system of n = 3 variables and
K = 50 bins is equal to Ωmin = − log(50) ≈ −5.64. The naive O-information estimation is
thus closer to the minimum O-information bound than to its true O-information of zero.
6.2.2. Bias in the Redundant Triplet
Figure 2 shows that the redundant triplet is biased towards synergy when N ≲ K. Due
to the fact that X1 = X2 = X3, we have that H(Xn) = H(Xn−j) = H(Xj). Thus, the strength
of the underestimation of all three of these entropy terms are equal in the estimation of
the O-information (3). The underestimations of Hˆ(X) and Hˆ(Xn−j) cancel each other out in
the sum in (3). Therefore, the only underestimation that remains is the underestimation of
Hˆ(Xn). If N < K, then Hˆ(Xn) by definition must be underestimated, and we see the bias
towards synergy in the naive O-information estimation. We also see that this bias decreases
to zero very quickly as soon as N increases above K. As the quantile binning eliminates
the bias in the entropy estimation Hˆ(X) as long as N ≥ K, this also eliminates the bias of
the joint entropy terms. Thus, the naive O-information estimation becomes much more
accurate as soon as N increases above K.
6.2.3. Bias in the Synergistic Triplet
For the fully synergistic triplet, the minimum sample size which is theoretically needed
to accurately estimate the O-information is N = K2. However, we see that, even when N is
larger than K2, the bias δ is not zero. Again, this occurs because it is unlikely that each of the
N = K2 samples observes a unique bin combination of the K2 total possible combinations.
The bias is towards redundancy in the synergistic triplet due to the balance of Hˆ(Xn)
and Hˆ(Xn−j) in the naive O-information estimation (3). The joint entropy Hˆ(X
n) appears
once, adding to the O-information estimation, while Hˆ(Xn−j) appears three times, subtract-
ing from the O-information estimation. In the synergistic triplet, the magnitude of the
underestimation of both Hˆ(Xn) and Hˆ(Xn−j) are the same per sample size N. Thus, the net
effect on the bias term is negative, and the effect on the resulting O-information estimation
Entropy 2024, 26, 837 15 of 24
is positive. Therefore, the balance of their underestimations make the naive O-information
estimation biased towards redundancy.
Moreover, we do not see the O-information bias fade to zero as fast as in the redundant
triplet. This occurs because the quantile binning only eliminates the bias in the entropy
estimation Hˆ(X), and unlike the redundant triplet, the joint entropy terms are not equal to
the marginal entropy terms any more.
6.3. The O-Information Bias Approximation in Simulations
In this section, the accuracy of the mean naive estimation Ωˆ is compared to the accuracy
of the bias-corrected estimation ΩˆBC′ . This is performed through the ε′ term introduced in
Equation (16). Recall that, if ε′ < 0, ΩˆBC′ is more accurate than Ωˆ, and vice versa if ε′ > 0.
The ε′ term for each (N,K) combination is shown in Figure 4. Figures 5 and 6 show the
behavior of Ωˆ and ΩˆBC′ for the slices shown in Figure 4, where K = 10 and K = 30. These
two figures are helpful to visualize many of the behaviors discussed in this section.
6.3.1. Bias Approximation of the Independent Triplet
The sections in Figure 4 of negative ε′ values for large enough N are expected. Here,
the bias correction partially corrects for the underestimation of the entropy terms in the
O-information estimation, which provides a more accurate estimation of the O-information.
We also see that ε′ is positive for small N/K ratios, implying that the naive estimation
is more accurate than the bias-corrected one. As discussed in Section 6.2, the mean bias of
the naive estimation actually decreases when N ≲ K2 due to the canceling out of underesti-
mations of various entropy terms. This effect is not captured by the bias approximation (12),
where a smaller N always results in a larger bias approximation. The bias approximation
is not able to capture this effect because we use the second-order expansion of the Taylor
approximation, instead of the third-order expansion, as given by [31]. The second-order
expansion works well for uniform distributions; however, in the independent triplet, it
is not guaranteed that the joint probability distributions H(Xn) and H(Xn−j) are uniform.
Therefore, the balance of biases in Hˆ(Xn) and Hˆ(Xn−j) terms results in this overestimation
of the bias we observe.
6.3.2. Bias Approximation of the Redundant Triplet
Recall that, in the fully redundant system, we only need to concern ourselves with
the bias in the entropy estimations Hˆ(X). In the section with a slightly larger N but where
N < K, the entropy underestimations are partially corrected for. From the O-information
estimation (3), we see that increasing Hˆ(X) has the effect of increasing the O-information
estimation as a whole to reflect the underlying redundancy.
Around the N = K line, however, the naive estimation is more accurate than the
bias-corrected estimation. This is due to the quantile binning strategy used. The quantile
binning results in almost perfectly uniform marginal distributions as long as N ≥ K.
Moreover, due to the system constraint that X1 = X2 = X3, the marginal distributions
are equal to the joint distributions. Thus, the naive estimation Ωˆ has approximately zero
bias in the fully redundant case. The bias correction, however, is derived under slightly
different assumptions than the quantile binning (discussed in more depth in Section 6.4),
and therefore tries to correct for a nonexistent bias in the entropy estimation Hˆ(X). In the
redundant triplet, this results the an overshoot of ΩˆBC′ over the true Ω.
The white dots along the N = K line in Figure 4 are the points where ΩˆBC′ have
overshot the true O-information by the same distance as Ωˆ has yet to approach the true
O-information. Therefore, their absolute errors cancel each other out, resulting in ε′ = 0.
6.3.3. Bias Approximation of the Synergistic Triplet
The bias correction works much better for the fully synergistic triplet than the indepen-
dent triplet when N ≲ K2. For the synergistic triplet, the bias approximation formula (12)
Entropy 2024, 26, 837 16 of 24
reduces to the concave parabola δMM = 12Nln(2)
[−2K2 + 3K− 1]. This parabola is centered
around approximately zero and is very wide, as the 12Nln(2) coefficient is so close to zero.
The larger the sample size N, the wider the parabola becomes, and thus a higher K is
needed in order for the bias approximation to noticeably deviate from zero. Only then
can the bias correction take effect for the synergistic triplet. This explains why the bias
approximation works particularly well for N ≲ K2, and does not have a noticeable effect
when N is larger.
6.4. A Note on the Quantile Binning Strategy and the Bias Approximation
While the experiments on the simulated data show that the bias-correction process
is far from perfect, we point out that our procedure of bin allocation is not completely in
line with the idea behind which the O-information bias correction (12) was derived. As
stated above, the bins are constructed based on the observed samples, even though we
know the ground truth quantiles of the generated variables. However, the bias-correction
process (12) assumes that the bins are created from the true underlying distribution, and
the observed samples are assigned the bins they happen to fall into.
This is illustrated by the fact that in the fully redundant triplet when N mod K = 0,
the naive O-information estimation has zero error while the bias-correction process does
not have zero error, and tries to correct for the underestimation in the entropy we would see
if the samples were assigned to the bins constructed from the true underlying distribution.
Therefore, the bias correction might have performed better in this simulated experi-
ment if we had constructed the bins from the known underlying distribution, and assigned
the observed samples to these existing bins. As this is impossible with empirical data, we
decided to keep the binning procedure of the simulation experiment comparable to the
empirical data.
6.5. When Can the Bias-Correction Process be Applied to Empirical Data?
The redundant triplet shows next to no bias for sample sizes N that are larger than
the number of bins, K, to begin with; while the bias correction is less accurate than the
naive estimation around N = K, this inaccuracy fades to zero relatively quickly as N/K
increases further. As it is generally recommended to have a sample size much larger than
the number of bins, it is unlikely that the errors of the bias correction in redundant triplets
pose any serious problems when applying this bias correction. The situation is more tricky
for triplets of independent variables. If the sample size is N ≲ K2, then the bias correction
does more harm than good. However, despite the naive estimation performing better in
this case, it is still quite biased itself. The bias correction only improves the O-information
estimation for triplets of independent variables given that N ≫ K2. However, it only
partially corrects for the bias. If the triplet is synergistic, the bias correction works very well
for N ≲ K2, substantially improving the O-information estimation. If N ≳ K2, then there
is very little bias to begin with, and the bias correction can only very slightly improve the
estimation further.
Assuming we have no prior knowledge about the joint distribution of variables, the
prudent approach is to assume independence between variables in a triplet, as these triplets
show the largest error of both the naive estimation δ and the bias-corrected estimation δBC′ .
With this assumption, the bias-correction process can be applied only if N ≫ K2, but it
is important to keep in mind that it does not correct the bias fully. If N ≳ K3, the naive
O-information estimation should quite good to begin with, and the bias correction would
simply improve it slightly more.
6.6. Application to the Young Finns Study
The simulated independent triplets have a mean O-information estimation (both
naive and bias-corrected) that aligns almost perfectly with the peak of the O-information
distribution of the empirical data. This reveals that a large majority of triplets in the YFS
Entropy 2024, 26, 837 17 of 24
dataset are independent from each other (or their synergy and redundancy cancel each
other out perfectly), rather than synergistic as their negative O-information might imply.
The result that almost all of the top synergistic triplets are ratios makes sense as well.
For triplets which are exact ratios, the pairwise constraints are much weaker than the
constraints on the system, which is completely governed by the equation X3 = X1/X2. This
perfectly aligns with the idea that a synergistic system has local independence but global
cohesion. Two of the three variables in the triplet must be known in order to derive the
third, but just knowing one does not give much information about either of the other two
variables. Even though the ratio triplets in the YFS dataset are not exact ratios due to the
discretization for example, they were still identified as highly synergistic. This provides a
proof of concept of the O-information, as it indeed assigns the most negative O-information
to the systems in empirical data which embody this concept of synergy.
After removing the triplets containing inherent mathematical relationships (ratios,
sums, and means), the fact that the bias correction does not change the order of the top
5 synergistic triplets implies that there is a level of robustness in the O-information. It
suggests that the strength of the bias does not drastically change across similarly synergistic
triplets. This allows us to make interpretations (with caution) about the relative strength of
higher-order synergistic relationships. It also allows us to single out specific variables that
frequently occur across highly synergistic triplets, with the reasonable assumption that these
variables remain relevant after correcting for the bias. However, the bias correction does
change the ordering beyond the top 5 synergistic triplets, and introduces 12 new triplets
to the 50 most synergistic ranking. Thus, in combination with the simulations, the bias
correction could provide valuable insights into synergistic triplets that may previously have
been overlooked, for instance when performing centrality analyses on this hyper-network.
The most synergistic remaining triplet both before and after bias correction is [“UnSatDeg”:
Estimated degree of unsaturation, “SLDLTG”: Triglycerides in small LDL; mmol/L, “FAw3”:
Omega-3 fatty acids; mmol/L], with Ωˆ = −0.691 and ΩˆBC = −0.592. The degree of unsatura-
tion is determined by the total number of double bonds and rings present in molecules such as
triglycerides. The number of double bonds present in a triglyceride is in turn influenced by its
fatty acid composition [32]. Omega-3 fatty acids are a type of polyunsaturated fatty acid which
can be found in triglycerides [33]. The “poly” signifies the existence of multiple double bonds,
and each of these double bonds increases the overall degree of unsaturation of the triglyceride.
The third variable SLDLTG represents the triglyceride content in small LDL particles. A potential
explanation for the synergy between these three variables is that the degree of unsaturation
cannot be determined by the amount of Omega-3 fatty acids or the amount of triglycerides in
LDLs alone. However, together they provide more information on the overall composition of
saturated and unsaturated fatty acids in LDLs, and thus the degree of unsaturation can be more
accurately estimated.
7. Conclusions
An important finding is that in triplets of independent variables the bias in the naive
O-information estimation δ can persist even with relatively large sample sizes. In contrast,
the O-information estimation of fully redundant and synergistic triplets is less biased, as
their joint probability distributions can be estimated with less samples.
For systems of n = 3 independent variables, the naive O-information estimation is
severely biased towards synergy if K2 ≲ N ≲ K3. At this sample size, especially the joint
entropy H(Xn) is severely underestimated, resulting in a strong downward bias of the naive
O-information estimation. Without taking this bias into account, higher-order relationships
may be classified as highly synergistic while in reality being close to independent. In fact,
our simulations show that, even with a larger sample size of N = 10, 000 and K = 50, the
naive O-information estimation is so biased that it is closer to the minimum bound of the
O-information than its true value of zero.
This implies that highly synergistic groups of variables identified in empirical data
using the O-information may in reality be closer to independent, if the N/K ratio is small
Entropy 2024, 26, 837 18 of 24
enough and no bias correction attempts have been made. Indeed, findings illustrated by
the Young Finns Study and our simulations suggest that a large majority of triplets in the
dataset which would naively be labeled as synergistic due to their negative O-information
are actually independent. When looking for synergistic higher-order relationships in
empirical data, it is therefore crucial to be aware of the existence of this bias. It may also
be helpful to use simulations to establish the O-information estimation value which fully
independent triplets with the same N and K values would have, in order to benchmark the
boundary between synergistic and redundant in the empirical dataset.
Future work could improve the Miller–Maddow bias-correction process derived in this
paper by exploring better ways to estimate K−j and K(n) from the data, rather than simply
counting the number of joint bin combinations we observe. Moreover, other O-information
bias correction methods should be explored. The Jackknife method may be a better alterna-
tive than the Miller–Maddow method for small sample sizes, if one is willing to take on the
additional computational cost. Other alternatives with lower computational cost should
also be explored, such as Harris’ third-order Taylor approximation or Bayesian approaches.
Moreover, future work should explore the effect of different discretization methods
on Ωˆ, δ, and ΩˆBC′ , in order to apply these results to cases where quantile binning may
not be appropriate. While it is out of scope for this paper to compare the simulation
results for various discretization methods, it is certainly important to keep in mind that
the discretization method will affect the behavior of information theoretic measures [34],
including Ωˆ, δ, and ΩˆBC′ .
Different discretization methods also open the door to exploring the estimation bias δ
when variables have marginal distributions other than uniform. For instance, in the case
of gaussian marginal distributions, the inherent relationship between the entropy and the
variance may be exploited to estimate the O-information, following the method of [18].
While this would still require the estimation of the variables’ variances and the system’s
covariance matrices, it would provide an interesting comparison to the bias correction
presented in this paper.
Lastly, the analyses carried out in this paper for systems of n = 3 could be extended
to consider systems of more than three variables. The O-information’s power lies in its
scalability to systems of higher dimensions, which should be taken advantage of, while the
derived O-information bias approximation term is flexible to include a higher number of
variables, it would be important to explore the effect of increasing the number of variables
on the approximation’s accuracy.
Author Contributions: Conceptualization, J.G., J.L. and R.Q.; methodology, J.G., J.L. and R.Q.;
software, J.G.; validation, J.G. and R.Q.; formal analysis, J.G., J.L. and R.Q.; investigation, J.G., J.L.,
C.H., S.T. and R.Q.; data curation, P.P.M., T.L., M.K. and O.R.; writing—original draft preparation, J.G.;
writing—review and editing, J.G., J.L., C.H., P.P.M., O.R. and R.Q.; visualization, J.G.; supervision, J.L.
and R.Q.; project administration, J.A.B. All authors have read and agreed to the published version of
the manuscript.
Funding: Johanna Gehlen, Jie Li, Stavroula Tassi, Jos A. Bosch, and Rick Quax were supported by the
EU project To_Aition, which has received funding from the European Union’s Horizon 2020 research
and innovation programme under grant agreement No 848146. Cillian Hourican and Rick Quax were
supported by the Netherlands Organisation for Health Research and Development (ZonMw), Open
Competition Grant 09120012010063. The Young Finns Study has been financially supported by the
Academy of Finland: grants 356405, 322098, 286284, 134309 (Eye), 126925, 121584, 124282, 129378
(Salve), 117797 (Gendi), and 141071 (Skidi); the Social Insurance Institution of Finland; Competitive
State Research Financing of the Expert Responsibility area of Kuopio, Tampere and Turku University
Hospitals (grant X51001); Juho Vainio Foundation; Paavo Nurmi Foundation; Finnish Foundation
for Cardiovascular Research; Finnish Cultural Foundation; The Sigrid Juselius Foundation; Tampere
Tuberculosis Foundation; Emil Aaltonen Foundation; Yrjö Jahnsson Foundation; Signe and Ane
Gyllenberg Foundation; Diabetes Research Foundation of Finnish Diabetes Association; EU Horizon
2020 (grant 755320 for TAXINOMISIS and grant 848146 for To Aition); European Research Council
(grant 742927 for MULTIEPIGEN project); Tampere University Hospital Supporting Foundation;
Entropy 2024, 26, 837 19 of 24
Finnish Society of Clinical Chemistry; the Cancer Foundation Finland; pBETTER4U_EU (Preventing
obesity through Biologically and bEhaviorally Tailored inTERventions for you; project number:
101080117); and the Jane and Aatos Erkko Foundation. PPM was supported by the Academy of
Finland (Grant number: 349708).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: The GitHub repository to run the simulations can be found at https://
github.com/johannagehlen/oinfo-bias (accessed on 5 July 2024) and at Zenodo with DOI 10.5281/zen-
odo.13760381. Restrictions apply to the availability of the Young Finns Study data. Data were ob-
tained from the Research Centre of Applied and Preventive Cardiovascular Medicine and Centre for
Population Health Research, University of Turku.
Conflicts of Interest: The authors declare no conflicts of interest.
Appendix A. Miller–Maddow Derivation
The idea of the Miller–Maddow entropy estimation is to make a second-order Taylor
approximation of the function − pˆilog( pˆi) per bin i around each bin’s probability pi:
E[Hˆ(X)] ≡ E
[
−
K
∑
i=1
pˆilogpˆi
]
=
K
∑
i=1
E[− pˆilogpˆi]
≈
K
∑
i=1
[
f (pi) + f ′(pi)(E[ pˆi]− pi) + f
′′(pi)
2
E[( pˆi − pi)2]
]
=
K
∑
i=1
[ f (pi) + f ′(pi)(E[ pˆi]− pi) + f
′′(pi)
2
(E[ pˆi2]− 2E[ pˆi]pi + p2i )]
(A1)
It can be shown that E[ pˆi] = pi and E[ pˆi2] = pi
(
pi N−1N +
1
N
)
. Thus, expression (A1)
simplifies to:
E[Hˆ(X)] ≈
K
∑
i=1
[
f (pi) +
f ′′(pi)
2
(
pi
(
pi
N − 1
N
+
1
N
)
− p2i
)]
= −
K
∑
i=1
pilogpi −
K
∑
i=1
[
1
2piln(2)
(
pi
(
pi
N − 1
N
+
1
N
)
− p2i
)]
The first term in this expression is the theoretical entropy H. Simplifying the second
term further, we obtain the final Miller–Maddow Entropy bias estimation:
E[Hˆ(X)] = H(X)− K− 1
2Nln(2)
(A2)
Appendix B. Young Finns Study Data
High-throughput NMR spectroscopy was used for quantification of metabolites from
serum [35]. The platform enables the simultaneous quantification of standard clinical
lipids, lipoprotein subclasses and individual lipids (triglycerides, phospholipids, free and
esterified cholesterol) transported by these particles, multiple fatty acids, glucose, various
glycolysis precursors, ketone bodies, and amino acids in absolute concentration units in a
single experimental setup.
Lipidome quantification for the stored plasma samples was performed at Zora Bio-
sciences Oy (Espoo, Finland) using liquid chromatography–tandem mass spectrometry
(LC-MS/MS). Lipid extraction was based on a previously described method [36]. In brief,
Entropy 2024, 26, 837 20 of 24
10 µL of 10 mM 2,6-di-tert-butyl-4-methylphenol (BHT) in methanol was added to 10 µL of
the sample, followed by 20 µL of internal standards (Avanti Polar Lipids Inc., Alabaster,
AL, USA) and 300 µL of chloroform–methanol (2:1, v:v) (Sigma-Aldrich GmbH, Steinheim,
Germany). The samples were mixed and sonicated in a water bath for 10 min, followed
by a 40 min incubation and centrifugation (15 min at 5700× g). The upper phase was
transferred and evaporated under nitrogen. Extracted lipids were resuspended in 100 µL of
water-saturated butanol and sonicated in a water bath for 5 min. Then, 100 µL of methanol
was added to the samples before the extracts were centrifuged for 5 min at 3500× g, and
finally the supernatants were transferred to the analysis plate for mass spectrometric (MS)
analysis. The MS analyses have also been described in detail previously [37]. The anal-
yses were performed on a hybrid triple quadrupole/linear ion trap mass spectrometer
(QTRAP 5500, AB Sciex, Concord, ON, Canada) equipped with ultra-high-performance
liquid chromatography (UHPLC) (Nexera-X2, Shimadzu, Kyoto, Japan). Chromatographic
separation of the lipidomic screening platform was performed on an Acquity BEH C18,
2.1 × 50 mm id. 1.7 µm column (Waters Corporation, Milford, MA, USA). The data were
collected using a scheduled multiple reaction monitoring algorithm and processed using
Analyst and MultiQuant 3.0 software (AB Sciex). The heights of the peaks obtained from
the MS analysis were normalized with the internal standard of the lipid classes.
Appendix C. Feature Selection on the Young Finns Study Dataset
Due to the high computational cost of calculating the O-information for every possible
triplet in the YFS dataset, a the following feature selection methods are implemented:
• First, dropping constant and quasi-constant features. Quasi-constant features were
defined as features which have the same value in 95% of the available observations;
681 features of the total 690 features remain.
• Then, identifying blocks of highly correlated features (absolute Pearson’s R value
above 0.85), and keeping only the feature per block which has the highest absolute
correlation with “imtka”, the cartiod IMT average, and “ beckpisteet”, a depression
score ranking depression from a score of 0 for no depression to 63 for the most
severe depression is derived. The carotid IMT average is chosen as a representative
value of cardiovascular health, which has been proposed by experts of the field [38];
374 features of the 690 total features remain.
This feature selection filters the YFS dataset based on relevance to the morbidities
associated with cardiovascular disease and depression, as these are the two health disorders
represented in the YFS dataset. After feature selection, 374 features of the 690 total features
remain. Then, slightly above 8.6 million possible triplets remain in the dataset. The
brute force approach to calculate the O-information for each possible remaining triplet is
implemented in Julia, due to its speed advantages in comparison with Python.
Appendix D. Top 20 Synergistic Triplets before Bias Correction; Which Ones Are Ratios?
Table A1. Top 20 most synergistic triplets before bias correction. The ratio triplets are highlighted
in gray.
Ωˆ Variable 1 Variable 2 Variable 3
−1.462 MHDLC MHDLTGPCT MHDLTG
−1.331 LHDLTG LHDLTGPCT LHDLCE
−1.276 LHDLTG LHDLTGPCT LHDLPL
−1.061 SLDLTG MLDLPL SLDLTGPCT
−1.022 LVLDLCEPCT LVLDLFCPCT LVLDLCPCT
−0.978 XLHDLTG XLHDLTGPCT HDLD
Entropy 2024, 26, 837 21 of 24
Table A1. Cont.
Ωˆ Variable 1 Variable 2 Variable 3
−0.887 SHDLP HDLD LHDLPL
−0.881 SLDLTG LLDLCEPCT SLDLTGPCT
−0.867 SHDLP SHDLPLPCT SHDLCE
−0.864 TotFA FAw3 DHAFA
−0.830 LHDLTG LHDLTGPCT HDLD
−0.829 MLDLPL MHDLFCPCT ApoBApoA1
−0.814 RemnantC SLDLTG IDLCEPCT
−0.814 LLDLCEPCT SLDLTGPCT MLDLFCPCT
−0.8 MHDLC SHDLPLPCT SHDLCE
−0.794 IDLFCPCT MLDLFCPCT VLDLTG
−0.789 MHDLC SHDLP HDLD
−0.784 LLDLCEPCT MLDLFCPCT IDLCEPCT
−0.773 SHDLFCPCT TotFA MLDLFCPCT
−0.770 IDLPLPCT MLDLPL SHDLCE
Table A2. Top 10 synergistic triplets after removing ratio, mean, and sum triplets and before bias
correction.
Ωˆ Variable 1 Variable 2 Variable 3
−0.691 UnSatDeg SLDLTG FAw3
−0.638 UnSatDeg TAG.16.0.18.0.18.1...1 FAw3
−0.616 PC.34.5 LPC.20.5_sn1 PC.32.2
−0.595 PC.34.5 LPC.20.5_sn2 PC.32.2
−0.588 UnSatDeg VLDLTG FAw3
−0.546 LPC.18.2_sn2 LPC.20.3_sn1 PC.38.3..1
−0.536 UnSatDeg FAw3 PE.40.4
−0.534 UnSatDeg FAw3 XLVLDLTG
−0.531 LPC.20.5_sn1 LPC.18.2_sn2 PC.36.5b..1
−0.529 PG.36.1 UnSatDeg FAw3
Table A3. Top 10 synergistic triplets after removing ratio, mean, and sum triplets and after bias
correction.
ΩˆBC Variable 1 Variable 2 Variable 3
−0.592 UnSatDeg SLDLTG FAw3
−0.527 UnSatDeg TAG.16.0.18.0.18.1...1 FAw3
−0.524 PC.34.5 LPC.20.5_sn1 PC.32.2
−0.503 PC.34.5 LPC.20.5_sn2 PC.32.2
−0.463 UnSatDeg VLDLTG FAw3
−0.443 LPC.18.2_sn2 LPC.20.5_sn2 PC.36.5b..1
−0.441 LPC.20.5_sn1 LPC.18.2_sn2 PC.36.5b..1
−0.422 UnSatDeg FAw3 XLVLDLTG
−0.420 LPC.18.2_sn2 LPC.20.3_sn1 PC.38.3..1
−0.403 LPC.18.2_sn2 LPC.22.4_sn1 PC.40.4
Entropy 2024, 26, 837 22 of 24
Appendix E. Top 20 Synergistic Triplets after Bias Correction, with Indication Which
Triplets Are Not Present before Bias Correction
Table A4. Top 20 synergistic triplets after removing ratio, mean, and sum triplets and after bias
correction. The newly introduced triplets in the top 20 rank after the bias correction are highlighted
in green.
ΩˆBC Variable 1 Variable 2 Variable 3
−0.592 UnSatDeg SLDLTG FAw3
−0.527 UnSatDeg TAG.16.0.18.0.18.1...1 FAw3
−0.524 PC.34.5 LPC.20.5_sn1 PC.32.2
−0.503 PC.34.5 LPC.20.5_sn2 PC.32.2
−0.463 UnSatDeg VLDLTG FAw3
−0.443 LPC.18.2_sn2 LPC.20.5_sn2 PC.36.5b..1
−0.441 LPC.20.5_sn1 LPC.18.2_sn2 PC.36.5b..1
−0.422 UnSatDeg FAw3 XLVLDLTG
−0.420 LPC.18.2_sn2 LPC.20.3_sn1 PC.38.3..1
−0.403 LPC.18.2_sn2 LPC.22.4_sn1 PC.40.4
−0.403 UnSatDeg FAw3 PE.40.4
−0.399 XXLVLDLTG MLDLPL SHDLCE
−0.391 PG.36.1 UnSatDeg FAw3
−0.389 UnSatDeg FAw3 DAG.14.0.18.1.
−0.386 UnSatDeg XXLVLDLTG FAw3
−0.385 UnSatDeg PE.36.1 FAw3
−0.382 PC.34.5 PC.32.2 PC.36.5b..1
−0.375 LacCer.d18.2.16.0. PC.36.6 PC.38.7
−0.373 LPE.18.1_sn2 PE.40.6 PE.36.1
−0.365 UnSatDeg TAG.17.0.18.1.18.1. FAw3
Appendix F. Variable Descriptions
The variable descriptions provided by the Young Finn Study are the following:
• XLVLDLTG: Triglycerides in very large VLDL; mmol/L.
• MLDLPL: Phospholipids in medium LDL; mmol/L.
• SLDLTG: Triglycerides in small LDL; mmol/L.
• XLHDLTG: Triglycerides in very large HDL; mmol/L.
• LHDLPL: Phospholipids in large HDL; mmol/L.
• LHDLCE: Cholesterol esters in large HDL; mmol/L.
• LHDLTG: Triglycerides in large HDL; mmol/L.
• MHDLC: Total cholesterol in medium HDL; mmol/L.
• MHDLTG: Triglycerides in medium HDL; mmol/L.
• SHDLP: Concentration of small HDL particles; mol/L.
• SHDLCE: Cholesterol esters in small HDL; mmol/L.
• LVLDLCPCT: Total cholesterol to total lipids ratio in large VLDL.
• LVLDLCEPCT: Cholesterol esters to total lipids ratio in large VLDL.
• LVLDLFCPCT: Free cholesterol to total lipids ratio in large VLDL.
• IDLPLPCT: Phospholipids to total lipds ratio in IDL.
• IDLCEPCT: Cholesterol esters to total lipids ratio in IDL.
• IDLFCPCT: Free cholesterol to total lipids ratio in IDL.
• LLDLCEPCT: Cholesterol esters to total lipids ratio in large LDL.
• MLDLFCPCT: Free cholesterol to total lipids ratio in medium LDL.
• SLDLTGPCT: Triglycerides to total lipids ratio in small LDL.
• XLHDLTGPCT: Triglycerides to total lipids ratio in very large HDL.
• LHDLTGPCT: Triglycerides to total lipids ratio in large HDL.
Entropy 2024, 26, 837 23 of 24
• MHDLFCPCT: Free cholesterol to total lipids ratio in medium HDL.
• MHDLTGPCT: Triglycerides to total lipids ratio in medium HDL.
• SHDLPLPCT: Phospholipids to total lipds ratio in small HDL.
• SHDLFCPCT: Free cholesterol to total lipids ratio in small HDL.
• HDLD: Mean diameter for HDL particles; nm.
• RemnantC: Remnant cholesterol (non-HDL07, non-LDL -cholesterol); mmol/L.
• VLDLTG: Triglycerides in VLDL; mmol/L.
• ApoBApoA1: Ratio of apolipoprotein B to apolipoprotein A-I.
• TotFA: Total fatty acids; mmol/L.
• UnSatDeg: Estimated degree of unsaturation.
• FAw3: Omega-3 fatty acids; mmol/L.
• DHAFA: Ratio of 22:6 docosahexaenoic acid to total fatty acids.
• LPC.18.2_sn2: Lysophosphatidylcholine.
• LPC.20.3_sn1: Lysophosphatidylcholine.
• LPC.20.5_sn1: Lysophosphatidylcholine.
• LPC.20.5_sn2: Lysophosphatidylcholine.
• PC.38.3..1: Phosphatidylcholine.
• PC.36.5b..1: Phosphatidylcholine.
• PC.32.2: Phosphatidylcholine.
• PC.34.5: Phosphatidylcholine.
• TAG.16.0.18.0.18.1...1: Triacylglycerol.
References
1. Yu, S. A Simple Extended-Cavity Diode Laser. J. Neurosci. 2011, 69, 1236–1239.
2. Giusti, C.; Ghrist, R.; Bassett, D.S. Two’s company, three (or more) is a simplex: Algebraic-topological tools for understanding
higher-order structure in neural data. J. Comput. Neurosci. 2016, 41, 1–14. [CrossRef] [PubMed]
3. Stramaglia, S.; Scagliarini, T.; Daniels, B.C.; Marinazzo, D. Quantifying Dynamical High-Order Interdependencies From the
O-Information: An Application to Neural Spiking Dynamics. Front. Physiol. 2021, 11, 595736. [CrossRef] [PubMed]
4. Gatica, M.; Cofré, R.; Mediano, P.A.; Rosas, F.E.; Orio, P.; Diez, I.; Swinnen, S.P.; Cortes, J.M. High-Order Interdependencies in the
Aging Brain. Brain Connect. 2021, 11, 734–744. [CrossRef] [PubMed]
5. Sanchez-Gorostiaga A, B.D. High-order interactions distort the functional landscape of microbial consortia. PLoS Biol. 2019,
17, e3000550. [CrossRef]
6. Wasserman, S.; Faust, K. Social Network Analysis: Methods and Applications; Cambridge University Press: Cambridge, UK, 1994.
7. Lucas, M.; Cencetti, G.; Battiston, F. Multiorder Laplacian for synchronization in higher-order networks. Phys. Rev. Res. 2020,
2, 033410. [CrossRef]
8. Quax, R.; Har-Shemesh, O.; Sloot, P.M.A. Quantifying Synergistic Information Using Intermediate Stochastic Variables. Entropy
2017, 19, 85. [CrossRef]
9. Rosas, F.E.; Mediano, P.A.M.; Gastpar, M.; Jensen, H.J. Quantifying high-order interdependencies via multivariate extensions of
the mutual information. Phys. Rev. E 2019, 100, 032305. [CrossRef]
10. Williams, P.L.; Beer, R.D. Nonnegative Decomposition of Multivariate Information. arXiv 2010, arXiv:1004.2515.
11. Kunert-Graf, J.; Sakhanenko, N.; Galas, D. Partial Information Decomposition and the Information Delta: A Geometric Unification
Disentangling Non-Pairwise Information. Entropy 2020, 22, 1333. [CrossRef]
12. Scagliarini, T.; Marinazzo, D.; Guo, Y.; Stramaglia, S.; Rosas, F.E. Quantifying high-order interdependencies on individual patterns
via the local O-information: Theory and applications to music analysis. Phys. Rev. Res. 2022, 4, 013184. [CrossRef]
13. Kolchinsky, A. A Novel Approach to the Partial Information Decomposition. Entropy 2022, 24, 403. [CrossRef] [PubMed]
14. Finn, C.; Lizier, J.T. Generalised Measures of Multivariate Information Content. Entropy 2020, 22, 216. [CrossRef] [PubMed]
15. Finn, C.; Lizier, J.T. Pointwise Partial Information Decomposition Using the Specificity and Ambiguity Lattices. Entropy 2018,
20, 297. [CrossRef] [PubMed]
16. Ay, N. Information Geometry on Complexity and Stochastic Interaction. Entropy 2015, 17, 2432–2458. [CrossRef]
17. Niu, X.; Quinn, C.J. A Measure of Synergy, Redundancy, and Unique Information using Information Geometry. In Proceedings of
the 2019 IEEE International Symposium on Information Theory (ISIT), Paris, France, 7–12 July 2019; pp. 3127–3131. [CrossRef]
18. Sparacino, L.; Faes, L.; Mijatovic´, G.; Parla, G.; Lo Re, V.; Miraglia, R.; de Ville de Goyet, J.; Sparacia, G. Statistical Approaches
to Identify Pairwise and High-Order Brain Functional Connectivity Signatures on a Single-Subject Basis. Life 2023, 13, 2075.
[CrossRef]
19. Antonacci, Y.; Minati, L.; Nuzzi, D.; Mijatovic, G.; Pernice, R.; Marinazzo, D.; Stramaglia, S.; Faes, L. Measuring high-order
interactions in rhythmic processes through multivariate spectral information decomposition. IEEE Access 2021, 9, 149486–149505.
[CrossRef]
Entropy 2024, 26, 837 24 of 24
20. Faes, L.; Mijatovic, G.; Antonacci, Y.; Pernice, R.; Barà, C.; Sparacino, L.; Sammartino, M.; Porta, A.; Marinazzo, D.; Stramaglia, S.
A new framework for the time- and frequency-domain assessment of high-order interactions in networks of random processes.
IEEE Trans. Signal Process. 2022, 70, 5766–5777. [CrossRef]
21. Pirovano, I.; Antonacci, Y.; Mastropietro, A.; Bara, C.; Sparacino, L.; Guanziroli, E.; Molteni, F.; Tettamanti, M.; Faes, L.; Rizzo, G.
Rehabilitation Modulates High-Order Interactions Among Large-Scale Brain Networks in Subacute Stroke. IEEE Trans. Neural
Syst. Rehabil. Eng. 2023, 31, 4549–4560. [CrossRef]
22. Scagliarini, T.; Nuzzi, D.; Antonacci, Y.; Faes, L.; Rosas, F.E.; Marinazzo, D.; Stramaglia, S. Gradients of O-information: Low-order
descriptors of high-order dependencies. arXiv 2022, arXiv:2207.03581. [CrossRef]
23. Beirlant, J.; Dudewicz, E.; Gyor, L.; Meulen, E. Nonparametric Entropy Estimation: An Overview. Int. J. Math. Stat. Sci. 1997, 6,
17–39.
24. Miller, G. Note on the Bias of Information Estimates. In Information Theory in Psychology. Problems and Methods; Quastler, H., Ed.;
Free Press: Glencoe, IN, USA, 1955; 95p. [CrossRef]
25. Zahl, S. Jackknifing An Index of Diversity. Ecology 1977, 58, 907–913. [CrossRef]
26. Grassberger, P. Entropy Estimates from Insufficient Samplings. arXiv 2008, arXiv:physics/0307138
27. Wolpert, D.H.; Wolf, D.R. Estimating Functions of Probability Distributions from a Finite Set of Samples, Part 1: Bayes Estimators
and the Shannon Entropy. arXiv 1994, arXiv:comp-gas/9403001
28. Panzeri, S.; Senatore, R.; Montemurro, M.A.; Petersen, R.S. Correcting for the Sampling Bias Problem in Spike Train Information
Measures. J. Neurophysiol. 2007, 98, 1064–1072. [CrossRef] [PubMed]
29. Raitakari, O.T.; Juonala, M.; Rönnemaa, T.; Keltikangas-Järvinen, L.; Räsänen, L.; Pietikäinen, M.; Hutri-Kähönen, N.; Taittonen,
L.; Jokinen, E.; Marniemi, J.; et al. Cohort profile: The Cardiovascular Risk in Young Finns Study. Int. J. Epidemiol. 2008, 37,
1220–1226. [CrossRef]
30. Patient Education: High Cholesterol and Lipids (Beyond the Basics). Available online: https://www.uptodate.com/contents/
high-cholesterol-and-lipids-beyond-the-basics/ (accessed on 3 April 2024).
31. Harris, B. Colloquia Mathematica Societatis János Bolyai; North-Holland, János Bolyai Mathematical Society, Elsevier Science
Publishing Company Inc.: Amsterdam, The Netherlands, 1975.
32. Morsch, L.; Farmer, S.; Cunningham, K.; Sharrett, Z.; Shea, K.M. 7.3: Calculating Degree of Unsaturation. In Organic Chemistry;
John Wiley & Sons: Hoboken, NJ, USA, 2015; Chapter 7.
33. Oluk, C.A.; Karaca, O.B. 18-Functional food ingredients and nutraceuticals, milk proteins as nutraceuticals nanoScience and food
industry. In Nutraceuticals; Grumezescu, A.M., Ed.; Nanotechnology in the Agri-Food Industry, Academic Press: Cambridge, MA,
USA, 2016; pp. 715–759.
34. Barà, C.; Pernice, R.; Catania, C.A.; Hilal, M.; Porta, A.; Humeau-Heurtier, A.; Faes, L. Comparison of entropy rate measures for
the evaluation of time series complexity: Simulations and application to heart rate and respiratory variability. Biocybern. Biomed.
Eng. 2024, 44, 380–392. [CrossRef]
35. Soininen, P.; Kangas, A.J.; Würtz, P.; Tukiainen, T.; Tynkkynen, T.; Laatikainen, R.; Järvelin, M.R.; Kähönen, M.; Lehtimäki, T.;
Viikari, J.; et al. High-throughput serum NMR metabonomics for cost-effective holistic studies on systemic metabolism. Analyst
2009, 134, 1781–1785. [CrossRef]
36. Wong, G.; Barlow, C.K.; Weir, J.M.; Jowett, J.B.; Magliano, D.J.; Zimmet, P.; Shaw, J.; Meikle, P.J. Inclusion of plasma lipid species
improves classification of individuals at risk of type 2 diabetes. PLoS ONE 2013, 8, e76577. [CrossRef]
37. Braicu, E.I.; Darb-Esfahani, S.; Schmitt, W.D.; Koistinen, K.M.; Heiskanen, L.; Pöhö, P.; Budczies, J.; Kuhberg, M.; Dietel, M.;
Frezza, C.; et al. High-grade ovarian serous carcinoma patients exhibit profound alterations in lipid metabolism. Oncotarget 2017,
8, 102912. [CrossRef]
38. Ebrahim, S.; Papacosta, O.; Whincup, P.; Wannamethee, G.; Walker, M.; Nicolaides, A.N.; Dhanjil, S.; Griffin, M.; Belcaro, G.;
Rumley, A.; et al. Carotid Plaque, Intima Media Thickness, Cardiovascular Risk Factors, and Prevalent Cardiovascular Disease in
Men and Women. Stroke 1999, 30, 841–850. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.