Transfer entropy is a non-parametric statistic measuring the amount of directed (time-asymmetric) transfer of information between two random processes.[1][2][3] Transfer entropy from a process X to another process Y is the amount of uncertainty reduced in future values of Y by knowing the past values of X given past values of Y. More specifically, if and for denote two random processes and the amount of information is measured using Shannon's entropy, the transfer entropy can be written as:
where H(X) is Shannon's entropy of X. The above definition of transfer entropy has been extended by other types of entropy measures such as Rényi entropy.[3][4]
Transfer entropy reduces to Granger causality for vector auto-regressive processes.[7] Hence, it is advantageous when the model assumption of Granger causality doesn't hold, for example, analysis of non-linear signals.[8][9] However, it usually requires more samples for accurate estimation.[10]
The probabilities in the entropy formula can be estimated using different approaches (binning, nearest neighbors) or, in order to reduce complexity, using a non-uniform embedding.[11]
While it was originally defined for bivariate analysis, transfer entropy has been extended to multivariate forms, either conditioning on other potential source variables[12] or considering transfer from a collection of sources,[13] although these forms require more samples again.
^ abVer Steeg, Greg; Galstyan, Aram (2012). "Information transfer in social media". Proceedings of the 21st international conference on World Wide Web (WWW '12). ACM. pp. 509–518. arXiv:1110.2724. Bibcode:2011arXiv1110.2724V.
^Massey, James (1990). "Causality, Feedback And Directed Information" (ISITA). CiteSeerX10.1.1.36.5688. {{cite journal}}: Cite journal requires |journal= (help)
^Permuter, Haim Henry; Weissman, Tsachy; Goldsmith, Andrea J. (February 2009). "Finite State Channels With Time-Invariant Deterministic Feedback". IEEE Transactions on Information Theory. 55 (2): 644–662. arXiv:cs/0608070. doi:10.1109/TIT.2008.2009849. S2CID13178.
^Kramer, G. (January 2003). "Capacity results for the discrete memoryless network". IEEE Transactions on Information Theory. 49 (1): 4–21. doi:10.1109/TIT.2002.806135.
^Permuter, Haim H.; Kim, Young-Han; Weissman, Tsachy (June 2011). "Interpretations of Directed Information in Portfolio Theory, Data Compression, and Hypothesis Testing". IEEE Transactions on Information Theory. 57 (6): 3248–3259. arXiv:0912.4872. doi:10.1109/TIT.2011.2136270. S2CID11722596.