This post is the reading notes about this paper. This paper proposes a new LSTM structure that can be interpreted with the help of mixture attention, which includes both variable importance and temporal importance.
The hidden state update function is constructed in a similar manner as the regular LSTM
The authors proposed two sets of approaches
The difference between this two approaches is that the equation set 1 first transfers the matrices to vectors, then restore back to matrices, however, for equation set 2, the authors extend the regular LSTM with tensor operations, and operate on matrices directly. The goal of both approaches are the same -- keep the variables independent during propagation.
The notations are defined by
The loss function is defined by
The above Lemma 3.3 ensures that during the EM algorithm, the above loss function upper-bounds the negative log-likelihood.
Therefore, minimizing Eq. (9) enables to simultaneously learn the network parameters and importance vectors without the need of post processing on trained networks.
where
And the temporal importance vector can also be derived
where
where $\mu_n$ is the element of $I$
[1]Guo, Tian, Tao Lin, and Nino Antulov-Fantulin. "Exploring Interpretable LSTM Neural Networks over MultiVariable Data." International Conference on Machine Learning (ICML), 2019.
Background
RNNs trained over multi-variable data capture nonlinear correlation of historical values of target and exogenous variables to the future target values. However, current RNNs fall short of interpretability for multi-variable data due to their opaque hidden states. Existing works aiming to enhance the interpretability of recurrent neural networks rarely touch the internal structure of RNNs to overcome the opacity of hidden states on multivariable data. This paper tries to achieve a unified framework of accurate forecasting and importance interpretation.Proposed Model
This model basically does two things- first explores the internal structure of LSTM to enable hidden states to encode individual variables,
- then, mixture attention is designed to summarize these variable-wise hidden states for predicting.
The IMV-LSTM Structure
The idea of IMV-LSTM is to make use of hidden state matrix and to develop associated update scheme such that each element (e.g. row) of the hidden matrix encapsulates information exclusively from a certain variable of the input.The hidden state update function is constructed in a similar manner as the regular LSTM
The authors proposed two sets of approaches
The difference between this two approaches is that the equation set 1 first transfers the matrices to vectors, then restore back to matrices, however, for equation set 2, the authors extend the regular LSTM with tensor operations, and operate on matrices directly. The goal of both approaches are the same -- keep the variables independent during propagation.
Mixture Attention
Mixture attention is used to enable interpretability of the IMV-LSTM model. the mixture attention is formulated asThe notations are defined by
The above Lemma 3.3 ensures that during the EM algorithm, the above loss function upper-bounds the negative log-likelihood.
Therefore, minimizing Eq. (9) enables to simultaneously learn the network parameters and importance vectors without the need of post processing on trained networks.
Interpretation
After training, a simple closed-form solution of the variable importance vector $I$ can be derivedAnd the temporal importance vector can also be derived
where
Prediction
in the predicting phase, the prediction of $y_{T+1} is obtained by the weighted sum of means as:[1]Guo, Tian, Tao Lin, and Nino Antulov-Fantulin. "Exploring Interpretable LSTM Neural Networks over MultiVariable Data." International Conference on Machine Learning (ICML), 2019.
Comments
Post a Comment