Skip to main content

Posts

Showing posts from 2019

Exploring Interpretable LSTM Neural Networks over MultiVariable Data

This post is the reading notes about this paper . This paper proposes a new LSTM structure that can be interpreted with the help of mixture attention, which includes both variable importance and temporal importance. Background RNNs trained over multi-variable data capture nonlinear correlation of historical values of target and exogenous variables to the future target values. However, current RNNs fall short of interpretability for multi-variable data due to their opaque hidden states. Existing works aiming to enhance the interpretability of recurrent neural networks rarely touch the internal structure of RNNs to overcome the opacity of hidden states on multivariable data. This paper tries to achieve a unified framework of accurate forecasting and importance interpretation. Proposed Model This model basically does two things first explores the internal structure of LSTM to enable hidden states to encode individual variables, then, mixture attention is designed to summarize

Reading notes: Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs

This paper regarding interpretable AI introduces Contextual Decomposition (CD) which can extract complex interactions from LSTM networks. By decomposing the output of a LSTM, CD captures the contributions of combinations of words or variables to the final prediction of an LSTM. Recall that the LSTM has the form of Contextual Decomposition  Given an arbitrary phrase x_q,...,x_r where 1≤𝑞≤𝑟≤𝑇, assume the output and cell state vectors c_t and h_t can be written as a sum of two contributions where \beta_t corresponds to contributions made solely by the given phrase to h_t, and that \gamma_t corresponds to contributions involving, at least in part, elements outside of the phrase. Assuming that  the linearized version of a non-linear function 𝜎 can be written as 𝐿_𝜎, then 𝜎 can be linearly decomposed in the following way For example, for the input gate i_t, we have recall that By linearization and decomposition relations (8) and (9) we have Note that the fir