NC unit secretary
2020-10-09 23:15:44

This is a discussion channel for “How Do Neural Systems Learn to Infer?” by Prof. Maneesh Sahani (Gatsby Computational Neuroscience Unit).  A link to the talk is the following. Please do not share the link to anyone outside of this slack workspace. Access Pass code can be found at the announcement channel.   URL:  https://vimeo.com/471277483 (37 minutes)

👀 Ryota Kanai
David Rotermund
2020-10-10 03:22:01

Wow, your talk is mirroring our experience, we had with our "Spike-by-Spike" network architecture. But instead of following a theoretical deduction like you did, our network forced us to do it that way. :-)

We came from a totally different direction (a spiking inference population which is based on a non-negative generative model and optimality criteria https://doi.org/10.1162/neco.2007.19.5.1313) which gave us something that we call "inference populations" (IPs). Then we tried to build larger networks out of these IPs (plus communication is only allowed via spikes) but we failed for a long time.

Only after we allowed it to be a pool of independent IPs, like you described on the slide "Changing cartoons", finally the whole deep network started to work. Now we have a deep neuronal network that works forward (https://doi.org/10.3389/fncom.2019.00055) as well as bi-directional (https://doi.org/10.1101/613471). Since we wrote these papers, we improved our learning rules (local learning as well as backprop learning works) in many ways but every time we tried to touch the pool of independent IPs, the network always pushed back hard.

A neat byproduct of this pool made from independent inference populations is that it allows you to build strongly parallelized hardware solutions (https://doi.org/10.1101/500280). This parallelization property should apply to your approach too.

Nice 🙂 Thanks for your talk!

Maneesh Sahani
2020-10-11 02:37:43

*Thread Reply:* Thanks for these pointers David. This sounds very interesting and we'll definitely take a look.

Emtiyaz Khan
2020-10-10 03:57:16

@Maneesh Sahani Great talk! I particularly enjoyed the second part. Could you provide a reference for that part?

The “implicit rep and recurrent inference” scheme, I highly suspect, is in fact a mirror descent algorithm. It appears to be closely related to what I have been calling the “Bayesian learning rule” (or conjugate computation VI); See Sec 2.2 here. So it might be very similar to variational message passing type of schemes.

Happy to hear of your thoughts. Thanks!

Maneesh Sahani
2020-10-11 02:56:22

*Thread Reply:* Hi Emti! No references yet I'm afraid (as I said in the talk, I wanted to throw out some current ideas for discussion). On the recurrence: I think it's actually much simpler than that. The expression I wrote for expfam mean inference is really just a dynamical restatement of \nabla\Psi(\mu) = \eta, with the dynamics guaranteed to converge by convexity of \Psi. Alternatively one can see it as a direct gradient implementation of the convex duality definition of \Psi: e.g. page 4 here. The key point for the talk is that this view allows the network recurrence to define the exponential family form. On message passing: the scheme can correspond to variational MP, BP/EP or intermediates based on other \alpha-divergences depending on the form of F on the penultimate slide (that links means in one network to natural params in another). Variational MP corresponds to a linear F.

Emtiyaz Khan
2020-10-11 12:25:54

*Thread Reply:* Thanks for the answer! I understand it and agree with your view.

> The key point for the talk is that this view allows the network recurrence to define the exponential family form. This is exciting! I have realized that there exists a duality between such “local” updates and the “global” implications (“defining the expfamily” in your case). Looking forward to your paper.

Pau
2020-10-10 05:50:10

Great talk, and very clear. However, I have a couple of technical questions: • Belief propagation typically works on trees or graphs that are locally tree-like. However, cortical connections are often dominated by local interactions, and the connectivity typically decays (fast) with distance. So the question is how can we reconcile a locally-dominated connectome with belief propagation? • How intricate are the computations in each cortical column? I mean when an inference might depend of multiple non-linear variables. I am thinking of cases such as an xor type of computation, where information from other sources cannot be simply added. Do we need to put some complicated function under the hood of psi?

Maneesh Sahani
2020-10-11 03:05:35

*Thread Reply:* Thanks for the comment and questions. On belief propagation: the idea is that the dense local recurrence implements the exponential family inference for a single variable (a node of the graph) while sparser long-range connections reflect interactions between variables (edges). This assumes a strong compartmentalisation of recurrence in a "column" which is probably not true, so there's work to be done to extend this to a "neural field" sort of structure -- but I think the same basic ideas will carry through. On your second point: indeed, the idea is that \Psi (or, more generally, the recurrence function) can be quite complex, which would presumably be why neural circuits have such rich architectures. However the additive form on the inputs emerges from the factorisation of the distribution associated with the graph, combined with the exponential family form. Factorisation means that messages from neighbours multiply (as in BP), and the exponential form translates that product to a sum.

Hiroaki Gomi
2020-10-11 10:26:45

Hi Maneesh. Thank you for your stimulative talk. It’s unfortunate we cannot invite you to Tokyo.  You suggested in your last slide that multiple learning timescale in the recurrent connection. Could you explain more why short-term priors realized by local recurrent connection need to vary independently? And are the time-scales of adaptation in local and global connections predefined inherently or determined in the learning?

Maneesh Sahani
2020-10-11 21:52:34

*Thread Reply:* Hi Hiroaki. Indeed, I do hope we are able to get together somewhere in the world before too long. To expand on the issue you raise: the individual node distributions in the representation we're suggesting are multivariate, and can be quite complex. So, in principle, the entire latent structure could collapse into a single node. Two structural components work against this collapse: the standard one is conditional independence in the graph; but a additional one is provided by variable timescales of adaptation. The decomposition into separate variables can be driven if the local (in time) marginal distributions of two latent variables change over time while the joint factor remains constant; but for this signal to work the way the marginals change needs to be different. I said "independent" but we may not need full statistical independence -- just enough difference to be identifiable. I hope that helps ...

👍 Hiroaki Gomi
Jian Liu
2020-10-14 21:52:30

Nice talk. Maneesh always can make it easy to understand. The questions listed in the end are very insightful.