NC unit secretary
2020-10-06 15:15:12

This is a discussion channel for "Object-oriented deep learning" by Dr. Matthew Botvinick (DeepMind) A link to the talk is the following. Please do not share the link to anyone outside of this slack workspace. Access Passcode can be found at the announcement channel.   URL: https://vimeo.com/471278717 (22 minutes)

✅ Ryota Kanai
Gergo Gomori
2020-10-11 07:16:47

Dear Dr. Matthew Botvinick, Thank you very much for your presentation. Your projects are really inspiring and I believe they give us a glance into the interesting underlying mechanisms of the brain.

Matt Botvinick
2020-10-11 20:38:41

*Thread Reply:* Thanks, Gergo. Glad you found it to be of interest!

Nicos Isaak
2020-10-11 20:44:58

Thank you very much for this great talk!

Matt Botvinick
2020-10-11 21:22:39

*Thread Reply:* Glad you found it interesting, Nicos!

Shin'ya Nishida
2020-10-11 22:06:28

Thank you for your interesting presentation. I wonder how complex is your object representation. Does it contain detailed information about texture and geometry as well? If not, are you thinking separate representation of object and surface like 3D model and 2.5D sketch of D. Marr?

Matt Botvinick
2020-10-11 22:10:34

*Thread Reply:* Thanks for the question. The object encodings we use are those generated by MONet, which are generated by a variational autoencoder. This is good enough to support decent reconstruction, so therefore does capture shape, color, texture, to some degree. However, some caveats: MONet doesn't work well on natural images (yet), and so further technological development is needed -- many are trying (for example with adversarial methods). Second, the vector embedding of objects here is tied to their 2D appearance, rather than explicitly encoding 3d geometry. However, effects such as occlusion effects can nonetheless be dealt with in our setup through relational coding over time. This is evident in our 'intuitive physics' results, where the system predicts the emergence of transiently occluded moving objects, even though the per-frame object encodings are purely 2D.

Matt Botvinick
2020-10-11 22:12:17

*Thread Reply:* I wouldn't want the specific approach we take to object embedding to be taken too literally -- better methods will certainly emerge. For example, our current approach determines object segmentations in a purely task-independent way. It seems clear that the nature of the task should influence the way that images are segmented (for example, should an apple on a tree be included in the entire tree-object or singled out as an independent object? It depends on whether you're climbing the tree or trying to pick the apple).

Shin'ya Nishida
2020-10-11 22:41:56

*Thread Reply:* Thanks. Whether human brain has task-dependent object/surface representations or a more general-purpose one is a very interesting question.

Sugandha Sharma
2020-10-11 23:35:51

Matt -- great points in the discussion period. What is the paper you referred to, in which you showed that using a network with structural hierarchy already in it to learn a task leads to a functionally hierarchical solution to the task?

Matt Botvinick
2020-10-11 23:40:11

Thanks, Sugandha. This is the paper: Botvinick, M. M. (2007). Multilevel structure in behaviour and in the brain: a model of Fuster's hierarchy. Philosophical Transactions of the Royal Society B: Biological Sciences362(1485), 1615-1626.

Emtiyaz Khan
2020-10-12 00:04:52

👏 for your comments on the meta-learning approaches during the panel discussion. It resonates with me! (also of course, thanks for your interesting talk).

Matt Botvinick
2020-10-12 00:07:20

*Thread Reply:* Thanks for joining for the discussion, Emtiyaz. What a great group to chat with!

Emtiyaz Khan
2020-10-12 00:08:31

*Thread Reply:* Completely agree. It’s midnight for me and I am wide awake 😅

Raunak Basu
2020-10-12 01:02:11

Hi Matt, your analogy about drunk driving being an example of the kind of declarative representation that AI models lack was amazing. However, I would think that given the current state of AI such a system may indeed be possible. For example, let us break down that exact decision making process - 1) getting drunk causes motor impairments, 2) Driving requires motor skills, 3) Accidents are bad for society. Now, a deep RL system trained to do a completely different motor task (like playing tennis) could have an extra "alcohol" input that 'blurs' the action selection. Coupling a model like that with a model that is trained to self drive a car can in principle result in an inference process that being drunk will impair driving. Now comes the question, how harmful can such a decision be. For that you can have simple models that may be trained on outcome valence given different situations or have more realistic models that can be along the lines of what Josh talked about - models with some understanding of physics. I guess the fun and challenging part would be the integration of these distributed modules that in the end would give an illusion of conscious and unified declarative representation.

Matt Botvinick
2020-10-12 18:31:27

*Thread Reply:* Hey Raunak -- Yes, I agree, the foundations are there, if we can figure out how to combine them properly. I guess what I had in mind with the drunk driving example was a scenario where a system can not only make the right decisions, but do so based on 'explicit' knowledge (i.e., "One's reaction times go up when one is drunk, and therefore driving can be dangerous.") Perhaps recent progress with language modeling will be useful, since a lot of "explicit" knowledge seems to be verbalizable. But overall I agree that what we have now, in terms of AI building blocks, shouldn't be too far from sufficient!