The model proposed by Heinrich et al. builds upon the one by Hinoshita et al. It adds visual input and thus shows how learning of language may not only be grounded in perception of verbal utterances, but also in visual perception.⇒
Hinoshita et al. propose a model of natural language acquisition based on a multiple-timescale recurrent artificial neural network (MTRNN).⇒
Lawrence et al. train different kinds of recurrent neural networks to classify sentences in grammatical or agrammatical.⇒
Lawrence manage to train ANNs to learn grammar-like structure without them having any inbuilt representation of grammar They argue that that shows that Chomsky's assumption that humans must have inborn linguistic capabilities is unnecessary.⇒
Hinoshita et al. argue that by watching language learning in RNNs, we can learn about how the human brain might self-organize to learn language.⇒
Wilson and Golonka argue that representations and computational processes can be replaced by world-body-brain dynamics even in neurolinguistics.⇒
Since speech happens in a brain which is part of a body in a physical world, it is without doubt possible to describe it in terms of world-body-brain dynamics.
The question is whether that is a good way of describing it. Such a description may be very complex and difficult to handle—it might run against what explanation in science is supposed to do.⇒
Neural responses to words from different categories activate different networks of brain regions.⇒
The fact that the brain regions activated by (hearing, reading...) certain words correspond to the categories the words belong to (action words for motor areas etc.) suggests semantic grounding in perception and action.⇒
Words from some categories do not activate brain regions which are related to their meaning.
The semantics of those words do not seem to be grounded in perception or action.
Pulvermüller calls such categories and their neural representations
disembodied words seem to activate areas in the brain related to emotional processing.
These words may be grounded in emotion.⇒
It seems that the representations of words can be more or less modal ie. words may be more or less abstract and thus more or less grounded in sensory, motor, or emotional areas.⇒
Fixating some point in space enhances spoken language understanding if the words come from that point in space. Fixating a visual stream showing lips consistent with the utterances, this effect is strongest, but it also works if the visual display is random. The effect is also enhanced if fixation is combined with some form of visual task which is complex enough.
Fixating at some point in space can impede language understanding if the utterance do not emanate from the focus of visual attention and there are auditory distractors which do.⇒
De Kamps and van der Velde introduce a neural blackboard architecture for representing sentence structure.⇒
De Kamps and van der Velde use their blackboard architecture for two very different tasks: representing sentence structure and object attention.⇒
Using multiple layers each of which learns with a trace rule with successively larger time scales is similar to the CTRNNs Stefan Heinrich uses to learn the structure of language. Could there be a combined model of learning of sentence structure and language processing on the one hand and object-based visual or multi-modal attention on the other?⇒
Feldman dismisses de Kamps' and van der Velde's approaches to neural variable binding stating that they don't work for the general case "where new entities and relations can be dynamically added".⇒