Visual short-term memory

In the study of vision, visual short-term memory (VSTM) is one of three broad memory systems including iconic memory and long-term memory. VSTM is a type of short-term memory, but one limited to information within the visual domain.

The term VSTM refers in a theory-neutral manner to the non-permanent storage of visual information over an extended period of time.^[1] The visuospatial sketchpad is a VSTM subcomponent within the theoretical model of working memory proposed by Alan Baddeley; in which it is argued that a working memory aids in mental tasks like planning and comparison.^[2]^[3] Whereas iconic memories are fragile, decay rapidly, and are unable to be actively maintained, visual short-term memories are robust to subsequent stimuli and last over many seconds. VSTM is distinguished from long-term memory, on the other hand, primarily by its very limited capacity.^[4]^[5]

YouTube Encyclopedic

1/3
Views:
154 896
7 702 334
2 816

Transcription

Take a second to think about everything you've done today. You've taken in way more information than you could possibly remember in detail-- things you've seen, heard, smelled, touched, and tasted. But somehow, some information gets stored in a way that lets you access it later. So what makes this process work? Our brains are really complicated, so scientists have come up with models to represent how our brain takes in and makes sense of information in our environment. One of the most influential models is the information-processing model, which proposes that our brains are similar to computers-- we get input from the environment, process it, and output decisions. It's important to note that this model doesn't really describe where things happen in the brain. It's more conceptual. The first stage, then, is getting the input, which occurs in sensory memory. This is sometimes also called the sensory register, so if you hear that term, just know it's the same thing as sensory memory. And this is where you first interact with the information in your environment. It's a temporary register of all the information your senses are taking in. Even though you have five senses, the two most studied in terms of memory are sight and sound. So within sensory memory, you have iconic memory, which is memory for what you see, and echoic memory, which is memory for what you hear. One of the really interesting things about sensory memory is that it lasts a different amount of time depending on the modality of the information coming in. So visual information is incredibly vivid, but it only lasts for less than half a second. Auditory information. on the other hand, lasts a little bit longer. It lasts for about three or four seconds. So if you've ever tuned out of a conversation and your friend gets mad that you're not listening to them, you can thank echoic memory for helping you remember the last thing they actually said. So we have a ton of information coming into our sensory memory, but we can't possibly process all of it. We decide what to pay attention to, and that gets passed along into working memory to be processed. Working memory is just whatever you're thinking about right at this moment. And it's also called short-term memory, but we're going to stick with working memory because that's what psychologists call it. Working memory capacity works a little bit differently. It's not defined by time so much as quantity. Just remember the magic number seven. Your working memory can hold about seven plus or minus two pieces of information at a time, so about five to nine. This does vary a little bit based on how complicated those pieces of information are, how old you are, that kind of thing. But generally, it's right around seven. And an interesting fact is that this is actually why phone numbers started out as seven digits long. It was determined that that's as many pieces of information as a person could hold in mind without getting numbers confused or mixing them up. And just like sensory memory has different components for different types of input, working memory has different components to process those distinct types of input. Visual and spatial information, like pictures and maps, are processed in the aptly-named visuo-spatial sketchpad, while verbal information, meaning words and numbers, are processed in the phonological loop. Again, think of repeating a phone number to yourself just long enough to type it in. That's using your phonological loop. Be careful here, though. "Verbal information" means any words and numbers, so words and numbers you heard that came from the echoic memory, and words and numbers you saw that came from iconic memory. So we've got a little bit of mix-and-match here. Now, you might be thinking that sometimes you need to process input place that has verbal and visual information together, such as a map with street names and landmarks. In that case, you need someone to coordinate the efforts of the visuo-spatial sketchpad and the phonological loop. So something called the central executive fills that role. You can think of him kind of like a traffic cop who directs the other components of working memory. Once the central executive tells the visuo-spatial sketchpad and the phonological loop to coordinate, then they create an integrated representation that gets stored in the episodic buffer, which acts as a connector to long-term memory. Long-term memory is the final stage in the information processing model. When stuff gets in here, it's like hitting the Save button on your computer. Unfortunately, our memories aren't quite as foolproof as that. It doesn't work perfectly. But we can store a lot of information in long-term memory. Once again, there are different components that specialize in different types of memories. We have two main categories-- explicit, also called declarative, and implicit, also called non-declarative. As you can see, psychologists like to give these things multiple names, but fortunately, they can generally be broken down into something that makes sense, so don't get intimidated. Explicit memories, for example, are facts or events that you can clearly or explicitly describe. So any time you take a vocabulary test or remember the state capitals, you're using a specific type of explicit memory called semantic memory. And "semantic" just means "having to do with words," so you can think about it as being able to remember simple facts like the meaning of words. A second type of explicit memory is called episodic memory, which is memory for events, like your last birthday party. Just like a TV episode is a sequence of events, your episodic memory stores event-related memories. While explicit memories are easy to define, implicit memories are a little bit fuzzier. They involve things you may not be able to articulate, such as how to ride a bicycle. You probably can't say clearly how much pressure to put on the pedals or exactly how to turn the handlebars. But provided that you ever learned in the first place, if you get on a bike and just do it, you probably won't fall over. Memories for procedures like riding a bike are conveniently called "procedural memories." The last type of implicit memory is called priming, which means that previous experience influences your current interpretation of an event. For example, if I say the word "hair," what do you think of? If you paid attention at the beginning of this video, then you might have thought of "hair" as "H-A-R-E," meaning "rabbit," because you were primed with the bunny picture at the beginning. Your recent experience of seeing a bunny stayed in your memory and influenced your interpretation of the word that I said. If you weren't paying attention, or if you've maybe had to push your hair out of your face in the last few minutes, then you might have thought of "hair" as "H-A-I-R," because it's generally a more common word. With all these components of memory, you might be wondering how much it can actually hold. I think we've all had the feeling that we can't possibly take in any more information, and while it might be true but you can't process any more information at the moment, unlike like the computer in front of you, as far as we know, long-term memory capacity is unlimited. So your brain never actually gets too full for more information.

Overview

The introduction of stimuli which were hard to verbalize, and unlikely to be held in long-term memory, revolutionized the study of VSTM in the early 1970s.^[6]^[7]^[8] The basic experimental technique used required observers to indicate whether two matrices,^[7]^[8] or figures,^[6] separated by a short temporal interval, were the same. The finding that observers were able to report that a change had occurred, at levels significantly above chance, indicated that they were able to encode aspect of the first stimulus in a purely visual store, at least for the period until the presentation of the second stimulus. However, as the stimuli used were complex, and the nature of the change relatively uncontrolled, these experiments left open various questions, such as:

whether only a subset of the perceptual dimensions comprising a visual stimulus are stored (e.g., spatial frequency, luminance, or contrast)
whether perceptual dimensions are maintained in VSTM with greater fidelity than others
the nature by which these dimensions are encoded (i.e., are perceptual dimensions encoded within separate, parallel channels, or are all perceptual dimensions stored as a single bound entity within VSTM?).

Set-size effects

Much effort has been dedicated to investigating the capacity limits of VSTM. In a typical change-detection task, observers are presented with two arrays, composed of a number of stimuli. The two arrays are separated by a short temporal interval, and the task of observers is to decide if the first and second arrays are identical, or whether one item differs across the two displays.^[a] Performance is critically dependent on the number of items in the array. While performance is generally almost perfect for arrays of one or two items, correct responses invariably decline in a monotonic fashion as more items are added. Different theoretical models have been put forward to explain limits on VSTM storage, and distinguishing between them remains an active area of research.

Models of capacity limits

Slot models

A prominent class of model proposes that observers are limited by the total number of items which can be encoded, either because the capacity of VSTM itself is limited.^[b] This type of model has obvious similarities to urn models used in probability theory.^[c] In essence, an urn model assumes that VSTM is restricted in storage capacity to only a few items, k (often estimated to lie in the range of three-to-five in adults, though fewer in children^[9]). The probability that a suprathreshold change will be detected is simply the probability that the change element is encoded in VSTM (i.e., k/N). This capacity limit has been linked to the posterior parietal cortex, the activity of which initially increases with the number of stimuli in the arrays, but saturates at higher set-sizes.^[10] Although urn models are used commonly to describe performance limitations in VSTM,^[d] it is only recently that the actual structure of items stored has been considered. Luck and colleagues have reported a series of experiments designed specifically to elucidate the structure of information held in VSTM.^[11] This work provides evidence that items stored in VSTM are coherent objects, and not the more elementary features of which those objects are composed.

Noise models

An alternative framework has more been put forward by Wilken and Ma who suggest that apparent capacity limitations in VSTM are caused by a monotonic decline in the quality of the internal representations stored (i.e., monotonic increase in noise) as a function of set-size. In this conception capacity limitations in memory are not caused by a limit on the number of things that can be encoded, but by a decline in the quality of the representation of each thing as more things are added to memory. In their 2004 experiments, they varied color, spatial frequency, and orientation of objects stored in VSTM using a signal detection theory approach.^[e] The participants were asked to report differences between the visual stimuli presented to them in consecutive order. The investigators found that different stimuli were encoded independently and in parallel, and that the major factor limiting report performance was neuronal noise (which is a function of visual set-size).^[12]

Under this framework, the key limiting factor on working memory performance is the precision with which visual information can be stored, not the number of items that can be remembered.^[12] Further evidence for this theory was obtained by Bays and Husain using a discrimination task. They showed that, unlike a "slot" model of VSTM, a signal-detection model could account both for discrimination performance in their study and previous results from change detection tasks.^[f] These authors proposed that VSTM is a flexible resource, shared out between elements of a visual scene—items that receive more resource are stored with greater precision. In support of this, they showed that increasing the salience of one item in a memory array led to that item being recalled with increased resolution, but at the cost of reducing resolution of storage for the other items in the display.^[13]

Psychophysical models

Psychophysical experiments suggest that information is encoded in VSTM across multiple parallel channels, each channel associated with a particular perceptual attribute.^[14] Within this framework, a decrease in an observer's ability to detect a change with increasing set-size can be attributed to two different processes:

if decisions are made across different channels, decreases in performance are typically small, and consistent with decreases expected when making multiple independent decisions^[15]^[16]
if multiple decisions are made within the same channel, the decrease in performance is much greater than expected on the basis of increased decision-noise alone, and is attributed to interference caused by multiple decisions within the same perceptual channel.^[17]

However, the Greenlee-Thomas model^[15] suffers from two failings as a model for the effects of set-size in VSTM. First, it has only been empirically tested with displays composed of one or two elements. It has been shown repeatedly in various experimental paradigms that set-size effects differ for displays composed of a relatively small number of elements (i.e., 4 items or less), and those associated with larger displays (i.e., more than 4 items). The Greenlee-Thomas model offers no explanation for why this might be so. Second, while Magnussen, Greenlee, and Thomas^[18]^{[full citation needed]} are able to use this model to predict that greater interference will be found when dual decisions are made within the same perceptual dimension, rather than across different perceptual dimensions, this prediction lacks quantitative rigor, and is unable to accurately anticipate the size of the threshold increase, or give a detailed explanation of its underlying causes.

In addition to the Greenlee-Thomas model, there are two other prominent approaches for describing set-size effects in VSTM. These two approaches can be referred to as sample size models,^[19] and urn models.^[g] They differ from the Greenlee-Thomas model by:

ascribing the root cause of set-size effects to a stage prior to decision making
making no theoretical distinction between decisions made in the same, or across different, perceptual dimensions.

Intermediate visual store

There is some evidence of an intermediate visual store with characteristics of both iconic memory and VSTM.^[20] This intermediate store is proposed to have high capacity (up to 15 items) and prolonged memory trace duration (up to 4 seconds). It coexists with VSTM but unlike it visual stimuli can overwrite the contents of its visual store.^[21] Further studies suggests an involvement of visual area V4 in the retention of information about the color of the stimulus in visual working memory,^[22]^[23] and the role of the VO1 area for retaining information about its shape.^[23] It has been shown that in the VO2 region all characteristics of the stimulus retained in memory are combined into a holistic image.^[23]

The function of visual short-term memory representations

VSTM is thought^[by whom?] to be the visual component of the working memory system, and as such it is used as a buffer for temporary information storage during the process of naturally occurring tasks. But what naturally occurring tasks actually require VSTM? Most work on this issue has focused on the role of VSTM in bridging the sensory gaps caused by saccadic eye movements. These sudden shift of gaze typically occur 2–4 times per second, and vision is briefly suppressed while the eyes are moving. Thus, the visual input consists of a series of spatially shifted snapshots of the overall scene, separated by brief gaps. Over time, a rich and detailed long-term memory representation is constructed from these brief glimpses of the input, and VSTM is thought^[by whom?] to bridge the gaps between these glimpses and to allow the relevant portions of one glimpse to be aligned with the relevant portions of the next glimpse. Both spatial and object VSTM systems may play important roles in the integration of information across eye movements. Eye movements are also affected by VSTM representations. The constructed representations held in VSTM can affect eye movements even when the task does not explicitly require eye movements: the direction of small microsaccades point towards the location of objects in VSTM.^[24]

Notes

^ e.g., Luck & Vogel 1997.
^ e.g., Cowan 2001; Luck & Vogel 1997; Pashler 1988.
^ See, for example, Mendenhall 1967.^{[full citation needed]}
^ e.g., Luck & Vogel 1997; Pashler 1988; Sperling 1960.
^ See also the closely related work by Palmer 1990.
^ e.g., Luck & Vogel 1997.
^ e.g., Pashler 1988.

References

^ Buss, Aaron T.; Ross-Sheehy, Shannon; Reynolds, Greg D. (2018-10-01). "Visual working memory in early development: a developmental cognitive neuroscience perspective". Journal of Neurophysiology. 120 (4): 1472–1483. doi:10.1152/jn.00087.2018. ISSN 0022-3077. PMID 29897858. S2CID 49189631.
^ Buss, Aaron T.; Ross-Sheehy, Shannon; Reynolds, Greg D. (2018-10-01). "Visual working memory in early development: a developmental cognitive neuroscience perspective". Journal of Neurophysiology. 120 (4): 1472–1483. doi:10.1152/jn.00087.2018. ISSN 0022-3077. PMID 29897858. S2CID 49189631.
^ Logie, Robert (1988-04-01). "Working memory, Alan Baddeley, Oxford University Press, Oxford 1986. No. of pages: 289. Price £30.00 (Hardback), ISBN 0 19 852116 2". Applied Cognitive Psychology. 2 (2): 166–168. doi:10.1002/acp.2350020209.
^ Buss, Aaron T.; Ross-Sheehy, Shannon; Reynolds, Greg D. (2018-10-01). "Visual working memory in early development: a developmental cognitive neuroscience perspective". Journal of Neurophysiology. 120 (4): 1472–1483. doi:10.1152/jn.00087.2018. ISSN 0022-3077. PMID 29897858. S2CID 49189631.
^ Baddeley, Alan D.; Hitch, Graham (1974-01-01), Bower, Gordon H. (ed.), Working Memory, Psychology of Learning and Motivation, vol. 8, Academic Press, pp. 47–89, doi:10.1016/s0079-7421(08)60452-1, ISBN 9780125433082, retrieved 2022-05-16
^ ^a ^b Cermak, Gregory W. (1971). "Short-term recognition memory for complex free-form figures". Psychonomic Science. 25 (4): 209–211. doi:10.3758/BF03329095.
^ ^a ^b Phillips, W.A. (1974). "On the distinction between sensory storage and short-term visual memory". Perception & Psychophysics. 16 (2): 283–290. doi:10.3758/bf03203943.
^ ^a ^b Phillips, W.A.; Baddeley, A.D. (1971). "Reaction time and short-term visual memory". Psychonomic Science. 22 (2): 73–74. doi:10.3758/bf03332500.
^ Riggs, K.J.; McTaggart, J.; Simpson, A. (2006). "Changes in the capacity of visual working memory in 5- to 10-year-olds". Journal of Experimental Child Psychology. 95 (1): 18–26. doi:10.1016/j.jecp.2006.03.009. PMID 16678845.
^ Todd, J. Jay; Marois, René (2004). "Capacity limit of visual short-term memory in human posterior parietal cortex". Nature. 428 (6984): 751–754. Bibcode:2004Natur.428..751T. doi:10.1038/nature02466. PMID 15085133. S2CID 4415712.
^ Luck, S.J.; Vogel, E.K. (1997). "The capacity of visual working memory for features and conjunctions". Nature. 390 (6657): 279–281. Bibcode:1997Natur.390..279L. doi:10.1038/36846. PMID 9384378. S2CID 205025290.
^ ^a ^b Wilken, P.; Ma, W.J. (2004). "A detection theory account of change detection". J Vis. 4 (12): 1120–1135. doi:10.1167/4.12.11. PMID 15669916.
^ Bays, P.M.; Husain, M. (2008). "Dynamic shifts of limited working memory resources in human vision". Science. 321 (5890): 851–854. Bibcode:2008Sci...321..851B. doi:10.1126/science.1158023. PMC 2532743. PMID 18687968.
^ Magnussen, S (2000). "Low-level memory processes in vision". Trends in Neurosciences. 23 (6): 247–251. doi:10.1016/s0166-2236(00)01569-1. PMID 10838593. S2CID 16231057.
^ ^a ^b Greenlee, M.W.; Thomas, J.P. (1993). "Simultaneous discrimination of the spatial frequency and contrast of periodic stimuli". Journal of the Optical Society of America A. 10 (3): 395–404. Bibcode:1993JOSAA..10..395G. doi:10.1364/josaa.10.000395. PMID 8473947.
^ Vincent, A.; Regan, D. (1995). "Parallel independent encoding of orientation, spatial frequency, and contrast". Perception. 24 (5): 491–499. doi:10.1068/p240491. PMID 7567425. S2CID 25950156.
^ Magnussen, S.; Greenlee, M.W. (1997). "Competition and sharing of processing resources in visual discrimination". Journal of Experimental Psychology: Human Perception and Performance. 23 (6): 1603–1616. doi:10.1037/0096-1523.23.6.1603. PMID 9425670.
^ Magnussen, Greenlee & Thomas 1997
^ Palmer, J (1990). "Attentional limits on the perception and memory of visual information". Journal of Experimental Psychology: Human Perception and Performance. 16 (2): 332–350. doi:10.1037/0096-1523.16.2.332. PMID 2142203.
^ Sligte, Ilja G.; Scholte, H. Steven; Lamme, Victor A. F. (2008). "Are There Multiple Visual Short-Term Memory Stores?". PLOS ONE. 3 (2): e1699. Bibcode:2008PLoSO...3.1699S. doi:10.1371/journal.pone.0001699. PMC 2246033. PMID 18301775.
^ Pinto, Y.; Sligte, I.S.; Shapiro, K.L.; Lamme, V.A.F. (2013). "Fragile Visual Short-Term Memory is an object-based and location-specific storage". Psychonomic Bulletin & Review. 20 (4): 732–739. doi:10.3758/s13423-013-0393-4. PMID 23456410.
^ Sligte, I. G.; Scholte, H. S.; Lamme, V. A. F. (2009). "V4 Activity Predicts the Strength of Visual Short-Term Memory Representations". Journal of Neuroscience. 29 (23): 7432–7438. doi:10.1523/JNEUROSCI.0784-09.2009. PMC 6665414. PMID 19515911.
^ ^a ^b ^c Kozlovskiy, Stanislav; Rogachev, Anton (2021). "How Areas of Ventral Visual Stream Interact When We Memorize Color and Shape Information". Advances in Intelligent Systems and Computing. Springer-Nature. 1358 (95–100): 95–100. doi:10.1007/978-3-030-71637-0_10. ISBN 978-3-030-71636-3. ISSN 2194-5357. S2CID 234902744.
^ Martinez-Conde, S; Alexander, R (2019). "A gaze bias in the mind's eye". Nature Human Behaviour. 3 (5): 424–425. doi:10.1038/s41562-019-0546-1. PMID 31089295. S2CID 71148025.

Sources

Cowan, N (2001). "The magical number 4 in short-term memory: A reconsideration of mental storage capacity". Behavioral and Brain Sciences. 24 (1): 87–114, discussion 114-85. doi:10.1017/S0140525X01003922. PMID 11515286.
Pashler, H (1988). "Familiarity and visual change detection". Perception & Psychophysics. 44 (4): 369–378. doi:10.3758/bf03210419. PMID 3226885.
Sperling, G (1960). "The information available in brief visual presentations". Psychological Monographs: General and Applied. 74 (11): 1–30. doi:10.1037/h0093759.

This page was last edited on 13 August 2023, at 18:56

From Wikipedia, the free encyclopedia