To install click the Add extension button. That's it.

The source code for the WIKI 2 extension is being checked by specialists of the Mozilla Foundation, Google, and Apple. You could also do it yourself at any point in time.

4,5
Kelly Slayton
Congratulations on this excellent venture… what a great idea!
Alexander Grigorievskiy
I use WIKI 2 every day and almost forgot how the original Wikipedia looks like.
Live Statistics
English Articles
Improved in 24 Hours
Added in 24 Hours
What we do. Every page goes through several hundred of perfecting techniques; in live mode. Quite the same Wikipedia. Just better.
.
Leo
Newton
Brights
Milds

Curse of dimensionality

From Wikipedia, the free encyclopedia

The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces (often with hundreds or thousands of dimensions) that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience. The expression was coined by Richard E. Bellman when considering problems in dynamic programming.[1][2]

Cursed phenomena occur in domains such as numerical analysis, sampling, combinatorics, machine learning, data mining and databases. The common theme of these problems is that when the dimensionality increases, the volume of the space increases so fast that the available data become sparse. This sparsity is problematic for any method that requires statistical significance. In order to obtain a statistically sound and reliable result, the amount of data needed to support the result often grows exponentially with the dimensionality. Also, organizing and searching data often relies on detecting areas where objects form groups with similar properties; in high dimensional data, however, all objects appear to be sparse and dissimilar in many ways, which prevents common data organization strategies from being efficient.

YouTube Encyclopedic

  • 1/5
    Views:
    1 748 725
    275 156
    22 459
    2 839
    11 098
  • ✪ Thinking visually about higher dimensions
  • ✪ Principal component analysis
  • ✪ From Deep Learning of Disentangled Representations to Higher-level Cognition
  • ✪ State Space Tracking: Estimation Theory Part 1
  • ✪ Large-scale Image Classification: ImageNet and ObjectBank

Transcription

Math is sometimes a real tease it seduces us with the beauty of reasoning geometrically in two and three dimensions Where there's this really nice back-And-forth between pairs or triplets of numbers and spatial stuff that our visual Cortex is good at processing for example if you think about a circle with radius one centered at the origin You are in effect conceptualizing every possible pair of numbers x and y that satisfy a certain numerical property that x squared plus y squared is 1 and The usefulness here is that a lot of facts that look opaque in a purely analytics become quite clear geometrically and Vice-versa Honestly this channel has been the direct beneficiary of this back and forth Since it offers such a rich library of that special category of cleverness that involves connecting two Seemingly disparate ideas, and I don't just mean the general back and forth between pairs or triplets of numbers and spatial reasoning I mean this specific one between sums of squares and circles and spheres It's at the heart of the video. I made showing How pi is connected to number theory in primes and the one showing how to visualize all possible pythagorean triples? It also underlies the video on the borsig Coulomb theorem being used to solve What was basically a counting puzzle by using topological facts about spheres? There is no doubt that the ability to frame analytic facts geometrically is very useful for math But it's all a tease because when you start asking questions about quadruplets or quintuplets or 100 tuples of numbers It's frustrating the constraints on our physical space Seem to have constrained our intuitions about geometry and we lose this back and forth I mean it is completely reasonable to imagine that there are problems out there that would have clever and Illuminating solutions if only we knew how to conceptualize say lists of 10 numbers as individual points in some space for Mathematicians or computer scientists or physicists problems that are framed in terms of lists of numbers lists of more than three numbers are a regular part of the job and The standard approach to actually doing math in higher dimensions is to use two and three dimensions for analogy but to fundamentally reason about things just Analytically somewhat analogous to a pilot relying primarily on instruments and not sight while flying through the clouds now what I want to offer here is a hybrid between the purely geometric and the purely analytic view Z-- a method for making the analytic reasoning a little more visual in a way that generalizes to arbitrarily high dimensions and to drive home the value of a tactic like this I want to share with you a very famous example Where analogies with two and three dimensions? Cannot help because of something extremely counterintuitive that only happens in higher dimensions the hope though Is that what I show you here helps to make that phenomenon more intuitive The focus throughout will be on higher dimensional spheres for example when we talk about a four dimensional sphere Say with radius one centered at the origin What that actually is is the set of all quadruplets of numbers x y z w? Where the sum of the squares of these numbers is one? What I have pictured here now is multiple three-dimensional slices of a 4D sphere projected back into three dimensions but it's confusing and even if you do wrap your head around it it just pushes the question back to how you would think about a five or a Six or a seven dimensional sphere and more importantly squinting your eyes to understand a projection like this It's not very reflective of what doing math with a 4D sphere actually entails Instead the basic idea here will be to get very literal about it and to think about four separate numbers I like to picture for vertical number lines with sliders to represent each number each Configuration of these sliders is a point in 40 space a quadruplet of numbers and what it means to be on a 4D unisphere Centered at the origin is that the sum of the squares of these four values is 1? our goal is to understand which movements of these sliders Correspond to movements on the sphere to Do that it helps if we knock things down to two dimensions where we can actually see the circle so ask yourself What's a nice way to think about this relation that x squared plus y squared is 1? Well, I like to think of the value of x squared as the real estate belonging to x and likewise the value of y squared Is the real estate belonging to y, and that they have a total of one unit of real estate to Share between them? So moving around on the circle Corresponds to a constant exchange of real estate between the two variables Part of the reason I choose this term is that it lets us make a very useful Analogy that real estate is cheap near zero and more expensive away from zero To see this consider starting off in a position where x equals one and y is zero Meaning x has all of the real estate to itself which in our usual geometric picture means we're on the right most point of the circle if You move x down just a bit to 0.9 the value of x squared Changes to zero point eight one so it has in effect given up zero point one nine units of real estate But for y squared to increase by that same amount Y has to move an entire zero point four four units away from zero more than four times the amount that x moved in other words Exchanged a little to give up expensive real estate So that y could move a lot and gain the same value of cheap real estate in terms of the usual circle drawing this corresponds to the steep slope near the right side a Small nudge in x allows for a very big change to y Moving forward let's add some tick marks to these lines to indicate what? 0.05 units of real estate looks like at each point that is how much would x have to change so that the value of x squared changes by 0.05 As you walk around the circle the trade-off in value between x squared + y squared gives this piston looking dance motion where the sliders are moving more slowly away from 0 Because real estate is more expensive in those regions there were just more tick marks to cover per unit distance Also a nice side effect of the term real estate is that it aligns naturally with the fact that it comes in units of distance Squared so the square root of the total real estate among all coordinates gives us the distance from the origin for a unit sphere in three dimensions the set of all triplets xy z where the sum of their squares is 1 All we have to do is add a third slider for z But these three sliders still only have the one unit of real estate to share between them To get a feel for this imagine holding x in place at 0.5. Where it occupies? 0.25 units of real Estate What this means is that y and z can move around in the same piston dense motion? We saw before as they trade off the remaining 0.75 units of real Estate in Terms of our typical way of visualizing a sphere this corresponds to slicing the sphere along the plane where x is 0.5 and looking at the circle formed by all of the choices for Y&Z on that sphere as you increase the value of x the amount of real estate left over for Y&Z is smaller and This more constrained piston Dance is what it feels like for the circular slice to be smaller eventually once x reaches the value 1 there's no real estate left over, so you reach this singularity point where y and z are both forced to be 0 The feeling here is a bit like being a bug on the surface of the sphere You are unable to see the whole sphere all at once instead You're just sitting on a single point, and you have some sense for what local movements are allowed in Four dimensions and higher we lose the crutch of the global view that a spatial visual offers But the fundamental rules of this real estate exchange remain the same If you fix one slider in place and watch the other three trade off This is basically what it means to take a slice of the 4D sphere to get a small 3D sphere in Much the same way that fixing one of the sliders for the three dimensional case Give us a circular slice when the remaining two were free to vary Now watching these sliders move about and thinking about the real estate exchange is pretty fun but it runs the risk of being aimless unless we have an actual high dimensional puzzle to sink our teeth into so let's set aside the sliders for just a moment and bring in a very classic example of something that seems reasonable and even dull in two and three dimensions But which is totally out of whack in higher dimensions? To start take a 2x2 box centered at the origin Its Corners around the vertices 1 1 1 negative 1 negative 1 1 and negative 1 Negative 1 draw four circles each with radius one centered At these four vertices so each one is tangent to two of its neighbors Now I want you to think of the circle centered at the origin which is just large enough to be touching those corner circles Tangent to each one of them What we want to do for this setup and for its analogies in higher dimensions is find the radius of that inner circle here in two dimensions We can use the pythagorean theorem to see that the distance from the origin To the corner of the box is the square root of two which is around 1.4 1/4 then you can subtract off this portion here the radius of the corner circle which by definition is 1 And that means the radius of the inner circle is square root of 2 minus 1 or about 0.4 1/4 No surprises here that seems pretty reasonable Now do something analogous in three dimensions draw a 2 by 2 by 2 Cube? Whose Corners have vertices 1 1 1 1 1 negative 1 on? and then we're gonna take eight different spheres each of which has a radius one and Center them on these vertices so that each one is tangent to three of its neighbors Now again think about the sphere centered at the origin which is just large enough to be barely touching those eight corner spheres As before we can start by thinking about the distance from the origin to the corner of the box say the corner at 1 1 1 by the way I guess I still haven't yet explicitly said that the way distances work in higher dimensions is Always to add up the squares of the components in each direction and take the square root If you've never seen why this follows from the pythagorean theorem just in the 2-dimensional case it's actually a really fun puzzle to think about and I've left the relevant image up on the screen for any of you who want To pause and ponder on it anyway in our case the distance between the origin and the corner 1 1 1 is the Square root of 1 squared plus 1 Squared plus 1 Squared or square root of 3 which is about 1.7 3 So the radius of that inner sphere is going to be this quantity Minus the radius of a corner sphere which by definition is 1 and again? 0.73 seems like a reasonable radius for that inner sphere, but what happens to that inner radius as you increase dimensions Obviously the reason I bring this up. Is that something surprising will happen and some of you might see where this is going But actually don't want it to feel like a surprise as fun as it is to wow people with counterintuitive facts in Math The goal here is genuine understanding not shock For higher dimensions will be using sliders to get a gut feel for what's going on But since it's kind of a different way of viewing things it helps to get a running start by looking back at how to analyze the two and three-dimensional cases in the context of sliders First things first how do you think about a circle centered at a corner like one negative one? Well previously for a circle centered at the origin the amount of real estate belonging to both x and y was dependent on their distance from the number 0 And it's the same basic idea here as you move around the center It's just that the real estate might be dependent on the distance between each coordinate and some other number so for this circle centered at 1 negative 1 the amount of real estate belonging to x is the square of its distance from 1 Likewise the real estate belonging to y is the square of its distance from Negative 1 Other than that the looking feel with this piston dance trade-off is completely the same for simplicity we'll only focus on one of these circles the one centered at 1 1 Now ask yourself What does it mean to find a circle centered at the origin? Large enough to be tangent to this guy when we're thinking just in terms of sliders Well notice how this point of tangency happens when the x and y coordinates are both the same Or frays differently at the point of this corner circle closest to the origin The real estate is shared evenly between x and y This will be important for later. So let's really dig in and think about why it's true Imagine perturbing that point slightly maybe moving x a little closer to 0 which means y would have to move a little away from 0 The change in x would have to be a little smaller than the change in y Since the real estate it gains by moving farther away from 1 is more expensive than the real estate that y loses by getting closer to 1 But from the perspective of the origin point 0 0 that trade-off is reversed the resulting change to x squared is smaller than the resulting change to y squared since when real estate is measured with respect to 0 0 That move of y towards 1 is the more expensive one What this means is that any slight perturbation? Away from this point where real estate is shared evenly results in an increasing distance from the origin The reason we care Is that this point is tangent to the inner circle? So we can also think about it as being a point of the inner circle And this will be very useful for higher dimensions. It gives us a reference point to understanding the radius of that inner circle Specifically you can ask how much real estate is shared between x and y at this point when real estate? measurements are done with respect to the origin 0 0 For example down here in two dimensions both x and y dip below 0.5 in this configuration So the total value x squared plus y squared is going to be less than 0.5 squared plus 0.5 squared Comparing to this halfway point is really going to come in handy for wrapping our mind around what happens in Higher dimensions? Taking things one step at a time. Let's bump it up to three dimensions Consider the Corner sphere with radius 1 centered at 1 1 1 the point on that sphere that's closest to the origin Corresponds to the configuration of sliders where x y and z are all reaching down toward 0 and equal to each other Again, they all have to go a little beyond that halfway point because the position zero point 5 only accounts for 0.5 squared or 0.25 units of real Estate so with all three coordinates getting 1/3 of a unit of real estate they need to be farther out and Again since this is a point where the corner sphere is tangent to the inner sphere. It's also a point of the inner sphere so with reference to the origin 0 0 0 Think about the amount of real estate shared between xy and z in this position corresponding to the tangent point It's definitely less than 0.75 Since all three of these are smaller than 0.5 so each one has less than 0.25 units of real estate and Again, we sit back and feel comfortable with this result right the inner sphere is smaller than the corner spheres But things get interesting when we move up into four dimensions Our 2 by 2 by 2 by 2 box is Gonna have 16 vertices at 1 1 1 1 1 1 1 negative 1 and so on with all possible binary combinations of 1 and negative 1 What this means is that there are 16 units fears centered at these corners each one tangent to four of its neighbors? As before, we'll just be focusing on one of them the one centered at one one one one the point of the sphere closest to the origin Corresponds to the configuration of sliders where all four coordinates reach exactly halfway between one and zero and That's because when one of the coordinates is zero point five units away from one it has zero point two five units of real estate with respect to the point one We do the same trick as before Thinking of this now as a point of the inner sphere and measuring things with respect to the origin But you can already see what's cool about four dimensions as you switch to thinking of real estate with respect to zero zero zero zero It's still the case that each of these four coordinates has zero point two five units of real estate Making for a total of one shared between the four coordinates In other words that inner sphere is precisely the same size as the corner spheres This Matches with what you see numerically by the way where you can compute the distance between the origin and the corner one one one one is the square root of four and Then when you subtract off the radius of one of the Corners Fears what you get is one But there's something much more satisfying about seeing in rather than just computing it in particular Here's a cool aspect of the fact that that inner sphere has radius one move things around so that all of the real estate goes to the coordinate x and You'll end up at the point one zero zero zero this point is actually touching the two by two by two by two box and When you're stuck thinking in the two or three dimensional cases this fact that the inner sphere has radius one The same size as the corner spheres and that it touches the box Well it just seems too big But it's important to realize this is fundamentally a four dimensional phenomenon, and you just can't cram it down into smaller dimensions But things get weirder. Let's knock it up to five dimensions in this case. We have quite a Few corners Here's 32 in total, but again for simplicity. We'll only be thinking about the one syndra dead one one one one one Think about the point of the sphere closest to the origin where all five coordinates are equally splitting the one unit of shared real-estate This time each coordinate is a little higher than 0.5 if they reach down to 0.5 Each one would have point to 5 units of real estate giving a total of 1.25 which is too much? But the tables are turned when you view this as a point on the inner sphere Because with respect to the origin this configuration has much more than one unit of real estate Not only is every coordinate more than point five units away from zero But the larger number of dimensions means that there's more total real estate when you add it all up specifically you can compute that the radius of that inner sphere is about 1.2 for The intuitive feel for what that means is that the sliders can roam over more territory? Than what just a single unit of real estate would allow One fun way to see what this means is to adjust everything so that all of the real estate goes to just one coordinate Because this coordinate can reach Beyond one what you are seeing is that this five dimensional inner sphere? Actually pokes outside the box But to really get a feel for how strange things become as a last example. I want to jump up into ten dimensions remember all this means is that points have ten coordinates for a sphere with radius one a Single unit of real estate must be shared among all ten of those coordinates as always the point of this corner sphere closest to the origin is the one where all ten coordinates split the real estate evenly and here you can really see just how far away this feels from the origin or It frays differently that inner sphere is allowed to have a very large amount of real estate in fact you can compute that the radius of the inner sphere is about two point one six and viewed from this perspective Where you have tenfold the mentions to share that real estate? Doesn't it actually feel somewhat reasonable that the inner sphere should have a radius more than twice as big as all those corner spheres? To get a sense for just how big this inner sphere is look back in two dimensions and imagine a four by four box bounding all four circles from the outside Or go to three dimensions and imagine a four by four by four box bounding all of those corner spheres from the outside Way up here in ten dimensions that quote-unquote inner sphere is actually large enough to poke Outside of that outer bounding box since it has a diameter bigger than four I know that seems crazy But you have to realize that the face of the box is always two units away from the origin no matter how high the dimension is and Fundamentally, it's because it only involves moving along a single axis But the point 1 1 1 1 1 1 1 1 1 1 which Determines the inner spheres radius is Actually really far away from the center all the way up here in 10 dimensions and it's because all 10 of those dimensions add a full unit of real estate for that point and Of course as you keep upping the dimensions that inner sphere just keeps growing without bound Not only is it poking outside of these boxes, but the proportion of the inner sphere lying inside the box Decreases exponentially towards zero as the dimension keeps increasing So taking a step back one of the things I like about Using this slider method for teaching is that when I shared it with a few friends The way they started to talk about higher dimensions became a little less Metaphysical and started to sound more like how you would hear a mathematician talk about the topic Rather than skeptically asking whether or not 10 dimensional space is a real thing Recognizing that it's exactly as real as numbers are People would actually probe at what other properties high dimensional spheres have and what other shapes feel like in terms of sliders This box situation is just one in a number of things that feel very crazy about higher dimensional spheres And it's really fun to think about these others in the context of sliders and real estate It's obviously limited I mean You're a bug on the surface of these objects only getting a feel for one point at a time and for the rules of movement Also, Geometry can be quite nice when its coordinate free and this is the opposite of that But it does give a foothold into thinking about high dimensional shapes a little more concretely Now you could say that viewing things with sliders is no different from thinking about things purely analytically I mean, it's honestly a little more than representing each coordinate literally. It's kind of the most obvious thing you might do but this small move makes it much more possible to play with the thought of a high dimensional point and Even little things like thinking about the squares of coordinates as real estate Can shed light on some seemingly strange aspects of high dimensions like just how far away the corner of a box is from its center If anything the fact that it's such a direct representation of a purely analytic is Exactly, what makes it such a faithful reflection of what genuinely doing math and higher dimensions entails? we're still flying in the clouds trusting the instruments of analytic reasoning, but this is a Redesign of those instruments one which better takes advantage of the fact that such a large portion of our brains goes towards image processing I Mean just because you can't visualize something doesn't mean you can't still think about it visually Before you go, I've got a couple announcements for you guys including a qa follow-up that I owe you first off I want to tell you about Brilliant org who helped to support this video this is just such a good example of strong alignment between Sponsor and content creator a Lot of people ask me for advice on where and how to learn more math and more than anything I cannot emphasize enough How important it is to? Prioritize active problem-solving with things like lectures and readings serving as a supplement to that main goal And there's an irony in me saying that since the videos I create are a form of passive education But my real hope with these is that they inspire people to go and actively play with math Brilliant org is one of the best places on the internet to do that I'm a member of their subscription service, and I think anyone watching this video would enjoy their course on the joy of problem solving or aptly enough outside of the box Geometry Also their complex Algebra course is really good But there's just many many good things to choose from both in and out of math the user experience is one of going through a very thoughtfully curated list of puzzles and questions and Honestly, it's just really fun If you follow the link brilliant org Slash three B-1b that lets them know that you came from here Also, I'm starting a podcast. It's with two very smart guys each named ben where we talk about education and various other things Some of you may remember that last video I elicited Q&A questions and the first episode includes my responses to many of them For the version of it up on YouTube I actually had some fun and made the background video just a point wandering aimlessly Around on a 10 dimensional sphere for an hour, and it's weirdly captivating to watch so if you want to learn stuff go check out brilliant if you want more passive consumption check out the podcast and As always if you want to help out the channel consider sharing the video or supporting on Patreon you

Contents

Domains

Combinatorics

In some problems, each variable can take one of several discrete values, or the range of possible values is divided to give a finite number of possibilities. Taking the variables together, a huge number of combinations of values must be considered. This effect is also known as the combinatorial explosion. Even in the simplest case of binary variables, the number of possible combinations already is , exponential in the dimensionality. Naively, each additional dimension doubles the effort needed to try all combinations.

Sampling

There is an exponential increase in volume associated with adding extra dimensions to a mathematical space. For example, 102=100 evenly spaced sample points suffice to sample a unit interval (a "1-dimensional cube") with no more than 10−2=0.01 distance between points; an equivalent sampling of a 10-dimensional unit hypercube with a lattice that has a spacing of 10−2=0.01 between adjacent points would require 1020[=(102)10] sample points. In general, with a spacing distance of 10−n the 10-dimensional hypercube appears to be a factor of 10n(10-1)[=(10n)10/(10n)] "larger" than the 1-dimensional hypercube, which is the unit interval. In the above example n=2: when using a sampling distance of 0.01 the 10-dimensional hypercube appears to be 1018 "larger" than the unit interval. This effect is a combination of the combinatorics problems above and the distance function problems explained below.

Optimization

When solving dynamic optimization problems by numerical backward induction, the objective function must be computed for each combination of values. This is a significant obstacle when the dimension of the "state variable" is large.

Machine learning

In machine learning problems that involve learning a "state-of-nature" from a finite number of data samples in a high-dimensional feature space with each feature having a range of possible values, typically an enormous amount of training data is required to ensure that there are several samples with each combination of values. A typical rule of thumb is that there should be at least 5 training examples for each dimension in the representation.[3] With a fixed number of training samples, the predictive power of a classifier or regressor first increases as number of dimensions/features used is increased but then decreases,[4] which is known as Hughes phenomenon[5] or peaking phenomena.[3]

Distance functions

When a measure such as a Euclidean distance is defined using many coordinates, there is little difference in the distances between different pairs of samples.

One way to illustrate the "vastness" [citation needed for this quoted term] of high-dimensional Euclidean space is to compare the proportion of an inscribed hypersphere with radius and dimension , to that of a hypercube with edges of length . The volume of such a sphere is , where is the gamma function, while the volume of the cube is . As the dimension of the space increases, the hypersphere becomes an insignificant volume relative to that of the hypercube. This can clearly be seen by comparing the proportions as the dimension goes to infinity:

as .

Furthermore, the distance between the center and the corners is , which increases without bound for fixed r. In this sense, nearly all of the high-dimensional space is "far away" from the centre. To put it another way, the high-dimensional unit hypercube can be said to consist almost entirely of the "corners" of the hypercube, with almost no "middle".

This also helps to understand the chi-squared distribution. Indeed, the (non-central) chi-squared distribution associated to a random point in the interval [-1, 1] is the same as the distribution of the length-squared of a random point in the d-cube. By the law of large numbers, this distribution concentrates itself in a narrow band around d times the standard deviation squared (σ2) of the original derivation. This illuminates the chi-squared distribution and also illustrates that most of the volume of the d-cube concentrates near the surface of a sphere of radius dσ.

A further development of this phenomenon is as follows. Any fixed distribution on induces a product distribution on points in d. For any fixed n, it turns out that the minimum and the maximum distance between a random reference point Q and a list of n random data points P1,...,Pn become indiscernible compared to the minimum distance:[6]

.

This is often cited as distance functions losing their usefulness (for the nearest-neighbor criterion in feature-comparison algorithms, for example) in high dimensions. However, recent research has shown this to only hold in the artificial scenario when the one-dimensional distributions are independent and identically distributed.[7] When attributes are correlated, data can become easier and provide higher distance contrast and the signal-to-noise ratio was found to play an important role, thus feature selection should be used.[7]

Nearest neighbor search

The effect complicates nearest neighbor search in high dimensional space. It is not possible to quickly reject candidates by using the difference in one coordinate as a lower bound for a distance based on all the dimensions.[8][9]

However, it has recently been observed that the mere number of dimensions does not necessarily result in difficulties,[10] since relevant additional dimensions can also increase the contrast. In addition, for the resulting ranking it remains useful to discern close and far neighbors. Irrelevant ("noise") dimensions, however, reduce the contrast in the manner described above. In time series analysis, where the data are inherently high-dimensional, distance functions also work reliably as long as the signal-to-noise ratio is high enough.[11]

k-nearest neighbor classification

Another effect of high dimensionality on distance functions concerns k-nearest neighbor (k-NN) graphs constructed from a data set using a distance function. As the dimension increases, the indegree distribution of the k-NN digraph becomes skewed with a peak on the right because of the emergence of a disproportionate number of hubs, that is, data-points that appear in many more k-NN lists of other data-points than the average. This phenomenon can have a considerable impact on various techniques for classification (including the k-NN classifier), semi-supervised learning, and clustering,[12] and it also affects information retrieval.[13]

Anomaly detection

In a recent survey, Zimek et al. identified the following problems when searching for anomalies in high-dimensional data:[7]

  1. Concentration of scores and distances: derived values such as distances become numerically similar
  2. Irrelevant attributes: in high dimensional data, a significant number of attributes may be irrelevant
  3. Definition of reference sets: for local methods, reference sets are often nearest-neighbor based
  4. Incomparable scores for different dimensionalities: different subspaces produce incomparable scores
  5. Interpretability of scores: the scores often no longer convey a semantic meaning
  6. Exponential search space: the search space can no longer be systematically scanned
  7. Data snooping bias: given the large search space, for every desired significance a hypothesis can be found
  8. Hubness: certain objects occur more frequently in neighbor lists than others.

Many of the analyzed specialized methods tackle one or another of these problems, but there remain many open research questions.

See also

References

  1. ^ Richard Ernest Bellman; Rand Corporation (1957). Dynamic programming. Princeton University Press. ISBN 978-0-691-07951-6.,
    Republished: Richard Ernest Bellman (2003). Dynamic Programming. Courier Dover Publications. ISBN 978-0-486-42809-3.
  2. ^ Richard Ernest Bellman (1961). Adaptive control processes: a guided tour. Princeton University Press.
  3. ^ a b Koutroumbas, Sergios Theodoridis, Konstantinos (2008). Pattern Recognition - 4th Edition. Burlington. Retrieved 8 January 2018.
  4. ^ Trunk, G. V. (July 1979). "A Problem of Dimensionality: A Simple Example". IEEE Transactions on Pattern Analysis and Machine Intelligence. PAMI-1 (3): 306–307. doi:10.1109/TPAMI.1979.4766926.
  5. ^ Hughes, G.F. (January 1968). "On the mean accuracy of statistical pattern recognizers". IEEE Transactions on Information Theory. 14 (1): 55–63. doi:10.1109/TIT.1968.1054102.
  6. ^ Beyer, K.; Goldstein, J.; Ramakrishnan, R.; Shaft, U. (1999). When is "Nearest Neighbor" Meaningful?. Proc. 7th International Conference on Database Theory - ICDT'99. LNCS. 1540. pp. 217–235. doi:10.1007/3-540-49257-7_15. ISBN 978-3-540-65452-0.
  7. ^ a b c Zimek, A.; Schubert, E.; Kriegel, H.-P. (2012). "A survey on unsupervised outlier detection in high-dimensional numerical data". Statistical Analysis and Data Mining. 5 (5): 363–387. doi:10.1002/sam.11161.
  8. ^ Marimont, R.B.; Shapiro, M.B. (1979). "Nearest Neighbour Searches and the Curse of Dimensionality". IMA J Appl Math. 24 (1): 59–70. doi:10.1093/imamat/24.1.59.
  9. ^ Chávez, Edgar; Navarro, Gonzalo; Baeza-Yates, Ricardo; Marroquín, José Luis (2001). "Searching in Metric Spaces". ACM Computing Surveys. 33 (3): 273–321. CiteSeerX 10.1.1.100.7845. doi:10.1145/502807.502808.
  10. ^ Houle, M. E.; Kriegel, H. P.; Kröger, P.; Schubert, E.; Zimek, A. (2010). Can Shared-Neighbor Distances Defeat the Curse of Dimensionality? (PDF). Scientific and Statistical Database Management. Lecture Notes in Computer Science. 6187. p. 482. doi:10.1007/978-3-642-13818-8_34. ISBN 978-3-642-13817-1.
  11. ^ Bernecker, T.; Houle, M. E.; Kriegel, H. P.; Kröger, P.; Renz, M.; Schubert, E.; Zimek, A. (2011). Quality of Similarity Rankings in Time Series. Symposium on Spatial and Temporal Databases. Lecture Notes in Computer Science. 6849. p. 422. doi:10.1007/978-3-642-22922-0_25. ISBN 978-3-642-22921-3.
  12. ^ Radovanović, Miloš; Nanopoulos, Alexandros; Ivanović, Mirjana (2010). "Hubs in space: Popular nearest neighbors in high-dimensional data" (PDF). Journal of Machine Learning Research. 11: 2487–2531.
  13. ^ Radovanović, M.; Nanopoulos, A.; Ivanović, M. (2010). On the existence of obstinate results in vector space models. 33rd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '10. p. 186. doi:10.1145/1835449.1835482. ISBN 9781450301534.
This page was last edited on 4 June 2019, at 08:19
Basis of this page is in Wikipedia. Text is available under the CC BY-SA 3.0 Unported License. Non-text media are available under their specified licenses. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc. WIKI 2 is an independent company and has no affiliation with Wikimedia Foundation.