Curse of dimensionality
Last updated
Was this helpful?
Last updated
Was this helpful?
There are three problems in high dimension
First, More length is needed to capture same rate of data.
Second, All sample points are close to an edge of the sample.
Third, We need much more sample in high dimension to capture same percentage of data in low dimension.
βοΈ First Problem
Let's think about a unit hypercube in p dimension, . We assume the density of x is uniformly distributed.
The average length of one side in p dimension is that:
There is a subcube in hypercube occupying r% of all data.
In, one side length of a subcube is for data.
In, one side length of a subcube is for data.
In, one side length of a subcube is for data.
When dimension becomes higher, an amount of information becomes increased. More information means longer length of one side for same rate of data.
In , we need the length 0.63 to capture 1% of data.
βοΈ Second Problem
All sample becomes close into an edge.
βοΈ Third Problem
In this context, sampling density means the number of sample in unit length.
Find the y value satisfying . The distance the closest point from origin.
The distance becomes very short when the number of sample is large and the dimensionality is high. For our fixed sample, the sampling density is proportional to
To maintain the same density, we need one hundred samples in one dimension. However, we need samples in 100 dimension to maintain the same density.