Show Tag: clustering

Select Other Tags

Noise can improve convergence in clustering algorithms.

"Stochastic competitive learning behaves as a form of adaptive quantization", because the centroids being adapted distribute themselves in the data space such that they minimize the quantization error (according to the distance metric being used).

My algorithms minimize the expected error since they take into account the probability of data points (via noise properties).

The account of abstraction due to Hoare is that we first cluster objects according to arbitrary similarities. We then find clusters which are predictive of the future and name them. Subsequently, the similarities within such a named cluster are thought of as essential whereas the differences are perceived as unimportant.

The account of the process of abstraction due to Hoare is

  • Abstraction (selecting those properties of the real things we want to represent in our abstraction)
  • Representation (choosing symbols like words, pictograms... for the abstraction)
  • Manipulation (declaring rules for how to use the symbolic representation to predict what will happen to the real things under certain circumstances)
  • Axiomatisation (declaring rigorously the relationship between the symbols used in the representation and the properties of the real things being abstracted from)

Keogh and Lin claim that what they call time series subsequence clustering is meaningless. Specifically, this means that the clusters found by clustering all subsequences of a time series of a certain length will yield the same (kind of) clusters as clustering random sequences.

Intuitively, clustering time series subsequences is meaningless for two reasons:

  • The sum (or average) of all subsequences of a time series is always a straight line, and the cluster centers (in $k$-means and related algorithms) sum to the global mean. Thus, only if the interesting features of a time series together average to a straight line can an algorithm find them. This, however, is not the case very often.
  • It should be expected from any meaningful clustering that subsequences which start at near-by time points end up in the same cluster (what Keogh and Lin call `trivial matches'). However, the similarity between close-by subsequences depends highly on the rate of change around a subsequence and thus a clustering algorithm will find cluster centers close to subsequences with a low rate of change rather than with a high rate of change and therefore typically for the less interesting subsequences.

Keogh and Lin propose focusing on finding motifs in time series instead of clusters.