To install click the Add extension button. That's it.

The source code for the WIKI 2 extension is being checked by specialists of the Mozilla Foundation, Google, and Apple. You could also do it yourself at any point in time.

4,5
Kelly Slayton
Congratulations on this excellent venture… what a great idea!
Alexander Grigorievskiy
I use WIKI 2 every day and almost forgot how the original Wikipedia looks like.
Live Statistics
English Articles
Improved in 24 Hours
Added in 24 Hours
Languages
Recent
Show all languages
What we do. Every page goes through several hundred of perfecting techniques; in live mode. Quite the same Wikipedia. Just better.
.
Leo
Newton
Brights
Milds

Compositional data

From Wikipedia, the free encyclopedia

In statistics, compositional data are quantitative descriptions of the parts of some whole, conveying relative information. Mathematically, compositional data is represented by points on a simplex. Measurements involving probabilities, proportions, percentages, and ppm can all be thought of as compositional data.

Ternary plot

In three variables, compositional data in three variables can be plotted via ternary plots. The use of a barycentric plot on three variables graphically depicts the ratios of the three variables as positions in an equilateral triangle.

Simplicial sample space

In general, John Aitchison defined compositional data to be proportions of some whole in 1982.[1] In particular, a compositional data point (or composition for short) can be represented by a positive real vector. The sample space of compositional data is a simplex:

An illustration of the Aitchison simplex.  Here, there are 3 parts,  x 1 , x 2 , x 3 {\displaystyle x_{1},x_{2},x_{3}}  represent values of different proportions.  A, B, C, D and E are 5 different compositions within the simplex.  A, B and C are all equivalent and D and E are equivalent.
An illustration of the Aitchison simplex. Here, there are 3 parts, represent values of different proportions. A, B, C, D and E are 5 different compositions within the simplex. A, B and C are all equivalent and D and E are equivalent.

The only information is given by the ratios between components, so the information of a composition is preserved under multiplication by any positive constant. Therefore the sample space of compositional data can always be assumed to be a standard simplex, i.e. . In this context, normalization to the standard simplex is called closure and is denoted by :

where D is the number of parts (components) and denotes a row vector.

Aitchison geometry

The simplex can be given the structure of a real vector space in several different ways. The following vector space structure is called Aitchison geometry or the Aitchison simplex and has the following operations:

Perturbation
Powering
Inner product

Under these operations alone, it is sufficient to show that the Aitchison simplex forms a Euclidean vector space.

Orthonormal bases

Since the Aitchison simplex forms a finite Hilbert space, it is possible to construct orthonormal bases in the simplex. Every composition can be decomposed as follows

Where forms an orthonormal basis in the simplex.[2]

Linear transformations

There are three well-characterized isomorphisms that transform from the Aitchison simplex to real space. All of these transforms satisfy linearity and as given below

Additive logratio transform

The additive log ratio (alr) transform is an where . This is given by

The choice of denominator component is arbitrary, and could be any specified component. This transform is commonly used in chemistry with measurements such as pH. In addition, this is the transform most commonly used for multinomial logistic regression. The alr transform is not an isometry, meaning that distances on transformed values will not be equivalent to distances on the original compositions in the simplex.

Center logratio transform

The center log ratio (clr) transform is both an isomorphism and an isometry where

The inverse of this function is also known as the softmax function commonly used in neural networks.

Isometric logratio transform

The isometric log ratio (ilr) transform is both an isomorphism and an isometry where

There are multiple ways to construct orthonormal bases, including using the  Gram–Schmidt orthogonalization or singular-value decomposition of clr transformed data. Another alternative is to construct log contrasts from a bifurcating tree. If are given a bifurcating tree, we can construct a basis from the internal nodes in the tree.

A representation of a tree in terms of its orthogonal components. l represents an internal node, an element of the orthonormal basis. This is a precursor to using the tree as a scaffold for the ilr transform
A representation of a tree in terms of its orthogonal components. l represents an internal node, an element of the orthonormal basis. This is a precursor to using the tree as a scaffold for the ilr transform

Each vector in the basis would be determined as follows

The elements within each vector are given as follows

where are the respective number of tips in the corresponding subtrees shown in the figure. It can be shown that the resulting basis is orthonormal[3]

Once the basis is built, the ilr transform can be calculated as follows

where each element in the ilr transformed data is of the following form

where and are the set of values corresponding to the tips in the subtrees and

Examples

  • In chemistry, compositions can be expressed as molar concentrations of each component. As the sum of all concentrations is not determined, the whole composition of D parts is needed and thus expressed as a vector of D molar concentrations. These compositions can be translated into weight per cent multiplying each component by the appropriated constant.
  • In demography, a town may be a compositional data point in a sample of towns; a town in which 35% of the people are Christians, 55% are Muslims, 6% are Jews, and the remaining 4% are others would correspond to the quadruple [0.35, 0.55, 0.06, 0.04]. A data set would correspond to a list of towns.
  • In geology, a rock composed of different minerals may be a compositional data point in a sample of rocks; a rock of which 10% is the first mineral, 30% is the second, and the remaining 60% is the third would correspond to the triple [0.1, 0.3, 0.6]. A data set would contain one such triple for each rock in a sample of rocks.
  • In high throughput sequencing, data obtained are count compositions since the capacity of the machine determines the number of reads observed. These reduce to probabilities of observing a feature given the sequencing depth.
  • In probability and statistics, a partition of the sampling space into disjoint events is described by the probabilities assigned to such events. The vector of D probabilities can be considered as a composition of D parts. As they add to one, one probability can be suppressed and the composition is completely determined.
  • In a survey, the proportions of people positively answering some different items can be expressed as percentages. As the total amount is identified as 100, the compositional vector of D components can be defined using only D − 1 components, assuming that the remaining component is the percentage needed for the whole vector to add to 100.

See also

Notes

  1. ^ Aitchison, John (1982). "The Statistical Analysis of Compositional Data". Journal of the Royal Statistical Society. Series B (Methodological). 44 (2): 139–177. doi:10.1111/j.2517-6161.1982.tb01195.x.
  2. ^ Egozcue 2003.
  3. ^ Egozcue 2005.

References

Software

  • compositions – R package for compositional data analysis
  • coda.base – Compositional data analysis in R
  • CoDa.jl – Compositional data analysis in Julia

External links

This page was last edited on 23 September 2019, at 21:27
Basis of this page is in Wikipedia. Text is available under the CC BY-SA 3.0 Unported License. Non-text media are available under their specified licenses. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc. WIKI 2 is an independent company and has no affiliation with Wikimedia Foundation.