To install click the Add extension button. That's it.

The source code for the WIKI 2 extension is being checked by specialists of the Mozilla Foundation, Google, and Apple. You could also do it yourself at any point in time.

4,5
Kelly Slayton
Congratulations on this excellent venture… what a great idea!
Alexander Grigorievskiy
I use WIKI 2 every day and almost forgot how the original Wikipedia looks like.
Live Statistics
English Articles
Improved in 24 Hours
Added in 24 Hours
What we do. Every page goes through several hundred of perfecting techniques; in live mode. Quite the same Wikipedia. Just better.
.
Leo
Newton
Brights
Milds

# Machine learning

## From Wikipedia, the free encyclopedia

Machine learning is a field of computer science that uses statistical techniques to give computer systems the ability to "learn" (e.g., progressively improve performance on a specific task) with data, without being explicitly programmed.[2]

The name machine learning was coined in 1959 by Arthur Samuel.[1] Evolved from the study of pattern recognition and computational learning theory in artificial intelligence,[3] machine learning explores the study and construction of algorithms that can learn from and make predictions on data[4] – such algorithms overcome following strictly static program instructions by making data-driven predictions or decisions,[5]:2 through building a model from sample inputs. Machine learning is employed in a range of computing tasks where designing and programming explicit algorithms with good performance is difficult or infeasible; example applications include email filtering, detection of network intruders, and computer vision.

Machine learning is closely related to (and often overlaps with) computational statistics, which also focuses on prediction-making through the use of computers. It has strong ties to mathematical optimization, which delivers methods, theory and application domains to the field. Machine learning is sometimes conflated with data mining,[6] where the latter subfield focuses more on exploratory data analysis and is known as unsupervised learning.[7][8]

Within the field of data analytics, machine learning is a method used to devise complex models and algorithms that lend themselves to prediction; in commercial use, this is known as predictive analytics. These analytical models allow researchers, data scientists, engineers, and analysts to "produce reliable, repeatable decisions and results" and uncover "hidden insights" through learning from historical relationships and trends in the data.[9]

### YouTube Encyclopedic

• 1/5
Views:
306 722
4 237
554 439
124 708
307 678
• Machine Learning & Artificial Intelligence: Crash Course Computer Science #34
• A Day in the Life of a Machine Learning/Computer Science Student
• What is machine learning and how to learn it ?
• Artificial Intelligence Vs Machine Learning Vs Data science Vs Deep learning
• 11. Introduction to Machine Learning

#### Transcription

Hi, I’m Carrie Anne, and welcome to Crash Course Computer Science! As we’ve touched on many times in this series, computers are incredible at storing, organizing, fetching and processing huge volumes of data. That’s perfect for things like e-commerce websites with millions of items for sale, and for storing billions of health records for quick access by doctors. But what if we want to use computers not just to fetch and display data, but to actually make decisions about data? This is the essence of machine learning – algorithms that give computers the ability to learn from data, and then make predictions and decisions. Computer programs with this ability are extremely useful in answering questions like Is an email spam? Does a person’s heart have arrhythmia? What video should youtube recommend after this one? While useful, we probably wouldn’t describe these programs as “intelligent” in the same way we think of human intelligence. So, even though the terms are often interchanged, most computer scientists would say that machine learning is a set of techniques that sits inside the even more ambitious goal of Artificial Intelligence, or AI for short. INTRO Machine Learning and AI algorithms tend to be pretty sophisticated. So rather than wading into the mechanics of how they work, we're going to focus on what the algorithms do conceptually. Let’s start with a simple example: deciding if a moth is a Luna Moth or an Emperor Moth. This decision process is called classification, and an algorithm that does it is called a classifier. Although there are techniques that can use raw data for training – like photos and sounds – many algorithms reduce the complexity of real world objects and phenomena into what are called features. Features are values that usefully characterize the things we wish to classify. For our moth example, we’re going to use two features: “wingspan” and “mass”. In order to train our machine learning classifier to make good predictions, we’re going to need training data. To get that, we’d send an entomologist out into a forest to collect data for both luna and emperor moths. These experts can recognize different moths, so they not only record the feature values, but also label that data with the actual moth species. This is called labeled data. Because we only have two features, it’s easy to visualize this data in a scatterplot. Here, I’ve plotted data for 100 Emperor Moths in red and 100 Luna Moths in blue. We can see that the species make two groupings, but…. there’s some overlap in the middle… so it’s not entirely obvious how to best separate the two. That’s what machine learning algorithms do – find optimal separations! I’m just going to eyeball it and say anything less than 45 millimeters in wingspan is likely to be an Emperor Moth. We can add another division that says additionally mass must be less than .75 in order for our guess to be Emperor Moth. These lines that chop up the decision space are called decision boundaries. If we look closely at our data, we can see that 86 emperor moths would correctly end up inside the emperor decision region, but 14 would end up incorrectly in luna moth territory. On the other hand, 82 luna moths would be correct, with 18 falling onto the wrong side. A table, like this, showing where a classifier gets things right and wrong is called a confusion matrix... which probably should have also been the title of the last two movies in the Matrix Trilogy! Notice that there’s no way for us to draw lines that give us 100% accuracy. If we lower our wingspan decision boundary, we misclassify more Emperor moths as Lunas. If we raise it, we misclassify more Luna moths. The job of machine learning algorithms, at a high level, is to maximize correct classifications while minimizing errors On our training data, we get 168 moths correct, and 32 moths wrong, for an average classification accuracy of 84%. Now, using these decision boundaries, if we go out into the forest and encounter an unknown moth, we can measure its features and plot it onto our decision space. This is unlabeled data. Our decision boundaries offer a guess as to what species the moth is. In this case, we’d predict it’s a Luna Moth. This simple approach, of dividing the decision space up into boxes, can be represented by what’s called a decision tree, which would look like this pictorially or could be written in code using If-Statements, like this. A machine learning algorithm that produces decision trees needs to choose what features to divide on…and then for each of those features, what values to use for the division. Decision Trees are just one basic example of a machine learning technique. There are hundreds of algorithms in computer science literature today. And more are being published all the time. A few algorithms even use many decision trees working together to make a prediction. Computer scientists smugly call those Forests… because they contain lots of trees. There are also non-tree-based approaches, like Support Vector Machines, which essentially slice up the decision space using arbitrary lines. And these don’t have to be straight lines; they can be polynomials or some other fancy mathematical function. Like before, it’s the machine learning algorithm's job to figure out the best lines to provide the most accurate decision boundaries. So far, my examples have only had two features, which is easy enough for a human to figure out. If we add a third feature, let’s say, length of antennae, then our 2D lines become 3D planes, creating decision boundaries in three dimensions. These planes don’t have to be straight either. Plus, a truly useful classifier would contend with many different moth species. Now I think you’d agree this is getting too complicated to figure out by hand… But even this is a very basic example – just three features and five moth species. We can still show it in this 3D scatter plot. Unfortunately, there’s no good way to visualize four features at once, or twenty features, let alone hundreds or even thousands of features. But that’s what many real-world machine learning problems face. Can YOU imagine trying to figure out the equation for a hyperplane rippling through a thousand-dimensional decision space? Probably not, but computers, with clever machine learning algorithms can… and they do, all day long, on computers at places like Google, Facebook, Microsoft and Amazon. Techniques like Decision Trees and Support Vector Machines are strongly rooted in the field of statistics, which has dealt with making confident decisions, using data, long before computers ever existed. There’s a very large class of widely used statistical machine learning techniques, but there are also some approaches with no origins in statistics. Most notable are artificial neural networks, which were inspired by neurons in our brains! For a primer of biological neurons, check out our three-part overview here, but basically neurons are cells that process and transmit messages using electrical and chemical signals. They take one or more inputs from other cells, process those signals, and then emit their own signal. These form into huge interconnected networks that are able to process complex information. Just like your brain watching this youtube video. Artificial Neurons are very similar. Each takes a series of inputs, combines them, and emits a signal. Rather than being electrical or chemical signals, artificial neurons take numbers in, and spit numbers out. They are organized into layers that are connected by links, forming a network of neurons, hence the name. Let’s return to our moth example to see how neural nets can be used for classification. Our first layer – the input layer – provides data from a single moth needing classification. Again, we’ll use mass and wingspan. At the other end, we have an output layer, with two neurons: one for Emperor Moth and another for Luna Moth. The most excited neuron will be our classification decision. In between, we have a hidden layer, that transforms our inputs into outputs, and does the hard work of classification. To see how this is done, let’s zoom into one neuron in the hidden layer. The first thing a neuron does is multiply each of its inputs by a specific weight, let’s say 2.8 for its first input, and .1 for it’s second input. Then, it sums these weighted inputs together, which is in this case, is a grand total of 9.74. The neuron then applies a bias to this result - in other words, it adds or subtracts a fixed value, for example, minus six, for a new value of 3.74. These bias and inputs weights are initially set to random values when a neural network is created. Then, an algorithm goes in, and starts tweaking all those values to train the neural network, using labeled data for training and testing. This happens over many interactions, gradually improving accuracy – a process very much like human learning. Finally, neurons have an activation function, also called a transfer function, that gets applied to the output, performing a final mathematical modification to the result. For example, limiting the value to a range from negative one and positive one, or setting any negative values to 0. We’ll use a linear transfer function that passes the value through unchanged, so 3.74 stays as 3.74. So for our example neuron, given the inputs .55 and 82, the output would be 3.74. This is just one neuron, but this process of weighting, summing, biasing and applying an activation function is computed for all neurons in a layer, and the values propagate forward in the network, one layer at a time. In this example, the output neuron with the highest value is our decision: Luna Moth. Importantly, the hidden layer doesn’t have to be just one layer… it can be many layers deep. This is where the term deep learning comes from. Training these more complicated networks takes a lot more computation and data. Despite the fact that neural networks were invented over fifty years ago, deep neural nets have only been practical very recently, thanks to powerful processors, but even more so, wicked fast GPUs. So, thank you gamers for being so demanding about silky smooth framerates! A couple of years ago, Google and Facebook demonstrated deep neural nets that could find faces in photos as well as humans – and humans are really good at this! It was a huge milestone. Now deep neural nets are driving cars, translating human speech, diagnosing medical conditions and much more. These algorithms are very sophisticated, but it’s less clear if they should be described as “intelligent”. They can really only do one thing like classify moths, find faces, or translate languages. This type of AI is called Weak AI or Narrow AI. It’s only intelligent at specific tasks. But that doesn’t mean it’s not useful; I mean medical devices that can make diagnoses, and cars that can drive themselves are amazing! But do we need those computers to compose music and look up delicious recipes in their free time? Probably not. Although that would be kinda cool. Truly general-purpose AI, one as smart and well-rounded as a human, is called Strong AI. No one has demonstrated anything close to human-level artificial intelligence yet. Some argue it’s impossible, but many people point to the explosion of digitized knowledge – like Wikipedia articles, web pages, and Youtube videos – as the perfect kindling for Strong AI. Although you can only watch a maximum of 24 hours of youtube a day, a computer can watch millions of hours. For example, IBM’s Watson consults and synthesizes information from 200 million pages of content, including the full text of Wikipedia. While not a Strong AI, Watson is pretty smart, and it crushed its human competition in Jeopardy way back in 2011. Not only can AIs gobble up huge volumes of information, but they can also learn over time, often much faster than humans. In 2016, Google debuted AlphaGo, a Narrow AI that plays the fiendishly complicated board game Go. One of the ways it got so good and able to beat the very best human players, was by playing clones of itself millions and millions of times. It learned what worked and what didn’t, and along the way, discovered successful strategies all by itself. This is called Reinforcement Learning, and it’s a super powerful approach. In fact, it’s very similar to how humans learn. People don’t just magically acquire the ability to walk... it takes thousands of hours of trial and error to figure it out. Computers are now on the cusp of learning by trial and error, and for many narrow problems, reinforcement learning is already widely used. What will be interesting to see, is if these types of learning techniques can be applied more broadly, to create human-like, Strong AIs that learn much like how kids learn, but at super accelerated rates. If that happens, there are some pretty big changes in store for humanity – a topic we’ll revisit later. Thanks for watching. See you next week.

## Overview

Tom M. Mitchell provided a widely quoted, more formal definition of the algorithms studied in the machine learning field: "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E."[10] This definition of the tasks in which machine learning is concerned offers a fundamentally operational definition rather than defining the field in cognitive terms. This follows Alan Turing's proposal in his paper "Computing Machinery and Intelligence", in which the question "Can machines think?" is replaced with the question "Can machines do what we (as thinking entities) can do?".[11] In Turing's proposal the various characteristics that could be possessed by a thinking machine and the various implications in constructing one are exposed.

### Machine learning tasks

Machine learning tasks are typically classified into two broad categories, depending on whether there is a learning "signal" or "feedback" available to a learning system:

• Supervised learning: The computer is presented with example inputs and their desired outputs, given by a "teacher", and the goal is to learn a general rule that maps inputs to outputs. As special cases, the input signal can be only partially available, or restricted to special feedback:
• Semi-supervised learning: the computer is given only an incomplete training signal: a training set with some (often many) of the target outputs missing.
• Active learning: the computer can only obtain training labels for a limited set of instances (based on a budget), and also has to optimize its choice of objects to acquire labels for. When used interactively, these can be presented to the user for labeling.
• Reinforcement learning: training data (in form of rewards and punishments) is given only as feedback to the program's actions in a dynamic environment, such as driving a vehicle or playing a game against an opponent.[5]:3
• Unsupervised learning: No labels are given to the learning algorithm, leaving it on its own to find structure in its input. Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or a means towards an end (feature learning).

### Machine learning applications

A support vector machine is a classifier that divides its input space into two regions, separated by a linear boundary. Here, it has learned to distinguish black and white circles.

Another categorization of machine learning tasks arises when one considers the desired output of a machine-learned system:[5]:3

• In classification, inputs are divided into two or more classes, and the learner must produce a model that assigns unseen inputs to one or more (multi-label classification) of these classes. This is typically tackled in a supervised way. Spam filtering is an example of classification, where the inputs are email (or other) messages and the classes are "spam" and "not spam".
• In regression, also a supervised problem, the outputs are continuous rather than discrete.
• In clustering, a set of inputs is to be divided into groups. Unlike in classification, the groups are not known beforehand, making this typically an unsupervised task.
• Density estimation finds the distribution of inputs in some space.
• Dimensionality reduction simplifies inputs by mapping them into a lower-dimensional space. Topic modeling is a related problem, where a program is given a list of human language documents and is tasked to find out which documents cover similar topics.

Among other categories of machine learning problems, learning to learn learns its own inductive bias based on previous experience. Developmental learning, elaborated for robot learning, generates its own sequences (also called curriculum) of learning situations to cumulatively acquire repertoires of novel skills through autonomous self-exploration and social interaction with human teachers and using guidance mechanisms such as active learning, maturation, motor synergies, and imitation.

## History and relationships to other fields

Arthur Samuel, an American pioneer in the field of computer gaming and artificial intelligence, coined the term "Machine Learning" in 1959 while at IBM[12]. As a scientific endeavour, machine learning grew out of the quest for artificial intelligence. Already in the early days of AI as an academic discipline, some researchers were interested in having machines learn from data. They attempted to approach the problem with various symbolic methods, as well as what were then termed "neural networks"; these were mostly perceptrons and other models that were later found to be reinventions of the generalized linear models of statistics.[13] Probabilistic reasoning was also employed, especially in automated medical diagnosis.[14]:488

However, an increasing emphasis on the logical, knowledge-based approach caused a rift between AI and machine learning. Probabilistic systems were plagued by theoretical and practical problems of data acquisition and representation.[14]:488 By 1980, expert systems had come to dominate AI, and statistics was out of favor.[15] Work on symbolic/knowledge-based learning did continue within AI, leading to inductive logic programming, but the more statistical line of research was now outside the field of AI proper, in pattern recognition and information retrieval.[14]:708–710; 755 Neural networks research had been abandoned by AI and computer science around the same time. This line, too, was continued outside the AI/CS field, as "connectionism", by researchers from other disciplines including Hopfield, Rumelhart and Hinton. Their main success came in the mid-1980s with the reinvention of backpropagation.[14]:25

Machine learning, reorganized as a separate field, started to flourish in the 1990s. The field changed its goal from achieving artificial intelligence to tackling solvable problems of a practical nature. It shifted focus away from the symbolic approaches it had inherited from AI, and toward methods and models borrowed from statistics and probability theory.[15] It also benefited from the increasing availability of digitized information, and the ability to distribute it via the Internet.

Machine learning and data mining often employ the same methods and overlap significantly, but while machine learning focuses on prediction, based on known properties learned from the training data, data mining focuses on the discovery of (previously) unknown properties in the data (this is the analysis step of knowledge discovery in databases). Data mining uses many machine learning methods, but with different goals; on the other hand, machine learning also employs data mining methods as "unsupervised learning" or as a preprocessing step to improve learner accuracy. Much of the confusion between these two research communities (which do often have separate conferences and separate journals, ECML PKDD being a major exception) comes from the basic assumptions they work with: in machine learning, performance is usually evaluated with respect to the ability to reproduce known knowledge, while in knowledge discovery and data mining (KDD) the key task is the discovery of previously unknown knowledge. Evaluated with respect to known knowledge, an uninformed (unsupervised) method will easily be outperformed by other supervised methods, while in a typical KDD task, supervised methods cannot be used due to the unavailability of training data.

Machine learning also has intimate ties to optimization: many learning problems are formulated as minimization of some loss function on a training set of examples. Loss functions express the discrepancy between the predictions of the model being trained and the actual problem instances (for example, in classification, one wants to assign a label to instances, and models are trained to correctly predict the pre-assigned labels of a set of examples). The difference between the two fields arises from the goal of generalization: while optimization algorithms can minimize the loss on a training set, machine learning is concerned with minimizing the loss on unseen samples.[16]

### Relation to statistics

Machine learning and statistics are closely related fields. According to Michael I. Jordan, the ideas of machine learning, from methodological principles to theoretical tools, have had a long pre-history in statistics.[17] He also suggested the term data science as a placeholder to call the overall field.[17]

Leo Breiman distinguished two statistical modelling paradigms: data model and algorithmic model,[18] wherein "algorithmic model" means more or less the machine learning algorithms like Random forest.

Some statisticians have adopted methods from machine learning, leading to a combined field that they call statistical learning.[19]

## Theory

A core objective of a learner is to generalize from its experience.[5][20] Generalization in this context is the ability of a learning machine to perform accurately on new, unseen examples/tasks after having experienced a learning data set. The training examples come from some generally unknown probability distribution (considered representative of the space of occurrences) and the learner has to build a general model about this space that enables it to produce sufficiently accurate predictions in new cases.

The computational analysis of machine learning algorithms and their performance is a branch of theoretical computer science known as computational learning theory. Because training sets are finite and the future is uncertain, learning theory usually does not yield guarantees of the performance of algorithms. Instead, probabilistic bounds on the performance are quite common. The bias–variance decomposition is one way to quantify generalization error.

For the best performance in the context of generalization, the complexity of the hypothesis should match the complexity of the function underlying the data. If the hypothesis is less complex than the function, then the model has underfit the data. If the complexity of the model is increased in response, then the training error decreases. But if the hypothesis is too complex, then the model is subject to overfitting and generalization will be poorer.[21]

In addition to performance bounds, computational learning theorists study the time complexity and feasibility of learning. In computational learning theory, a computation is considered feasible if it can be done in polynomial time. There are two kinds of time complexity results. Positive results show that a certain class of functions can be learned in polynomial time. Negative results show that certain classes cannot be learned in polynomial time.

## Approaches

### Decision tree learning

Decision tree learning uses a decision tree as a predictive model, which maps observations about an item to conclusions about the item's target value.

### Association rule learning

Association rule learning is a method for discovering interesting relations between variables in large databases.

### Artificial neural networks

An artificial neural network (ANN) learning algorithm, usually called "neural network" (NN), is a learning algorithm that is vaguely inspired by biological neural networks. Computations are structured in terms of an interconnected group of artificial neurons, processing information using a connectionist approach to computation. Modern neural networks are non-linear statistical data modeling tools. They are usually used to model complex relationships between inputs and outputs, to find patterns in data, or to capture the statistical structure in an unknown joint probability distribution between observed variables.

#### Deep learning

Falling hardware prices and the development of GPUs for personal use in the last few years have contributed to the development of the concept of deep learning which consists of multiple hidden layers in an artificial neural network. This approach tries to model the way the human brain processes light and sound into vision and hearing. Some successful applications of deep learning are computer vision and speech recognition.[22]

### Inductive logic programming

Inductive logic programming (ILP) is an approach to rule learning using logic programming as a uniform representation for input examples, background knowledge, and hypotheses. Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesized logic program that entails all positive and no negative examples. Inductive programming is a related field that considers any kind of programming languages for representing hypotheses (and not only logic programming), such as functional programs.

### Support vector machines

Support vector machines (SVMs) are a set of related supervised learning methods used for classification and regression. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that predicts whether a new example falls into one category or the other.

### Clustering

Cluster analysis is the assignment of a set of observations into subsets (called clusters) so that observations within the same cluster are similar according to some predesignated criterion or criteria, while observations drawn from different clusters are dissimilar. Different clustering techniques make different assumptions on the structure of the data, often defined by some similarity metric and evaluated for example by internal compactness (similarity between members of the same cluster) and separation between different clusters. Other methods are based on estimated density and graph connectivity. Clustering is a method of unsupervised learning, and a common technique for statistical data analysis.

### Bayesian networks

A Bayesian network, belief network or directed acyclic graphical model is a probabilistic graphical model that represents a set of random variables and their conditional independencies via a directed acyclic graph (DAG). For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases. Efficient algorithms exist that perform inference and learning.

### Reinforcement learning

Reinforcement learning is concerned with how an agent ought to take actions in an environment so as to maximize some notion of long-term reward. Reinforcement learning algorithms attempt to find a policy that maps states of the world to the actions the agent ought to take in those states. Reinforcement learning differs from the supervised learning problem in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected.

### Representation learning

Several learning algorithms, mostly unsupervised learning algorithms, aim at discovering better representations of the inputs provided during training. Classical examples include principal components analysis and cluster analysis. Representation learning algorithms often attempt to preserve the information in their input but transform it in a way that makes it useful, often as a pre-processing step before performing classification or predictions, allowing reconstruction of the inputs coming from the unknown data generating distribution, while not being necessarily faithful for configurations that are implausible under that distribution.

Manifold learning algorithms attempt to do so under the constraint that the learned representation is low-dimensional. Sparse coding algorithms attempt to do so under the constraint that the learned representation is sparse (has many zeros). Multilinear subspace learning algorithms aim to learn low-dimensional representations directly from tensor representations for multidimensional data, without reshaping them into (high-dimensional) vectors.[23] Deep learning algorithms discover multiple levels of representation, or a hierarchy of features, with higher-level, more abstract features defined in terms of (or generating) lower-level features. It has been argued that an intelligent machine is one that learns a representation that disentangles the underlying factors of variation that explain the observed data.[24]

### Similarity and metric learning

In this problem, the learning machine is given pairs of examples that are considered similar and pairs of less similar objects. It then needs to learn a similarity function (or a distance metric function) that can predict if new objects are similar. It is sometimes used in Recommendation systems.

### Sparse dictionary learning

In this method, a datum is represented as a linear combination of basis functions, and the coefficients are assumed to be sparse. Let x be a d-dimensional datum, D be a d by n matrix, where each column of D represents a basis function. r is the coefficient to represent x using D. Mathematically, sparse dictionary learning means solving ${\displaystyle x\approx Dr}$ where r is sparse. Generally speaking, n is assumed to be larger than d to allow the freedom for a sparse representation.

Learning a dictionary along with sparse representations is strongly NP-hard and also difficult to solve approximately.[25] A popular heuristic method for sparse dictionary learning is K-SVD.

Sparse dictionary learning has been applied in several contexts. In classification, the problem is to determine which classes a previously unseen datum belongs to. Suppose a dictionary for each class has already been built. Then a new datum is associated with the class such that it's best sparsely represented by the corresponding dictionary. Sparse dictionary learning has also been applied in image de-noising. The key idea is that a clean image patch can be sparsely represented by an image dictionary, but the noise cannot.[26]

### Genetic algorithms

A genetic algorithm (GA) is a search heuristic that mimics the process of natural selection, and uses methods such as mutation and crossover to generate new genotype in the hope of finding good solutions to a given problem. In machine learning, genetic algorithms found some uses in the 1980s and 1990s.[27][28] Conversely, machine learning techniques have been used to improve the performance of genetic and evolutionary algorithms.[29]

### Rule-based machine learning

Rule-based machine learning is a general term for any machine learning method that identifies, learns, or evolves "rules" to store, manipulate or apply, knowledge. The defining characteristic of a rule-based machine learner is the identification and utilization of a set of relational rules that collectively represent the knowledge captured by the system. This is in contrast to other machine learners that commonly identify a singular model that can be universally applied to any instance in order to make a prediction.[30] Rule-based machine learning approaches include learning classifier systems, association rule learning, and artificial immune systems.

#### Learning classifier systems

Learning classifier systems (LCS) are a family of rule-based machine learning algorithms that combine a discovery component (e.g. typically a genetic algorithm) with a learning component (performing either supervised learning, reinforcement learning, or unsupervised learning). They seek to identify a set of context-dependent rules that collectively store and apply knowledge in a piecewise manner in order to make predictions.[31]

## Applications

Applications for machine learning include:

In 2006, the online movie company Netflix held the first "Netflix Prize" competition to find a program to better predict user preferences and improve the accuracy on its existing Cinematch movie recommendation algorithm by at least 10%. A joint team made up of researchers from AT&T Labs-Research in collaboration with the teams Big Chaos and Pragmatic Theory built an ensemble model to win the Grand Prize in 2009 for \$1 million.[37] Shortly after the prize was awarded, Netflix realized that viewers' ratings were not the best indicators of their viewing patterns ("everything is a recommendation") and they changed their recommendation engine accordingly.[38]

In 2010 The Wall Street Journal wrote about the firm Rebellion Research and their use of Machine Learning to predict the financial crisis. [39]

In 2012, co-founder of Sun Microsystems Vinod Khosla predicted that 80% of medical doctors jobs would be lost in the next two decades to automated machine learning medical diagnostic software.[40]

In 2014, it has been reported that a machine learning algorithm has been applied in Art History to study fine art paintings, and that it may have revealed previously unrecognized influences between artists.[41]

## Limitations

Although machine learning has been transformative in some fields, effective machine learning is difficult because finding patterns is hard and often not enough training data are available; as a result, many machine-learning programs often fail to deliver the expected value.[42][43][44] Reasons for this are numerous: lack of (suitable) data, lack of access to the data, data bias, privacy problems, badly chosen tasks and algorithms, wrong tools and people, lack of resources, and evaluation problems.[45]

Machine learning approaches in particular can suffer from different data biases. A machine learning system trained on your current customers only may not be able to predict the needs of new customer groups that are not represented in the training data. When trained on man-made data, machine learning is likely to pick up the same constitutional and unconscious biases already present in society.[46] Language models learned from data have been shown to contain human-like biases.[47][48] Machine learning systems used for criminal risk assessment have been found to be biased against black people.[49][50] In 2015, Google photos would often tag black people as gorillas,[51] and in 2018 this still was not well resolved, but Google reportedly was still using the workaround to remove all gorilla from the training data, and thus was not able to recognize real gorillas at all.[52] Similar issues with recognizing non-white people have been found in many other systems.[53] In 2016, Microsoft tested a chatbot that learned from Twitter, and it quickly picked up racist and sexist language.[54] Because of such challenges, the effective use of machine learning may take longer to be adopted in other domains.[55] In 2018, a self-driving car from Uber failed to detect a pedestrian, who got killed in the accident.[56] Attempts to use machine learning in healthcare with the IBM Watson system failed to deliver even after years of time and billions of investment.[57][58]

## Model assessments

Classification machine learning models can be validated by accuracy estimation techniques like the Holdout method, which splits the data in a training and test set (conventionally 2/3 training set and 1/3 test set designation) and evaluates the performance of the training model on the test set. In comparison, the N-fold-cross-validation method randomly splits the data in k subsets where the k-1 instances of the data are used to train the model while the kth instance is used to test the predictive ability of the training model. In addition to the holdout and cross-validation methods, bootstrap, which samples n instances with replacement from the dataset, can be used to assess model accuracy.[59]

In addition to overall accuracy, investigators frequently report sensitivity and specificity meaning True Positive Rate (TPR) and True Negative Rate (TNR) respectively. Similarly, investigators sometimes report the False Positive Rate (FPR) as well as the False Negative Rate (FNR). However, these rates are ratios that fail to reveal their numerators and denominators. The Total Operating Characteristic (TOC) is an effective method to express a model's diagnostic ability. TOC shows the numerators and denominators of the previously mentioned rates, thus TOC provides more information than the commonly used Receiver Operating Characteristic (ROC) and ROC's associated Area Under the Curve (AUC).[60]

## Ethics

Machine learning poses a host of ethical questions. Systems which are trained on datasets collected with biases may exhibit these biases upon use (algorithmic bias), thus digitizing cultural prejudices.[61] For example, using job hiring data from a firm with racist hiring policies may lead to a machine learning system duplicating the bias by scoring job applicants against similarity to previous successful applicants.[62][63] Responsible collection of data and documentation of algorithmic rules used by a system thus is a critical part of machine learning.

Because language contains biases, machines trained on language corpora will necessarily also learn bias.[64]

Other forms of ethical challenges, not related to personal biases, are more seen in health care. There are concerns among health care professionals that these systems might not be designed in the public's interest, but as income generating machines. This is especially true in the United States where there is a perpetual ethical dilemma of improving health care, but also increasing profits. For example, the algorithms could be designed to provide patients with unnecessary tests or medication in which the algorithm's proprietary owners hold stakes in. There is huge potential for machine learning in health care to provide professionals a great tool to diagnose, medicate, and even plan recovery paths for patients, but this will not happen until the personal biases mentioned previously, and these "greed" biases are addressed.[65]

## Software

Software suites containing a variety of machine learning algorithms include the following :

## References

1. ^ a b Samuel, Arthur (1959). "Some Studies in Machine Learning Using the Game of Checkers". IBM Journal of Research and Development. 3 (3): 210–229. CiteSeerX . doi:10.1147/rd.33.0210.
2. ^ The "without being explicitly programmed" definition is often attributed to Arthur Samuel, who coined the term "machine learning" in 1959.[1] But the phrase is not found literally in this publication, and may be a paraphrase that appeared later. Confer "Paraphrasing Arthur Samuel (1959), the question is: How can computers learn to solve problems without being explicitly programmed?" in Koza, John R.; Bennett, Forrest H.; Andre, David; Keane, Martin A. (1996). Automated Design of Both the Topology and Sizing of Analog Electrical Circuits Using Genetic Programming. Artificial Intelligence in Design '96. Springer, Dordrecht. pp. 151–170. doi:10.1007/978-94-009-0279-4_9.
3. ^ http://www.britannica.com/EBchecked/topic/1116194/machine-learning  This tertiary source reuses information from other sources but does not name them.
4. ^ Ron Kohavi; Foster Provost (1998). "Glossary of terms". Machine Learning. 30: 271–274.
5. Bishop, C. M. (2006), Pattern Recognition and Machine Learning, Springer, ISBN 978-0-387-31073-2
6. ^ Mannila, Heikki (1996). Data mining: machine learning, statistics, and databases. Int'l Conf. Scientific and Statistical Database Management. IEEE Computer Society.
7. ^ Machine learning and pattern recognition "can be viewed as two facets of the same field."[5]:vii
8. ^ Friedman, Jerome H. (1998). "Data Mining and Statistics: What's the connection?". Computing Science and Statistics. 29 (1): 3–9.
9. ^ "Machine Learning: What it is and why it matters". www.sas.com. Retrieved 2016-03-29.
10. ^ Mitchell, T. (1997). Machine Learning. McGraw Hill. p. 2. ISBN 978-0-07-042807-2.
11. ^ Harnad, Stevan (2008), "The Annotation Game: On Turing (1950) on Computing, Machinery, and Intelligence", in Epstein, Robert; Peters, Grace, The Turing Test Sourcebook: Philosophical and Methodological Issues in the Quest for the Thinking Computer, Kluwer
12. ^ R. Kohavi and F. Provost, "Glossary of terms," Machine Learning, vol. 30, no. 2–3, pp. 271–274, 1998.
13. ^ Sarle, Warren. "Neural Networks and statistical models". CiteSeerX .
14. ^ a b c d Russell, Stuart; Norvig, Peter (2003) [1995]. Artificial Intelligence: A Modern Approach (2nd ed.). Prentice Hall. ISBN 978-0137903955.
15. ^ a b Langley, Pat (2011). "The changing science of machine learning". Machine Learning. 82 (3): 275–279. doi:10.1007/s10994-011-5242-y.
16. ^ Le Roux, Nicolas; Bengio, Yoshua; Fitzgibbon, Andrew (2012). "Improving First and Second-Order Methods by Modeling Uncertainty". In Sra, Suvrit; Nowozin, Sebastian; Wright, Stephen J. Optimization for Machine Learning. MIT Press. p. 404.
17. ^ a b Michael I. Jordan (2014-09-10). "statistics and machine learning". reddit. Retrieved 2014-10-01.
18. ^ Cornell University Library. "Breiman: Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author)". Retrieved 8 August 2015.
19. ^ Gareth James; Daniela Witten; Trevor Hastie; Robert Tibshirani (2013). An Introduction to Statistical Learning. Springer. p. vii.
20. ^ Mohri, Mehryar; Rostamizadeh, Afshin; Talwalkar, Ameet (2012). Foundations of Machine Learning. USA, Massachusetts: MIT Press. ISBN 9780262018258.
21. ^ a b Alpaydin, Ethem (2010). Introduction to Machine Learning. London: The MIT Press. ISBN 978-0-262-01243-0. Retrieved 4 February 2017.
22. ^ Honglak Lee, Roger Grosse, Rajesh Ranganath, Andrew Y. Ng. "Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations" Proceedings of the 26th Annual International Conference on Machine Learning, 2009.
23. ^ Lu, Haiping; Plataniotis, K.N.; Venetsanopoulos, A.N. (2011). "A Survey of Multilinear Subspace Learning for Tensor Data" (PDF). Pattern Recognition. 44 (7): 1540–1551. doi:10.1016/j.patcog.2011.01.004.
24. ^ Yoshua Bengio (2009). Learning Deep Architectures for AI. Now Publishers Inc. pp. 1–3. ISBN 978-1-60198-294-0.
25. ^ Tillmann, A. M. (2015). "On the Computational Intractability of Exact and Approximate Dictionary Learning". IEEE Signal Processing Letters. 22 (1): 45–49. arXiv:. Bibcode:2015ISPL...22...45T. doi:10.1109/LSP.2014.2345761.
26. ^ Aharon, M, M Elad, and A Bruckstein. 2006. "K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation." Signal Processing, IEEE Transactions on 54 (11): 4311–4322
27. ^ Goldberg, David E.; Holland, John H. (1988). "Genetic algorithms and machine learning". Machine Learning. 3 (2): 95–99. doi:10.1007/bf00113892.
28. ^ Michie, D.; Spiegelhalter, D. J.; Taylor, C. C. (1994). Machine Learning, Neural and Statistical Classification. Ellis Horwood.
29. ^ Zhang, Jun; Zhan, Zhi-hui; Lin, Ying; Chen, Ni; Gong, Yue-jiao; Zhong, Jing-hui; Chung, Henry S.H.; Li, Yun; Shi, Yu-hui (2011). "Evolutionary Computation Meets Machine Learning: A Survey" (PDF). Computational Intelligence Magazine. 6 (4): 68–75. doi:10.1109/mci.2011.942584.
30. ^ Bassel, George W.; Glaab, Enrico; Marquez, Julietta; Holdsworth, Michael J.; Bacardit, Jaume (2011-09-01). "Functional Network Construction in Arabidopsis Using Rule-Based Machine Learning on Large-Scale Data Sets". The Plant Cell. 23 (9): 3101–3116. doi:10.1105/tpc.111.088153. ISSN 1532-298X. PMC . PMID 21896882.
31. ^ Urbanowicz, Ryan J.; Moore, Jason H. (2009-09-22). "Learning Classifier Systems: A Complete Introduction, Review, and Roadmap". Journal of Artificial Evolution and Applications. 2009: 1–25. doi:10.1155/2009/736398. ISSN 1687-6229.
32. ^ Bridge, James P., Sean B. Holden, and Lawrence C. Paulson. "Machine learning for first-order theorem proving." Journal of automated reasoning 53.2 (2014): 141–172.
33. ^ Loos, Sarah, et al. "Deep Network Guided Proof Search." arXiv preprint arXiv:1701.06972 (2017).
34. ^ Finnsson, Hilmar, and Yngvi Björnsson. "Simulation-Based Approach to General Game Playing." AAAI. Vol. 8. 2008.
35. ^ Sarikaya, Ruhi, Geoffrey E. Hinton, and Anoop Deoras. "Application of deep belief networks for natural language understanding." IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 22.4 (2014): 778–784.
36. ^ "AI-based translation to soon reach human levels: industry officials". Yonhap news agency. Retrieved 4 Mar 2017.
37. ^ "BelKor Home Page" research.att.com
38. ^ "The Netflix Tech Blog: Netflix Recommendations: Beyond the 5 stars (Part 1)". Retrieved 8 August 2015.
39. ^ Scott Patterson (13 July 2010). "Letting the Machines Decide". The Wall Street Journal. Retrieved 24 June 2018.
40. ^ Vonod Khosla (January 10, 2012). "Do We Need Doctors or Algorithms?". Tech Crunch.
41. ^
42. ^ "Why Machine Learning Models Often Fail to Learn: QuickTake Q&A". Bloomberg.com. 2016-11-10. Retrieved 2017-04-10.
43. ^ "The First Wave of Corporate AI Is Doomed to Fail". Harvard Business Review. 2017-04-18. Retrieved 2018-08-20.
44. ^ "Why the A.I. euphoria is doomed to fail". VentureBeat. 2016-09-18. Retrieved 2018-08-20.
45. ^ "9 Reasons why your machine learning project will fail". www.kdnuggets.com. Retrieved 2018-08-20.
46. ^ Garcia, Megan (2016). "Racist in the Machine". World Policy Journal. 33 (4): 111–117. doi:10.1215/07402775-3813015. ISSN 0740-2775.
47. ^ Caliskan, Aylin; Bryson, Joanna J.; Narayanan, Arvind (2017-04-14). "Semantics derived automatically from language corpora contain human-like biases". Science. 356 (6334): 183–186. doi:10.1126/science.aal4230. ISSN 0036-8075. PMID 28408601.
48. ^ Wang, Xinan; Dasgupta, Sanjoy (2016), Lee, D. D.; Sugiyama, M.; Luxburg, U. V.; Guyon, I., eds., "An algorithm for L1 nearest neighbor search via monotonic embedding" (PDF), Advances in Neural Information Processing Systems 29, Curran Associates, Inc., pp. 983–991, retrieved 2018-08-20
49. ^ "Machine Bias". ProPublica. Julia Angwin, Jeff Larson, Lauren Kirchner, Surya Mattu. 2016-05-23. Retrieved 2018-08-20.
50. ^ "Opinion | When an Algorithm Helps Send You to Prison". New York Times. Retrieved 2018-08-20.
51. ^ "Google apologises for racist blunder". BBC News. 2015-07-01. Retrieved 2018-08-20.
52. ^ "Google 'fixed' its racist algorithm by removing gorillas from its image-labeling tech". The Verge. Retrieved 2018-08-20.
53. ^ "Opinion | Artificial Intelligence's White Guy Problem". New York Times. Retrieved 2018-08-20.
54. ^ Metz, Rachel. "Why Microsoft's teen chatbot, Tay, said lots of awful things online". MIT Technology Review. Retrieved 2018-08-20.
55. ^ Simonite, Tom. "Microsoft says its racist chatbot illustrates how AI isn't adaptable enough to help most businesses". MIT Technology Review. Retrieved 2018-08-20.
56. ^ "Why Uber's self-driving car killed a pedestrian". The Economist. Retrieved 2018-08-20.
57. ^ "IBM's Watson recommended 'unsafe and incorrect' cancer treatments - STAT". STAT. 2018-07-25. Retrieved 2018-08-21.
58. ^ Hernandez, Daniela; Greenwald, Ted (2018-08-11). "IBM Has a Watson Dilemma". Wall Street Journal. ISSN 0099-9660. Retrieved 2018-08-21.
59. ^ Kohavi, Ron (1995). "A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection" (PDF). International Joint Conference on Artificial Intelligence.
60. ^ Pontius, Robert Gilmore; Si, Kangping (2014). "The total operating characteristic to measure diagnostic ability for multiple thresholds". International Journal of Geographical Information Science. 28 (3): 570–583. doi:10.1080/13658816.2013.862623.
61. ^ Bostrom, Nick (2011). "The Ethics of Artificial Intelligence" (PDF). Retrieved 11 April 2016.
62. ^ Edionwe, Tolulope. "The fight against racist algorithms". The Outline. Retrieved 17 November 2017.
63. ^ Jeffries, Adrianne. "Machine learning is racist because the internet is racist". The Outline. Retrieved 17 November 2017.
64. ^ Narayanan, Arvind (August 24, 2016). "Language necessarily contains human biases, and so will machines trained on language corpora". Freedom to Tinker.
65. ^ Char, D. S., Shah, N. H., & Magnus, D. (2018). Implementing Machine Learning in Health Care—Addressing Ethical Challenges.New England Journal of Medicine,378(11), 981-983. doi:10.1056/nejmp1714229

## Further reading

Basis of this page is in Wikipedia. Text is available under the CC BY-SA 3.0 Unported License. Non-text media are available under their specified licenses. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc. WIKI 2 is an independent company and has no affiliation with Wikimedia Foundation.