New to machine learning , Whether in books , Or on the learning platform , The first one is Supervised learning , So what is supervised learning ? supervise —— seeing the name of a thing one thinks of its function , Take you “ Watch learning ”, To put it bluntly, let your computer understand a rule , And learn a lot according to this law , Finally, through the law forecast perhaps classification .
There is garbage classification in life , There are also good and bad categories of goods , Everything that exists in this world , We all define an attribute for it , People are no exception , They are called good people and bad people , There are also differences between the rich and the poor , A thing can be defined with multiple attributes .
In supervised learning , It is divided into : Classification and regression , Here is a brief introduction to what is classification , What kind of data is suitable for classification , How many categories are there ?
classification : It is applicable to the discrete data of the target column , Note that here is the target column , That is, the columns to be predicted through the model , If it is a discrete data , Then it applies to classification .
Classification is divided into : Two classification , Many classification
Two classification There are only two cases for a target column , The general classification is mainly classified into two categories , For example, in predicting whether the tumor is benign or malignant , Predict whether the product will be sold successfully , Test whether a sample is qualified .
Many classification There are many situations in a target list , For example, a certain credit rating has :A、B、C、D Four situations , So this is a multi category situation , The algorithm of multi classification is similar to that of two classification , There is a difference in the details .
Definition of classification method : The classification analysis is based on the training set data of known categories , Establish classification model , The classification model is used to predict the category of unknown data objects .
1、 pattern recognition (Pattern Recognition), It is to study the automatic processing of patterns and
interpretation . The goal of pattern recognition is often to recognize , That is, analyze the mode category of the sample to be tested .
2、 forecast , The extended description of the given data is automatically derived from the historical data records , Thus, class prediction can be carried out for future data
measuring .
Practical application cases
1、 behavior analysis
2、 Item identification 、 Image detection
3、 E-mail classification ( Spam and non spam, etc )
4、 Classification of news articles 、 Handwritten digit recognition 、 Customer group classification in personalized marketing 、 Images / Video scene classification, etc
In the last article we talked about , The framework of machine learning and related theoretical knowledge , That is to say, in a complete model training , These steps are indispensable .
Use the following criteria to compare classification and prediction methods
The accuracy of the prediction : The ability of the model to correctly predict the class number of new data Speed : The computational costs of generating and using models Robustness, : Given noise data or data with vacancy value , The ability of the model to correctly predict Scalability : For a lot of data , Ability to build models effectively Interpretability : The level of understanding and insight provided by the learning model
Logical regression ( Although it is a regression algorithm, it is actually a problem of completing classification )
Decision tree ( Include ID3 Algorithm 、 C4.5 Algorithm and CART Algorithm )
neural network
Bayes
K- Nearest neighbor algorithm
Support vector machine (SVM) These classification algorithms are not suitable for the same use scenarios , Only according to the actual application evaluation can the appropriate algorithm be selected Model .
Common applications of classification algorithms include : Decision tree method in medical diagnosis 、 Loan risk assessment and other fields ; Neural networks are used to recognize handwritten characters 、 Speech recognition and face recognition , Bayes in spam filtering 、 Application of text spelling correction direction, etc .
Classification is also a common prediction problem , The problems solved by this classification are basically the same as those in life , For example, we will decide whether to travel according to the weather , The weather condition is the eigenvalue of the dependent variable , Travel or not is the tag value of the dependent variable , Classification algorithm is to automate or semi automate the process of our thinking .
The typical application of classification in data mining is based on the characteristics of things at the data level , Scientific classification of things . The difference between classification and regression is : Regression can be used to predict continuous target variables , Classification can be used to predict discrete target variables .
In computer language , classification What logic language do you think of most easily , Pretty good , The answer is : Judgment statement
This is also the bottom idea of classification , It's like a decision tree , There are many branches under one condition
In the next issue, we will introduce , Feature Engineering in model training
Don't envy to get quiet , Don't try to compare yourself until you finally get to Qingyun