The purpose of this project is to comapare Classification algorithms implemented on Lung Cancer Dataset
The Lung cancer dataset used in the project has been collected from data.world whose link is:
https://data.world/sta427ceyin/survey-lung-cancer
We have selected 10 of the following classification algorithms that have been used in this project:
- Logistic Regression
- K-Nearest Neighbors (KNN)
- Decision Tree
- Support Vector Machines (SVM)
- Naive Bayes
- Random Forest
- Gradient Boosting
- Neural Networks
- AdaBoost
- XGBoost
Then we build the model for each of the above mentioned algorithms. Using the following Evaluation Metrics we have compared the algorithms:
- Accuracy
- Precision
- F1 Score
- Recall Score
- Confusion Matrix
These are the accuracies of the algorithms:
- Logistic Regression: 90.29%
- K-Nearest Neighbors (KNN): 87.37%
- Decision Tree: 87.37%
- Support Vector Machines (SVM): 84.46%
- Naive Bayes: 86.4%
- Random Forest: 89.32%
- Gradient Boosting: 89.32%
- Neural Networks: 84.46%
- AdaBoost: 84.46%
- XGBoost: 84.46%
Out of all the algorithms so implemented, Logistic Regression performed the best. The evaluation metrics for Logistic Regression is as follows:
Accuracy: 0.9029126213592233
Precision: 0.9052631578947369
Recall: 0.9885057471264368
F1 score: 0.945054945054945
Confusion Matrix: