Open Data Science World
Monday, 16 October 2023
Accidental Deaths in India
Thursday, 2 March 2023
Tuesday, 17 March 2020
Saturday, 2 March 2019
Car Safety Analysis
Car Safety Analysis¶
1) importing the dataset¶
In [68]:
import pandas
In [69]:
d = pandas.read_csv('C://Users//Admin//Documents//careval.csv',header='infer' )
In [70]:
d
Out[70]:
| Buying Price | Maintenance Cost | Number of Doors | Number of Persons | Lug Boot | Safety | |
|---|---|---|---|---|---|---|
| 0 | vhigh | vhigh | 2 | 2 | small | low |
| 1 | vhigh | vhigh | 2 | 2 | small | med |
| 2 | vhigh | vhigh | 2 | 2 | small | high |
| 3 | vhigh | vhigh | 2 | 2 | med | low |
| 4 | vhigh | vhigh | 2 | 2 | med | med |
| 5 | vhigh | vhigh | 2 | 2 | med | high |
| 6 | vhigh | vhigh | 2 | 2 | big | low |
| 7 | vhigh | vhigh | 2 | 2 | big | med |
| 8 | vhigh | vhigh | 2 | 2 | big | high |
| 9 | vhigh | vhigh | 2 | 4 | small | low |
| 10 | vhigh | vhigh | 2 | 4 | small | med |
| 11 | vhigh | vhigh | 2 | 4 | small | high |
| 12 | vhigh | vhigh | 2 | 4 | med | low |
| 13 | vhigh | vhigh | 2 | 4 | med | med |
| 14 | vhigh | vhigh | 2 | 4 | med | high |
| 15 | vhigh | vhigh | 2 | 4 | big | low |
| 16 | vhigh | vhigh | 2 | 4 | big | med |
| 17 | vhigh | vhigh | 2 | 4 | big | high |
| 18 | vhigh | vhigh | 2 | more | small | low |
| 19 | vhigh | vhigh | 2 | more | small | med |
| 20 | vhigh | vhigh | 2 | more | small | high |
| 21 | vhigh | vhigh | 2 | more | med | low |
| 22 | vhigh | vhigh | 2 | more | med | med |
| 23 | vhigh | vhigh | 2 | more | med | high |
| 24 | vhigh | vhigh | 2 | more | big | low |
| 25 | vhigh | vhigh | 2 | more | big | med |
| 26 | vhigh | vhigh | 2 | more | big | high |
| 27 | vhigh | vhigh | 3 | 2 | small | low |
| 28 | vhigh | vhigh | 3 | 2 | small | med |
| 29 | vhigh | vhigh | 3 | 2 | small | high |
| ... | ... | ... | ... | ... | ... | ... |
| 1698 | low | low | 4 | more | big | low |
| 1699 | low | low | 4 | more | big | med |
| 1700 | low | low | 4 | more | big | high |
| 1701 | low | low | 5more | 2 | small | low |
| 1702 | low | low | 5more | 2 | small | med |
| 1703 | low | low | 5more | 2 | small | high |
| 1704 | low | low | 5more | 2 | med | low |
| 1705 | low | low | 5more | 2 | med | med |
| 1706 | low | low | 5more | 2 | med | high |
| 1707 | low | low | 5more | 2 | big | low |
| 1708 | low | low | 5more | 2 | big | med |
| 1709 | low | low | 5more | 2 | big | high |
| 1710 | low | low | 5more | 4 | small | low |
| 1711 | low | low | 5more | 4 | small | med |
| 1712 | low | low | 5more | 4 | small | high |
| 1713 | low | low | 5more | 4 | med | low |
| 1714 | low | low | 5more | 4 | med | med |
| 1715 | low | low | 5more | 4 | med | high |
| 1716 | low | low | 5more | 4 | big | low |
| 1717 | low | low | 5more | 4 | big | med |
| 1718 | low | low | 5more | 4 | big | high |
| 1719 | low | low | 5more | more | small | low |
| 1720 | low | low | 5more | more | small | med |
| 1721 | low | low | 5more | more | small | high |
| 1722 | low | low | 5more | more | med | low |
| 1723 | low | low | 5more | more | med | med |
| 1724 | low | low | 5more | more | med | high |
| 1725 | low | low | 5more | more | big | low |
| 1726 | low | low | 5more | more | big | med |
| 1727 | low | low | 5more | more | big | high |
1728 rows × 6 columns
2) Preprocessing Step¶
In [71]:
from sklearn import preprocessing
In [8]:
le = preprocessing.LabelEncoder()
In [72]:
data=d.apply(le.fit_transform)
In [73]:
data
Out[73]:
| Buying Price | Maintenance Cost | Number of Doors | Number of Persons | Lug Boot | Safety | |
|---|---|---|---|---|---|---|
| 0 | 3 | 3 | 0 | 0 | 2 | 1 |
| 1 | 3 | 3 | 0 | 0 | 2 | 2 |
| 2 | 3 | 3 | 0 | 0 | 2 | 0 |
| 3 | 3 | 3 | 0 | 0 | 1 | 1 |
| 4 | 3 | 3 | 0 | 0 | 1 | 2 |
| 5 | 3 | 3 | 0 | 0 | 1 | 0 |
| 6 | 3 | 3 | 0 | 0 | 0 | 1 |
| 7 | 3 | 3 | 0 | 0 | 0 | 2 |
| 8 | 3 | 3 | 0 | 0 | 0 | 0 |
| 9 | 3 | 3 | 0 | 1 | 2 | 1 |
| 10 | 3 | 3 | 0 | 1 | 2 | 2 |
| 11 | 3 | 3 | 0 | 1 | 2 | 0 |
| 12 | 3 | 3 | 0 | 1 | 1 | 1 |
| 13 | 3 | 3 | 0 | 1 | 1 | 2 |
| 14 | 3 | 3 | 0 | 1 | 1 | 0 |
| 15 | 3 | 3 | 0 | 1 | 0 | 1 |
| 16 | 3 | 3 | 0 | 1 | 0 | 2 |
| 17 | 3 | 3 | 0 | 1 | 0 | 0 |
| 18 | 3 | 3 | 0 | 2 | 2 | 1 |
| 19 | 3 | 3 | 0 | 2 | 2 | 2 |
| 20 | 3 | 3 | 0 | 2 | 2 | 0 |
| 21 | 3 | 3 | 0 | 2 | 1 | 1 |
| 22 | 3 | 3 | 0 | 2 | 1 | 2 |
| 23 | 3 | 3 | 0 | 2 | 1 | 0 |
| 24 | 3 | 3 | 0 | 2 | 0 | 1 |
| 25 | 3 | 3 | 0 | 2 | 0 | 2 |
| 26 | 3 | 3 | 0 | 2 | 0 | 0 |
| 27 | 3 | 3 | 1 | 0 | 2 | 1 |
| 28 | 3 | 3 | 1 | 0 | 2 | 2 |
| 29 | 3 | 3 | 1 | 0 | 2 | 0 |
| ... | ... | ... | ... | ... | ... | ... |
| 1698 | 1 | 1 | 2 | 2 | 0 | 1 |
| 1699 | 1 | 1 | 2 | 2 | 0 | 2 |
| 1700 | 1 | 1 | 2 | 2 | 0 | 0 |
| 1701 | 1 | 1 | 3 | 0 | 2 | 1 |
| 1702 | 1 | 1 | 3 | 0 | 2 | 2 |
| 1703 | 1 | 1 | 3 | 0 | 2 | 0 |
| 1704 | 1 | 1 | 3 | 0 | 1 | 1 |
| 1705 | 1 | 1 | 3 | 0 | 1 | 2 |
| 1706 | 1 | 1 | 3 | 0 | 1 | 0 |
| 1707 | 1 | 1 | 3 | 0 | 0 | 1 |
| 1708 | 1 | 1 | 3 | 0 | 0 | 2 |
| 1709 | 1 | 1 | 3 | 0 | 0 | 0 |
| 1710 | 1 | 1 | 3 | 1 | 2 | 1 |
| 1711 | 1 | 1 | 3 | 1 | 2 | 2 |
| 1712 | 1 | 1 | 3 | 1 | 2 | 0 |
| 1713 | 1 | 1 | 3 | 1 | 1 | 1 |
| 1714 | 1 | 1 | 3 | 1 | 1 | 2 |
| 1715 | 1 | 1 | 3 | 1 | 1 | 0 |
| 1716 | 1 | 1 | 3 | 1 | 0 | 1 |
| 1717 | 1 | 1 | 3 | 1 | 0 | 2 |
| 1718 | 1 | 1 | 3 | 1 | 0 | 0 |
| 1719 | 1 | 1 | 3 | 2 | 2 | 1 |
| 1720 | 1 | 1 | 3 | 2 | 2 | 2 |
| 1721 | 1 | 1 | 3 | 2 | 2 | 0 |
| 1722 | 1 | 1 | 3 | 2 | 1 | 1 |
| 1723 | 1 | 1 | 3 | 2 | 1 | 2 |
| 1724 | 1 | 1 | 3 | 2 | 1 | 0 |
| 1725 | 1 | 1 | 3 | 2 | 0 | 1 |
| 1726 | 1 | 1 | 3 | 2 | 0 | 2 |
| 1727 | 1 | 1 | 3 | 2 | 0 | 0 |
1728 rows × 6 columns
In [13]:
from sklearn.cross_validation import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn import tree
from sklearn import preprocessing
3) Splitting The Training and Testing data¶
In [16]:
X= data.values[:,0:4]
Y= data.values[:,5]
In [18]:
X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size = 0.3, random_state = 100)
4) Fitting the Decision Tree¶
In [77]:
clf_gini = DecisionTreeClassifier(criterion = "entropy", random_state = 100,
max_depth=3, min_samples_leaf=5)
clf_gini.fit(X_train, y_train)
Out[77]:
DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=3,
max_features=None, max_leaf_nodes=None,
min_impurity_split=1e-07, min_samples_leaf=5,
min_samples_split=2, min_weight_fraction_leaf=0.0,
presort=False, random_state=100, splitter='best')
5) Predicting on the Test data¶
In [78]:
y_pred = clf_gini.predict(X_test)
6) Validating¶
In [79]:
accuracy_score(y_test,y_pred)
Out[79]:
0.26589595375722541
7) Same Procedure on KNN¶
In [80]:
from sklearn.neighbors import KNeighborsClassifier
In [81]:
knn=KNeighborsClassifier(n_neighbors=10)
In [82]:
knn.fit(X_train,y_train)
Out[82]:
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=1, n_neighbors=10, p=2,
weights='uniform')
In [83]:
y_predknn=knn.predict(X_test)
In [84]:
accuracy_score(y_test,y_predknn)
Out[84]:
0.18497109826589594
8) Cross Validation¶
In [90]:
from sklearn.model_selection import cross_val_score
In [87]:
scores = cross_val_score(knn,X,Y,cv=5)
In [88]:
scores.mean()
Out[88]:
0.33333333333333331
9) Ensembling¶
In [58]:
from sklearn.ensemble import AdaBoostClassifier
In [60]:
abc=AdaBoostClassifier(n_estimators=10,learning_rate=1)
In [62]:
abc.fit(X_train,y_train)
Out[62]:
AdaBoostClassifier(algorithm='SAMME.R', base_estimator=None, learning_rate=1,
n_estimators=10, random_state=None)
In [64]:
y_predada=abc.predict(X_test)
In [65]:
accuracy_score(y_test,y_predada)
Out[65]:
0.26974951830443161
Subscribe to:
Posts (Atom)