Saturday, 2 March 2019

Car Safety Analysis


Career Mantra 2

Car Safety Analysis

1) importing the dataset

In [68]:
import pandas
In [69]:
d = pandas.read_csv('C://Users//Admin//Documents//careval.csv',header='infer' )
In [70]:
d
Out[70]:
Buying Price Maintenance Cost Number of Doors Number of Persons Lug Boot Safety
0 vhigh vhigh 2 2 small low
1 vhigh vhigh 2 2 small med
2 vhigh vhigh 2 2 small high
3 vhigh vhigh 2 2 med low
4 vhigh vhigh 2 2 med med
5 vhigh vhigh 2 2 med high
6 vhigh vhigh 2 2 big low
7 vhigh vhigh 2 2 big med
8 vhigh vhigh 2 2 big high
9 vhigh vhigh 2 4 small low
10 vhigh vhigh 2 4 small med
11 vhigh vhigh 2 4 small high
12 vhigh vhigh 2 4 med low
13 vhigh vhigh 2 4 med med
14 vhigh vhigh 2 4 med high
15 vhigh vhigh 2 4 big low
16 vhigh vhigh 2 4 big med
17 vhigh vhigh 2 4 big high
18 vhigh vhigh 2 more small low
19 vhigh vhigh 2 more small med
20 vhigh vhigh 2 more small high
21 vhigh vhigh 2 more med low
22 vhigh vhigh 2 more med med
23 vhigh vhigh 2 more med high
24 vhigh vhigh 2 more big low
25 vhigh vhigh 2 more big med
26 vhigh vhigh 2 more big high
27 vhigh vhigh 3 2 small low
28 vhigh vhigh 3 2 small med
29 vhigh vhigh 3 2 small high
... ... ... ... ... ... ...
1698 low low 4 more big low
1699 low low 4 more big med
1700 low low 4 more big high
1701 low low 5more 2 small low
1702 low low 5more 2 small med
1703 low low 5more 2 small high
1704 low low 5more 2 med low
1705 low low 5more 2 med med
1706 low low 5more 2 med high
1707 low low 5more 2 big low
1708 low low 5more 2 big med
1709 low low 5more 2 big high
1710 low low 5more 4 small low
1711 low low 5more 4 small med
1712 low low 5more 4 small high
1713 low low 5more 4 med low
1714 low low 5more 4 med med
1715 low low 5more 4 med high
1716 low low 5more 4 big low
1717 low low 5more 4 big med
1718 low low 5more 4 big high
1719 low low 5more more small low
1720 low low 5more more small med
1721 low low 5more more small high
1722 low low 5more more med low
1723 low low 5more more med med
1724 low low 5more more med high
1725 low low 5more more big low
1726 low low 5more more big med
1727 low low 5more more big high

1728 rows × 6 columns

2) Preprocessing Step

In [71]:
from sklearn import preprocessing
In [8]:
le = preprocessing.LabelEncoder()
In [72]:
data=d.apply(le.fit_transform)
In [73]:
data
Out[73]:
Buying Price Maintenance Cost Number of Doors Number of Persons Lug Boot Safety
0 3 3 0 0 2 1
1 3 3 0 0 2 2
2 3 3 0 0 2 0
3 3 3 0 0 1 1
4 3 3 0 0 1 2
5 3 3 0 0 1 0
6 3 3 0 0 0 1
7 3 3 0 0 0 2
8 3 3 0 0 0 0
9 3 3 0 1 2 1
10 3 3 0 1 2 2
11 3 3 0 1 2 0
12 3 3 0 1 1 1
13 3 3 0 1 1 2
14 3 3 0 1 1 0
15 3 3 0 1 0 1
16 3 3 0 1 0 2
17 3 3 0 1 0 0
18 3 3 0 2 2 1
19 3 3 0 2 2 2
20 3 3 0 2 2 0
21 3 3 0 2 1 1
22 3 3 0 2 1 2
23 3 3 0 2 1 0
24 3 3 0 2 0 1
25 3 3 0 2 0 2
26 3 3 0 2 0 0
27 3 3 1 0 2 1
28 3 3 1 0 2 2
29 3 3 1 0 2 0
... ... ... ... ... ... ...
1698 1 1 2 2 0 1
1699 1 1 2 2 0 2
1700 1 1 2 2 0 0
1701 1 1 3 0 2 1
1702 1 1 3 0 2 2
1703 1 1 3 0 2 0
1704 1 1 3 0 1 1
1705 1 1 3 0 1 2
1706 1 1 3 0 1 0
1707 1 1 3 0 0 1
1708 1 1 3 0 0 2
1709 1 1 3 0 0 0
1710 1 1 3 1 2 1
1711 1 1 3 1 2 2
1712 1 1 3 1 2 0
1713 1 1 3 1 1 1
1714 1 1 3 1 1 2
1715 1 1 3 1 1 0
1716 1 1 3 1 0 1
1717 1 1 3 1 0 2
1718 1 1 3 1 0 0
1719 1 1 3 2 2 1
1720 1 1 3 2 2 2
1721 1 1 3 2 2 0
1722 1 1 3 2 1 1
1723 1 1 3 2 1 2
1724 1 1 3 2 1 0
1725 1 1 3 2 0 1
1726 1 1 3 2 0 2
1727 1 1 3 2 0 0

1728 rows × 6 columns

In [13]:
from sklearn.cross_validation import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn import tree
from sklearn import preprocessing

3) Splitting The Training and Testing data

In [16]:
X= data.values[:,0:4]
Y= data.values[:,5]
In [18]:
X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size = 0.3, random_state = 100)

4) Fitting the Decision Tree

In [77]:
clf_gini = DecisionTreeClassifier(criterion = "entropy", random_state = 100,
                               max_depth=3, min_samples_leaf=5)
clf_gini.fit(X_train, y_train)
Out[77]:
DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=3,
            max_features=None, max_leaf_nodes=None,
            min_impurity_split=1e-07, min_samples_leaf=5,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            presort=False, random_state=100, splitter='best')

5) Predicting on the Test data

In [78]:
y_pred = clf_gini.predict(X_test)

6) Validating

In [79]:
accuracy_score(y_test,y_pred)
Out[79]:
0.26589595375722541

7) Same Procedure on KNN

In [80]:
from sklearn.neighbors import KNeighborsClassifier
In [81]:
knn=KNeighborsClassifier(n_neighbors=10)
In [82]:
knn.fit(X_train,y_train)
Out[82]:
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=10, p=2,
           weights='uniform')
In [83]:
y_predknn=knn.predict(X_test)
In [84]:
accuracy_score(y_test,y_predknn)
Out[84]:
0.18497109826589594

8) Cross Validation

In [90]:
from sklearn.model_selection import cross_val_score
In [87]:
scores = cross_val_score(knn,X,Y,cv=5)
In [88]:
scores.mean()
Out[88]:
0.33333333333333331

9) Ensembling

In [58]:
from sklearn.ensemble import AdaBoostClassifier
In [60]:
abc=AdaBoostClassifier(n_estimators=10,learning_rate=1)
In [62]:
abc.fit(X_train,y_train)
Out[62]:
AdaBoostClassifier(algorithm='SAMME.R', base_estimator=None, learning_rate=1,
          n_estimators=10, random_state=None)
In [64]:
y_predada=abc.predict(X_test)
In [65]:
accuracy_score(y_test,y_predada)
Out[65]:
0.26974951830443161