Car Safety Analysis¶

1) importing the dataset¶

import pandas

d = pandas.read_csv('C://Users//Admin//Documents//careval.csv',header='infer' )

d

2) Preprocessing Step¶

from sklearn import preprocessing

le = preprocessing.LabelEncoder()

data=d.apply(le.fit_transform)

data

from sklearn.cross_validation import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn import tree
from sklearn import preprocessing

3) Splitting The Training and Testing data¶

X= data.values[:,0:4]
Y= data.values[:,5]

X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size = 0.3, random_state = 100)

4) Fitting the Decision Tree¶

clf_gini = DecisionTreeClassifier(criterion = "entropy", random_state = 100,
                               max_depth=3, min_samples_leaf=5)
clf_gini.fit(X_train, y_train)

DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=3,
            max_features=None, max_leaf_nodes=None,
            min_impurity_split=1e-07, min_samples_leaf=5,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            presort=False, random_state=100, splitter='best')

5) Predicting on the Test data¶

y_pred = clf_gini.predict(X_test)

6) Validating¶

accuracy_score(y_test,y_pred)

0.26589595375722541

7) Same Procedure on KNN¶

from sklearn.neighbors import KNeighborsClassifier

knn=KNeighborsClassifier(n_neighbors=10)

knn.fit(X_train,y_train)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=10, p=2,
           weights='uniform')

y_predknn=knn.predict(X_test)

accuracy_score(y_test,y_predknn)

0.18497109826589594

8) Cross Validation¶

from sklearn.model_selection import cross_val_score

scores = cross_val_score(knn,X,Y,cv=5)

scores.mean()

0.33333333333333331

9) Ensembling¶

from sklearn.ensemble import AdaBoostClassifier

abc=AdaBoostClassifier(n_estimators=10,learning_rate=1)

abc.fit(X_train,y_train)

AdaBoostClassifier(algorithm='SAMME.R', base_estimator=None, learning_rate=1,
          n_estimators=10, random_state=None)

y_predada=abc.predict(X_test)

accuracy_score(y_test,y_predada)

0.26974951830443161

	Buying Price	Maintenance Cost	Number of Doors	Number of Persons	Lug Boot	Safety
0	vhigh	vhigh	2	2	small	low
1	vhigh	vhigh	2	2	small	med
2	vhigh	vhigh	2	2	small	high
3	vhigh	vhigh	2	2	med	low
4	vhigh	vhigh	2	2	med	med
5	vhigh	vhigh	2	2	med	high
6	vhigh	vhigh	2	2	big	low
7	vhigh	vhigh	2	2	big	med
8	vhigh	vhigh	2	2	big	high
9	vhigh	vhigh	2	4	small	low
10	vhigh	vhigh	2	4	small	med
11	vhigh	vhigh	2	4	small	high
12	vhigh	vhigh	2	4	med	low
13	vhigh	vhigh	2	4	med	med
14	vhigh	vhigh	2	4	med	high
15	vhigh	vhigh	2	4	big	low
16	vhigh	vhigh	2	4	big	med
17	vhigh	vhigh	2	4	big	high
18	vhigh	vhigh	2	more	small	low
19	vhigh	vhigh	2	more	small	med
20	vhigh	vhigh	2	more	small	high
21	vhigh	vhigh	2	more	med	low
22	vhigh	vhigh	2	more	med	med
23	vhigh	vhigh	2	more	med	high
24	vhigh	vhigh	2	more	big	low
25	vhigh	vhigh	2	more	big	med
26	vhigh	vhigh	2	more	big	high
27	vhigh	vhigh	3	2	small	low
28	vhigh	vhigh	3	2	small	med
29	vhigh	vhigh	3	2	small	high
...	...	...	...	...	...	...
1698	low	low	4	more	big	low
1699	low	low	4	more	big	med
1700	low	low	4	more	big	high
1701	low	low	5more	2	small	low
1702	low	low	5more	2	small	med
1703	low	low	5more	2	small	high
1704	low	low	5more	2	med	low
1705	low	low	5more	2	med	med
1706	low	low	5more	2	med	high
1707	low	low	5more	2	big	low
1708	low	low	5more	2	big	med
1709	low	low	5more	2	big	high
1710	low	low	5more	4	small	low
1711	low	low	5more	4	small	med
1712	low	low	5more	4	small	high
1713	low	low	5more	4	med	low
1714	low	low	5more	4	med	med
1715	low	low	5more	4	med	high
1716	low	low	5more	4	big	low
1717	low	low	5more	4	big	med
1718	low	low	5more	4	big	high
1719	low	low	5more	more	small	low
1720	low	low	5more	more	small	med
1721	low	low	5more	more	small	high
1722	low	low	5more	more	med	low
1723	low	low	5more	more	med	med
1724	low	low	5more	more	med	high
1725	low	low	5more	more	big	low
1726	low	low	5more	more	big	med
1727	low	low	5more	more	big	high

	Buying Price	Maintenance Cost	Number of Doors	Number of Persons	Lug Boot	Safety
0	3	3	0	0	2	1
1	3	3	0	0	2	2
2	3	3	0	0	2	0
3	3	3	0	0	1	1
4	3	3	0	0	1	2
5	3	3	0	0	1	0
6	3	3	0	0	0	1
7	3	3	0	0	0	2
8	3	3	0	0	0	0
9	3	3	0	1	2	1
10	3	3	0	1	2	2
11	3	3	0	1	2	0
12	3	3	0	1	1	1
13	3	3	0	1	1	2
14	3	3	0	1	1	0
15	3	3	0	1	0	1
16	3	3	0	1	0	2
17	3	3	0	1	0	0
18	3	3	0	2	2	1
19	3	3	0	2	2	2
20	3	3	0	2	2	0
21	3	3	0	2	1	1
22	3	3	0	2	1	2
23	3	3	0	2	1	0
24	3	3	0	2	0	1
25	3	3	0	2	0	2
26	3	3	0	2	0	0
27	3	3	1	0	2	1
28	3	3	1	0	2	2
29	3	3	1	0	2	0
...	...	...	...	...	...	...
1698	1	1	2	2	0	1
1699	1	1	2	2	0	2
1700	1	1	2	2	0	0
1701	1	1	3	0	2	1
1702	1	1	3	0	2	2
1703	1	1	3	0	2	0
1704	1	1	3	0	1	1
1705	1	1	3	0	1	2
1706	1	1	3	0	1	0
1707	1	1	3	0	0	1
1708	1	1	3	0	0	2
1709	1	1	3	0	0	0
1710	1	1	3	1	2	1
1711	1	1	3	1	2	2
1712	1	1	3	1	2	0
1713	1	1	3	1	1	1
1714	1	1	3	1	1	2
1715	1	1	3	1	1	0
1716	1	1	3	1	0	1
1717	1	1	3	1	0	2
1718	1	1	3	1	0	0
1719	1	1	3	2	2	1
1720	1	1	3	2	2	2
1721	1	1	3	2	2	0
1722	1	1	3	2	1	1
1723	1	1	3	2	1	2
1724	1	1	3	2	1	0
1725	1	1	3	2	0	1
1726	1	1	3	2	0	2
1727	1	1	3	2	0	0

Open Data Science World

Saturday, 2 March 2019

Car Safety Analysis

Car Safety Analysis¶

1) importing the dataset¶

2) Preprocessing Step¶

3) Splitting The Training and Testing data¶

4) Fitting the Decision Tree¶

5) Predicting on the Test data¶

6) Validating¶

7) Same Procedure on KNN¶

8) Cross Validation¶

9) Ensembling¶

Blog Archive