Random Forest in python

data-science-summary/summary

Random Forest in python

고수트 2020. 9. 20. 17:05

<엔지니어는 구현을 못하면 모르는것이다>

Random Forest

Bagging방식의 일종

여러개의 Bootstrap data set 으로 여러개의 Decision Tree 를 만들고 Voting을 통해 최종 예측값을 출력하는 방식

여러개의 Bootstrap data set 뽑고

이중 일부 feature만 랜덤!으로 Decision Tree의 불순도(impurity)를 계산(매 node마다 feature 랜덤 추출)하여 Decision Tree Model 을 만든다.

여러개의 data set 이 있으니 여러개의 Decision Tree 가 생성된다.

다양한 Decision Tree 모델에 Test Data를 입력하여 가장 많이 추출되는 예측값을 최종 결과 값으로 선택한다.

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=1000, n_features=4,
                           n_informative=2, n_redundant=0,
                           random_state=0, shuffle=False)

from sklearn.model_selection import train_test_split
train_x, test_x, train_y, test_y = train_test_split(X, y, test_size=0.2, random_state=123)

# max_depth: 깊이, 
clf = RandomForestClassifier(max_depth=2, random_state=0)
# 학습
clf.fit(train_x, train_y)
# 예측
print(clf.predict(test_x))
# 점수
clf.score(test_x, test_y)

저작자표시 (새창열림)