sklearn train_test_split, 트레인 테스트 데이터 분할

sklearn train_test_split, 트레인 테스트 데이터 분할¶

주안점

반환값 순서 ! 데이터 트레인셋, 데이터 테스트셋, 라벨 트레인셋, 라벨 테스트셋

In [1]:

import numpy as np

# 예시 데이터 생성
n = 150
x = np.arange(n*3).reshape((n,3))
y = np.random.randint(2, size=n)

옵션
train_size : 0과 1사이 숫자로 비율을 설정함.
test_size : 0과 1사이 숫자로 비율을 설정함.

In [2]:

# train_size와 text_size 중 하나만 쓰면 나머지는 자동 남은 비율로 설정됨
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.7)
len(x_train), len(x_test)

Out[2]:

(105, 45)

In [3]:

# train_size와 text_size 둘의 합이 1.0이 안되어도 지정된 만큼 데이터가 할당 된다.
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.7, test_size=0.1)
len(x_train), len(x_test)

Out[3]:

(105, 15)

In [4]:

# train_size와 text_size 둘의 합이 1.0을 넘으면 에러가 도출된다.
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.7, test_size=0.1)
len(x_train), len(x_test)

Out[4]:

(105, 15)

옵션
shuffle : True를 입력하면 데이터를 섞음

In [5]:

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.7, shuffle=True)
len(x_train), len(x_test)

Out[5]:

(105, 45)

딥러닝 프레임워크인 텐서플로우, 파이토치를 쓰더라도 데이터를 분리 할 때 sklearn의 train_test_split`을 많이 쓰더라구요.
이래저래.. 은근 많이 쓰게 되는 함수 인듯 합니다.

'python' 카테고리의 다른 글

파이썬을 활용한 unique, 고유값 도출(numpy unique) (0)	2022.12.19
파이토치로 이미지 패딩하기 torchvision padding (0)	2022.12.19
파이토치 view 텐서 차원 변경 (torch.view, shape 변경) (0)	2022.12.18
판다스를 활용한 one-hot, 더미 컬럼 만들기 (pandas.get_dummies one-hot encoding) (0)	2022.12.18
파이썬 복사, 깊은 복사 copy() deepcopy() (0)	2022.12.17

아항 !!

sklearn train_test_split, 트레인 테스트 데이터 분할

sklearn train_test_split, 트레인 테스트 데이터 분할¶

'python' 카테고리의 다른 글

댓글

티스토리툴바

sklearn train_test_split, 트레인 테스트 데이터 분할

sklearn train_test_split, 트레인 테스트 데이터 분할¶

'python' 카테고리의 다른 글

관련글

댓글

티스토리툴바