Keras - ImageDataGenerator
Keras에서 이미지 데이터를 학습할 때 실시간으로 data augmentation을 할 수 있도록 지원하는 클래스로 ImageDataGenerator가 있다.
from keras_preprocessing.image import ImageDataGenerator
API에 나와있는 example을 보자.
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
y_train = np_utils.to_categorical(y_train, num_classes)
y_test = np_utils.to_categorical(y_test, num_classes)
datagen = ImageDataGenerator(
featurewise_center=True,
featurewise_std_normalization=True,
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True)
# compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied)
datagen.fit(x_train)
# fits the model on batches with real-time data augmentation:
model.fit_generator(datagen.flow(x_train, y_train, batch_size=32),
steps_per_epoch=len(x_train) / 32, epochs=epochs)
# here's a more "manual" example
for e in range(epochs):
print('Epoch', e)
batches = 0
for x_batch, y_batch in datagen.flow(x_train, y_train, batch_size=32):
model.fit(x_batch, y_batch)
batches += 1
if batches >= len(x_train) / 32:
# we need to break the loop by hand because
# the generator loops indefinitely
break
datagen.fit(x_train)
: Data Generator를 몇 샘플 데이터(x_train)에 fitting한다.
This computes the internal data stats related to the data-dependent transformations, based on an array of sample data.
샘플 데이터 배열을 기반으로 데이터종속 변환과 관련된 내부 데이터 통계를 계산한다.
datagen.flow(x_train,y _train, batch_size=32)
: data와 label 배열을 가져온다. batch size만큼 data를 증가시킨다.
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
'data/train',
target_size=(150, 150),
batch_size=32,
class_mode='binary')
validation_generator = test_datagen.flow_from_directory(
'data/validation',
target_size=(150, 150),
batch_size=32,
class_mode='binary')
model.fit_generator(
train_generator,
steps_per_epoch=2000,
epochs=50,
validation_data=validation_generator,
validation_steps=800)
train_generator = train_datagen.flow_from_directory(
'data/train',
target_size=(150, 150),
batch_size=32,
class_mode='binary')
flow_from_directory는 directory 경로로부터 dataframe을 가져오고 augmented 또는 normalized된 data 배치를 생성한다.
그 외 함수는 API 참고
'Machine Learning' 카테고리의 다른 글
Monte Carlo Tree Search (0) | 2019.10.28 |
---|---|
PyTorch - What is PyTorch (1) (0) | 2019.10.16 |
코세라 - 04 Multiple features (0) | 2019.08.30 |
코세라 - 03 Gradient Descent (0) | 2019.08.30 |
코세라 - 02 Cost Function (0) | 2019.08.30 |