앎을 경계하기

Machine Learning

Keras - ImageDataGenerator

양갱맨 2019. 10. 16. 01:11

Keras - ImageDataGenerator

Keras에서 이미지 데이터를 학습할 때 실시간으로 data augmentation을 할 수 있도록 지원하는 클래스로 ImageDataGenerator가 있다.

from keras_preprocessing.image import ImageDataGenerator

ImageDataGenerator Keras API

API에 나와있는 example을 보자.

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
y_train = np_utils.to_categorical(y_train, num_classes)
y_test = np_utils.to_categorical(y_test, num_classes)
datagen = ImageDataGenerator(
    featurewise_center=True,
    featurewise_std_normalization=True,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True)
# compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied)
datagen.fit(x_train)
# fits the model on batches with real-time data augmentation:
model.fit_generator(datagen.flow(x_train, y_train, batch_size=32),
                    steps_per_epoch=len(x_train) / 32, epochs=epochs)
# here's a more "manual" example
for e in range(epochs):
    print('Epoch', e)
    batches = 0
    for x_batch, y_batch in datagen.flow(x_train, y_train, batch_size=32):
        model.fit(x_batch, y_batch)
        batches += 1
        if batches >= len(x_train) / 32:
            # we need to break the loop by hand because
            # the generator loops indefinitely
            break

datagen.fit(x_train) : Data Generator를 몇 샘플 데이터(x_train)에 fitting한다.

This computes the internal data stats related to the data-dependent transformations, based on an array of sample data.

샘플 데이터 배열을 기반으로 데이터종속 변환과 관련된 내부 데이터 통계를 계산한다.

datagen.flow(x_train,y _train, batch_size=32) : data와 label 배열을 가져온다. batch size만큼 data를 증가시킨다.

train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
        'data/train',
        target_size=(150, 150),
        batch_size=32,
        class_mode='binary')
validation_generator = test_datagen.flow_from_directory(
        'data/validation',
        target_size=(150, 150),
        batch_size=32,
        class_mode='binary')
model.fit_generator(
        train_generator,
        steps_per_epoch=2000,
        epochs=50,
        validation_data=validation_generator,
        validation_steps=800)
train_generator = train_datagen.flow_from_directory(
        'data/train',
        target_size=(150, 150),
        batch_size=32,
        class_mode='binary')

flow_from_directory는 directory 경로로부터 dataframe을 가져오고 augmented 또는 normalized된 data 배치를 생성한다.

그 외 함수는 API 참고

'Machine Learning' 카테고리의 다른 글

Monte Carlo Tree Search  (0) 2019.10.28
PyTorch - What is PyTorch (1)  (0) 2019.10.16
코세라 - 04 Multiple features  (0) 2019.08.30
코세라 - 03 Gradient Descent  (0) 2019.08.30
코세라 - 02 Cost Function  (0) 2019.08.30