DeepFashion实现服装检测搭配

作者 | 李秋键

出品 | AI科技大本营

头图 | CSDN付费下载于视觉中国

在我们日常生活中，计算机视觉扮演着十分重要的角色，尤其是在服装、珠宝、装饰等外观对人们的选择起着重大作用的领域中。因此，研究用户喜好和商品特性的视觉效果变成了一个很重要的任务。

近年来，服装等商品的搭配、推荐受到了广泛的关注，并在基于视觉的推荐问题中取得了一定的成果。但是，目前工作对于商品的表征，往往是在一个通用的视觉特征空间中，比如CNN (Convolutional Neural Networks)网络的输出层特征。这样的视觉特征表示，对商品的类别比较敏感，却难以建模商品的不同风格。

这样的视觉特征表示很难有效地用于推荐系统中，因为相似风格的商品往往会被同一个人同时购买，但在视觉特征空间中却并不相似，这就为提升推荐效果带来了难度。而在论文DeepFashion: Powering Robust Clothes Recognition and Retrieval withRich Annotations （CVPR 2016）中提出的基于FashionNet实现的服装关键点检测恰好解决了这个问题。

实验前的准备

首先我们使用的python版本是3.6.5所用到的模块如下：

opencv是将用来进行图像处理和图片保存读取等操作。
numpy模块用来处理矩阵数据的运算。
Tensorflow-gpu模块是常用的用来搭建模型和训练的深度学习框架,通过调用GPU达到加速的效果。
scikit-learn是python中常见的机器学习集成库。
PIL库可以完成对图像进行批处理、生成图像预览、图像格式转换和图像处理操作，包括图像基本处理、像素处理、颜色处理等。

网络模型的定义和训练

FashionNet的前向计算过程总共分为三个阶段：第一个阶段，将一张衣服图片输入到网络中的蓝色分支，去预测衣服的关键点是否可见和位置。第二个阶段，根据在上一步预测的关键点位置，关键点池化层（landmark pooling layer）得到衣服的局部特征。第三个阶段，将“fc6 global”层的全局特征和“fc6 local”的局部特征拼接在一起组成“fc7_fusion”，作为最终的图像特征。FashionNet引入了四种损失函数，并采用一种迭代训练的方式去优化。这些损失分别为：回归损失对应于关键点定位，softmax损失对应于关键点是否可见和衣服类别，交叉熵损失函数对应属性预测和三元组损失函数对应于衣服之间的相似度学习。作者分别从衣服分类，属性预测和衣服搜索这三个方面，将FashionNet与其他方法相比较，都取得了明显更好的效果。

（1）网络层的定义：包括优化器，分类器，网络神经元定义等。具体代码如下：

def create_model(is_input_bottleneck, is_load_weights, input_shape, output_classes, optimizer='Adagrad', learn_rate=None, decay=0.0, momentum=0.0, activation='relu', dropout_rate=0.5):logging.debug('input_shape {}'.format(input_shape))logging.debug('input_shape {}'.format(type(input_shape)))# Optimizeroptimizer, learn_rate = get_optimizer(optimizer, learn_rate, decay, momentum)# Trainif is_input_bottleneck is True:model_inputs = Input(shape=(input_shape))common_inputs = model_inputs# Predictelse:                                                                                               #input_shape = (img_width, img_height, 3)base_model = applications.VGG16(weights='imagenet', include_top=False, input_shape=input_shape)#base_model = applications.inception_v3.InceptionV3(include_top=False, weights='imagenet', input_shape=input_shape)logging.debug('base_model inputs {}'.format(base_model.input))                                  # shape=(?, 224, 224, 3)logging.debug('base_model outputs {}'.format(base_model.output))                                # shape=(?, 7, 7, 512)model_inputs = base_model.inputcommon_inputs = base_model.output## Model Classificationx = Flatten()(common_inputs)x = Dense(256, activation='tanh')(x)x = Dropout(dropout_rate)(x)predictions_class = Dense(output_classes, activation='softmax', name='predictions_class')(x)## Model (Regression) IOU scorex = Flatten()(common_inputs)x = Dense(256, activation='tanh')(x)x = Dropout(dropout_rate)(x)x = Dense(256, activation='tanh')(x)x = Dropout(dropout_rate)(x)predictions_iou = Dense(1, activation='sigmoid', name='predictions_iou')(x)## Create Modelmodel = Model(inputs=model_inputs, outputs=[predictions_class, predictions_iou])# logging.debug('model summary {}'.format(model.summary()))## Load weightsif is_load_weights is True:model.load_weights(top_model_weights_path_load, by_name=True)## Compilemodel.compile(optimizer=optimizer,loss={'predictions_class': 'sparse_categorical_crossentropy', 'predictions_iou': 'mean_squared_error'}, metrics=['accuracy'],loss_weights={'predictions_class': predictions_class_weight, 'predictions_iou': predictions_iou_weight})logging.info('optimizer:{}  learn_rate:{}  decay:{}  momentum:{}  activation:{}  dropout_rate:{}'.format(optimizer, learn_rate, decay, momentum, activation, dropout_rate))
return model

（2）模型的初始化：

def init():global batch_sizebatch_size = batch_size_trainlogging.debug('batch_size{}'.format(batch_size))global class_namesclass_names =sorted(get_subdir_list(dataset_train_path))logging.debug('class_names{}'.format(class_names))global input_shapeinput_shape = (img_width,img_height, img_channel)logging.debug('input_shape{}'.format(input_shape))if notos.path.exists(output_path_name):os.makedirs(output_path_name)if notos.path.exists(logs_path_name):os.makedirs(logs_path_name)if not os.path.exists(btl_path):os.makedirs(btl_path)if not os.path.exists(btl_train_path):os.makedirs(btl_train_path)if notos.path.exists(btl_val_path):os.makedirs(btl_val_path)

（3）bottleneck文件的保存：bottleneck结构就是为了降低参数量，Bottleneck 三步走是先用PW对数据进行降维，再进行常规卷积核的卷积，最后PW对数据进行升维（类似于沙漏型）。

def save_bottleneck():logging.debug('class_names{}'.format(class_names))logging.debug('batch_size{}'.format(batch_size))logging.debug('epochs{}'.format(epochs))logging.debug('input_shape{}'.format(input_shape))## Build the VGG16 networkmodel =applications.VGG16(include_top=False, weights='imagenet',input_shape=input_shape)#model =applications.inception_v3.InceptionV3(include_top=False, weights='imagenet',input_shape=input_shape)for train_val in ['train','validation']:with open('bottleneck/btl_' +train_val + '.txt', 'w') as f_image:for class_name inclass_names:dataset_train_class_path = os.path.join(dataset_path, train_val,class_name)logging.debug('dataset_train_class_path{}'.format(dataset_train_class_path))images_list = []images_name_list =[]images_path_name =sorted(glob.glob(dataset_train_class_path + '/*.jpg'))logging.debug('images_path_name{}'.format(len(images_path_name)))for index, image inenumerate(images_path_name):#logging.debug('image {}'.format(image))img =Image.open(image)img = preprocess_image(img)current_batch_size = len(images_list)#logging.debug('current_batch_size {}'.format(current_batch_size))images_list.append(img)image_name = image.split('/')[-1].split('.jpg')[0]images_name_list.append(image)images_list_arr= np.array(images_list)# TODO: Skippingn last images of a class which do not sum up to batch_sizeif(current_batch_size < batch_size-1):continueX =images_list_arrbottleneck_features_train_class= model.predict(X, batch_size)#bottleneck_features_train_class = model.predict(X, nb_train_class_samples //batch_size)## Savebottleneck filebtl_save_file_name = btl_path + train_val + '/btl_' + train_val + '_' +class_name + '.' + str(index).zfill(7) + '.npy'logging.info('btl_save_file_name {}'.format(btl_save_file_name))np.save(open(btl_save_file_name, 'w'), bottleneck_features_train_class)for name inimages_name_list:f_image.write(str(name) + '\n')images_list = []images_name_list= []

（4）模型的训练:读入搭建好的网络层和使用bottleneck files去创建验证集

def train_model():## Build networkmodel =applications.VGG16(include_top=False, weights='imagenet',input_shape=input_shape)#model =applications.inception_v3.InceptionV3(include_top=False, weights='imagenet', input_shape=input_shape)# Get sorted bottleneck filenames in a listbtl_train_names =sorted(glob.glob(btl_train_path + '/*.npy'))btl_val_names =sorted(glob.glob(btl_val_path + '/*.npy'))## Train Labelsbtl_train_list = []train_labels_class = []train_labels_iou = []# Load bottleneckfiles to create validation setval_data = []model = create_model(True,False, input_shape_btl_layer, len(class_names), optimizer, learn_rate, decay,momentum, activation, dropout_rate)logging.info('train_labels_iou{}'.format(train_labels_iou.shape))logging.info('train_labels_class{}'.format(train_labels_class.shape))logging.info('train_data{}'.format(train_data.shape))logging.info('val_labels_iou{}'.format(val_labels_iou.shape))logging.info('val_labels_class{}'.format(val_labels_class.shape))logging.info('val_data{}'.format(val_data.shape))# TODO: class_weight_val wrongmodel.fit(train_data,[train_labels_class, train_labels_iou],class_weight=[class_weight_val, class_weight_val],                                      #dictionary mapping classes to a weight value, used for scaling the loss function(during training only).epochs=epochs,batch_size=batch_size,validation_data=(val_data, [val_labels_class, val_labels_iou]),callbacks=callbacks_list)# TODO: These are not the bestweightsmodel.save_weights(top_model_weights_path_save)

模型的使用

（1）根据模型特征分割图片：将其中不同的部位进行分割成不同的图片块

def selective_search_bbox(image):logging.debug('image{}'.format(image))# load imageimg = skimage.io.imread(image)#img = Image.open(image)width, height, channels =img.shapelogging.debug('img {}'.format(img.shape))logging.debug('img{}'.format(type(img)))region_pixels_threshold =(width*height)/100logging.debug('region_pixels_threshold{}'.format(region_pixels_threshold))# perform selective searchimg_lbl, regions = selectivesearch.selective_search(img,scale=500, sigma=0.9, min_size=10)#img_lbl, regions =selectivesearch.selective_search(img)# logging.debug('regions{}'.format(regions))logging.debug('regions{}'.format(len(regions)))candidates = set()for r in regions:# distorted rectsx, y, w, h = r['rect']# excluding same rectangle(with different segments)if r['rect'] in candidates:continue# # excluding regionssmaller than 2000 pixelsif r['size'] < region_pixels_threshold:logging.debug('Discarding - region_pixels_threshold - {} < {} - x:{}y:{} w:{} h:{}'.format(region_pixels_threshold, r['size'], x, y, w, h))continue# # Orig# if w / h > 1.2 or h / w> 1.2:#     continueif h != 0 and w / h > 6:logging.debug('Discarding w/h {} - x:{} y:{} w:{} h:{}'.format(w/h, x,y, w, h))continueif w != 0 and h / w > 6:logging.debug('Discardingh/w {} - x:{} y:{} w:{} h:{}'.format(h/w, x, y, w, h))continuecandidates.add(r['rect'])

（2）模型的预测：其中包括模型的初始化，图片的读入和模型的加载与可视化显示的实现

def init():global batch_sizebatch_size = batch_size_predictlogging.debug('batch_size{}'.format(batch_size))global input_shapeinput_shape = (img_width,img_height, img_channel)logging.debug('input_shape{}'.format(input_shape))global class_names# TODO: Remove hardcoding ifdataset availableclass_names = ['Anorak','Bomber', 'Button-Down', 'Capris', 'Chinos', 'Coat', 'Flannel', 'Hoodie','Jeans', 'Jeggings', 'Jersey', 'Kaftan', 'Parka', 'Peacoat', 'Poncho', 'Robe','Sweatshorts', 'Trunks', 'Turtleneck']#class_names =get_subdir_list(dataset_train_path)logging.debug('class_names{}'.format(class_names))def get_images():images_path_name =sorted(glob.glob(prediction_dataset_path + '/*.jpg'))#logging.debug('images_path_name {}'.format(images_path_name))return images_path_namedef get_bbox(images_path_name):# TODO: Currently for 1 imageonlyfor index, image inenumerate(images_path_name):bboxes =selective_search_bbox(image)logging.debug('bboxes {}'.format(bboxes))return bboxes#model = create_model_predict((input_shape), optimizer, learn_rate, decay,momentum, activation, dropout_rate)model = create_model(False,True, input_shape, len(class_names), optimizer, learn_rate, decay, momentum,activation, dropout_rate)images_list = []images_name_list = []images_name_list2 = []prediction_class = []prediction_iou = []prediction_class_prob = []prediction_class_name = []## Folderprediction_dataset_path='dataset_prediction/images/'#images_path_name =sorted(glob.glob(prediction_dataset_path + '/*.jpg'))#for image in images_path_name:for index, image inenumerate(images_names):logging.debug('\n\n++++++++++++++++++++++++++++++++++++++++')image_path_name =prediction_dataset_path + imagelogging.debug('image_path_name {}'.format(image_path_name))img =Image.open(image_path_name)logging.debug('img{}'.format(img))logging.debug('img len{}'.format((img.size)))#img.save('output/a' +str(index) + '.jpg')img = preprocess_image(img)img = np.expand_dims(img, 0)prediction =model.predict(img, batch_size, verbose=1)# logging.debug('prediction{}'.format(prediction))prediction_class_=prediction[0][0]#logging.debug('prediction_class_ {}'.format(prediction_class_))prediction_class.append(prediction_class_)prediction_iou_ =prediction[1][0][0]logging.debug('prediction_iou_{}'.format(prediction_iou_))prediction_iou.append(prediction_iou_)prediction_class_index =np.argmax(prediction[0])logging.debug('prediction_class_index{}'.format(prediction_class_index))prediction_class_prob_ =prediction[0][0][prediction_class_index]logging.debug('prediction_class_prob_{}'.format(prediction_class_prob_))prediction_class_prob.append(prediction_class_prob_)prediction_class_name_ =class_names[prediction_class_index]logging.debug('prediction_class_name_{}'.format(prediction_class_name_))prediction_class_name.append(prediction_class_name_)images_list.append(img)images_name_list.append(image_path_name)#logging.debug('prediction_class {}'.format(prediction_class))logging.debug('prediction_iou{}'.format(prediction_iou))logging.debug('prediction_class_prob {}'.format(prediction_class_prob))logging.debug('prediction_class_name{}'.format(prediction_class_name))#logging.debug('images_name_list {}'.format(images_name_list))bboxes = []for image_path_name inimages_name_list:bbox_=image_path_name.split('/')[-1].split('.jpg')[0].split('-')[1]x = int(bbox_.split('_')[0])y = int(bbox_.split('_')[1])w = int(bbox_.split('_')[2])h = int(bbox_.split('_')[3])bbox = (x, y, w, h)bboxes.append(bbox)bboxes = set(bboxes)logging.debug('bboxes{}'.format(bboxes))#orig_image_path_name =['dataset_prediction/images/img_00000061.jpg']#orig_image_path_name =['dataset_prediction/images2/shahida-parides-floral-v-neckline-long-kaftan-dress.jpg']orig_image_path_name =sorted(glob.glob('dataset_prediction/images' + '/*.jpg'))logging.debug('orig_image_path_name {}'.format(orig_image_path_name))display_bbox(orig_image_path_name, bboxes, prediction_class_name,prediction_class_prob, prediction_iou, images_name_list)logging.debug('images_list{}'.format(len(images_list)))images_list_arr =np.array(images_list)logging.debug('images_list_arrtype {}'.format(type(images_list_arr)))prediction =model.predict(images_list_arr, batch_size, verbose=1)#prediction =model.predict(predict_data, batch_size, verbose=1)# logging.debug('\n\nprediction\n{}'.format(prediction))logging.debug('prediction shape{} {}'.format(len(prediction), len(prediction[0])))print('')for index,preds inenumerate(prediction):for index2, pred inenumerate(preds):#print('images_name_listindex2 : {:110}    '.format(images_name_list[index2]), end='')#print('\n')print('images_name_listindex2 : {:110}    '.format(images_name_list2[index2]), end='')for p in pred:print('{:8f}'.format(float(p)), end='')print('')print('')

效果如下图所示：

源码地址链接：

https://pan.baidu.com/s/1_XCUbhup--4b11dBOtYZvg

提取码：5wna

作者简介

李秋键，CSDN 博客专家，CSDN达人课作者。硕士在读于中国矿业大学，开发有taptap安卓武侠游戏一部，vip视频解析，文意转换工具，写作机器人等项目，发表论文若干，多次高数竞赛获奖等等。

更多精彩推荐

四款5G版iPhone 12齐发，苹果股价却应声而跌
韩辉：国产操作系统的最大难题在于解决“生产关系”
我是一个平平无奇的AI神经元腾讯否认微信测试语音消息进度调节；监证会同意蚂蚁集团科创板IPO注册；React 17 正式版发布|极客头条
区块链赋能供应链金融|应用优势与四类常见模式
蓝色巨人IBM全力奔赴的混合云之旅能顺利吗？