There is a data set goods of retail product images, without any division.In the goods directory, the name of the subdirectory is the category of the picture, and there are multiple pictures under each category. The file structure is shown in Figure 1.Now, we need to split it into training set and test set in a certain proportion.
The idea of realizing this function is to create a new directory tree with the same structure, and randomly select some pictures in each category in the original directory tree and move them to the corresponding category in the new directory tree.The complete code is as follows:
import osimport randomimport shutil#source dataset path and target dataset pathpath_source = './goods'path_target = './goods_test'#Parameters: the proportion of source path, target path and test setdef seperate(path_source, path_target, percent):#Generate a list of all directory names under path_sourcecategories = os.listdir(path_source)for name in categories:#Create a subdirectory with the same name under path_targetos.makedirs(os.path.join(path_target, name))#Generate a list of all images in subdirectoriesnums = os.listdir(os.path.join(path_source, name))#Randomly select a part of the picture according to the proportionnums_target = random.sample(nums, int(len(nums)*percent))#Cut the image to the target pathfor pic in nums_target:shutil.move(os.path.join(path_source, name, pic), os.path.join(path_target, name, pic))#After execution, path_source is the training set, and path_target is the test set.seperate(path_source, path_target, 0.3)
If the data set is of other structure, you can make slight changes on this basis.
Key functions among them:
os.listdir(path): List the names of all projects under path (including directories and files)
os.makedirs(path): recursively create the directory on the path path, if the directory already exists, an error will be reported
os.makedir(path) can be used to create only one directory
random.sample(list, num): randomly select num elements from the list to form a new list
shutil.move(source, target): move the source file to the target file
Note that shutil.move is a very dangerous operation. It is recommended to comment it out when debugging the code to ensure that the code is correct before executing it.