您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Python operation HDF5 file example

編輯：Python

Catalog

introduction

Create files and datasets

Write data sets

Read dataset

introduction

stay Matlab operation HDF5 It has been described in detail in the document HDF5 File already used Matlab How to operate it . This article summarizes how to Python Next use HDF5 file . We still follow Matlab operation HDF5 The order of the documents is , They are to create HDF5 file , Write data , Reading data .

Python Under the HDF5 File dependency h5py tool kit

Create files and datasets

Use `h5py.File() Method creation hdf5 file

h5file = h5py.File(filename,'w')

Then create a dataset on this basis

X = h5file.create_dataset(shape=(0,args.patch_size,args.patch_size), # The dimension of the dataset maxshape = (None,args.patch_size,args.patch_size), # The maximum allowable dimension of the dataset dtype=float,compression='gzip',name='train', # data type 、 Is it compressed? , And the name of the dataset chunks=(args.chunk_size,args.patch_size,args.patch_size)) # Block storage , The size of each block

The two most relevant parameters are shape and maxshape, Obviously, we want a certain dimension of the dataset to be extensible , So in maxshape in , Mark the dimension you want to expand as None, Other dimensions and shape The parameters are the same . Another thing worth noting is , Use compression='gzip' in the future , The entire data set can be greatly compressed , It is very useful for large data sets , And when reading and writing data , No explicit decoding by the user .

Write data sets

Use the above creat_dataset Created dataset in the future , Reading and writing a dataset is like reading and writing numpy Arrays are just as convenient , For example, the above function defines the data set 'train', That's the variable X in the future , You can read and write in the following ways ：

data = np.zeros((100,args.patch_size,arg))X[0:100,:,:] = data

When you created the dataset earlier , We define shape = (args.chunk_size,args.patch_size,args.patch_size), If there is more data , What shall I do? ？

have access to resize Method to extend the maxshape Is defined as None That dimension of :

X.resize(X.shape[0]+args.chunk_size,axis=0)

Because we are maxshape=(None,args.patch_size,args.patch_size) Define the zeroth dimension as extensible , therefore , First of all, we use X.shape[0] To find the length of the dimension , And extend it . After the dimension is extended , You can continue to write data into it .

Read dataset

Read h5 The file method is also very simple , The first use of h5py.File Method to open the corresponding h5 file , Then a data set in it is fetched to a variable , Reading this variable is like numpy The same .

h = h5py.File(hd5file,'r')train = h['train']train[1]train[2]...

But there is a problem with the above reading method, that is, every time you use it (train[1],train[2]) Need to read data from the hard disk , This will result in slower reads . A better way is , Read one at a time from the hard disk chunk_size The data of , Then store the data in memory , Read from memory when needed , For example, use the following method ：

h = h5py.File(hd5file,'r')train = h['train']X = train[0:100] # Read more data from the hard disk at one time ,X Will be stored in memory X[1] # Read from memory X[2] # Read from memory

This method will be much faster .

That's all Python operation HDF5 Details of file examples , More about Python operation HDF5 Please pay attention to other relevant articles of the software development network for the information of the document ！