Best practice to safe and load data from/to files #660
Replies: 5 comments 1 reply
-
For 1 and 2, my suggestion is saving your configuration files and output files in a format that is convenient to save/load in most of languages. INI, JSON, YAML, and TOML are possible candidates. I am not familiar with Matlab, so I am not sure what formats are supported by Matlab. For 3, is there any point that saving the data in the Cytnx format is not covered? |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot, I will have a look into these formats. For 3: can I combine parameters (name-value pairs) and several tensors in one file (and if so, how)? Also, one typically needs to know the exact file format of the project. So I was wondering if there is a standard format to pack the tensors into a container together with metadata such as parameters, etc. which can be read even without knowing my project details. This makes it easier to share my results with others who do not know my specific binary format. |
Beta Was this translation helpful? Give feedback.
-
My workaround so far was to write all relevant parameters in the file name. But this seems not a very clean solution to me, and what if I want to add a parameter later? Writing the tensors in text format seems inefficient, and Cytnx provides no way of doing so either (which makes my encoding format project-specific again). How about HDF5? I think one can combine human readable key-value pairs with large data structures efficiently. For now: can I at least store several tensors in one file? Like combining them in a vector and saving that vector to binary? |
Beta Was this translation helpful? Give feedback.
-
HDF5 might be a good option here - I've never used it myself but I know other people use it (including iTensor I think). There are some efforts to find common formats, eg https://tensor.sciencesconf.org/ and https://github.com/TAPPorg/tensor-interfaces although I can't see anything about file formats, I believe that has been discussed (but no visible progress...). An interchange format is of course a different problem to saving data during a calculation (or for followup calculations). https://zarr.dev/ might be a good option, but I only just came across it now, I know nothing about it. The scheme the Matrix Product Toolkit uses works very well, but is very specialized. It stores history along with the data, so eg |
Beta Was this translation helpful? Give feedback.
-
I vote for HDF5. Actually, there are some discussion among library builders to discuss a potential common schema for HDF5 to share tensor in the future. It depends on what kind of metadata is required as you need to store enough information so that people can easily know how the tensors are produced, under what kind of symmetries etc. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I would like to discuss file IO, and see how this can be integrated in Cytnx projects.
In a typical project, I need some file IO - I usually run the same algorithm many times (usually in parallel), with different options and parameters. Three file types are common:
Then, the algorithm runs and creates
Additionally, it would make sense to safe some values initially, like the parameters the algorithm ran with, initial values for several variables, etc.
Often, I need to read and write files in several languages (for example, bash for manipulation of input files, C++/Python to run the algorithm, Matlab to do the data analysis). I can now write project-specific file IO commands in all languages. But it would be nicer to have a standard form.
My wish would be:
-text files (?) in a standard format for 1) and 3), that can be written and read in many languages. Maybe as tupels (variable name -> value), but it also makes sense to combine all values with the same iteration number. What is a good practice for this?
-binary files for 2). Also here, it would be great if the data can be read and written in a standard format, and if things that belong together can be written in one file. This could, for example, be all parameters, iteration number, a TN that consists of many tensors. It would be good if such a file could be opened in this standard format, and it would tell me the parameter-value pairs, and say that there are 100 objects of type 'UniTensor' (which can then be saved in a Cytnx-specific format). That way, a user who does not know my project-specific binary IO format can still understand what is saved in this file.
It would be great if Cytnx would support such a format, since currently each tensor needs to be written to a new file, which seems quite impractical for big networks (or do I miss something here?)
Beta Was this translation helpful? Give feedback.
All reactions