Output format I/O
The important topic of the meeting was the discussion of the output format. Here are some notes:
- fortran analysis is not a 1st class requirement, there is no constraint from here. Wrappers with c or even python are always possible. Eventually there will be requests for converters, a la corsika2root. But we may not have to take responsibility for this.
- the primary output will be a "library" of showers, thus, there is a question if the structure should be "library/shower/output-component" or "library/output-component/shower". The former is more friendly for HPC computing since smaller libraries can much more easily be merged together. The latter is a bit more analysis friendly since individual components can be picked out easily. In both cases: with a small set of extra utility functions it is easy to deal with this. Also: we may have a "master switch" where the format can be switched from one to the other. This can work both in reading as well as writing. This extra flexibility may be very handy.
- any file format must avoid an extremely large number of files. It must be possible to concatenate data.
- for parallel writes fully asynchronous operations would be a huge advantage. E.g. Cherenkov photons may be written at a different time than other properties of a shower, etc.
- on some HPC systems writing to a scratch disk first may be an advantage. Data must then be integrated in the dataset afterwards. This may be studied.
- In any case, we need records. It is impossible to keep the entire shower in memory. This can be a problem for numpy etc.
- concerning inexlib-ROOT:
- very small package
- would take over responsibility
- no advanced features (see above)
- not HPC friendly
- concerning parquet:
- small package
- large community
- very active
- very HPC friendly, maybe the best performance (similar to ROOT)
- no internal file structure, just plain columnar data
- concerning HDF5
- large package
- big community
- HDF _is_ a filesystem
- HDF most certainly slower than parquet/ROOT. Needs to be quantified (?)
unrelated:
- look and use at nonius project for optimization
- zstd offers the best compression performance currently
There are minutes attached to this event.
Show them.