CORSIKA 8 meeting

Name: CORSIKA 8 meeting
Start: 2020-09-24T14:30:00+02:00
End: 2020-09-24T16:20:00+02:00
Location: No location set

Thursday Sep 24, 2020, 2:30 PM → 4:20 PM Europe/Berlin

Description

Connect to ZOOM meeting room:

https://zoom.us/j/786000579

Hide

Output format I/O

The important topic of the meeting was the discussion of the output format. Here are some notes:

fortran analysis is not a 1st class requirement, there is no constraint from here. Wrappers with c or even python are always possible. Eventually there will be requests for converters, a la corsika2root. But we may not have to take responsibility for this.
the primary output will be a "library" of showers, thus, there is a question if the structure should be "library/shower/output-component" or "library/output-component/shower". The former is more friendly for HPC computing since smaller libraries can much more easily be merged together. The latter is a bit more analysis friendly since individual components can be picked out easily. In both cases: with a small set of extra utility functions it is easy to deal with this. Also: we may have a "master switch" where the format can be switched from one to the other. This can work both in reading as well as writing. This extra flexibility may be very handy.
any file format must avoid an extremely large number of files. It must be possible to concatenate data.
for parallel writes fully asynchronous operations would be a huge advantage. E.g. Cherenkov photons may be written at a different time than other properties of a shower, etc.
on some HPC systems writing to a scratch disk first may be an advantage. Data must then be integrated in the dataset afterwards. This may be studied.
In any case, we need records. It is impossible to keep the entire shower in memory. This can be a problem for numpy etc.
concerning inexlib-ROOT:
- very small package
- would take over responsibility
- no advanced features (see above)
- not HPC friendly
concerning parquet:
- small package
- large community
- very active
- very HPC friendly, maybe the best performance (similar to ROOT)
- no internal file structure, just plain columnar data
concerning HDF5
- large package
- big community
- HDF _is_ a filesystem
- HDF most certainly slower than parquet/ROOT. Needs to be quantified (?)

unrelated:

look and use at nonius project for optimization
zstd offers the best compression performance currently

There are minutes attached to this event. Show them.

- 2:30 PM → 2:35 PM
  
  Welcome and Status 5m
  
  Call24Sept20_status.pdf
- 2:35 PM → 3:00 PM
  
  Cont': Discussion of Output Formats 25m
  
  Please contribute to this disussion. It is very important since we will fix the output format and start to develop in this direction afterwards.
  - Eventual news on the "Detector" design
  - Open discussion about output formats, flexibility, requirements and options
  
  Speaker: Remy Prechelt (University of Hawai'i at Manoa)
  
  output.pdf
- 3:00 PM → 3:15 PM
  
  HDF5 15m
  
  Speaker: Anatoli Fedynitch (DESY Zeuthen)
  
  example file
  
  HDF5_anatoli.pdf
- 3:15 PM → 3:35 PM
  
  Status and Planning 20m
  
  Speaker: Ralf Ulrich (KIT)
  
  Call24Sept20_output.pdf

Choose timezone

CORSIKA 8 meeting

Output format I/O

unrelated: