Indico "indico.scc.kit.edu" will be now avilable on " indico.kit.edu".

CORSIKA 8 meeting

Europe/Berlin
Description

Connect to ZOOM meeting room:

https://zoom.us/j/786000579

 

Output format I/O

The important topic of the meeting was the discussion of the output format. Here are some notes:

  •  fortran analysis is not a 1st class requirement, there is no constraint from here. Wrappers with c or even python are always possible. Eventually there will be requests for converters, a la corsika2root. But we may not have to take responsibility for this.
  •  the primary output will be a "library" of showers, thus, there is a question if the structure should be "library/shower/output-component" or "library/output-component/shower". The former is more friendly for HPC computing since smaller libraries can much more easily be merged together. The latter is a bit more analysis friendly since individual components can be picked out easily. In both cases: with a small set of extra utility functions it is easy to deal with this. Also: we may have a "master switch" where the format can be switched from one to the other. This can work both in reading as well as writing. This extra flexibility may be very handy.
  • any file format must avoid an extremely large number of files. It must be possible to concatenate data. 
  • for parallel writes fully asynchronous operations would be a huge advantage. E.g. Cherenkov photons may be written at a different time than other properties of a shower, etc.
  • on some HPC systems writing to a scratch disk first may be an advantage. Data must then be integrated in the dataset afterwards. This may be studied.
  • In any case, we need records. It is impossible to keep the entire shower in memory. This can be a problem for numpy etc.
  • concerning inexlib-ROOT:
    • very small package
    • would take over responsibility
    • no advanced features (see above)
    • not HPC friendly
  • concerning parquet:
    • small package
    • large community
    • very active
    • very HPC friendly, maybe the best performance (similar to ROOT)
    • no internal file structure, just plain columnar data
  • concerning HDF5
    • large package
    • big community
    • HDF _is_ a filesystem
    • HDF most certainly slower than parquet/ROOT. Needs to be quantified (?)

 

unrelated:

  • look and use at nonius project for optimization
  • zstd offers the best compression performance currently
There are minutes attached to this event. Show them.
    • 2:30 PM 2:35 PM
      Welcome and Status 5m
    • 2:35 PM 3:00 PM
      Cont': Discussion of Output Formats 25m

      Please contribute to this disussion. It is very important since we will fix the output format and start to develop in this direction afterwards.
      - Eventual news on the "Detector" design
      - Open discussion about output formats, flexibility, requirements and options

      Speaker: Remy Prechelt (University of Hawai'i at Manoa)
    • 3:00 PM 3:15 PM
      HDF5 15m
      Speaker: Anatoli Fedynitch (DESY Zeuthen)
    • 3:15 PM 3:35 PM
      Status and Planning 20m
      Speaker: Ralf Ulrich (KIT)