Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Comments

Close side panel

CPV run3 geometry and readout#5499

Merged
shahor02 merged 9 commits intoAliceO2Group:devAliceO2Group/AliceO2:devfrom
peressounko:devperessounko/AliceO2:devCopy head branch name to clipboard
Feb 24, 2021
Merged

CPV run3 geometry and readout#5499
shahor02 merged 9 commits intoAliceO2Group:devAliceO2Group/AliceO2:devfrom
peressounko:devperessounko/AliceO2:devCopy head branch name to clipboard

Conversation

@peressounko
Copy link
Collaborator

Switched to Run3 CPV geometry and readout format.

Fixed seg. violation in Clusterizer related to the way MC events are tracked: large event is split into several with smaller number of tracks, but for each pack of primaries independent Detector::endOfEvent() is called and then list of hits are merged in Merger. As a result, lists of hits are only partially sorted, only within individual bunches, what breaks Digitization and Clusterization algorithms. Digitization alg. was rewritten to consume unsorted list of hits.

Copy link
Collaborator

@shahor02 shahor02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @peressounko
I've tried to test this PR on 20 pbpb events, but not sure if the results I got are OK.
With

o2-raw-file-reader-workflow --input-conf raw/CPV/CPVraw.cfg  $ARGS_ALL  --configKeyValues "HBFUtils.nHBFPerTF=128"  -b | o2-cpv-reco-workflow $ARGS_ALL --input-type raw   --output-type digits --disable-mc -b

I see a log Writing 134973 digits, if I change the --output-type clusters, I get Finished, wrote 1236 clusters, 11TR and 0 Labels. Is it normal to have such a difference between the clusters and digits?
Then, in the cpvdigits.root I see 136040 digits, why the number of decoded digits is less that those from hits digitization.

Also, I see that your o2-cpv-reco-workflow does not write any output, all writers are commented out, and the readers also, at least I was not able run the recoflow starting from the MC digits, o2-cpv-reco-workflow --input-type digits --output-type clusters produces Invalid workflow: No matching output found for CPV/DIGITS/0. ...

Could you please bring the o2-cpv-reco-workflow to accordance with other detectors workflows, reading inputs and writing outputs by default, unless it is explicitly disabled, e.g.
(1) --disable-root-input : expect inputs provided by upstream workflows, don't enable readers
(2) --disable-root-output: don't write output file, just send the data.

BTW, the phos workflow seems to have the same problems.

Cheers,
Ruben

@shahor02
Copy link
Collaborator

Hi @peressounko
Thanks for changes, but what was the reason to change gsl::spans by const vector? You create a copy in this case, which is anyway used as const.
What about the mismatch between the clusters and decoded digits as well as the mismatch between decoded and original digits?

@peressounko
Copy link
Collaborator Author

Dear @shahor02, concerning number of clusters vs digits: presently thresholds are not optimized yet, so many clusters merged to one big, especially in high multiplicity case. So this is possible. CPV experts will adjust parameters soon.
Concerning span->vector change: span conflicts with realization of tree reading. While clusterizer got digits from Digitzer, it was OK, but using Reader as a source it writes "root serialization error" runtime. Do you have example of reading root file and sending vector so that consume it as span?
Best regards, Dmitri

@shahor02
Copy link
Collaborator

Hi @peressounko

OK for clusters, but what about the difference between the digits: original vs decoded from raw?

For the spans: what we receive as a span is always sent as vector, it is just a way to use (as cons) the data owned by the framework, w/o creating an extra copy. See, e.g.

writing:

pc.outputs().snapshot(Output{mOrigin, "TRACKS", 0, Lifetime::Timeframe}, mTracks);

reading:

inputs.emplace_back("trackITS", "ITS", "TRACKS", 0, Lifetime::Timeframe);
with
const auto tracksITS = pc.inputs().get<gsl::span<o2::its::TrackITS>>("trackITS");

@peressounko
Copy link
Collaborator Author

Dear @shahor02,
I returned to span in this version.
I am confused about Digits: where did you get number "writing nnn digits". In principle, if we convert digits->raw->digits, it may reduce number of digits as some zero suppression is implemented.
I renamed Publisher->Reader, and now QC part complains and tests fail. Should I prepare simultaneous commit for QC?
Best regards, Dmitri

@shahor02
Copy link
Collaborator

Hi @peressounko

I am confused about Digits: where did you get number "writing nnn digits". In principle, if we convert digits->raw->digits, it may reduce number of digits as some zero suppression is implemented.

I am referring to the messages like 280847:CPVRawToDigitConverterSpec]: [22:06:35][INFO] [CPVRawToDigitConverter - run] Writing 599243 digits ..., since they were not actually written, I could not check what is actually written.
Is this zero suppression reflected in reconstruction from digits? I mean the results of reco from raw or from digits should be the same ( as I wrote above, I could not run the clustering starting from digits).

I renamed Publisher->Reader, and now QC part complains and tests fail. Should I prepare simultaneous commit for QC?

If you keep the new name, then yes, also the QC will need to be patched, but was there a reason to rename?
BTW, unless I've missed it: the CPV Reader provides a reader for digits, but don't we need also one for clusters?

@peressounko
Copy link
Collaborator Author

Hi @shahor02
I do not see any difference between CPVRawToDigitConverterSpec output and digits size in root file. Could you please try once more?
Added Cluster reader. It is not used so far, but will be probably necessary in AO2D creation.
There might be some difference between digits->clusters and digits->raw->digits->clusters as in the latter case we have un-calibration, rounding to integers and applying ZS thresholds and calibration back.
For me Reader reflects process better so I would prefer to keep new name. How can I patch QC part? If I will open PR in QC, it will be independent commit and will conflict with current O2. O2 PR can not pass tests as it conflicts with QC. How should I proceed?
Best regards, Dmitri

@shahor02
Copy link
Collaborator

Hi @peressounko
OK, will test again, perhaps now it is corrected. My point was that the routing MC production will be done via digits, avoiding mc->digits->raw->digit, so, one has to make sure that the results of 2 chain are the same. Perhaps the 0-suppression should be applied already on the MC digitization level?

For the the QC patch: there is no other way but to merge this PR (with failing QC) and to apply a QC patch independently and create new tags both for O2 and QC (pinging to @Barthelemy )

@shahor02
Copy link
Collaborator

Hi @peressounko

I've tried your branch on 50 pbpb events with PHS and CPV, but it crashes on PHS digitiations:

o2-sim-digitizer-workflow  --shm-segment-size 20000000000

493067:PHOSDigitizer]:  *** Break *** segmentation violation
...
[493067:PHOSDigitizer]: #7  0x00007fd281eabf5b in __gnu_cxx::new_allocator<o2::phos::Digit>::construct<o2::phos::Digit, o2::phos::Digit const&> (this=0x55bda618c828, __p=0x41db9ad941b80000) at /usr/include/c++/9/new:174
[493067:PHOSDigitizer]: #8  std::allocator_traits<std::allocator<o2::phos::Digit> >::construct<o2::phos::Digit, o2::phos::Digit const&> (__a=..., __p=0x41db9ad941b80000) at /usr/include/c++/9/bits/alloc_traits.h:484
[493067:PHOSDigitizer]: #9  std::vector<o2::phos::Digit, std::allocator<o2::phos::Digit> >::push_back (__x=..., this=0x55bda618c828) at /usr/include/c++/9/bits/stl_vector.h:1189
[493067:PHOSDigitizer]: #10 o2::phos::Digitizer::processHits (this=0x55bda6143008, hits=<optimized out>, digitsBg=std::vector of length 0, capacity 0, digitsOut=std::vector of length 197731537055274325, capacity -39146837 = {...}, labels=..., collId=collId

i.e. the digitsOut vector is corrupted.

If I digitize CPV only, it works, but then I see the difference (261410 vs 263507) in the number of decoded:

o2-sim-digitizer-workflow  --shm-segment-size 20000000000 --onlyDet CPV
o2-cpv-digi2raw -o raw/CPV
o2-raw-file-reader-workflow --input-conf raw/CPV/CPVraw.cfg |  o2-cpv-reco-workflow --input-type raw --output-type clusters --run | tee xx.log
...
[497871:CPVRawToDigitConverterSpec]: [11:18:42][INFO] No reading calibration from ccdb requested, set default
[497871:CPVRawToDigitConverterSpec]: [11:18:42][INFO] No reading bad map from ccdb requested, set default
[497871:CPVRawToDigitConverterSpec]: [11:18:42][INFO] [CPVRawToDigitConverter - run] Writing 261410 digits ...
[497872:CPVClusterizerSpec]: [11:18:42][INFO] Start run
[497872:CPVClusterizerSpec]: [11:18:47][INFO] Finished, wrote  2661 clusters, 22TR and 0 Labels
...

and original digits:

root [1] TFile *_file0 = TFile::Open("cpvdigits.root");
root [2] o2sim->Scan("@CPVDigit.size()");
************************
*    Row   * @CPVDigit *
************************
*        0 *    263507 *
************************

@shahor02
Copy link
Collaborator

PS: forgot fast generation line for he test which crashes:

o2-sim -j 10 -n 50 -m PHS CPV -g pythia8hi --configKeyValues "Diamond.width[2]=6.; SimCutParams.maxRTracking=550; SimCutParams.maxAbsZTracking=400"

@peressounko
Copy link
Collaborator Author

Hi @shahor02,
I tried several times in different combinations of your generation line + digitization with PHOS and PHOS+CPV and can not reproduce bug - it worked for me. Did you run the latest version?
Concerning number of digits - reduction of the number in chain digit->raw->digit agrees with idea that conversion to raw and ZS kill part of digits. I will check carefully, but I do not think that if worse to implement strict simulation of ZS in Digitizer: its influence on physics is minor, but it will require some more CPU.
Best regards, Dmitri

@shahor02
Copy link
Collaborator

Hi @peressounko
I put the hits which I fail to digitize here: https://cernbox.cern.ch/index.php/s/YUqNTi11v7NaS6E
I believe I've built your branch as is...
For the difference in digits, I trust your judgement if it is important or not :)

@peressounko
Copy link
Collaborator Author

Hi @shahor02, in your pack o2sim_Kine.root is missing. Without it Digitization does not start at al.

@shahor02
Copy link
Collaborator

@peressounko : did not we need it for digitization (apparently just to query the number of available events, though this could be done with hits...). I put the kine file here: https://cernbox.cern.ch/index.php/s/s67lITmdGQUy68z

…S Digitizer to consume un-sorted hits; Fixes in clusterizer. Make one-to-one correspondence between Cells and MCLabels; fix CPV geometry
@peressounko
Copy link
Collaborator Author

@shahor02 thank you, I reproduced problem and fixed typo.

Copy link
Collaborator

@shahor02 shahor02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peressounko thanks! Now works, but please merge the peressounko#1 to make default options consistent and -h working.

Also (probably, this is for the future studies): while the recoflow for PHOS is very fast, the CPV has noticeable delay (~5s) to process 50 pbpb events. Can this be optimized?

@peressounko
Copy link
Collaborator Author

Dear @shahor02, than you for fixes, I approved them, hope they will merge. Or should I merge manually and make another push?
Concerning optimization, I think the reason of delay is in line https://github.com/peressounko/AliceO2/blob/e52c10d5508044c0aef2070f8c21a1802b97ddfe/Detectors/CPV/simulation/src/Digitizer.cxx#L106
since in high multiplicity env. and especially with pileup we have several contributors to same pad and we have to add Labels not to the end but to already existing entries in MCTruthContainer. This triggers movement of the tail of Label list. For large number of pads it becomes expensive. This can be optimized either by introducing temporary list of primaries with some special structure which easily can accept new labels in the middle (probably better to have this functionality in MCTruthContainer?) or slightly this can be improved by introducing threshold to keep primary info for digits. I think as we "flatten" MCTruthContainer before writing, one can use more complicated structure than vector while it is in memory.

@shahor02
Copy link
Collaborator

@peressounko you have to press "rebase and merge" button in your github repo to merge my PR, then your updated branch will propagate also to this PR. If you want to do addition modifications (e.g. remove "hits" option), you should pull your branch to your local repo, add a patch and force-push.

For the the speed of CPV clustering: I see delay (compared to PHOS) also with raw input, where there are no labels involved, there should be something else...

Make options consistent
@peressounko
Copy link
Collaborator Author

Let us postpone Hits to next PR.
Commenting line with Label adding reduces CPU time by factor ~30 in case of strong pileup and minor in single event case. Probably there is also another reason - will have a look.
Best regards, Dmitri

@shahor02
Copy link
Collaborator

OK for "hits", will merge once checks are passed. I was testing on 50 min.bias pbpb events, there the CPV reco from raw was only slightly faster than that from digits, and both where considerably slower than the PHS. Thanks for checking.

@peressounko
Copy link
Collaborator Author

Error in build/O2/fullCI, build/O2/o2 and build/O2/o2-dataflow due to #include <PHOSWorkflow/PublisherSpec.h> to be patched in QC.

@davidrohr
Copy link
Collaborator

@peressounko : How do you want to get this merged? Do we need the QC patch merged first, or will that then break QC without #5499 being merged? I.e. do they need to be merged at the same time? We try to avoid this, but if it is absolutely necessary, it can be done but must be synchronized a bit.
If possible, we try to introduce the first change in a compatible way, then merge the change to the second repository, and then remove the compatibility part in the first repository.

@shahor02
Copy link
Collaborator

Hi @davidrohr, if I remember correctly, in such cases we simply had the QC patch ready,
merging the o2, then QC, the retaging both. Alternatively, @peressounko , if the conflict is only due to the changed Publisher, you indeed can add the old file temporarily (w/o using it in o2), then, once the QC is changed to not use it, remove it again.

@peressounko
Copy link
Collaborator Author

Thanks, @davidrohr @shahor02, I prepared patch for QC or I can revert Publisher.h - what do you prefer?

@shahor02
Copy link
Collaborator

@peressounko you don't need to revert, just add old Publisher on top of the new Reader, so that O2 has no other dependency on the former except in the CMakelists. This would be indeed the cleanest solution

@davidrohr
Copy link
Collaborator

davidrohr commented Feb 24, 2021 via email

@shahor02
Copy link
Collaborator

@peressounko you would need to add also the old PHOS Publisher.cxx and add restore the line

src/PublisherSpec.cxx

@shahor02
Copy link
Collaborator

@davidrohr, the FST failed, I guess due to this:
/mnt/mesos/sandbox/sandbox/o2-fullci/sw/slc8_x86-64/O2/5499-1/prodtests/full_system_test.sh: line 51: ulimit: open files: cannot modify limit: Operation not permitted, permission problem?
Anyway, merging since not related with this PR. @peressounko could you open PR for QC?

@shahor02 shahor02 merged commit b99cb72 into AliceO2Group:dev Feb 24, 2021
@davidrohr
Copy link
Collaborator

@shahor02 : Thanks for pinging me, it should actually have stopped at that place, that seems a bug, will fix that.
And also checking how we can increase the ulimit for the full CI.

@peressounko
Copy link
Collaborator Author

Dear @shahor02, David, thank you for help and suggestions! I created QC ticket AliceO2Group/QualityControl#620

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.