Build a real ETL pipeline to support various sorts of stat collection and aggregation. See D3022, D3062.
DESCRIPTION
DETAILS
- Blocks
- T4171: Building reporting and data systems
T6041: Metrics for Maniphest
Restricted Maniphest Task
T1135: Build ETL-based statistics for Differential - Differential Revisions
- Restricted Differential Revision
Restricted Differential Revision
Restricted Differential Revision
Restricted Differential Revision
Restricted Differential Revision
Restricted Differential Revision
Restricted Differential Revision
Restricted Differential Revision
Restricted Differential Revision
What is missing in order to consider this task done?
Wikimedia is just a few weeks away from its Phabricator prime-time. We know we will not have proper metrics on Day 1, but we will need them at some point. We might help building missing blocks if the pending tasks are well identified.
Only a small amount of work has been completed here. We don't have implementable plans for the remaining work yet.
This is mostly infrastructure work for T4171, and both are very low priorities for the upstream. Realistically, you shouldn't expect much support from us on this front for a long time.
Just to confirm (and I'm not trying to be cynical!), you are saying that you are not planning to work on this any time soon, and you don't encourage others to try either, right? If this is the case, perhaps the best option for us will be to try to find a shorter term solution poking the Conduit API and creating the metrics outside, as we are doing at http://korma.wmflabs.org
I'd rather invest in the Metrics-in-Phabricator path, but I understand that this might be just too complex for an external contributor today.
That's right. Even if someone somehow built exactly what we plan to build (which would be effectively impossible since we aren't totally sure yet ourselves and it definitely doesn't exist as a written implementation plan anywhere), we wouldn't have the time to review, integrate, and support it right now, so it would just sit in limbo until we got around to prioritizing these things.
Roughly, the only viable way forward in the upstream is:
- Some time far in the future, this will become a priority for us.
- We'll implement a basic system (ETL, charting) and get it to a place where we're satisfied with it from an architecture and scalability point of view. Contributors probably can't help very much with this because we won't have a detailed plan yet and will sort of be exploring approches.
- Once the basics are solid and we're convinced we've built a working solution to the hard parts of the problem, we will be able to accept contributions for new fact extractors, chart types, charting features, etc. But basically feature expansions of a product which will be fairly well-defined by that point.
As a downstream, some options available to you are:
- Wait a long time for upstream prioritization.
- Build outside of Phabricator, using Conduit (possibly not a great fit) or raw MySQL.
- Build in Phabricator, but don't expect any upstream support (review, integration, guidance) for a long time.

