Description
I noticed inconsistencies between the pvlib.clearsky.detect_clearsky implementation and the reference it's based on (M. J. Reno and C. W. Hansen, “Identification of periods of clear sky irradiance in time series of GHI measurements,” Renew. Energy, vol. 90, pp. 520–531, 2016.)
The first issue is in the way slopes are calculated. When calculating the slopes, the function implicitly assumes a data frequency of 1 minute (e.g., meas_slope = np.diff(measured[H], n=1, axis=0)
). Including the time differences would provide expected behavior and would reflect the equations from the original reference. Furthermore, it may improve the performance of using the default parameters on lower frequency data.
I also believe that the calculation of the final features ("maximum difference between changes in GHI and clear sky time series" from the paper) is incorrectly calculated. The paper shows this feature as the maximum of the absolute element-wise difference between GHI and GHI(cs) slopes for each window (eqn 14). The implementation appears to calculate the difference between the absolute maximum GHI slope and absolute maximum GHI(cs) slope for a given window.
This isn't really a "bug", but tagging it as such seemed like the best option. I'd be happy to resubmit the issue report under a different tag if preferred.