-
Notifications
You must be signed in to change notification settings - Fork 179
Pdm weighting #944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Pdm weighting #944
Changes from all commits
f62f49a
ae7f029
370bc5d
6dfb20d
fbd90fd
a2b219a
4332dcc
c6adab6
89924ac
1b5a77d
019aad6
1df730c
1c79fcc
f9b4cbf
ad8346c
8e312a2
873e828
4191a74
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| Enhanced phase-dispersion-minimisation search by enabling weighted events. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -91,9 +91,58 @@ def test_stat(self): | |
| def test_pdm_stat(self): | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please test that the new keyword to
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the review on this @matteobachetti! I've now added a new test in 8e312a2. Let me know if it is suitable! It tests two scenarios (using the new variances argument and not) when there is a point that has an extremely high variance, while the other points are very uniform. |
||
| """Test pulse phase calculation, frequency only.""" | ||
| prof = np.array([1, 1, 1, 1, 1]) | ||
| sample_var = 2.0 | ||
| sample_var = 2 | ||
| nsample = 10 | ||
| np.testing.assert_array_almost_equal(pdm_profile_stat(prof, sample_var, nsample), 0.5) | ||
| sum_dev2 = sample_var * (nsample - 1) | ||
| np.testing.assert_array_almost_equal(pdm_profile_stat(prof, sum_dev2, nsample), 0.5) | ||
|
|
||
| def test_weighted_pdm_stat(self): | ||
| """Test the PDM calculation with weighted events (variances).""" | ||
| period = 1.0 | ||
| nbin = 5 | ||
|
|
||
| # Constant signal: flux=10 everywhere. The pdm should be 0. | ||
| times = np.array([0.1, 0.3, 0.5, 0.7, 0.9]) | ||
| fluxes = np.array([10.0, 10.0, 10.0, 10.0, 10.0]) | ||
| std_var = ( | ||
| 0.1 # instead of 1.0, to check that avoiding division by zero isn't changing results | ||
| ) | ||
| variances_base = np.ones_like(fluxes) * std_var | ||
|
|
||
| _, profile_base, _ = fold_events( | ||
| times, 1 / period, nbin=nbin, weights=fluxes, mode="pdm", ref_time=0 | ||
| ) | ||
| assert np.allclose(profile_base, 0.0) # equal with the mean at every bin | ||
|
|
||
| # An outlier point at 0.15 (Bin 0) with flux 1000 | ||
| times_out = np.append(times, 0.15) | ||
| fluxes_out = np.append(fluxes, 1000.0) | ||
|
|
||
| # High variance for the outlier, standard for the rest | ||
| outlier_var = 1.0e6 | ||
| variances = np.append(variances_base, outlier_var) | ||
|
|
||
| # Unweighted PDM on dirty data (should be skewed) | ||
| _, profile_unweighted, _ = fold_events( | ||
| times_out, 1 / period, nbin=nbin, weights=fluxes_out, mode="pdm", ref_time=0 | ||
| ) | ||
| assert profile_unweighted[0] > 1000.0 # the profile bin should deviate from 0 | ||
|
|
||
| # Weighted PDM on dirty data (should effectively ignore outlier) | ||
| _, profile_weighted, _ = fold_events( | ||
| times_out, | ||
| 1 / period, | ||
| nbin=nbin, | ||
| weights=fluxes_out, | ||
| variances=variances, | ||
| mode="pdm", | ||
| ref_time=0, | ||
| ) | ||
| # With high variance, the outlier's weight is tiny. | ||
| # The bin statistics should be dominated by the point with flux 10. | ||
| # The result should be very close to the baseline (0) and smaller than the unweighted result. | ||
| assert np.all(profile_weighted <= profile_unweighted) | ||
| assert np.allclose(profile_weighted, 0.0, atol=2.0) | ||
|
|
||
| def test_zn(self): | ||
| """Test pulse phase calculation, frequency only.""" | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is changing the name of the variable needed? Also, is the variable something different from before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @matteobachetti, thanks for the review. Sorry for the long wait in coming back to this!
The variable name has been changed to reflect that it is no longer the variance but rather the numerator in typical variance calculations, which is the sum of squared deviances, i.e.
(x - xbar)**2. This is calculated in thefold_eventsfunction in the more performant (but made obscure) method. The variance is obtained from the sum of squared deviances by dividing by the number of degrees of freedom. I thought this makes the statistics functionpdm_profile_stata bit more cohesive now as it computes statistical values, using the degrees of freedom information available to it, whereas thefold_eventsfunction is responsible for computing the profile and its higher order moments.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nabilbrice I see. This might be problematic, because as a policy we do not make breaking changes to the code, if not at a major release and after a period where we mark something as deprecated.
Would it be possible to make this new argument optional, and use it if not None instead of sample_var, rather than changing the signature of the function?