50 individual magnitudes heavier than an average of 50 magnitudes?

Hello Everybody !
I am confused by this question:
I feel there are two ways to estimate the magnitude of a variable star when you have taken 50 images of it, for example, using VPHOT or another application:

  • Either you download the 50 images and then edit a report with the 50 lines of estimates
  • Or you download the 50 images, stack them, and only then edit a report that gives you a single estimate.

Thus, even if the averages are identical, the two methods are not represented identically on the final graph: the first represents 50 differents estimates and the second a single estimate.
Statistically, on the graph, the first will be more significant (with a higher weight) than the second.This can disturb the calculation of the periodic curve. And yet they are the same thing !

Could someone explain to me how I can edit an average from stacking as significantly as someone else who would edit the 50 individual results?

Please tell me where I am wrong, thank you very much!

Hi,

People often ask whether it’s better to stack a bunch of images first and then do photometry, or to measure each frame individually and just average the results. The annoying but honest answer is: it depends – mainly on what kind of variable you’re observing and what timescales you care about.

If all your frames have the same exposure time and similar quality, then in purely mathematical terms, stacking them with an average and then doing photometry is almost equivalent to doing photometry on each frame and averaging the fluxes afterwards. The SNR improves roughly as √N in both cases. But in real data things are rarely that clean: seeing changes, focus drifts, transparency varies, guiding isn’t perfect, the background changes, some frames get hit by cosmic rays or satellites, etc. When you stack everything into a single image, you lose information about how the star (and the atmosphere) behaved over time between individual exposures. With median stacking, you also gain robustness against outliers, but you pay with some SNR and still end up with a “representative” image, not a time series.

That’s why for serious time-series work the usual approach is: calibrate every frame, do differential photometry on each one, and then decide how to bin in time. For slow, long-period, bright variables (Mira, SR, etc.), having 50×20 s exposures doesn’t mean much relative to a period of hundreds of days. In that regime it’s perfectly reasonable to combine 3–5 frames and treat them as one point in the light curve for that night – you don’t really lose any interesting variability on those short timescales.

For exoplanet transits or eclipsing binaries, things are very different. For cataclysmic variables and other objects with fast, stochastic variability, aggressive averaging is even more problematic. There the goal is to preserve as much time resolution as possible, not to smooth everything until it looks pretty. It’s usually better to choose an exposure time that already gives a usable SNR per frame and, at most, bin 2–3 points later if things are really noisy.

So, very roughly: stacking is great for pretty pictures, and for very slow variables where you only care about a long-term trend. For scientific time-series photometry of most variable stars, the safer workflow is: process each frame individually, do your differential photometry per frame, and then average/bin in time in a way that doesn’t destroy the time structure you actually want to study.

Nikola

2 Likes

Thank you very much for your response!

In the case I presented, I wasn’t talking about short-term variables because, as you rightly say, averaging can disrupt monitoring over time.

What bothers me is that we can represent either individual values or their average. If i well understood.
The report does not specify that the average is derived from a large number of individual values.
Perhaps we should assign a statistical weight to the average to indicate that this average is derived from 50 values for the final curve.

I agree with you that averaging causes a loss of information on inter-subject dispersion.
In my opinion, in this case, the error calculation should be adapted: something like Error= sqrt ( standard deviation(variable)²/ n + 1/SNR²) (?)

The standard deviation for stacking changes theoretical is as follows:

σ_stack:=sqrt(sqr(σ1)+sqr(σ2)…)/N

if the images are taken with the same exposure time and have similar noise level then it simplifies as:

σ_stack:=σ/sqrt(N)

======

Your asking if the following is true:

measure(average(images))=average(measure(images))

Ideally this should be the equal but in practice the measurement of the stacked image result will be better then the average measurement of the individual images. By measuring faint stars you will likely miss some flux which is below the background noise level. This is less the case for the stacked image.

Han

The stacking should be a simple average method. No sigma clip or something else throwing out outliers.

2 Likes

Hi,

For long-period variables I think a single estimation should be the ideal. However, in my case which I use ASTAP for calibrating and estimating, it seems that it requires a few images in order to generate deviations, so for 50 original frames, I’m generating 5 stacked images (10 images each). On the other hand, a researcher who will take these observations is not going to complain if he finds 5 instead of 1 estimation, he probably average them.

Sometimes there are few bad quality frames, so some stacked image can be composed of less than 10 images, but I think it does not matter, just worsens its SNR.

1 Like

Thank you for your reply!

I completely agree with your statistical approach, and thank you for clarifying the precautions regarding stacking!

Indeed, in the case of faint stars, stacking followed by aperture photometry is undoubtedly more “accurate,” as the center of the circles is probably more precisely located in this case!

But no worries for stars of rather high magnitude: the average of 50 values is no different than stacking the average.

In the end, what does the astronomer do for calculating the periodic curve?

He takes the averages of one observer and the individual values of another without differentiating between averages and individual values: that’s what bothers me. Because in this case, the graph is unbalanced in the calculation of the periodic curve: the point corresponding to the average is less “weighty” statistically than the individual points.

For example, in classical statistics: this is equivalent to mixing average values with individual values in the calculation of a least squares regression line: this approach would be considered absurd!

In the AAVSO report, there is no way of indicating the number of values used to calculate an average: values and averages can be mixed.Am I wrong?

Thank you very much!

I also imagine that the researcher takes the results without differentiating between the average and individual values (how could he differentiate them???)!

He averages everything: that’s what bothers me:

5 values have a greater weight than a result obtained from calculating 5 values! In statistics, to balance things out, we can theoretically assign a weight to each result. The ideal would be to assign a weight of “1” to each individual value and a weight of “5” to the average.

Even with a weighting, the method is not sound because the dispersion aspect is very different: the average masks the dispersion of the five values…

In fact, the methods should not be mixed.

Seems to me there is a 3rd option - to report the average of the 50 measurements.
In any case, would not the error estimates on the reported values not relate the amount of averaging done? You would report 50 relatively poor measurements, or one measurement (either from averaging the individual msmts or from the stacked image) with a much lower error estimate. When the researcher does their averaging they will presumably weight each data point according to the inverse of its variance thereby properly giving higher weight to the averaged values.

1 Like

To Richard Wagner
A priori, weighting by the inverse of the error could be an excellent approach!
Typically, 50 values are more dispersed (standard deviation) than the average of the 50 values (standard deviation / 7.1), so it fits!

The error calculation is not exactly the same in VPHOT, but it could work: VPHOT associates the value 1/SNR with the error. Thus,
SNR(1 value)<SNR(1 average) => Error(1value)>Error(1 average) with Error=1/SNR

But this does not work if we mix reports based on a single comparison star and those based on several comparison stars (ENSEMBLE). From memory (tell me if I’m wrong):
Error (1 value+1 comp) = 1/SNR
Error ( 1 mean+ x comp ) =SQRT( standard deviation(x estimation)²+1/SNR² ) (*)

In this specific case, Error(value) < Error(mean): weighting no longer works.

(*) In a strict statistical sense, the second formula is not valid:
Error =sqrt( Standard deviation(x estimations)²/n + 1/SNR²)
but apparently it is usual to use the standard deviation rather than the standard deviation / sqrt(n): this maximizes the error, that’s for sure. It would be a kind of precautionary principle (?), but it’s frustrating because it reduces the ‘real’ accuracy of the estimation.

I find it very instructive to look at ‘real’ accuracy by paying attention to check star measured magnitudes and by using photometric standards as targets in experiments.

1 Like