Minimum number of points for a good exponential curve fit

Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
We are currently trying to fit data from a process that generates data that decays exponentially. We employ several techniques for fitting this exponential decay in including FFT analysis and a weighted least-squares algorithm. We are attempting to fit a lot of these decays every second (in some cases 8000 decays/s) on varied computing platforms.
We want to minimize our computational time so we now have the question as to how do we determine the minimum number of points required for a "good" fit (I know the term is ambiguous). Just to be clear, we aren't interested in how long the sample required for a good fit is; we want to be able to determine how many points for a given portion of a curve would be required for a good fit. Is there an explicit approach to determining this or will this require modeling?
statistics
add a comment |Â
up vote
0
down vote
favorite
We are currently trying to fit data from a process that generates data that decays exponentially. We employ several techniques for fitting this exponential decay in including FFT analysis and a weighted least-squares algorithm. We are attempting to fit a lot of these decays every second (in some cases 8000 decays/s) on varied computing platforms.
We want to minimize our computational time so we now have the question as to how do we determine the minimum number of points required for a "good" fit (I know the term is ambiguous). Just to be clear, we aren't interested in how long the sample required for a good fit is; we want to be able to determine how many points for a given portion of a curve would be required for a good fit. Is there an explicit approach to determining this or will this require modeling?
statistics
It probably depends on how complicated the function you are trying to fit is and how noisy the data is (e.g. how safe it would be to take a random subsample). What is the model you are fitting? And how many points do you usually get per decay?
â user3658307
Jul 28 '17 at 18:43
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
We are currently trying to fit data from a process that generates data that decays exponentially. We employ several techniques for fitting this exponential decay in including FFT analysis and a weighted least-squares algorithm. We are attempting to fit a lot of these decays every second (in some cases 8000 decays/s) on varied computing platforms.
We want to minimize our computational time so we now have the question as to how do we determine the minimum number of points required for a "good" fit (I know the term is ambiguous). Just to be clear, we aren't interested in how long the sample required for a good fit is; we want to be able to determine how many points for a given portion of a curve would be required for a good fit. Is there an explicit approach to determining this or will this require modeling?
statistics
We are currently trying to fit data from a process that generates data that decays exponentially. We employ several techniques for fitting this exponential decay in including FFT analysis and a weighted least-squares algorithm. We are attempting to fit a lot of these decays every second (in some cases 8000 decays/s) on varied computing platforms.
We want to minimize our computational time so we now have the question as to how do we determine the minimum number of points required for a "good" fit (I know the term is ambiguous). Just to be clear, we aren't interested in how long the sample required for a good fit is; we want to be able to determine how many points for a given portion of a curve would be required for a good fit. Is there an explicit approach to determining this or will this require modeling?
statistics
asked Jul 28 '17 at 17:20
cirrusio
1011
1011
It probably depends on how complicated the function you are trying to fit is and how noisy the data is (e.g. how safe it would be to take a random subsample). What is the model you are fitting? And how many points do you usually get per decay?
â user3658307
Jul 28 '17 at 18:43
add a comment |Â
It probably depends on how complicated the function you are trying to fit is and how noisy the data is (e.g. how safe it would be to take a random subsample). What is the model you are fitting? And how many points do you usually get per decay?
â user3658307
Jul 28 '17 at 18:43
It probably depends on how complicated the function you are trying to fit is and how noisy the data is (e.g. how safe it would be to take a random subsample). What is the model you are fitting? And how many points do you usually get per decay?
â user3658307
Jul 28 '17 at 18:43
It probably depends on how complicated the function you are trying to fit is and how noisy the data is (e.g. how safe it would be to take a random subsample). What is the model you are fitting? And how many points do you usually get per decay?
â user3658307
Jul 28 '17 at 18:43
add a comment |Â
4 Answers
4
active
oldest
votes
up vote
0
down vote
If the data is clean, you only need two points because there are only two degrees of freedom-the original rate and the decay time. You use more points when there is noise in the data to get a better estimate, so the number of points needed depends on how noisy the data is. You can linearize your fit by taking the logairithm, getting $log(textcounts(t))=log (textcounts at t=0)-text(decay rate*time)$ Now a linear least squares fit will give you estimates of the errors in the parameters. I would take a few curves and fit them each with lots of points. You can then take a random sample of the points and see how much the fit changes.
Thanks @RossMilikan. This is data from a real world process, so it is definitely noisy. The issue is that we have computational limits so we are trying to balance goodness of fit with our available computational power. Your answer actually restates the problem and what we are looking for is an answer to this question - is there a mathematical way of determining how our goodness of fit might change with the resolution of the points to fit? When do my returns start to diminish with increasing effort? Hopefully that is clearer.
â cirrusio
Aug 1 '17 at 15:48
add a comment |Â
up vote
0
down vote
This is awfully close to what I'm working on right now! :)
The most important aspect is what the type of noise is. Is it mostly additive (i.e. your signal is $e^a-bx_i + N_i$) or is it multiplicative (i.e. $e^a-bx_i+N_i$)? I suppose that in practice you probably have a bit of both.
If it's multiplicative, then all your data is positively signed, you can easily get a solid fit by taking the logarithm of your data and doing a linear fit. That's very robust and of course very fast to fit!
Additive noise means that later samples will have a much larger relative error than the earlier samples, and so they'll be less useful to the fit.
In the likely scenario where additive noise is very very small for the first many samples (that is: would it be totally ridiculous for the first few samples to drop all the way to zero?), just take the first bunch of samples and fit that by the logarithm-and-linear-regression method. That method is very fast, so I can't really believe that computation time would actually be an issue for anything but the most extremely intense sample rates (> 1 million readings per second). How many samples should you take? That depends on the magnitude of the multiplicative noise, and how quickly the signal decays. As a general sort of estimate, if the signal has a half-life of around $N$ samples, then fitting $N$ samples should give a decent estimate and $5N$ should give quite a confident one.
Finally, one last note: If the additive noise is quite an issue, and you don't care about the magnitude $a$ of the signal (only the decay rate $b$), then you can do averaging on your signal to get a cleaner readout. That is, average each sample with a few before it. It will affect the magnitude $a$, but it will still be an exponential decay with the same decay rate $b$, and this can remove a lot of additive noise.
Thanks @AlexMeiburg. This one actually gets pretty close. But, I think it emphasizes how the digital and analog world diverge. You are correct that if the half-life is N, then 5N should be sufficient; the problem we are facing is that the N parameter (the resolution) is flexible. This is a measurement and we can actually change N for a given decay constant since we are making measurements as a function of time and we simply have to change the resolution of the measurement. Unfortunately, this resolution approaches 0 but we reach a computational limit.
â cirrusio
Aug 1 '17 at 15:40
Ah, gotcha. Well then, ask yourself about the noise level locally, and remember that taking N points will make noise drop by sqrt(N). For instance, if a typical measurement is ±10%, then you can think of taking 100 measurements in one half-life period as giving you a 1% accuracy reading of that. Subsequently, if you measured ~300 points over 300 half-lives, that seems like it should give you ~1% accuracy on the overall rate.
â Alex Meiburg
Aug 1 '17 at 18:34
add a comment |Â
up vote
0
down vote
I'm not sure I understand what you're doing, but perhaps some elementary
relationships will be useful. If none of the answers here are sufficient,
then you might consider posting this question on our sister site.
Let $X_1, X_2, dots, X_n$ be a random sample from
$mathsfExp(mean=mu) equiv mathsfExp(rate=1/mu).$
Then an unbiased estimator of $mu$ is the sample mean $A = bar X.$
An empirical cumulative distribution function (ECDF) estimates the
CDF of the population distribution.
Because $A/mu sim mathsfGamma(shape = n, rate = n),$ we can find
constants $L$ and $U$ that cut off 2.5% of the probability from the
lower and upper tails, respectively, of this distribution so that
$$P(L < A/mu < U) = P(A/U < mu < A/L) = .95$$
and a 95% confidence interval for $mu$ is $(A/U, A/L).$
Below, I simulate a sample of size $n = 10$ from $mathsfExp(mu = 3).$
The figure shows the ECDF of the sample, the CDF for
$mathsfExp(A)$ (dotted red curve) and the CDF for the population
distribution $mathsfExp(3).$ Some simulations showed better fit
and others showed worse fit. For the simulation shown, $A = 3.48$ and the CI for $mu$
is $(2.04, 7.25)$

Here is elementary R code (in R, the second argument of rexp is the
rate parameter):
n = 10; mu = 3; x = rexp(n, 1/mu); a = mean(x)
plot(ecdf(x), ylab="CDF", main="Empirical CDF with Estimated (red) and Exact CDFs")
curve(pexp(x, 1/a), lwd=3, lty="dashed", col="red", n=1001, add=T)
curve(pexp(x, 1/mu), col="blue", n=1001, add=T)
a; a/qgamma(c(.975,.025), n, n)
## 3.479 # est of pop mean
## 2.036312 7.254886 # CI for pop mean
For a sample of $n = 100,$ one simulation gave $A = 3.31$ and CI $(2.74, 4.06).$
[I had to do several simulations to get one that resulted in visible separation among
the ECDF and the two CDFs.]

I'm wondering whether such elementary methods of approximating the actual
CDF by the ECDF or the estimated CDF would be of any use in your project.
add a comment |Â
up vote
-1
down vote
As stated in the original post, the function is a single exponential.
Also, as stated in the original post, we are trying to DETERMINE the number of points needed. As a rough guideline, we sample 100s to 1000s of points for each curve currently. But maybe we only need 10 or 50 or 100.
This is not an answer. Are you the original poster? If so, the FAQ will show you how to merge the accounts, or you can edit the original post using the edit button below.
â Ross Millikan
Jul 28 '17 at 19:02
@RossMillikan - that was not the original poster but a colleague. This should have been a comment for clarification not an answer.
â cirrusio
Aug 1 '17 at 15:41
add a comment |Â
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
If the data is clean, you only need two points because there are only two degrees of freedom-the original rate and the decay time. You use more points when there is noise in the data to get a better estimate, so the number of points needed depends on how noisy the data is. You can linearize your fit by taking the logairithm, getting $log(textcounts(t))=log (textcounts at t=0)-text(decay rate*time)$ Now a linear least squares fit will give you estimates of the errors in the parameters. I would take a few curves and fit them each with lots of points. You can then take a random sample of the points and see how much the fit changes.
Thanks @RossMilikan. This is data from a real world process, so it is definitely noisy. The issue is that we have computational limits so we are trying to balance goodness of fit with our available computational power. Your answer actually restates the problem and what we are looking for is an answer to this question - is there a mathematical way of determining how our goodness of fit might change with the resolution of the points to fit? When do my returns start to diminish with increasing effort? Hopefully that is clearer.
â cirrusio
Aug 1 '17 at 15:48
add a comment |Â
up vote
0
down vote
If the data is clean, you only need two points because there are only two degrees of freedom-the original rate and the decay time. You use more points when there is noise in the data to get a better estimate, so the number of points needed depends on how noisy the data is. You can linearize your fit by taking the logairithm, getting $log(textcounts(t))=log (textcounts at t=0)-text(decay rate*time)$ Now a linear least squares fit will give you estimates of the errors in the parameters. I would take a few curves and fit them each with lots of points. You can then take a random sample of the points and see how much the fit changes.
Thanks @RossMilikan. This is data from a real world process, so it is definitely noisy. The issue is that we have computational limits so we are trying to balance goodness of fit with our available computational power. Your answer actually restates the problem and what we are looking for is an answer to this question - is there a mathematical way of determining how our goodness of fit might change with the resolution of the points to fit? When do my returns start to diminish with increasing effort? Hopefully that is clearer.
â cirrusio
Aug 1 '17 at 15:48
add a comment |Â
up vote
0
down vote
up vote
0
down vote
If the data is clean, you only need two points because there are only two degrees of freedom-the original rate and the decay time. You use more points when there is noise in the data to get a better estimate, so the number of points needed depends on how noisy the data is. You can linearize your fit by taking the logairithm, getting $log(textcounts(t))=log (textcounts at t=0)-text(decay rate*time)$ Now a linear least squares fit will give you estimates of the errors in the parameters. I would take a few curves and fit them each with lots of points. You can then take a random sample of the points and see how much the fit changes.
If the data is clean, you only need two points because there are only two degrees of freedom-the original rate and the decay time. You use more points when there is noise in the data to get a better estimate, so the number of points needed depends on how noisy the data is. You can linearize your fit by taking the logairithm, getting $log(textcounts(t))=log (textcounts at t=0)-text(decay rate*time)$ Now a linear least squares fit will give you estimates of the errors in the parameters. I would take a few curves and fit them each with lots of points. You can then take a random sample of the points and see how much the fit changes.
answered Jul 28 '17 at 19:12
Ross Millikan
277k21187354
277k21187354
Thanks @RossMilikan. This is data from a real world process, so it is definitely noisy. The issue is that we have computational limits so we are trying to balance goodness of fit with our available computational power. Your answer actually restates the problem and what we are looking for is an answer to this question - is there a mathematical way of determining how our goodness of fit might change with the resolution of the points to fit? When do my returns start to diminish with increasing effort? Hopefully that is clearer.
â cirrusio
Aug 1 '17 at 15:48
add a comment |Â
Thanks @RossMilikan. This is data from a real world process, so it is definitely noisy. The issue is that we have computational limits so we are trying to balance goodness of fit with our available computational power. Your answer actually restates the problem and what we are looking for is an answer to this question - is there a mathematical way of determining how our goodness of fit might change with the resolution of the points to fit? When do my returns start to diminish with increasing effort? Hopefully that is clearer.
â cirrusio
Aug 1 '17 at 15:48
Thanks @RossMilikan. This is data from a real world process, so it is definitely noisy. The issue is that we have computational limits so we are trying to balance goodness of fit with our available computational power. Your answer actually restates the problem and what we are looking for is an answer to this question - is there a mathematical way of determining how our goodness of fit might change with the resolution of the points to fit? When do my returns start to diminish with increasing effort? Hopefully that is clearer.
â cirrusio
Aug 1 '17 at 15:48
Thanks @RossMilikan. This is data from a real world process, so it is definitely noisy. The issue is that we have computational limits so we are trying to balance goodness of fit with our available computational power. Your answer actually restates the problem and what we are looking for is an answer to this question - is there a mathematical way of determining how our goodness of fit might change with the resolution of the points to fit? When do my returns start to diminish with increasing effort? Hopefully that is clearer.
â cirrusio
Aug 1 '17 at 15:48
add a comment |Â
up vote
0
down vote
This is awfully close to what I'm working on right now! :)
The most important aspect is what the type of noise is. Is it mostly additive (i.e. your signal is $e^a-bx_i + N_i$) or is it multiplicative (i.e. $e^a-bx_i+N_i$)? I suppose that in practice you probably have a bit of both.
If it's multiplicative, then all your data is positively signed, you can easily get a solid fit by taking the logarithm of your data and doing a linear fit. That's very robust and of course very fast to fit!
Additive noise means that later samples will have a much larger relative error than the earlier samples, and so they'll be less useful to the fit.
In the likely scenario where additive noise is very very small for the first many samples (that is: would it be totally ridiculous for the first few samples to drop all the way to zero?), just take the first bunch of samples and fit that by the logarithm-and-linear-regression method. That method is very fast, so I can't really believe that computation time would actually be an issue for anything but the most extremely intense sample rates (> 1 million readings per second). How many samples should you take? That depends on the magnitude of the multiplicative noise, and how quickly the signal decays. As a general sort of estimate, if the signal has a half-life of around $N$ samples, then fitting $N$ samples should give a decent estimate and $5N$ should give quite a confident one.
Finally, one last note: If the additive noise is quite an issue, and you don't care about the magnitude $a$ of the signal (only the decay rate $b$), then you can do averaging on your signal to get a cleaner readout. That is, average each sample with a few before it. It will affect the magnitude $a$, but it will still be an exponential decay with the same decay rate $b$, and this can remove a lot of additive noise.
Thanks @AlexMeiburg. This one actually gets pretty close. But, I think it emphasizes how the digital and analog world diverge. You are correct that if the half-life is N, then 5N should be sufficient; the problem we are facing is that the N parameter (the resolution) is flexible. This is a measurement and we can actually change N for a given decay constant since we are making measurements as a function of time and we simply have to change the resolution of the measurement. Unfortunately, this resolution approaches 0 but we reach a computational limit.
â cirrusio
Aug 1 '17 at 15:40
Ah, gotcha. Well then, ask yourself about the noise level locally, and remember that taking N points will make noise drop by sqrt(N). For instance, if a typical measurement is ±10%, then you can think of taking 100 measurements in one half-life period as giving you a 1% accuracy reading of that. Subsequently, if you measured ~300 points over 300 half-lives, that seems like it should give you ~1% accuracy on the overall rate.
â Alex Meiburg
Aug 1 '17 at 18:34
add a comment |Â
up vote
0
down vote
This is awfully close to what I'm working on right now! :)
The most important aspect is what the type of noise is. Is it mostly additive (i.e. your signal is $e^a-bx_i + N_i$) or is it multiplicative (i.e. $e^a-bx_i+N_i$)? I suppose that in practice you probably have a bit of both.
If it's multiplicative, then all your data is positively signed, you can easily get a solid fit by taking the logarithm of your data and doing a linear fit. That's very robust and of course very fast to fit!
Additive noise means that later samples will have a much larger relative error than the earlier samples, and so they'll be less useful to the fit.
In the likely scenario where additive noise is very very small for the first many samples (that is: would it be totally ridiculous for the first few samples to drop all the way to zero?), just take the first bunch of samples and fit that by the logarithm-and-linear-regression method. That method is very fast, so I can't really believe that computation time would actually be an issue for anything but the most extremely intense sample rates (> 1 million readings per second). How many samples should you take? That depends on the magnitude of the multiplicative noise, and how quickly the signal decays. As a general sort of estimate, if the signal has a half-life of around $N$ samples, then fitting $N$ samples should give a decent estimate and $5N$ should give quite a confident one.
Finally, one last note: If the additive noise is quite an issue, and you don't care about the magnitude $a$ of the signal (only the decay rate $b$), then you can do averaging on your signal to get a cleaner readout. That is, average each sample with a few before it. It will affect the magnitude $a$, but it will still be an exponential decay with the same decay rate $b$, and this can remove a lot of additive noise.
Thanks @AlexMeiburg. This one actually gets pretty close. But, I think it emphasizes how the digital and analog world diverge. You are correct that if the half-life is N, then 5N should be sufficient; the problem we are facing is that the N parameter (the resolution) is flexible. This is a measurement and we can actually change N for a given decay constant since we are making measurements as a function of time and we simply have to change the resolution of the measurement. Unfortunately, this resolution approaches 0 but we reach a computational limit.
â cirrusio
Aug 1 '17 at 15:40
Ah, gotcha. Well then, ask yourself about the noise level locally, and remember that taking N points will make noise drop by sqrt(N). For instance, if a typical measurement is ±10%, then you can think of taking 100 measurements in one half-life period as giving you a 1% accuracy reading of that. Subsequently, if you measured ~300 points over 300 half-lives, that seems like it should give you ~1% accuracy on the overall rate.
â Alex Meiburg
Aug 1 '17 at 18:34
add a comment |Â
up vote
0
down vote
up vote
0
down vote
This is awfully close to what I'm working on right now! :)
The most important aspect is what the type of noise is. Is it mostly additive (i.e. your signal is $e^a-bx_i + N_i$) or is it multiplicative (i.e. $e^a-bx_i+N_i$)? I suppose that in practice you probably have a bit of both.
If it's multiplicative, then all your data is positively signed, you can easily get a solid fit by taking the logarithm of your data and doing a linear fit. That's very robust and of course very fast to fit!
Additive noise means that later samples will have a much larger relative error than the earlier samples, and so they'll be less useful to the fit.
In the likely scenario where additive noise is very very small for the first many samples (that is: would it be totally ridiculous for the first few samples to drop all the way to zero?), just take the first bunch of samples and fit that by the logarithm-and-linear-regression method. That method is very fast, so I can't really believe that computation time would actually be an issue for anything but the most extremely intense sample rates (> 1 million readings per second). How many samples should you take? That depends on the magnitude of the multiplicative noise, and how quickly the signal decays. As a general sort of estimate, if the signal has a half-life of around $N$ samples, then fitting $N$ samples should give a decent estimate and $5N$ should give quite a confident one.
Finally, one last note: If the additive noise is quite an issue, and you don't care about the magnitude $a$ of the signal (only the decay rate $b$), then you can do averaging on your signal to get a cleaner readout. That is, average each sample with a few before it. It will affect the magnitude $a$, but it will still be an exponential decay with the same decay rate $b$, and this can remove a lot of additive noise.
This is awfully close to what I'm working on right now! :)
The most important aspect is what the type of noise is. Is it mostly additive (i.e. your signal is $e^a-bx_i + N_i$) or is it multiplicative (i.e. $e^a-bx_i+N_i$)? I suppose that in practice you probably have a bit of both.
If it's multiplicative, then all your data is positively signed, you can easily get a solid fit by taking the logarithm of your data and doing a linear fit. That's very robust and of course very fast to fit!
Additive noise means that later samples will have a much larger relative error than the earlier samples, and so they'll be less useful to the fit.
In the likely scenario where additive noise is very very small for the first many samples (that is: would it be totally ridiculous for the first few samples to drop all the way to zero?), just take the first bunch of samples and fit that by the logarithm-and-linear-regression method. That method is very fast, so I can't really believe that computation time would actually be an issue for anything but the most extremely intense sample rates (> 1 million readings per second). How many samples should you take? That depends on the magnitude of the multiplicative noise, and how quickly the signal decays. As a general sort of estimate, if the signal has a half-life of around $N$ samples, then fitting $N$ samples should give a decent estimate and $5N$ should give quite a confident one.
Finally, one last note: If the additive noise is quite an issue, and you don't care about the magnitude $a$ of the signal (only the decay rate $b$), then you can do averaging on your signal to get a cleaner readout. That is, average each sample with a few before it. It will affect the magnitude $a$, but it will still be an exponential decay with the same decay rate $b$, and this can remove a lot of additive noise.
answered Jul 28 '17 at 19:19
Alex Meiburg
1,795516
1,795516
Thanks @AlexMeiburg. This one actually gets pretty close. But, I think it emphasizes how the digital and analog world diverge. You are correct that if the half-life is N, then 5N should be sufficient; the problem we are facing is that the N parameter (the resolution) is flexible. This is a measurement and we can actually change N for a given decay constant since we are making measurements as a function of time and we simply have to change the resolution of the measurement. Unfortunately, this resolution approaches 0 but we reach a computational limit.
â cirrusio
Aug 1 '17 at 15:40
Ah, gotcha. Well then, ask yourself about the noise level locally, and remember that taking N points will make noise drop by sqrt(N). For instance, if a typical measurement is ±10%, then you can think of taking 100 measurements in one half-life period as giving you a 1% accuracy reading of that. Subsequently, if you measured ~300 points over 300 half-lives, that seems like it should give you ~1% accuracy on the overall rate.
â Alex Meiburg
Aug 1 '17 at 18:34
add a comment |Â
Thanks @AlexMeiburg. This one actually gets pretty close. But, I think it emphasizes how the digital and analog world diverge. You are correct that if the half-life is N, then 5N should be sufficient; the problem we are facing is that the N parameter (the resolution) is flexible. This is a measurement and we can actually change N for a given decay constant since we are making measurements as a function of time and we simply have to change the resolution of the measurement. Unfortunately, this resolution approaches 0 but we reach a computational limit.
â cirrusio
Aug 1 '17 at 15:40
Ah, gotcha. Well then, ask yourself about the noise level locally, and remember that taking N points will make noise drop by sqrt(N). For instance, if a typical measurement is ±10%, then you can think of taking 100 measurements in one half-life period as giving you a 1% accuracy reading of that. Subsequently, if you measured ~300 points over 300 half-lives, that seems like it should give you ~1% accuracy on the overall rate.
â Alex Meiburg
Aug 1 '17 at 18:34
Thanks @AlexMeiburg. This one actually gets pretty close. But, I think it emphasizes how the digital and analog world diverge. You are correct that if the half-life is N, then 5N should be sufficient; the problem we are facing is that the N parameter (the resolution) is flexible. This is a measurement and we can actually change N for a given decay constant since we are making measurements as a function of time and we simply have to change the resolution of the measurement. Unfortunately, this resolution approaches 0 but we reach a computational limit.
â cirrusio
Aug 1 '17 at 15:40
Thanks @AlexMeiburg. This one actually gets pretty close. But, I think it emphasizes how the digital and analog world diverge. You are correct that if the half-life is N, then 5N should be sufficient; the problem we are facing is that the N parameter (the resolution) is flexible. This is a measurement and we can actually change N for a given decay constant since we are making measurements as a function of time and we simply have to change the resolution of the measurement. Unfortunately, this resolution approaches 0 but we reach a computational limit.
â cirrusio
Aug 1 '17 at 15:40
Ah, gotcha. Well then, ask yourself about the noise level locally, and remember that taking N points will make noise drop by sqrt(N). For instance, if a typical measurement is ±10%, then you can think of taking 100 measurements in one half-life period as giving you a 1% accuracy reading of that. Subsequently, if you measured ~300 points over 300 half-lives, that seems like it should give you ~1% accuracy on the overall rate.
â Alex Meiburg
Aug 1 '17 at 18:34
Ah, gotcha. Well then, ask yourself about the noise level locally, and remember that taking N points will make noise drop by sqrt(N). For instance, if a typical measurement is ±10%, then you can think of taking 100 measurements in one half-life period as giving you a 1% accuracy reading of that. Subsequently, if you measured ~300 points over 300 half-lives, that seems like it should give you ~1% accuracy on the overall rate.
â Alex Meiburg
Aug 1 '17 at 18:34
add a comment |Â
up vote
0
down vote
I'm not sure I understand what you're doing, but perhaps some elementary
relationships will be useful. If none of the answers here are sufficient,
then you might consider posting this question on our sister site.
Let $X_1, X_2, dots, X_n$ be a random sample from
$mathsfExp(mean=mu) equiv mathsfExp(rate=1/mu).$
Then an unbiased estimator of $mu$ is the sample mean $A = bar X.$
An empirical cumulative distribution function (ECDF) estimates the
CDF of the population distribution.
Because $A/mu sim mathsfGamma(shape = n, rate = n),$ we can find
constants $L$ and $U$ that cut off 2.5% of the probability from the
lower and upper tails, respectively, of this distribution so that
$$P(L < A/mu < U) = P(A/U < mu < A/L) = .95$$
and a 95% confidence interval for $mu$ is $(A/U, A/L).$
Below, I simulate a sample of size $n = 10$ from $mathsfExp(mu = 3).$
The figure shows the ECDF of the sample, the CDF for
$mathsfExp(A)$ (dotted red curve) and the CDF for the population
distribution $mathsfExp(3).$ Some simulations showed better fit
and others showed worse fit. For the simulation shown, $A = 3.48$ and the CI for $mu$
is $(2.04, 7.25)$

Here is elementary R code (in R, the second argument of rexp is the
rate parameter):
n = 10; mu = 3; x = rexp(n, 1/mu); a = mean(x)
plot(ecdf(x), ylab="CDF", main="Empirical CDF with Estimated (red) and Exact CDFs")
curve(pexp(x, 1/a), lwd=3, lty="dashed", col="red", n=1001, add=T)
curve(pexp(x, 1/mu), col="blue", n=1001, add=T)
a; a/qgamma(c(.975,.025), n, n)
## 3.479 # est of pop mean
## 2.036312 7.254886 # CI for pop mean
For a sample of $n = 100,$ one simulation gave $A = 3.31$ and CI $(2.74, 4.06).$
[I had to do several simulations to get one that resulted in visible separation among
the ECDF and the two CDFs.]

I'm wondering whether such elementary methods of approximating the actual
CDF by the ECDF or the estimated CDF would be of any use in your project.
add a comment |Â
up vote
0
down vote
I'm not sure I understand what you're doing, but perhaps some elementary
relationships will be useful. If none of the answers here are sufficient,
then you might consider posting this question on our sister site.
Let $X_1, X_2, dots, X_n$ be a random sample from
$mathsfExp(mean=mu) equiv mathsfExp(rate=1/mu).$
Then an unbiased estimator of $mu$ is the sample mean $A = bar X.$
An empirical cumulative distribution function (ECDF) estimates the
CDF of the population distribution.
Because $A/mu sim mathsfGamma(shape = n, rate = n),$ we can find
constants $L$ and $U$ that cut off 2.5% of the probability from the
lower and upper tails, respectively, of this distribution so that
$$P(L < A/mu < U) = P(A/U < mu < A/L) = .95$$
and a 95% confidence interval for $mu$ is $(A/U, A/L).$
Below, I simulate a sample of size $n = 10$ from $mathsfExp(mu = 3).$
The figure shows the ECDF of the sample, the CDF for
$mathsfExp(A)$ (dotted red curve) and the CDF for the population
distribution $mathsfExp(3).$ Some simulations showed better fit
and others showed worse fit. For the simulation shown, $A = 3.48$ and the CI for $mu$
is $(2.04, 7.25)$

Here is elementary R code (in R, the second argument of rexp is the
rate parameter):
n = 10; mu = 3; x = rexp(n, 1/mu); a = mean(x)
plot(ecdf(x), ylab="CDF", main="Empirical CDF with Estimated (red) and Exact CDFs")
curve(pexp(x, 1/a), lwd=3, lty="dashed", col="red", n=1001, add=T)
curve(pexp(x, 1/mu), col="blue", n=1001, add=T)
a; a/qgamma(c(.975,.025), n, n)
## 3.479 # est of pop mean
## 2.036312 7.254886 # CI for pop mean
For a sample of $n = 100,$ one simulation gave $A = 3.31$ and CI $(2.74, 4.06).$
[I had to do several simulations to get one that resulted in visible separation among
the ECDF and the two CDFs.]

I'm wondering whether such elementary methods of approximating the actual
CDF by the ECDF or the estimated CDF would be of any use in your project.
add a comment |Â
up vote
0
down vote
up vote
0
down vote
I'm not sure I understand what you're doing, but perhaps some elementary
relationships will be useful. If none of the answers here are sufficient,
then you might consider posting this question on our sister site.
Let $X_1, X_2, dots, X_n$ be a random sample from
$mathsfExp(mean=mu) equiv mathsfExp(rate=1/mu).$
Then an unbiased estimator of $mu$ is the sample mean $A = bar X.$
An empirical cumulative distribution function (ECDF) estimates the
CDF of the population distribution.
Because $A/mu sim mathsfGamma(shape = n, rate = n),$ we can find
constants $L$ and $U$ that cut off 2.5% of the probability from the
lower and upper tails, respectively, of this distribution so that
$$P(L < A/mu < U) = P(A/U < mu < A/L) = .95$$
and a 95% confidence interval for $mu$ is $(A/U, A/L).$
Below, I simulate a sample of size $n = 10$ from $mathsfExp(mu = 3).$
The figure shows the ECDF of the sample, the CDF for
$mathsfExp(A)$ (dotted red curve) and the CDF for the population
distribution $mathsfExp(3).$ Some simulations showed better fit
and others showed worse fit. For the simulation shown, $A = 3.48$ and the CI for $mu$
is $(2.04, 7.25)$

Here is elementary R code (in R, the second argument of rexp is the
rate parameter):
n = 10; mu = 3; x = rexp(n, 1/mu); a = mean(x)
plot(ecdf(x), ylab="CDF", main="Empirical CDF with Estimated (red) and Exact CDFs")
curve(pexp(x, 1/a), lwd=3, lty="dashed", col="red", n=1001, add=T)
curve(pexp(x, 1/mu), col="blue", n=1001, add=T)
a; a/qgamma(c(.975,.025), n, n)
## 3.479 # est of pop mean
## 2.036312 7.254886 # CI for pop mean
For a sample of $n = 100,$ one simulation gave $A = 3.31$ and CI $(2.74, 4.06).$
[I had to do several simulations to get one that resulted in visible separation among
the ECDF and the two CDFs.]

I'm wondering whether such elementary methods of approximating the actual
CDF by the ECDF or the estimated CDF would be of any use in your project.
I'm not sure I understand what you're doing, but perhaps some elementary
relationships will be useful. If none of the answers here are sufficient,
then you might consider posting this question on our sister site.
Let $X_1, X_2, dots, X_n$ be a random sample from
$mathsfExp(mean=mu) equiv mathsfExp(rate=1/mu).$
Then an unbiased estimator of $mu$ is the sample mean $A = bar X.$
An empirical cumulative distribution function (ECDF) estimates the
CDF of the population distribution.
Because $A/mu sim mathsfGamma(shape = n, rate = n),$ we can find
constants $L$ and $U$ that cut off 2.5% of the probability from the
lower and upper tails, respectively, of this distribution so that
$$P(L < A/mu < U) = P(A/U < mu < A/L) = .95$$
and a 95% confidence interval for $mu$ is $(A/U, A/L).$
Below, I simulate a sample of size $n = 10$ from $mathsfExp(mu = 3).$
The figure shows the ECDF of the sample, the CDF for
$mathsfExp(A)$ (dotted red curve) and the CDF for the population
distribution $mathsfExp(3).$ Some simulations showed better fit
and others showed worse fit. For the simulation shown, $A = 3.48$ and the CI for $mu$
is $(2.04, 7.25)$

Here is elementary R code (in R, the second argument of rexp is the
rate parameter):
n = 10; mu = 3; x = rexp(n, 1/mu); a = mean(x)
plot(ecdf(x), ylab="CDF", main="Empirical CDF with Estimated (red) and Exact CDFs")
curve(pexp(x, 1/a), lwd=3, lty="dashed", col="red", n=1001, add=T)
curve(pexp(x, 1/mu), col="blue", n=1001, add=T)
a; a/qgamma(c(.975,.025), n, n)
## 3.479 # est of pop mean
## 2.036312 7.254886 # CI for pop mean
For a sample of $n = 100,$ one simulation gave $A = 3.31$ and CI $(2.74, 4.06).$
[I had to do several simulations to get one that resulted in visible separation among
the ECDF and the two CDFs.]

I'm wondering whether such elementary methods of approximating the actual
CDF by the ECDF or the estimated CDF would be of any use in your project.
edited Jul 29 '17 at 5:03
answered Jul 29 '17 at 4:56
BruceET
33.6k71440
33.6k71440
add a comment |Â
add a comment |Â
up vote
-1
down vote
As stated in the original post, the function is a single exponential.
Also, as stated in the original post, we are trying to DETERMINE the number of points needed. As a rough guideline, we sample 100s to 1000s of points for each curve currently. But maybe we only need 10 or 50 or 100.
This is not an answer. Are you the original poster? If so, the FAQ will show you how to merge the accounts, or you can edit the original post using the edit button below.
â Ross Millikan
Jul 28 '17 at 19:02
@RossMillikan - that was not the original poster but a colleague. This should have been a comment for clarification not an answer.
â cirrusio
Aug 1 '17 at 15:41
add a comment |Â
up vote
-1
down vote
As stated in the original post, the function is a single exponential.
Also, as stated in the original post, we are trying to DETERMINE the number of points needed. As a rough guideline, we sample 100s to 1000s of points for each curve currently. But maybe we only need 10 or 50 or 100.
This is not an answer. Are you the original poster? If so, the FAQ will show you how to merge the accounts, or you can edit the original post using the edit button below.
â Ross Millikan
Jul 28 '17 at 19:02
@RossMillikan - that was not the original poster but a colleague. This should have been a comment for clarification not an answer.
â cirrusio
Aug 1 '17 at 15:41
add a comment |Â
up vote
-1
down vote
up vote
-1
down vote
As stated in the original post, the function is a single exponential.
Also, as stated in the original post, we are trying to DETERMINE the number of points needed. As a rough guideline, we sample 100s to 1000s of points for each curve currently. But maybe we only need 10 or 50 or 100.
As stated in the original post, the function is a single exponential.
Also, as stated in the original post, we are trying to DETERMINE the number of points needed. As a rough guideline, we sample 100s to 1000s of points for each curve currently. But maybe we only need 10 or 50 or 100.
answered Jul 28 '17 at 18:59
TDG
1
1
This is not an answer. Are you the original poster? If so, the FAQ will show you how to merge the accounts, or you can edit the original post using the edit button below.
â Ross Millikan
Jul 28 '17 at 19:02
@RossMillikan - that was not the original poster but a colleague. This should have been a comment for clarification not an answer.
â cirrusio
Aug 1 '17 at 15:41
add a comment |Â
This is not an answer. Are you the original poster? If so, the FAQ will show you how to merge the accounts, or you can edit the original post using the edit button below.
â Ross Millikan
Jul 28 '17 at 19:02
@RossMillikan - that was not the original poster but a colleague. This should have been a comment for clarification not an answer.
â cirrusio
Aug 1 '17 at 15:41
This is not an answer. Are you the original poster? If so, the FAQ will show you how to merge the accounts, or you can edit the original post using the edit button below.
â Ross Millikan
Jul 28 '17 at 19:02
This is not an answer. Are you the original poster? If so, the FAQ will show you how to merge the accounts, or you can edit the original post using the edit button below.
â Ross Millikan
Jul 28 '17 at 19:02
@RossMillikan - that was not the original poster but a colleague. This should have been a comment for clarification not an answer.
â cirrusio
Aug 1 '17 at 15:41
@RossMillikan - that was not the original poster but a colleague. This should have been a comment for clarification not an answer.
â cirrusio
Aug 1 '17 at 15:41
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2375011%2fminimum-number-of-points-for-a-good-exponential-curve-fit%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
It probably depends on how complicated the function you are trying to fit is and how noisy the data is (e.g. how safe it would be to take a random subsample). What is the model you are fitting? And how many points do you usually get per decay?
â user3658307
Jul 28 '17 at 18:43