Is there a statistically significant increase in phrase occurrences?

kmrstats · Jan 29, 2008

Hi -

First timer here. Excuse me if this question is not up to the level i see posted on this forum, but here goes.

I have been asked to provide a daily signal generated from the number of occurrences of a set of specified phrases present in a news data feed. The first thing I did is generate a moving average from the daily count of each phrase in the feed and generate a signal if the current count was above the moving average by a specified percentage. Using this approach I didn't think the signal provided much value beacuse the phrase counts are very bursty. The count can be in the low teens for a number of days in a row and then jump to a 100 for a couple of days and then settle back into the low teens.

What type of statistics should I use to determine a statistically significant event given my scenario described above?

Thanks in advance

EnumaElish · Jan 30, 2008

One way is to:
1. calculate the historical average up to day t: HA(t) = [itex]\left.\sum_{s=1}^t n_s\right/t[/itex], where n_s is the number of occurrences on day s
2. calculate the historical standard deviation HSD(t) similarly
3. test whether n_t is > HA(t) + 2 HSD(t).

Is there a statistically significant increase in phrase occurrences?

FAQ: Is there a statistically significant increase in phrase occurrences?

What is bursty data?

How is bursty data different from traditional data?

What statistical methods are used for analyzing bursty data?

What are some real-world applications of bursty data analysis?

How can bursty data be managed and minimized?

Similar threads

Hot Threads

Recent Insights