# discovering patterns in real-valued time series[精选推荐pdf]

Discovering Patterns in Real-Valued Time SeriesJoe Catalano, Tom Armstrong, and Tim OatesUniversity of Maryland Baltimore County Baltimore, MD 21250 USA {jcat1, arm1, oates}umbc.eduAbstract. This paper describes an algorithm for discovering variable length patterns in real-valued time series. In contrast to most existingpattern discovery algorithms, ours does not first discretize the data, runs in linear time, and requires constant memory. These properties are ob- tained by sampling the data stream rather than processing all of the data. Empirical results show that the algorithm pers well on both synthetic and real data when compared to an exhaustive algorithm.1IntroductionMany of the data generated and stored by science, government, and industry are multi-variate, real-valued, and streaming. These time series data come fromsuch diverse sources as financial markets, climatological satellites, and medical devices. The potential uses of time series data are as varied as their sources. Insome cases, the goal is to make accurate predictions e.g., finding patterns inthe fluctuations of the price of one stock that predict the price of another stock 3 days hence. In other cases, the goal is to gain a deeper understanding of the underlying system generating the data. This paper is concerned with the lattertask, and presents a novel algorithm for finding recurring patterns sometimes called motifs [1] in time series. Most algorithms for discovering patterns in time series have one or more of the following characteristics. The most common characteristic is an inability to work with real-valued data except through prior discretization [2,3]. Even in those cases where real-valued data are acceptable, multi-variate data typically are not [4]. The algorithms also tend to be batch [5], rather than incremental, which poses problems when the datasets are large or come from a high-volume stream. Finally, there are often assumptions about the number or temporal ex- tent of patterns that exist in the data [6]. In contrast, we present an incrementalalgorithm with linear time and constant memory requirements for finding recur- ring patterns in real-valued, multi-variate streaming data wherein the number and temporal extent of the patterns are not known in advance. The algorithm obtains these desirable properties by sampling. Given a datastream, a user-specified number of large candidate windows of data are sampled from the stream, sub-windowed, and stored. As time progresses, comparison win- dows of the same size are periodically sampled, sub-windowed, compared to allof the sub-windows from each of the candidate windows, and finally discarded.J. F¨ urnkranz, T. Scheffer, and M. Spiliopoulou Eds. PKDD 2006, LNAI 4213, pp. 462–469, 2006. c Springer-Verlag Berlin Heidelberg 2006Discovering Patterns in Real-Valued Time Series463The similarity scores according to DTW of the k most similar comparison sub-windows are kept for each candidate sub-window. If a candidate window contains an instance of a pattern, then some of its sub-windows will contain parts of that pattern, and the sampling process will yield comparison windows and sub-windows with the same properties. To distinguish patterns from noise, the mean similarity of the best matches for each candidate sub-window is com- pared to a distribution of mean similarities constructed so as to enforce the null hypothesis that the matching sub-windows do not contain instances of the same pattern. When the null hypothesis is rejected, the candidate sub-window prob- ably contains a part of a pattern instance, and adjacent sub-windows with this property are “stitched” together to obtain the entire pattern. Empirical results with a variety of datasets are reported. First, synthetic datasets in which known patterns are added to a background of noise are used to explore the ability of the algorithm to discover patterns of varying lengthsand frequencies of occurrence in the face of changes to user-defined parameters. Second, the algorithm is run on real datasets for which the “true” patterns are not known. The discovered patterns are compared to those found by an excep- tionally expensive algorithm that pers an exhaustive search for patterns ofall lengths in the data. Results show that our algorithm finds many of the same patterns found by the exhaustive algorithm, but with limited computation and memory. The remainder of this paper is organized as follows. Section 2 describes ap- proachesto discoveringpatterns in data throughdiscretization. Section 3 presents our sampling algorithm and complexity analysis. Section 4 contains empirical re- sults of our algorithm on real-valued data. Finally, section 5 summarizes our con- tribution and points to future research directions.2Related WorkExtensive work has been done in the area of time series data mining, but little of it has focused on mining recurring patterns in real-valued time series. Some have applied clustering techniques to time series to mine interesting features of the data. Dynamic Time Warping DTW is used in Oates et al. [7] to cluster the experiences of a mobile robot, using robotic sensor data as the source of the time series, and in [8] to cluster multi-variate real-valued time series produced by selecting one of k HMMs. While not focused on pattern discovery, they establish that DTW can reliably be used as a similarity measure of real-valued multi- variate data. Colomer et al. [2] use DTW to classify patterns belonging to the same class of operating situations in a level control system. Unlike our approach, they apply DTW to episodes rather than to the original time series. Keogh and Pazzani [3]propose a modification to DTW that operates on a piecewise linear representa-tion of the data. Again this differs from our approach as it does not operate on the raw observations.464J. Catalano, T. Armstrong, and T. OatesLin et al. [4] define a motif as a previously unknown frequently occurringpattern, and introduce a discovery that uses a modified Euclidean dis- tance function to improve the complexity of the search. To achieve this perfor- mance they must reduce the dimensionality of the time series by means of a piecewise aggregate approximation and then further trans the time series into a discrete representation. Chiu et al. [1] extend Lin’s work, addressing the scalability of the motif discovery algorithm and its ability to discover motifs when noise is present in the time series. Lin et al. [9] extend the symbolic rep- resentation introduced in [4] to better work with algorithms for streaming time series. Finally, Oates [5] investigates a variation of the pattern discovery problem,wherein they determine what makes a set of sequences different from other se- quences obtained from the same source. The approach operates on multi-variate real-valued time series and uses DTW as the similarity measure, but does not use a sampling approach.3AlgorithmOur sampling algorithm, figure 1, accomplishes two main tasks; the first is to discover windows from a time series that contain pattern instances and the sec-ond is to remove noise prefixes and suffixes in windows containing patterns. Itaccomplishes these goals by repeatedly sampling fixed windows of the time series looking for pairs of windows that have a small distance under DTW. We use the distance between the windows as a measure of similarity, with large distances indicating dissimilarity. We further constrain the algorithm by requiring it to discover patterns without a priori knowledge of their existence. We must thendefine some threshold for rejecting a match when the DTW distance between the windows is too large. Since the goal is to distinguish patterns from noise, a distri- bution of DTW distances between noisy windows is computed and used as a refer- ence for thresholding the quality of a match. The algorithm pers these tasksusing bounded time and memory by fixing the number and size of the windows that are compared. For an incremental version we sample windows on demand.3.1Noise DistributionThe sampling algorithm has no a priori knowledge of the existence of patterns inthe time series. We define the null hypothesis to be that two randomly sampled windows do not contain instances of the same pattern. A distribution of window similarities must be computed as a basis for rejecting the null hypothesis when two windows contain a pattern instance. This is problematic because the core assumption is that given a large enough window of time series, it will contain some or all of a pattern instance with high probability if the patterns occur frequently. We create windows containing non-contiguous observations from the time series which ensures that these windows contain a pattern instance with low probability. That is, we create a noise window of length w by randomly samplingDiscovering Patterns in Real-Valued Time Series465and concatenating w individual observations. Warping these noise windows with normal windows gives us a null hypothesis distribution.3.2Pattern DiscoveryIn the pattern discovery phase of the algorithm, we repeatedly sample candidate windows from the time series and store the k-best matching comparison windows to each. Our goal is to identify frequently occurring patterns, thus we rely onsampling a sufficient number of windows to increase the probability of capturing multiple instances of a pattern. For the algorithm to be successful, the window length we choose must be large enough to fully contain any pattern we expect tofind, but this also means the windows will contain noise as well. Having noise inthe windows will increase the distance between them making it difficult to iden- tify legitimate patterns. This is in addition to the problem of extracting only the pattern from the window. To address both of these problems we consider sub-windows. The relatively small size of a sub-window makes it useful for ad- dressing this issue of granularity. Large windows contain noise and a large ratio of noise to pattern will yield poor results under any distance measure because the percentage of the window that will not have a strong matching region in another window will be great. The patterns are discovered as follows. Create two sets of sub-windows, the candidate set, denoted candSW, and the comparison set, denoted compSW. It is the candidate set from which we reject sub-windows that come from the noise distribution and identify patterns from the remaining sub-windows. To populate these sets, sample a window W having length w and generate all sub-windows Wi having length ¯ w. This yields w− ¯ w1 sub-windows which are added to eitherset candSW or compSW and repeat this process to a specified bound. Nor- malize each sub-window to have mean 0 and standard deviation 1. When both sets are populated, apply DTW to all pairs of sub-windows between the two sets. Group the sub-windows in compSW by the window from which they were cre- ated and add only the best Wiin compSW from each group to the list of matches for Wito which it is being compared. After warping all pairs, reduce the list of matches for each Wiin candSW to the k with the smallest distance measures.In the final step of the algorithm, bad matches are removed from the candi- date set. Recall the noise distribution that was created by warping normal sub- windows to sub-windows containing pure noise. As with candidate sub-windows, keep only the k-best matches for each noise sub-window. Sort the noise set by in- creasing average distance and a rejection threshold, γ, is established. Given some α, γ α ∗ nthaverage warp distance where n is the number of noise sub- windows. Reject those sub-windows that have an average warp distance greater than γ on the basis that the observed value is common in the noise distribu- tion and therefore is not likely to be a pattern instance. After removing the sub-windows containing bad matches, repeat the entire process using the now smaller candidate set, the same noise set, and a new comparison set. The pro- cess of eliminating candidate sub-windows is inherently error-prone because the comparison windows are randomly chosen and therefore we may not get enough466J. Catalano, T. Armstrong, and T. Oateswindows containing patterns to accurately accept or reject the candidate sub- windows. By running multiple iterations of the algorithm we reduce the amount of error introduced into the process.samplingtimeSeries,w, ¯ w,iterations,alpha 1amtToSample ← amountToSampletimeSeries, w, 50 2compSW ← createSWSettimeSeries, w, ¯ w, amtToSample 3candSW ← createSWSettimeSeries, w, ¯ w, amtToSample 4noiseSW ← createNonContiguousSWSettimeSeries, w 5combSW ← {∅} 6addAllnoiseSW, combSW 7addAllcandSW, combSW 8for i ← 1 to iterations 9do 10if iterations 1 11then 12compSW ← createSWSettimeSeries, w, ¯ w, amtToSample 13compareAllSubWindowPairscandSW , compSW 14else 15compareAllSubWindowPairscombSW, compSW 16 17removeRejectsalpha, candSW, noiseSW 18Fig.1. Batch Mode Pattern DiscoveryWhen all iterations have completed we then stitch together the remaining candidate sub-windows to patterns. A pattern is ed by combining all overlapping and adjacent sub-windows. The resulting window is then considered to be a complete pattern instance. The number of subwindows and the number of comparisons by DTW dom- inates the space and time complexity, respectively. DTW has quadratic time and space complexity, but the algorithm only considers pairs of subwindows of length ¯ w, resulting in constant time and space costs. The algorithm samples n candidate windows of length w and m comparison windows of length w requir- ing Onand Omtime and space. Each window of length w has w − ¯ w 1 subwindows of length ¯ w. Therefore, there are a total of n ∗ m ∗ w − ¯ w 12 subwindows to compare and store, or Onmtime and space. When considering the incremental version of the algorithm, m 1 so the complexity is linear in the number of candidate windows.4ExperimentsWe have uated our algorithm by studying perance on a synthesized time series with a recurring, embedded noisy pattern. Then we explore the uni-variate time series of the Standard Poor’s 500.Discovering Patterns in Real-Valued Time Series467We generated a uni-variate time series of 10,000 observations sampled uni-ly in the range [0,10]. The first 18 observations were used as a template for the pattern to be embedded in the data. The template is duplicated 49 times and noise chosen unily in the range [−1.0,1.0] is added to each observation in every copy of the template. We then insert the copies into the time series starting at position 100 and incrementing by 100 until all 49 have been embedded. We ran the sampling algorithm on the synthetic data varying the values for ¯ wand α and we fixed w as 10∗¯ w. To ¯ w we assigned values from the set {5,10,15,20} and to α values from the set {0.05,0.1