Why Clean Data is a
Mandatory Pre-Analysis Step
Futures Magazine's September 99 issue featured a cover
story about the accuracy of the top end-of-day data providers for investors. We are
pleased to report that CSI was the undeniable champion in terms of data accuracy and in
other ways that might surprise you. Please see the comparative rankings
of US data firms shown
below that was compiled from information supplied in the Sheldon Knight study.
Although we are admittedly biased, we found the results of
Mr. Knights analysis very interesting, even compelling: The largest market data
firms in the US just didn't stack up next to CSIs stellar performance. Not only did
CSI dramatically outdistance all of the competition with the least number of errors
overall, but we did so with zero omissions. According to Sheldon Knight, The data
management functions of [CSIs] Unfair Advantage are by far the most flexible tested,
and the database is one of the most comprehensive. Great data and great software;
what more can we say? We would like to disclose a little more about the differences that
make CSI the best data source in the industry. Its all in the details
On Data Accuracy
In the study, there were collectively 1,203 errors and
omissions noted from among the ten firms tested. The bottom line for CSI was the committal
of 27 errors and omissions in the 1,506-day test. Dividing the remaining errors among the
nine competing firms, we find that they had an average of 131 errors each in the same time
period, which demonstrates an error rate of the average CSI competitor that is 385% higher
than CSIs.
CSIs 18 errors in the soybean futures test were the
least of all vendors. The average error size was less than half that of the second place
firm, and an insignificant fraction of most of the other firms. In the S&P 500
analysis, CSIs error rate of 9 tied for the lowest with one other vendor. Data
sources were varied and sometimes overlapping, but CSIs record of minimal errors
probably has much more to do with procedure, pride, commitment, diligence, and customer
participation than source. It is very rare for an error to get past the many data
scrubbers on the CSI staff.
On Data Presentation
This was briefly noted in the Sheldon Knight study, but it
deserves additional comment. Data presentation refers to the handling of after-the-close
settlements that can result in exchanges quoting settlement prices that are outside the
days trading range (above the high or below the low). It is common (but not necessarily
correct) for summary day-end data vending firms to expand the high-low range to
accommodate the assigned settlement price, even though settlement prices do not
necessarily represent prices where actual trading took place. CSI delivers actual trading
statistics to customers and gives the option of presenting data 1) in actual form, based
on exchange statistics, 2) with highs and lows expanded to include the settlement, or 3)
with the settlement price modified so that it lies within the actual highs and lows.
According to the article, only CSI has recorded the historical statistics on all markets
so that they can be presented in any one of these ways. It is clear that CSIs
competitors have forever lost the ability to present an unaltered historical record.
On Analytical Validity
The Futures study clearly demonstrates that technical
analysis requires accurate data. In the study, S&P 500 data from CSI, TradeStation
Technologies, Inc (formerly Omega Research, Inc),
and Bridge were used on the same simple breakout system with strikingly different results.
The profit scenario varied from 20% to much more than 100% over the full period of study.
This should offer substantial proof that the derived effects of a flawed database can lead
to a useless result and a wasted effort because parameter settings determined from flawed
data cannot be expected to work with the same efficiency in the market on which they will
be applied. Unfair Advantage's software and database are designed so that every user is
equipped with exactly the same data set at all times, forcing any common analytical tool
that is derived from past information to produce equivalent results on different machines.
Building a trading model based upon flawed past data is
certain to degrade system effectiveness into the future. This truth, learned decades ago
by CSIs founder Bob Pelletier, is the driving force behind CSIs policies.
Before CSI was incorporated in 1970, studies done by Pelletier, a General Electric
mathematician at that time, were inevitably tripped up by some obscure error that
dominated parameter settings and falsely influenced the outcome of simulation exercises by
forcing undeserved profits from the flawed data. It may seem that a small error here or
there would not be important, but that was not the result in the work. Experiences like
this made it abundantly clear that errors must be forbidden if any fruitful benefit was to
be derived from hindsight testing.
Several of the data vendors included in the study are either
allied with or directly tied to very expensive analysis programs, but they are not
necessarily the required data sources. Although CSI is explicitly excluded from the data
download screens and menus of most of those programs, discriminating users of the
industry's most powerful software tools still come to CSI for data. They know that it is
pure folly to accept the suggestion that an average data firm can deliver the accuracy
needed to create an exceptional trading system. Now that the importance of data accuracy
has been revealed, perhaps even more traders will come directly to CSI, whether or not
their software producer steers them in that direction. Software companies with whom CSI
data products are compatible include: SuperCharts
or TradeStation and many others.
Putting it All Together
It should be mentioned that the errors measured in the
Sheldon Knight study were discovered in hindsight, based upon each company's one-time
historical submission of their global data reserves. An even more telling result might
have emerged had the study been conducted on an ongoing basis by observing each
contributors performance one day at a time over an extended period. With an ongoing study,
the reader could have a better understanding of each company's performance when it means
the most: immediately after each day's database update is posted. This way, a firm's
timing of delivery on all stock and world futures markets, diligence in avoiding
omissions, and ability to stay on top of information gathering in spite of unpredictable
obstacles could be studied.
Many factors contributed to CSIs impressive performance
reported in the Futures article, and most of them might be dismissed as insignificant
details. Back-up electrical power, multiple information sources, a large experienced staff
competent in applying checks and balances, and rewards for diligent customers reporting
questionable data items are a few of the details CSI attends to each day. They seem to
make the difference.
|