Textual information has remained largely outside the domain of quantitative modeling, having long been considered the domain of judgment. This is now changing as financial firms begin to tackle the problem of what is commonly called information overload; advances in computer technology are again behind the change.
Reuters publishes the equivalent of three bibles of (mostly financial) news daily; it is estimated that five new research documents come out of Wall Street every minute; asset managers at medium-sized firms report receiving up to 1,000 e-mails daily and work with as many as five screens on their desk. Conversely, there is also a lack of “digested” information. It has been estimated that only one third of the roughly 10,000 U.S. public companies are covered by meaningful Wall Street research; there are thousands of companies quoted on the U.S. exchanges with no Wall Street research at all. It is unlikely the situation is better relative to the tens of thousands of firms quoted on other exchanges throughout the world. Yet increasingly companies are providing information, including press releases and financial results, on their Web sites, adding to the more than 3.3 billion pages on the World Wide Web as of mid-2003.
Such unstructured (textual) information is progressively being transformed into self-describing, semistructured information that can be automatically categorized and searched by computers. A number of developments are making this possible. These include:
The development of XML (eXtensible Markup Language) standards for tagging textual data. This is taking us from free text search to queries on semi-structured data.
The development of RDF (Resource Description Framework) standards for appending metadata. This provides a description of the content of documents.
The development of algorithms and software that generate taxonomies and perform automatic categorization and indexation.
The development of database query functions with a high level of expressive power.
The development of high-level text mining functionality that allows “discovery.”
The emergence of standards for the handling of “meaning” is a major development. It implies that unstructured textual information, which some estimates put at 80% of all content stored in computers, will be largely replaced by semistructured information ready for machine handling at a semantic level. Today’s standard structured data- bases store data in a prespecified format so that the position of all elementary information is known. For example, in a trading transaction, the date, the amount exchanged, the names of the stocks traded and so on are all stored in predefined fields. However, textual data such as news or research reports, do not allow such a strict structuring. To enable the computer to handle such information, a descriptive metafile is appended to each unstructured file. The descriptive metafile is a structured file that contains the description of the key information stored in the unstructured data. The result is a semistructured database made up of unstructured data plus descriptive metafiles. Industry-specific and application-specific standards are being developed around the general-purpose XML. At the time of this writing, there are numerous initiatives established with the objective of defining XML standards for applications in finance, from time series to analyst and corporate reports and news. While it is not yet clear which of the competing efforts will emerge as the de facto standards, attempts are now being made to coordinate standardization efforts, eventually adopting the ISO 15022 central data repository as an integration point.
Technology for handling unstructured data has already made its way into the industry. Factiva, a Dow Jones-Reuters company, uses commercially available text mining software to automatically code and categorize more than 400,000 news items daily, in real time (prior to adopting the software, they manually coded and categorized some 50,000 news articles daily). Users can search the Factiva database which covers 118 countries and includes some 8,000 publications, and more than 30,000 company reports with simple intuitive queries expressed in a language close to the natural language. Suppliers such as Multex use text mining technology in their Web-based research portals for clients on the buy and sell sides. Such services typically offer classification, indexation, tagging, filtering, navigation, and search.
These technologies are helping to organize research flows. They allow to automatically aggregate, sort, and simplify information and provide the tools to compare and analyze the information. In serving to pull together material from myriad sources, these technologies will not only form the basis of an internal knowledge management system but allow to better structure the whole investment management process. Ultimately, the goal is to integrate data and text mining in applications such as fundamental research and event analysis, linking news, and financial time series.
-
Financial trading has evolved very much with the development of broadband internet and has today developed to offer most financial products under one roof with the use of one trading terminal also called the Metatrader forex platform. Experience it today for free at a leading forex broker.