Data Preparation for Data Mining,
Edition 1Editors: By Dorian Pyle
Ways Of Reading
-
This e-publication is accessible to the full extent that the file format and types of content allow, on a specific reading device, by default, without necessarily including any additions such as textual descriptions of images or enhanced navigation.
Navigation
-
The contents of the PDF have been tagged to permit access by assistive technologies as per PDF-UA-1 standard.
-
Page breaks included from the original print source
Additional Accessibility Information
-
All (or substantially all) textual matter is arranged in a single logical reading order (including text that is visually presented as separate from the main text flow, e.g., in boxouts, captions, tables, footnotes, endnotes, citations, etc.). Non-textual content is also linked from within this logical reading order. (Purely decorative non-text content can be ignored).
-
The language of the text has been specified (e.g., via the HTML or XML lang attribute) to optimise text-to-speech (and other alternative renderings), both at the whole document level and, where appropriate, for individual words, phrases or passages in a different language.
Conformance
-
The publication was certified on 20250728
-
Accessibility addendum
-
For detailed accessibility information, see Elsevier’s website at https://www.elsevier.com/about/accessibility
-
For queries regarding accessibility information, contact [email protected]
Note
-
This product relies on 3rd party tooling which may impact the accessibility features visible in inspection copies. All accessibility features mentioned would be present in the purchased version of the title.
Description
Data Preparation for Data Mining addresses an issue unfortunately ignored by most authorities on data mining: data preparation. Thanks largely to its perceived difficulty, data preparation has traditionally taken a backseat to the more alluring question of how best to extract meaningful knowledge. But without adequate preparation of your data, the return on the resources invested in mining is certain to be disappointing.
Dorian Pyle corrects this imbalance. A twenty-five-year veteran of what has become the data mining industry, Pyle shares his own successful data preparation methodology, offering both a conceptual overview for managers and complete technical details for IT professionals. Apply his techniques and watch your mining efforts pay off-in the form of improved performance, reduced distortion, and more valuable results.
On the enclosed CD-ROM, you'll find a suite of programs as C source code and compiled into a command-line-driven toolkit. This code illustrates how the author's techniques can be applied to arrive at an automated preparation solution that works for you. Also included are demonstration versions of three commercial products that help with data preparation, along with sample data with which you can practice and experiment.
Key Features
* Offers in-depth coverage of an essential but largely ignored subject.* Goes far beyond theory, leading you-step by step-through the author's own data preparation techniques.* Provides practical illustrations of the author's methodology using realistic sample data sets.* Includes algorithms you can apply directly to your own project, along with instructions for understanding when automation is possible and when greater intervention is required.* Explains how to identify and correct data problems that may be present in your application.* Prepares miners, helping them head into preparation with a better understanding of data sets and their limitations.About the author
By Dorian Pyle, Chief Scientist and Founder of PTI, Leominster, MA, USA