SQL Server Data Mining contains seven world-class data mining algorithms covering the following areas:
- Microsoft Naïve Bayes: Naïve Bayes is a simple and efficient algorithm for classification. It analyzes pairwise relationships between each input attribute and the predictable one.
- Microsoft Decision Trees: This is a state-of-the art decision tree algorithm, including both classicization and regression. It can also build multiple trees in a single model to performance association analysis. This algorithm is also packaged as Microsoft Linear Regression for those who want to do simple linear regression analysis.
- Microsoft Time Series : Time Series is a unique forecasting algorithm based on the AutoRegression Tree technique.
- Microsoft Clustering : This algorithm includes two different clustering techniques: EM (expectation and maximization) and Kmeans. It automatically detects the number of natural clusters in the dataset.
- Microsoft Sequence Clustering : This algorithm is a hybrid between clustering and sequence techniques. It can group similar cases together based on normal attributes as well as sequence attributes.
- Microsoft Association Rules : This is the most frequently requested algorithm since the SQL Server 2000 data mining release. It provides a powerful correlation counting engine. It can perform scalable and efficient market basket analysis.
- Microsoft Neural Network : This algorithm can perform deeper analysis and find complicated patterns. It can be used for both classification and regression tasks. This algorithm is also packaged as Microsoft Logistic Regression when the hidden layer is removed.
- Text Mining : With SSIS term extraction and term lookup transforms, unstructured data can be converted to structured format. This enables classification and clustering textual documents with data mining algorithms.
If your needs require you to go beyond the standard capabilities, SQL Server Data Mining is fully extensible through .NET stored procedures and plug-in algorithms that embed seamlessly to take advantage of all the platform abilities and integration.
The lack of model building and tuning tools is another limitation of SQL Server 2000. In SQL Server 2005, data mining tools are largely enhanced. The set of tools available includes model composition, training, browsing, comparison, and prediction query generation.
The Data Mining Wizard is a handy yet powerful tool to help you build data mining models from any data source. It helps you pick the most relevant input columns related to the predictable one. It can also be used to mine OLAP cubes. With a few mouse clicks, you can build a very sophisticated mining model. The Data Mining Editor allows you to tune your models by specifying parameter settings. Additionally, lift and profit charts are provided, so you can compare and contrast the quality of your models before you commit to deployment.
Simple Yet Powerful API
When it comes to applying models, SQL Server opens a new chapter in data mining. The creation of DMX (Data Mining Extensions to SQL) provides a rich SQL language already familiar to the throngs of developers and DBAs already close to their data. Performing complex predictions against data mining algorithms is now reduced to a join in a familiar SQL query. For the first time, those responsible for creating applications and handling data are empowered to leverage data mining technology using tools they already understand.
Integration with Sibling BI technologies
Rarely can a data mining problem be solved with only a data mining tool. SQL Server Data Mining sits among a family of BI technologies that can be leveraged together to enhance and develop this new breed of intelligent applications.
- SSIS Integration: Integration with SQL Server Integration Services injects the power of data mining into your operational data flows.
- OLAP Integration: Integration with OLAP allows you to mine against complex multidimensional calculations and use the results to create self-organizing cubes.
- Reporting Services Integration: Integration with Reporting Services provides a user-friendly presentation layer to display and distribute interactive reports driven by your data mining models.