Department of Computer Science
Courant Institute of Mathematical Sciences



Quick Menu

Main

Class Mailing List

Handouts

Slides

Demos


Data Mining


G22.3033-002 - Spring 2010




References
(partially adapted from a listing by J. Han)


  • Tutorial on R

  • Chapter 1. Introduction

    • A. Silberschatz, M. Stonebraker, and J. D. Ullman. Database research: achievements and opportunities into the 21st century. SIGMOD Record, 25(1):52-63, March 1996.

    • U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. Knowledge discovery and data mining: Towards a unifying framework. Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD'96), Portland, Oregon, pp. 82-88, Aug. 1996.

    • V. Ganti, J. Gehrke, R. Ramakrishnan. Mining very large databases. COMPUTER, 32(8):38-45, 1999.

    • M. S. Chen, J. Han, and P. S. Yu. Data mining: An overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering, 8(6):866-883, 1996.

    • J. Han, Data Mining Techniques. 1996 ACM/SIGMOD Int'l Conf. on Management of Data (SIGMOD'96) (Conference tutorial notes), Montreal, Canada, June 1996. http://db.cs.sfu.ca/sections/publication.html.

    • U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy, Advances in Knowledge Discovery and Data Mining. The MIT Press, 1996.

    • J. Han (ed.). KDD-99 Tutorial Notes. ACM Presss, August 1999.

  • Chapter 2. Data Warehouse and OLAP Technology for Data Mining

    • S. Chaudhuri, and U. Dayal. An overview of data warehousing and OLAP technology. ACM SIGMOD Record, 26(1):65-74, 1997.

    • J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab and sub-totals. Data Mining and Knowledge Discovery, 1(1):29-54, 1997.

    • E. Thomsen. OLAP Solutions: Building Multidimensional Information Systems, John Wiley & Sons, 1997.

    • R. Kimball. The Data Warehouse Toolkit, John Wiley & Sons, New York, 1996.

    • V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. In SIGMOD'96, pp. 205-216, Montreal, Canada, June 1996.

    • S. Agarwal, R. Agrawal, P. M. Deshpande, A. Gupta, J. F. Naughton, R. Ramakrishnan, and S. Sarawagi. On the computation of multidimensional aggregates. In Proc. 1996 Int. Conf. Very Large Data Bases (VLDB'96), pp. 506-521, Bombay, India, Sept. 1996.

    • Y. Zhao, P. M. Deshpande, and J. F. Naughton. An array-based algorithm for simultaneous multidimensional aggregates. In SIGMOD'97, pp. 159-170, Tucson, Arizona, May 1997.

    • R. Agrawal, A. Gupta, and S. Sarawagi. Modeling multidimensional databases. In Proc. 1997 Int. Conf. Data Engineering (ICDE'97), Birmingham, England, April 1997.

    • S. Sarawagi, R. Agrawal, and N. Megiddo. Discovery-driven exploration of OLAP data cubes. In Proc. Int. Conf. of Extending Database Technology (EDBT'98), Valencia, Spain, pp. 168-182, March 1998.

    • K. A. Ross, D. Srivastava, and D. Chatziantoniou. Complex aggregation at multiple granularities. In EDBT'98, pp. 263-277, Valencia, Spain, March 1998.

  • Chapter 3. Data Preprocessing

    • D. Barbará et al. The New Jersey Data Reduction Report. Bulletin of the Technical Committee on Data Engineering, 20, Dec. 1997, pp. 3-45.
    • F. Hussain, H. Liu, C. L. Tan, and M. Dash. Discretization: An enabling techniques. Technical Report, National Univ. of Singapore, 1999.
    • D. Pyle. Data Preparation for Data Mining. Morgan Kaufmann, 1999.

  • Chapter 4. Primitives for Data Mining

    • R. Meo, G. Psaila, and S. Ceri. A new SQL-like operator for mining association rules. In VLDB'96, pp. 122-133, Bombay, India, Sept. 1996.

    • R. Agrawal, M. Mehta, J. Shafer, R. Srikant, A. Arning, and T. Bollinger. The Quest data mining system. In KDD'96, pp. 244-249, Portland, Oregon, August 1996.

    • [ D ] J. Han. Towards on-line analytical mining in large databases. ACM SIGMOD Record, 27:97-107, 1998.

  • Chapter 5. Concept Description: Characterization and Comparison

    • R. S. Michalski. A theory and methodology of inductive learning. Artificial Intelligence, 20:111-118, 1983.

    • J. Han, Y. Cai, and N. Cercone. Data-driven discovery of quantitative rules in relational databases. IEEE Trans. Knowledge and Data Engineering, 5:29-40, 1993.

  • Chapter 6. Mining Association Rules in Large Databases

    • R. Agrawal and R. Srikant.Fast algorithms for mining association rules. In VLDB'94, pp. 487-499, Santiago, Chile, Sept. 1994.

    • J. Han and Y. Fu. Discovery of multiple-level association rules from large databases. In VLDB'95, pp. 420-431, Zürich, Switzerland, Sept. 1995.

    • R. Srikant and R. Agrawal. Mining generalized association rules. In VLDB'95, pp. 407-419, Zürich, Switzerland, Sept. 1995.

    • R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. In SIGMOD'96, pp. 1-12, Montreal, Canada, June 1996.

    • B. Lent, A. Swami, and J. Widom. Clustering association rules. In ICDE'97, pp. 220-231, Birmingham, England, April 1997.

    • S. Brin, R. Motwani, and C. Silverstein. Beyond market basket: Generalizing association rules to correlations. In SIGMOD'97, pp. 265-276, Tucson, Arizona, May 1997.

    • J. Han, L. V. S. Lakshmanan, and R. T. Ng. Constraint-based, multidimensional data mining. COMPUTER, 32(8): 46-50, 1999.

    • R. Ng, L. V. S. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules. In SIGMOD'98, pp. 13-24 Seattle, Washington, June 1998.

  • Chapter 7. Classification and Prediction

    • J. R. Quinlan. Induction of decision trees. Machine Learning, 1:81-106, 1986.

    • S. M. Weiss and N. Indurkhya. Predictive Data Mining. Morgan Kaufmann, 1997.

    • J. Shafer, R. Agrawal, and M. Mehta. SPRINT: A scalable parallel classifier for data mining. In VLDB'96, pp. 544-555, Bombay, India, Sept. 1996.

    • T. M. Mitchell. Machine Learning. McGraw Hill, 1997.

    • J. Gehrke, R. Ramakrishnan, V. Ganti. RainForest: A framework for fast decision tree construction of large datasets. In VLDB'98, pp. 416-427, New York, NY, August 1998.

    • S. K. Murthy. Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining and Knowledge Discovery, 2(4): 345-389, 1998.

  • Chapter 8. Cluster Analysis

    • R. Ng and J. Han. Efficient and effective clustering method for spatial data mining. In VLDB'94, pp. 144-155, Santiago, Chile, Sept. 1994.

    • T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: An efficient data clustering method for very large databases. In SIGMOD'96, pp. 103-114, Montreal, Canada, June 1996.

    • M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases. In KDD'96, pp. 226-231, Portland, Oregon, August 1996.

    • S. Guha, R. Rastogi, and K. Shim. CURE: An efficient clustering algorithm for large databases. In SIGMOD'98, pp. 73-84, Seattle, Washington, June 1998.

    • S. Guha, R. Rastogi, and K. Shim. ROCK: A robust clustering algorithm for categorical attributes. In ICDE'99, pp. 512-521, Sydney, Australia, March 1999.

    • R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In SIGMOD'98, pp. 94-105, Seattle, Washington, June 1998.

    • M. Ankerst, M. Breunig, H.-P. Kriegel, and J. Sander. Optics: Ordering points to identify the clustering structure. In SIGMOD'99, pp. 49-60, Philadelphia, PA, June 1999.

    • L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, 1990.

    • G. Sheikholeslami, S. Chatterjee, and A. Zhang. WaveCluster: A multi-resolution clustering approach for very large spatial databases. In VLDB'98, pp. 428-439, New York, NY, August 1998.

    • G. Karypis, E.-H. Han, and V. Kumar. CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling. COMPUTER, 32(8): 68-75, 1999.

  • Chapter 9. Mining Complex Types of Data

    • K. Koperski and J. Han. Discovery of spatial association rules in geographic information databases. In Proc. 4th Int'l Symp. on Large Spatial Databases (SSD'95), pp. 47-66, Portland, Maine, Aug. 1995.

    • M. Ester, H.-P. Kriegel, and J. Sander. Spatial data mining: A database approach. In SSD'97, pp. 47-66, Berlin, Germany, July 1997.

    • X. Zhou, D. Truffet, and J. Han. Efficient polygon amalgamation methods for spatial OLAP and spatial data mining. In SSD'99, pp. 167-187, Hong Kong, Aug. 1999.

    • R. Agrawal and R. Srikant. Mining sequential patterns. In ICDE'95, pp. 3-14, Taipei, Taiwan, March 1995.

    • R. Agrawal, K.-I. Lin, H.S. Sawhney, and K. Shim. Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In VLDB'95, pp. 490-501, Zurich, Switzerland, Sept. 1995.

    • R. Agrawal, G. Psaila, E. L. Wimmers, and M. Zait. Querying shapes of histories. In VLDB'95, pp. 502-514, Zürich, Switzerland, Sept. 1995.

    • J. Han, G. Dong, and Y. Yin. Efficient mining of partial periodic patterns in time series database. In ICDE'99, pp. 106-115, Sydney, Australia, April 1999.

    • S. Chakrabarti, B. E. Dom, S. R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, D. Gibson, and J. Kleinberg. Mining the Web's link structure. COMPUTER, 32(8):60-67, 1999.

    • J. Kleinberg and A. Tomkins. Application of linear algebra in information retrieval and hypertext analysis. In PODS'99, pp. 185-193, Philadelphia, PA, May 1999.

    • K. Wang, S. Zhou and S. C. Liew. Building hierarchical classifiers using class proximity. In VLDB99, Edinburgh, UK, Sept. 1999.

  • Chapter 10. Data Mining Applications and Trends in Data Mining

    • J. Han, Y. Huang, N. Cercone, and Y. Fu. Intelligent query answering by knowledge discovery techniques. IEEE Trans. Knowledge and Data Engineering, 8:373-390, 1996.

    • C. Clifton and D. Marks. Security and Privacy Implications of Data Mining. In Proc. 1996 SIGMOD'96 Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD'96), Montreal, Canada, pp. 15-20, June 1996.


Jean-Claude Franchitti, <jcf (followed by @, then cs, then a dot, then nyu, then a dot, and then edu)>
Last modified: Sun. Nov. 8 04:31:18 EDT 2009