General Info

Teachers

Teaching Assistants

Lectures

  • Tuesdays 18:00-19:45.
  • Exercise sessions

  • Tuesdays 20:00-22:00.
  • Office hour (Karl Heuer)

  • Tuesdays 13:00-14:00 (see DTU Learn announcements for exceptions)
  • Building 322, room 008
  • Location and TA coverage

    Materials

    Most topics will be based on the book "Mining of Massive Datasets" (Third edition) - see book homepage. However, further additional materials will be used. Observe that there is a huge community utilizing algorithms and further developing the tools we are considering within this course, so online search can also bring you valuable information and guidance.

    Videocast

  • Videocasts
  • Project work

    See information (including important dates) about the project in this PDF file.

    Weekplan

    THIS SCHEDULE IS TENTATIVE AND SUBJECT TO CHANGE

    Week Topics Slides Exercises Materials
    W1 / Aug 29
  • Introductory lecture
  • What is Data Mining?
  • Bonferroni's Principle
  • Tf.idf measure
  • Hash functions
  • Slides

  • Exercises: 1.2.1, 1.2.2, 1.3.1, 1.3.2 and 1.3.3
  • Ch. 1 of MMDS;
    W2 / Sep 5
  • No lecture
  • Python recap
  • Setting up Python and Jupyter Notebook on your local environment
  • Working on tutorial tasks
  • Tutorial tasks for NumPy, SciPy and Numba packages
  • Tutorial tasks for Pandas package
  • Exercise sheet
    W3 / Sep 12
  • MapReduce
  • Distributed File Systems
  • Cluster Computing
  • Slides Exercise sheet Ch. 2 of MMDS
    Test Files
    W4 / Sep 19
  • Similar Items
  • Minhashing
  • Locality Sensitive Hashing
  • Slides Exercise sheet
    Solutions
    Ch. 3 of MMDS
    Data and Template
    W5 / Sep 26
  • Frequent itemsets
  • Market-Basket Model
  • Association Rules
  • A-Priori Algorithm
  • PCY Algorithm (+ refinements)
  • Slides Exercise sheet
    Solutions
    Ch. 6 of MMDS
    W6 / Oct 03
  • Clustering
  • Hierarchichal algorithms
  • Point assignment algorithms (k-means algorithm)
  • DBSCAN algorithm
  • CURE algorithm
  • Evaluating (e.g. Davies-Bouldin index)
  • Slides Exercise sheet
    Solutions
    Ch. 7 of MMDS
    W7 / Oct 10
  • Mining Social-Network Graphs
  • Betweenness centrality
  • Girvan-Newman algorithm
  • Modularity
  • Spectral clustering
  • Slides Exercise sheet
    Solutions
    Ch. 10 of MMDS
    Survey on Spectral Clustering
    Holidays Holiday week
    W8 / Oct 24
  • Guest Lecture
  • Slides
    W9 / Oct 31
  • Project Work
  • W10 / Nov 07
  • Project Work
  • W11 / Nov 14
  • Project Work
  • W12 / Nov 21
  • Project Work
  • W13 / Nov 28
  • Project Work
  • Exam Exam period