General Info

Teachers

Teaching Assistants

Lectures

  • Tuesdays 18:00-19:45.
  • Exercise sessions

  • Tuesdays 20:00-22:00.
  • Location and TA coverage

    Materials

    Most topics will be based on the book "Mining of Massive Datasets" (Third edition) - see book homepage. However, further additional materials will be used. Observe that there is a huge community utilizing algorithms and further developing the tools we are considering within this course, so online search can also bring you valuable information and guidance.

    Videocast

  • Videocasts
  • Project work

    See information (including important dates) about the project in this PDF file.

    Weekplan

    Week Topics Slides Exercises Materials
    W1 / Aug 30
  • Introductory lecture
  • What is Data Mining?
  • Bonferroni's Principle
  • Tf.idf measure
  • Hash functions
  • Blackboard talk (see video)

    Exercise sheet

    Ch. 1 of MMDS;
    W2 / Sep 6
  • No lecture
  • Python recap #1
  • Setting up Python and Jupyter Notebook on your local environment
  • Working on tutorial tasks
  • Exercise sheet
    W3 / Sep 13
  • No lecture
  • Python recap #2
  • Tutorial tasks for NumPy, SciPy and Numba packages
  • Tutorial tasks for Pandas package
  • Exercise sheet
    W4 / Sep 20
  • MapReduce
  • Distributed File Systems
  • Cluster Computing
  • Slides Exercise sheet Ch. 2 of MMDS
    Test Files
    W5 / Sep 27
  • Similar Items
  • Minhashing
  • Locality Sensitive Hashing
  • Slides Exercise sheet Ch. 3 of MMDS
    Data and Template
    W6 / Oct 04
  • Frequent itemsets
  • Market-Basket Model
  • Association Rules
  • A-Priori Algorithm
  • PCY Algorithm (+ refinements)
  • Slides Exercise sheet Ch. 6 of MMDS
    Exercise data
    W7 / Oct 11
  • Clustering
  • Hierarchichal algorithms
  • Point assignment algorithms (k-means algorithm)
  • DBSCAN algorithm
  • CURE algorithm
  • Evaluating (Davies-Bouldin index)
  • Slides Exercise sheet Ch. 7 of MMDS
    Exercise data
    Holidays Holiday week
    W8 / Oct 25
  • Mining Social-Network Graphs
  • Betweenness centrality
  • Girvan-Newman algorithm
  • Modularity
  • Spectral clustering
  • Slides Exercise sheet Ch. 10 of MMDS
    W9 / Nov 01
  • Project phase
  • No lecture
  • W10 / Nov 08
  • Project phase
  • No lecture
  • W11 / Nov 15
  • Project phase
  • No lecture
  • W12 / Nov 22
  • Project phase
  • No lecture
  • W13 / Nov 29
  • Project phase
  • No lecture
  • Exam Exam period