This course discusses basics of the knowledge discovery process, data mining, and provides a basic introduction to data science. The course focuses on two main aspects of knowledge discovery: mathematical tools and a well-defined and structured knowledge discovery process consisting of a number of interactive and iterative steps. The accompanying project (KU) concentrates on programming infrastructure for manipulating large-scale data. The examples that we use in the course are mainly text-mining and recommender examples
In recent years the amount of data that we produce increased dramatically. We already produce more data than we are able to store with the current technological solutions. Therefore, making sense out of these huge amount of data, or extracting useful, valid, understandable, and novel patterns from this data is of cruicial importance. Knowledge discovery, data mining, and data science are one of the approaches to tackle this problem. The other similar, but somewhat different approaches include database technology, machine learning, or statistics.
In this course we will investigate, analyze, and discuss a well-defined process for knowledge discovery in such a large data. Apart from the process we will also discuss the mathematics needed for data mining.
Course topics include:
In this course the students will:
At the end of this course the students will know how to:
To refresh your knowledhe in linear algebra, probability theory and statistics you should work on these problems. This sheet will not be graded!
There will be two partial examinations written within the classes. You will write the exam in the beginning of a lecture for 45 minutes. Each partial examination will have 2 questions with difficulty adjusted to solve both problems in approx. 30 minutes. You can get max 20 points for each question resulting in a total of 80 points. Please note that the partial examinations count as one examinaton attempt!
Apart from the partial examinations there will be a standard written examination at the end of the course. 4 questions with max 20 points for each question. The total number of points that can be reached will be 80.
The grading scheme is as follows: