Virtual Observations and Data Mining in Astronomy

2010

Overview

The amount of data produced by observational and theoretical effort in astronomy and astrophysics today is very large. Thus it is important to be able to organise and access these data very efficiently.

This course aims to give you a solid basic understanding of databases in astronomy, experience in accessing these and applying them to astronomical research, and introduce the statistical tools of astronomical computing.

Throughout the course a number of practical exercises will be given and a key aim is to enable the student to reach a level where she/he can apply the tools to their own research problems. At the end of the course the student should be able to plan the optimal data processing during her/his research, to select necessary tools, to use Virtual Observatory for the data mining and publishing her/his own results.

After having taken this course you should be able to efficiently search and access astronomical databases, know the basics of how to access databases both through on-line interfaces and programatically. You will also be able to apply up-to-date statistical techniques to your data and use these to mine large datasets for scientifically interesting information.

Practical details

The course will be given by Andrey Belikov, Reynier Peletier, Gijs Verdoes Kleijn and Edwin Valentijn.

The lectures and practical classes will take place in PC room.

Monday13:30-15:30LecturePC room (from 4th October)
Tuesday9:00-11:00Lecture/WerkcollegePC room
Monday15:00-17:00WerkcollegePC room

Software

From September 30th software you need for this course (MySQL and R) can be found in /Users/users/belikov/local/bin. Please, add this path to your PATH (setenv PATH /Users/users/belikov/local/bin:$PATH). Please, note that you have to remove your old MySQL python libraries, take a new one add_python_libs.tgz and put in the same directory. See instructions in Werkcollege 1.

As well you can download and install necessary software youself:

MySQLhttp://www.mysql.com/downloads/mysql/select your platform or compile from a source
Rhttp://cran-mirror.cs.uu.nlfrom Dutch mirror in Utrecht
diahttp://live.gnome.org/Dia/Download

Structure, evaluation and time-budget

The evaluation (grading) of the course will be done on the basis of a written project to be handed in towards the end of the semester. More details regarding this will be given in the lectures and on this web-site when available.

Each week there will be 6 hr (2 hr lectures, 2 hr lecture combined with practical work and 2 hr practical work). Some lectures/practical work will include tasks to be completed during home work.

Some suggested literature for the course

There is not a single book for this course, rather a number of books are useful as reference for various aspects of the course. Individual lectures might also have suggested literature, so follow the lecture link below to see this.

The following are books that are useful for the topics of the course in general:

Other material to lectures

This course is based on the Interacademia course of lectures held in Spring 2010: Modern Data Mining in Astronomy

and course of lectures Virtual Observations held in Groningen:

2007: Virtual Observations

2008: Virtual Observatories

Examination tasks

Please, select examination task (3 tasks, in the order you prefer) and send the task numbers to Andrey Belikov

Exam tasks

Tasks already taken:

NameTask numberObject
Daniel Siepman2Pleiades
Boudewijn Hut2Hyades
Sander Bus4
Nancy Irisarri1
Caroline Van Borm3NGC 3324
Mark Erends3NGC 6231

Detailed lecture overview

The main responsible for the lecture is indicated in square brackets. The detailed plan of the individual lectures is still somewhat preliminary but the overall content of the course is set. Click the lecture title to access the page for that lecture which will also contain links, literature suggestions and the lecture slides after the lectures have been given.

September 6 - Lecture 1

Introduction to Virtual Observations

Responsible: E. Valentijn

Data processing in Astronomy. Main steps in the developing of data processing. Scope of the field (optical/infrared astronomy, radioastronomy, added-value study of catalogs etc.) Ongoing and incoming astronomical missions. Data volumes. Complexity of the data processing in astronomy.

more info

September 7 - Lecture 2

Introduction to SQL

Responsible: A.Belikov

Databases: relational and object-oriented. Object-oriented approach to programming. Relational algebra. SQL: basic datatypes, operations. Joins in SQL. Embedded SQL.

more info

September 9 - Werkcollege 1

Introduction to SQL and relational DBMS

Responsible: A.Belikov

Start and stop mysql server, create database, ingest data, simple operations, data processing with SQL

more info

September 13 - Lecture 3

Introduction to Data Mining in Astronomy

Responsible: A.Belikov

Subject of data mining. Components of data mining. Specific problems of data mining in Astronomy. Use cases for data mining in Astronomy.

more info

September 14 - Lecture 4

Basic Data Modeling

Responsible: A.Belikov

ER-diagrams. SADT. UML. Object-oriented programming. Python.

more info

September 16 - Werkcollege 2

Data modeling

Responsible: A.Belikov

Practical use of ER diagrams and UML data modeling. From ER and UML diagrams to database scheme and tables.

more info

September 20 - Lecture 5

Data Sources

Responsible: R. Peletier

Archives and catalogs in Astronomy. Raw data archives, processed data archives. Classification of archives and catalogs.

more info

September 21 - Lecture 6

Data Sources (continue)

Responsible: R.Peletier

Data archives and catalogs. CDS, ADS and other data centers.

more info

September 23 - Werkcollege 3

Data Sources

Responsible: A.Belikov/J.Bout

Classification of data archives/catalogs. SDSS CasJobs.

more info

September 27 - Lecture 7

Introduction to Statistics

Responsible: A.Belikov

Histograms. Kernel smoothing. Bayesian statistics.

more info

September 28 - Lecture 8

Introduction to Statistics (continue)

Responsible: A.Belikov

Corellation. PCA. Methods of classification. Neural networks.

more info

September 30 - Werkcollege 4

Introduction to Statistics

Responsible: A.Belikov

Practical use of statistics with SQL, R and Python.

more info

October 4 - Lecture 9

Astronomical Data Processing

Responsible: A.Belikov/G.Verdoes Kleijn

Data processing pipelines. Main components of astronomical data processing. Data reduction.

more info

October 5 - Lecture 10

Astroniomical Data Processing (continue)

Responsible: A.Belikov/G.Verdoes Kleijn

Source extraction. Data formats. FITS.

more info

October 7 - Werkcollege 5

Astronomical Data Processing

Responsible: A.Belikov/G.Verdoes Kleijn

Sextractor. FITS. Astrometry and photometry for sextracted catalogs.

more info

October 11 - Lecture 11

Information Systems in Astronomy

Responsible: E.Valentijn/A.Belikov/G.Verdoes Kleijn

Concept of Information System. Main components. Special features of IS in science and astronomy in particulary. Introduction to Astro-WISE.

more info

October 12 - Lecture 12

Information Systems in Astronomy (continue)

Responsible: E.Valentijn/A.Belikov/G.Verdoes Kleijn

Astro-WISE: data storage, data processing, Grid striucture. Implementation of pipelines. Data processing chain. Integration with external data sources. Data publishing.

more info

October 14 - Werkcollege 6

Astro-WISE

Responsible: A.Belikov/G.Verdoes Kleijn

Practical use of Astro-WISE.

more info

October 18 - Lecture 13

Virtual Observatory

Responsible: A.Belikov

VO concept and history. XML and VOTable. VO interfaces. UCD and utype.

more info

October 19 - Lecture 14

Virtual Observatory (continue)

Responsible: A.Belikov

VO tools.

more info

October 21 - Werkcollege 7

Virtual Observatory

Responsible: A.Belikov/G.Verdoes Kleijn

VO use cases: search for supernova and brown dwarfs.

more info

October 25 - Lecture 15

Modern Data Processing Infrastructure

Responsible: A.Belikov/F.Dijkstra

Grid and cloud computing. EGEE, EGI.

more info

October 26 - Lecture 16

Modern Data Processing Infrastructure (continue)

Responsible: A.Belikov/F.Dijkstra

Use of Grid infrastructure in data processing and data storage.

more info

October 28 - Werkcollege 8

Modern Data Processing Infrastructure

Responsible: A.Belikov/F.Dijkstra

Practical use of Grid computing. Submitting jobs to Grid.

more info