The amount of data produced by observational and theoretical effort in astronomy and astrophysics today is very large. Thus it is important to be able to organise and access these data very efficiently.
This course aims to give you a solid basic understanding of databases in astronomy, experience in accessing these and applying them to astronomical research, and introduce the statistical tools of astronomical computing.
Throughout the course a number of practical exercises will be given and a key aim is to enable the student to reach a level where she/he can apply the tools to their own research problems. At the end of the course the student should be able to plan the optimal data processing during her/his research, to select necessary tools, to use Virtual Observatory for the data mining and publishing her/his own results.
After having taken this course you should be able to efficiently search and access astronomical databases, know the basics of how to access databases both through on-line interfaces and programatically. You will also be able to apply up-to-date statistical techniques to your data and use these to mine large datasets for scientifically interesting information.
The course will be given by Andrey Belikov, Reynier Peletier, Gijs Verdoes Kleijn and Edwin Valentijn.
The lectures and practical classes will take place in PC room.
Monday | 13:30-15:30 | Lecture | PC room (from 4th October) |
Tuesday | 9:00-11:00 | Lecture/Werkcollege | PC room |
Monday | 15:00-17:00 | Werkcollege | PC room |
From September 30th software you need for this course (MySQL and R) can be found in /Users/users/belikov/local/bin. Please, add this path to your PATH (setenv PATH /Users/users/belikov/local/bin:$PATH). Please, note that you have to remove your old MySQL python libraries, take a new one add_python_libs.tgz and put in the same directory. See instructions in Werkcollege 1.
As well you can download and install necessary software youself:
MySQL | http://www.mysql.com/downloads/mysql/ | select your platform or compile from a source |
R | http://cran-mirror.cs.uu.nl | from Dutch mirror in Utrecht |
dia | http://live.gnome.org/Dia/Download |
The evaluation (grading) of the course will be done on the basis of a written project to be handed in towards the end of the semester. More details regarding this will be given in the lectures and on this web-site when available.
Each week there will be 6 hr (2 hr lectures, 2 hr lecture combined with practical work and 2 hr practical work). Some lectures/practical work will include tasks to be completed during home work.
There is not a single book for this course, rather a number of books are useful as reference for various aspects of the course. Individual lectures might also have suggested literature, so follow the lecture link below to see this.
The following are books that are useful for the topics of the course in general:
This book has a close relationship with the Weka data mining software.
New Trends in Data Warehousing and Data Analysis, Series: Annals of Information Systems , Vol. 3 , Kozielski, Stanislaw; Wrembel, Robert (Eds.) , Springer, 2009, ISBN: 978-0-387-87430-2
This book is more advanced
Michael J. A. Berry & Gordon S. Linoff, "Data mining techniques", Wiley 2004, ISBN-13: 978-0-471-47064-9.
This book has a decidedly clear turn towards the business world and so offers a different view of the field.
Christopher M. Bishop, "Pattern recognition and machine learning", Springer 2006, ISBN-13: 978-0387-31073-2.
This book is focused on the methods used in data mining and is not strongly directed to particular applications.
This course is based on the Interacademia course of lectures held in Spring 2010: Modern Data Mining in Astronomy
and course of lectures Virtual Observations held in Groningen:
2007: Virtual Observations
2008: Virtual Observatories
Please, select examination task (3 tasks, in the order you prefer) and send the task numbers to Andrey Belikov
Tasks already taken:
Name | Task number | Object |
---|---|---|
Daniel Siepman | 2 | Pleiades |
Boudewijn Hut | 2 | Hyades |
Sander Bus | 4 | |
Nancy Irisarri | 1 | |
Caroline Van Borm | 3 | NGC 3324 |
Mark Erends | 3 | NGC 6231 |
The main responsible for the lecture is indicated in square brackets. The detailed plan of the individual lectures is still somewhat preliminary but the overall content of the course is set. Click the lecture title to access the page for that lecture which will also contain links, literature suggestions and the lecture slides after the lectures have been given.
Data processing in Astronomy. Main steps in the developing of data processing. Scope of the field (optical/infrared astronomy, radioastronomy, added-value study of catalogs etc.) Ongoing and incoming astronomical missions. Data volumes. Complexity of the data processing in astronomy.
Databases: relational and object-oriented. Object-oriented approach to programming. Relational algebra. SQL: basic datatypes, operations. Joins in SQL. Embedded SQL.
Start and stop mysql server, create database, ingest data, simple operations, data processing with SQL
Subject of data mining. Components of data mining. Specific problems of data mining in Astronomy. Use cases for data mining in Astronomy.
ER-diagrams. SADT. UML. Object-oriented programming. Python.
Practical use of ER diagrams and UML data modeling. From ER and UML diagrams to database scheme and tables.
Archives and catalogs in Astronomy. Raw data archives, processed data archives. Classification of archives and catalogs.
Data archives and catalogs. CDS, ADS and other data centers.
Classification of data archives/catalogs. SDSS CasJobs.
Histograms. Kernel smoothing. Bayesian statistics.
Corellation. PCA. Methods of classification. Neural networks.
Practical use of statistics with SQL, R and Python.
Data processing pipelines. Main components of astronomical data processing. Data reduction.
Source extraction. Data formats. FITS.
Sextractor. FITS. Astrometry and photometry for sextracted catalogs.
Concept of Information System. Main components. Special features of IS in science and astronomy in particulary. Introduction to Astro-WISE.
Astro-WISE: data storage, data processing, Grid striucture. Implementation of pipelines. Data processing chain. Integration with external data sources. Data publishing.
Practical use of Astro-WISE.
VO concept and history. XML and VOTable. VO interfaces. UCD and utype.
VO tools.
VO use cases: search for supernova and brown dwarfs.
Grid and cloud computing. EGEE, EGI.
Use of Grid infrastructure in data processing and data storage.
Practical use of Grid computing. Submitting jobs to Grid.