Michiel Brentjens

Speed up your glish

Your first glish scripts that have to deal with actual data are probably slow as molasses. You probably use constructs similar to

a := large_dataset.getcol('DATA'); for(i in 1:len(a)){ a[i] := a[i]*2; } large_dataset.putcol('DATA', a);

Glish has much faster ways to deal with these types of operations. Just like FORTRAN, it is an array-oriented language that supports full-array operations. So instead of the above example you could use something like

a := large_dataset.getcol('DATA'); a *:= 2; large_dataset.putcol('DATA', a);

This should give you a significant speedup. Using full-array operations as much as possible may even speed up your code a factor 100 to 1000 depending on what you actually are doing. Quite complicated things are possible using full-array operations. Consult the Glish manual for detail

Chances are that you haven't noticed much improvement. In addition to that your harddisk may sound like a perpetual coffee grinder. These are typical symptoms of insufficient RAM memory (usage...). Your machine is swapping data back and forth between RAM and (SLOWWWWW) harddisk. There are two things you may do to prevent this.

  • Increase the value of the system.resources.memory variable in your .glishrc and .aipsrc files to reflect the actual number of MegaBytes of RAM in your machine;
  • Process large table columns in chunks rather than as a whole in order to minimize memory consumption;

An example of method 2 is shown below. Use this paradigm for all operations on large datacolumns:

copy_column := function(ms='', colfrom='', colto='') { t := table(ms, readonly=F); chunksize := 100.0e+6; # Bytes elem_per_row := prod(shape(t.getcell(colfrom,1))); nrows := floor(chunksize/elem_per_row/8/2); totalrows := t.nrows(); start := 1; while(start <= totalrows) { print start,'of', totalrows; if(totalrows - start + 1 < nrows) { nrows := totalrows - start + 1; } dcol := t.getcol(colfrom, start, nrows); t.putcol(colto, dcol, start, nrows); start := start+ nrows; } t.done(); }