This HTML automatically generated with rman for NEMO
Table of Contents

Name

table_open, table_row, table_md2, table_close - table manipulation routines

get_atable, get_ftable, get_line, parse, strinsert - legacy table

routines

Synopsis

Note the API below has not been implemented for functions with a preceding dash!


#include <table.h>#include <mdarray.h>#include <extstring.h>table  *table_open(stream
instr, int mode);table  *table_open1(stream instr, int mode, int nlines);void
   table_close(tableptr tptr);size_t  table_nrows(tableptr tprt);size_t
 table_ncols(tableptr tprt);string  table_row(tableptr tptr, int row);
string
table_line(table *t);ssize_t table_line1(table *t, char **line, size_t
*linelen, int newline);
mdarray2 table_md2rc(table *t);mdarray2 table_md2cr(table
*t);- string *table_comments(table *t);void table_reset(table *t);void table_close(table
*t);
- void table_set_valid_rows(int nrows, int *rows)- void table_set_valid_cols(int
ncols, int *cols)
- int table_next_row(table *t)   - int table_next_rows(table
*t)  - int table_next_rowi(table *t)- int table_next_rowr(table *t)
- string
table_cols(table *t, int col)- int  table_coli(table *t, int col)- real table_colr(table
*t, int col)
- string *table_colsp(table *t, int col) - int *table_colip(table
*t, int col)- real *table_colrp(table *t, int col)
string table_row(table
*t, int row)string *table_rowsp(table *t, int row)- int *table_rowip(table
*t, int row)- real *table_rowrp(table *t, int row)
- void table_set_ncols(int
ncols)
table *table_cat(int ntable, table *tptr, int mode)
Legacy: (some
of these might be deprecated in future)
int get_atable(strean instr,int
ncol,int *colnr,real *coldat,int ndat)
int get_ftable(stream instr,int ncol,int
*colpos,string *colfmt,real *coldat,int ndat) 
int get_line (string instr,
char *line) // deprecated now
int parse(int linenr, char *line, double *dat,
int ndat)
int strinsert(char *a, char *b, int n)
int iscomment(char *line)
void
sanitize(char *line)

Using Tables

Tables can be arbitrarely large, but like top level of NEMO structured files, do need to fit in memory. There can be some exceptions to this, and certain applications will stream data that do not fit in memory.

The particular table dialect should be detected automatically in files, and number of columns and rows can be arbitrarely high. tabgen(1NEMO) has methods to make really large tables in both "column" and "row" space for testing purposes.

When a table comes from a pipe, obviously it cannot seek, however using the new table I/O the full table is read into memory, initially using a linked list, and can then be addressed line by line in random row fashion.

An exception could be made where the table can be read in blocks of say 100 lines or so?

Comment lines: there are too many conventions for this, but we need to cover them all. A line starting with

#!;/\|C
have all been used for comments. In addition comments have been used at the end of a line, e.g.
1 2 3   # this is a comment
but see table(5NEMO) for the official rules.

Description

This new table interface was introduced in Summer 2020 to create a more uniform and scaleable table ASCII system. This will have no limit on number of columns, line length or number or rows, perhaps at a slight cost of performance. It should also detect automatically if tables have Space Separated Values (SSV), Comma Separated Values (CSV) or Tab Separated Values (TSV).

table_open opens a file for reading. The returned table* pointer is used in all subsequent table_ routines. The mode controls how many lines from the table are allowed in internal buffers. A value of 0 means the whole table will be read in memory, a value of 1 will read the table line by line, controlled by the user (see table_line below). Performance will be better (?) if tables are read line by line (mode=1), or at least not occupy memory for the whole table. Values larger than 1 are planned to hold small buffers of rows. Normally a table will be split in a "header" (comment lines) and "data" (rows of data), but with special mode=-1 all lines are treated equal and can be obtained via table_row().

table_open1 is kept for compatibility with older softwhere where the maximum number of lines nlines is given. It normally is only needed when the input file is a pipe and the whole file needs to be read, which is now supported.

table_line will read the next line from the table stream. If the file had been opened in mode=0 all lines have been read, and table_line would return NULL. Note that the returned string is 0-terminated, not newline terminated as getline(3) would do.

table_line1 will read the next line from the table stream using getline(3) , but depending on the setting of newline, the string can still contain a newline character. To best mimic the behavior of getline(3) , newline=1 needs to be set.

table_md2cr, table_md2rc are shortcut functions to convert an ascii table immediately into a two dimensional mdarray(3NEMO) data, for the [col][row] or [row][col] notation resp. With

Any comment lines at the start of the file will saved in a special comment set of lines, which can be extracted with table_comments. Finally table_close access to the table can be closed and any associated memory will be freed. In addition table_reset can be used to reset array access (more on that later), in the case it needs to be re-read. For arrays that are processed in streaming mode (e.g. filename="-") this will result in an error.

Once a table has been fully read into memory, table_nrows returns the number of (data, i.e. non-comment) rows (assuming non-streaming), and table_ncols the number of columns. By using table_set_valid_rows and/or table_set_valid_cols rows and/or columns can be selected for conversion, and this will also define the new value for nrows and ncols. When table_reset is called, these values are reset to their original value.

If the table is parsed line by line, some routines will not be accessible, since the table is not in memory.

Using table_next_row a new line can be read. This will return -1 upon end of file, 0 when the line is blank or contains no data, though could contain comments (e.g. lines with # ! or ;), and 1 when a line was read. No parsing will be done. If parsing is done, the line will be tokenized in identical types (string, int or real), with resp. table_next_rows , table_next_rowi, or table_next_rowr. The last line is always stored internally, and a pointer to the string can be retrieved with table_line for more refined user parsing.

Depending on with which of the three types the line was parsed, column elements can be retrieved with table_cols, table_coli, or table_colr. and if the whole table was available in memory, columns can also be retrieved in full via table_colsp, table_colip, or table_colrp

The currently parsed row can in full be retrieved with (again, depending on type) table_rowsp, table_rowip, or table_rowrp where the row number is ignored if the table is parsed row by row.

Possible future routines are table_set_ncols to cover the case where a row can span multiple lines. By default each line is a row in the table.

Given a number of tables, the table_cat function will catenate them. mode=0 will catenate them vertically, i.e. append the rows, keeping the same number of columns, whereas mode=1 will catenate them horizontally, keeping the number of rows, but increasing the number of columns. These are similar to the unix programs cat(1) and paste(1) resp. It is currently considered an error if the tables are not conformant in size.

The original legacy table routines remain available, though they should implement the new API, as it better deals with tables of unknown size in a pipe.

Both get_atable and get_ftable parse an ascii table, pointed by the instr stream, into ncol columns and up to ndat rows of real numbers into memory. The input table may contain comment lines, as well as columns which are not numbers. Badly parsed lines are simply skipped. Other common parameters to both routines are coldat, ncol and ndat: coldat is an array of ncol pointers to previously allocated data, each of them ndat real elements. The number of valid rows read is then returned. If this number is negative, it means more data is available, but could not be read because ndat was exhausted. Upon the next call ndat must be set to negative, to recover the last line read on the previous call, and continue reading the table without missing a line. CAVEAT: this only works if instr has not changed.

get_atable parses the table in free format. colnr an array of length ncol of the column numbers to read (1 being the first column), If any of the colnr is 0, it is interpreted as referring to the line number in the original input file (including/excluding comment and empty lines), 1 being the first line, and the corresponding entry in coldat is set as such. Columns are separated by whitespace or commas.

get_ftable parses the table in fixed format. colpos is an array with positions in the rows to start reading (1 being the first position), colfmt an array of pointers to the format string used to parse a real number (note real normally requires %lf). If any of the colpos is 0, it is interpreted as referring to the line number in the original input file (including comment lines), 1 being the first line, and the corresponding entry in coldat is set as such.

The get_line(3) gets the next line from a stream instr, stored into line. It returns the length of the string read, 0 if end of file. This routine is deprecated, the standard getline(3) should be used.

parse parses the character string in line into the double array dat, which has at most ndat entries. Parsing means that %n refers to column n in the character string (n must be larger than 0. Also %0 may be referenced, meaning the current line number, to be entered in the argument linenr.

strinsert inserts the string b into a, replacing n characters of a.

iscomment returns 1 if the line appears to be a comment (starts with ’;’, ’#’, ’!’ or a blank/newline)

sanitize converts any possible line originating from a DOS (CR/LF) or MAC (CR) or unix (LF) and 0-terminate it.

Examples

Some examples drafted, based on the new API presented.

An example reading in a full table into a two dimensional mdarray2, and adding 1 to each element:

    table *t = table_open(filename, 0);
    mdarray2 d2 = table_md2rc(t,0,0,0,0);
    table_close(t);
    ncols = table_ncols(t);
    nrows = table_nrows(t);
    
    for (int i=0; i<nrows; i++) 
      for (int j=0; j<ncols; j++)
        d2[i][j] += 1.0;             // d2[row][col]
Here is an example of reading the table line by line, without any parsing, but removing comment lines. This can be done in line by line streaming mode, not allocating space for the whole table, for which mode=1 is needed:
    table *t = table_open(filename, 1);
    int nrows = 0;
    string s;
    
    while ( (s=table_line(t)) ) {
    if (iscomment(s)) continue;
        nrows++    
        printf("%s\n", s);  
    }
    
    table_close(t);
    dprintf(0,"Read %d lines\n",nrows);
    
Dealing (and preserving) comments while reading in the whole table:
    table *t = table_open1(filename, 0, 0);
    int nrows = table_nrows(t);
    int ncols = table_ncols(t);  // this triggers a column counter
    
    string *sp = table_comments(t);   // not implemented yet
    while (*sp)
      printf("%s0,*sp++);
    for (int j=0; j<nrows; j++)
    real *rp = table_rowrp(t, j);
    for (int i=0; i<ncols; i++)
        printf("%g ",rp[i]);
    printf("0);
    table_close(t);

Xsv

An interesting package to deal with tables is the XSV program


cat AAPL.csv | xsv table | head -2
cat AAPL.csv | xsv slice -i 1 | xsv table
cat AAPL.csv | xsv slice -i 1 | xsv flatten
cat AAPL.csv | xsv count

Performance

Anecdotally comparing the table I/O routines with python can be found in $NEMO/scripts/csh/tabstat.py, which seems to indicate the C code is about 4 times faster than numpy.

Diagnostics

Low-level catastrophies (eg, bad filenames, parsing errors, wrong delimiters) generate messages via error(3NEMO) .

See Also

mdarray(3NEMO) , nemoinp(3NEMO) , burststring(3NEMO) , fits(5NEMO) , table(5NEMO) , ascii(7)


https://github.com/BurntSushi/xsv
https://heasarc.gsfc.nasa.gov/docs/software/fitsio/c/c_user/cfitsio.html
https://www.gnu.org/software/gnuastro/manual/html_node/Tables.html
XSV:  https://github.com/BurntSushi/xsv

Files


src/kernel/tab      table.c 

Author

Peter Teuben

Update History


xx-sep-88    V1.0 written    PJT
6-aug-92    documented get_Xtable functions      PJT
1-sep-95    added iscomment()    PJT
12-jul-03    fixed reading large table bufferrng    PJT
aug-2020    designing new table system    Sathvik/PJT
5-may-2022    finalizing implementation of table2    PJT/Parker/Yuzhu
31-dec-2022    add sanitize() to 0-terminate any style text    PJT


Table of Contents