The VisAD TextAdapter January, 2001 updated: December, 2004 Contents. Introduction I. Text File Formats II. Line 1 IIa. 2-D arrays IIb. 1,2, or 3-D points III. Line2 IIIa. for 2-D arrays IIIb. for 1,2,or 3-D points IV. Examples Introduction The VisAD TextAdapter is designed to allow you to quickly read in data that are in the form of an ASCII text file. We fully expect this class to continue to grow to accommodate other, common variations of text file formats that might be encountered. Two example files are also contained in the release. It is most convenient to test these using the VisAD Spreadsheet or the Jython (Python) interface. Fire up either the visad.python.JPythonFrame, or simply start it from the command line and use a sequence like: >>> from visad.python.JPythonMethods import * >>> a = load("example1.txt") >>> plot(a) >>> clearplot() >>> b = load("example2.csv") >>> plot(b) >>> clearplot() Or simply load these files into the SpreadSheet, and then experiment with the mappings! I. Text file formats The text files usually consist of 2 header lines and then data. Optional comment lines may be interspersed throughout. The data portion of the file may be either blank-, comma-, semicolon- or tab-separated values. At present only numeric data can be read. The model for these files is spreadsheet (Excel, etc) output -- that is, column-oriented values. Comment lines are any line that starts with either #, !, or %. The file extensions recognized by the VisAD DefaultFamily and the TextAdapter: .bsv -> blank-separated values .tsv -> tab-separated values .csv -> comma-separated values In addition, the VisAD DefaultFamily will recognize the extension .txt and invoke TextAdapter. In this case, however, the TextAdapter attempts to sense the delimiter using the hierarchy: tab, semicolon, comma, blank That is, if a tab character appears in the line, tab will be used. If not, then it looks for a semicolon. Otherwise, if a comma appears in the line it will be used. If neither a tab, semicolon, or comma appears, then blank will be used. We tried to keep the amount of modification you might have to make to existing files to a minimum. The general layout is: Line 1: functional description of the data in "VisAD" lingo (aka the "MathType") Line 2: column headers, which name each parameter and possibly give them a physical unit, using the delimiter defined above (tab, semicolon, comma or blank). Line 3-n: the data values (with delimiters as define above; for filenames without recognized extensions, the delimiter used in this data section does not have to be the same as the one used on Line 2) Please refer to the VisAD Library Developers Guide, section "3.1 MathTypes" for information on how to define the functional description. Also, take a look to the examples at the end of this file. Also, please note that if you are using the TextAdapter constructor directly, the "Line 1" and "Line 2" values do not have to be in the file - alternate signatures allow these values to be passed as arguments. However, if your text file is used through the VisAD DefaultFamily (as in the SpreadSheet), you must provide the information right in the text file. II. Line 1 (ignoring any preceding comment lines...) This line specifies a functional description of the data, using the VisAD "MathType" string. There are two categories of data that may be represented in these text of files: 1) 2-D arrays of a single parameter, or 2) 1,2,or 3-D (domain) points of one or more parameters. IIa. 2-D arrays In this case, the "VisAD" functional description looks like: (x,y)->(temperature) (Longitude, Latitude)->(speed) And the data portion of the file contains "x" values per line, and "y" lines of data. See Examples #2 and Example #6, below. Only the "y" domain component may have its values defined in the file. 2-D arrays are implied when: * there are 2 domain components * there is only one range component * there is more than one domain sampling value for the first domain component (that is, more than one data value on a line in the text file) IIb. 1,2,or 3-D points Just about every other form of a text file falls into this category. Examples of VisAD functional descriptions: (x)->(temperature, dewpoint, speed) (x,y,z)->(temperature, speed) At least one of the domain variables (x,y,z) _must_ be defined by data in the file. See Examples #1, #3, #4 and #5, below. You may also use Text types for these data. For example: (Latitude,Longitude)->(City(Text)) Strictly speaking, you may have a domain with more than 3 components. If you do this, however, the TextAdapter will not be able to optimize the construction of the sampling set, and will use either a LinearNDSet (if you supply simple ranges for all domain components) or an IrregularSet (if one or more of the domain components has values specified in the file). III. Line 2 (ignoring any comment lines that might come before) The second line of the text files defines which column of the data portion contains what parameters. (Note that, as with the "Line 1", an alternate form of the constructor is available so this information can be passed as an argument rather than being read from the file.) If you have other information that you need to specify for a parameter, you should use a blank-separated sequence of phrases in the form "key=value", to specify what you need. Here are the possible keys: key value ---- --------------- unit name of Unit (default = no unit) miss value to be treated as missing (default = no missing values) scale value that each datum is multiplied by (default = 1.0) offset value that is added to each scaled datum (default = 0.0) error value of the estimated error for this parameter (default = none) ** In this release, only range error estimates are implemented. interval either 'true' or 'false' to indicate that this parameter is an _interval_, like a difference (default = false) ** The following is _not_ implemented in this release: pos column-oriented location of the data values in the form first:last (if present for one item, this MUST be supplied for all items!) fmt format pattern, using SimpleDateFormat type patterns (without spaces). Commas should also be avoided in the format pattern, especially if comma is used as the header column delimiter value a value that is used for this field. When a value is defined there should not be a correspoding value for this field in the data lines. i.e., if you had 5 parameters defined in Line 2 (e.g., p1,p2,p3,p4,p5) and one of them (e.g., p3) had a value attribute then the data lines would only contain the values for the other parameters, e.g.: p1,p2,p4,p5 p1,p2,p4,p5 ... When using the value attribute you can also include in any subsequent lines a: name=value Where name is one of the parameter names. This allows you to reset the fixed value that is used. For example, this facility could be used if you had a set of observations from a single station: (index) -> (Longitude,Latitude,Time,T) Longitude[unit="degrees west" value="110"],Latitude[unit="deg" value="40"],Time[fmt="yyyy-MM-dd HH:mm:ss z"],T[unit="celsius"] 2007-02-20 11:00:00 MST,13.3 2007-02-20 11:00:00 MST,-2.0 Latitude=30 Longitude=100 2007-02-20 11:00:00 MST,5.0 ... Note: You don't have to have the value attribute in Line 2. You can just do: (index) -> (Longitude,Latitude,Time,T) Longitude[unit="degrees west" ],Latitude[unit="deg"],Time[fmt="yyyy-MM-dd HH:mm:ss z"],T[unit="celsius"] Longitude=110 Latitude=40 2007-02-20 11:00:00 MST,13.3 2007-02-20 11:00:00 MST,-2.0 Latitude=30 Longitude=100 2007-02-20 11:00:00 MST,5.0 ... Three short examples: a, b, c, temperature[unit=degC err=.1 miss=999.9], speed[m/s] Longitude[scale=-1], Latitude, temp[unit=degK miss=999.9], dewPoint[unit=degK miss=999.9] Time[fmt=yyyy-MM-dd'T'HH:mm:ss'Z'], Longitude, Latitude, Pressure (Please note in the second example, the "scale=-1" for the Longitude serves to invert the sign of the values read from the file). (You might also note that "C" is not the VisAD unit name for degrees Celcius..."C" means Coulomb...) For some 'values' you might need to imbed a space. You may do this by enclosing the entire 'value' in double quote marks, as in: unit="international inch". If you do this, however, you must be careful about ambiguities using "blank separated values" formats. As with Line 1, there are two cases to consider when defining the contents of this Line: IIIa. 2-D arrays In this case, _only_ one range parameter is permitted. You may have 0 or 1 domain parameter names, as well as any "column skip" dummy names (these are only permitted _before_ the actual range parameter name, though). For this simplest example: temperature[unit=degF] Says that the data contains only values of temperature in degrees F. The domain parameters defined in Line 1 will be computed based on the number of items per line and the number of lines of text. If you need to skip some columns before the values of the variable start, just put in a "skip" name for each column. For example: skip, skip, skip, temperature[unit=degF] indicates that the values in the first three columns should be skipped, and the rest of the values on the line will make up the "columns" of the 2-D array. The name "skip" can be _any_ unused name. IIIb. 1,2,or 3-D points In this case, you define which column corresponds to which parameter you named on Line 1. The order doesn't matter, only that the correct column is identified. If there are columns of data that are to be ignored, just use a "skip" name that was not defined on Line 1. For example: x, y, skip, temperature[unit=degC], skip, pressure[unit=hPa] In this case, the name "skip" is a filler to indicate what column(s) should be skipped. If you are dealing with Text type data, you must include the (Text) phrase here as well: Longitude, Latitude, skip, City(Text) (Note that with this "Text" form, each data item MUST be enclosed in double-quote marks -- see example, below) In both cases, you may also use this line of text to define the values of the domain component samplings in the form: name(first:last) This means that a) "name" is a domain component, and... b) the (sampling) values of "name" are NOT read from the file, but are computed based on the range "first to last" and the number of lines of text (number of samples), or (in the case of 2-D arrays) possibly the number of range values on the line (see below), and.... c) this name is ignored for the purposes of counting/locating, columns for other parameters. If the name of a domain component variable does NOT appear on Line 2, it's values are assumed to be 0:(N-1) where "N" is the number of samples (number of lines) in the file. There is one exception to this: in the case of 2-D arrays, the first domain component is assumed to apply to the number of range values on each line of text, _not_ the number of "samples" or lines of text. If you have only one value per input line of text, and you have a 2-D array, you may optionally add a third specification: name(first:last:number) where 'number' is the number of values for this domain component. See examples, below. Finally, if you need to combine the range with other information about the parameter, it would look like: x(1.0:13.7)[unit=cm] IV. Examples Here are a few examples taken from the beginning of some files: Example #1 - Simple CSV file, for a function value=f(x) <== (x)->(value) value 0 7.2 -9.1 Example #2 - CSV file of a 2-D array <== (x,y)->(value) skip,skip,skip,skip,value 0 , 0 , 17 , 34 , 50 , 64 , 76 , 86 , 93 , 98 , 99 1 , 17 , 34 , 50 , 64 , 76 , 86 , 93 , 98 , 99 , 98 Example #3 - a ".txt" file of two range components (note that the delimiters used on "Line 2" are different that the ones used in the data) <== (x)->(value1, value2) skip value1 skip skip skip skip value2 100 , 0 , 17 , 34 , 50 , 64 , 76 , 86 , 93 , 98 , 99 101 , 17 , 34 , 50 , 64 , 76 , 86 , 93 , 98 , 99 , 98 Example #4 - CSV file of two range components located at 2-d coordinates <== (x,y)->(value_a, value_b) y,x,p,value_a[unit="degC"],p,p,p,value_b[unit=degF] 0 , 0 , 17 , 34 , 50 , 64 , 76 , 86 , 93 , 98 , 99 1 , 17 , 34 , 50 , 64 , 76 , 86 , 93 , 98 , 99 , 98 Example #5 - BSV file of real data <== % Retrieval statistics for mlw_K+ir3_2a.ret : % Zbottom threshold = 0.0 km liqclouds=1 % IWP IWP errors (dB) Dme errors (dB) % (g/m^2) mean rms median mean rms median (IPW)->(IWP_Error, Dme_error, IWP_Error_mean, Dme_error_mean) IPW[g/m^2] IWP_Error_mean p IWP_Error Dme_error_mean p Dme_error 1.41 7.092 7.890 6.831 0.746 1.768 1.139 717 Example #6 - BSV file of a 2D grid, with the locations given Lat/Lon values <== (Longitude,Latitude)->(value) Longitude(-130:-40) Latitude(20:60) p value[unit=degC] 0 0 17 34 50 64 76 86 93 98 99 1 17 34 50 64 76 86 93 98 99 98 Example #7 - TXT file of point (in situ) data with non-numeric range values <== (Longitude, Latitude) -> (City(Text)) Latitude, Longitude, City(Text) -12.3, 130.8, "Darwin" -16.9, 146.5, "Cairns" -23.6, 133.9, "Alice Springs" -33.9, 151.2, "Sydney" -37.6, 144.9, "Melbourne" Example #8 - CSV file of time series point data with time format specified <== (Time->(Latitude,Longitude,Pressure)) Time[fmt=yyyy-MM-dd'T'HH:mm:ss'Z'],Latitude,Longitude,Pressure 2003-05-02T07:00:00Z,-10.0,150.0,998.0 2003-05-02T10:00:00Z,-10.5,150.5,997.0 2003-05-02T13:00:00Z,-11.0,151.0,996.0 Example #9 - Text file of a 2-D grid, with one value per line <== (Longitude,Latitude)->Altitude Longitude(-107.495:-103.605:109),Latitude(38.5572:41.487:109), Altitude[unit=m] 3060. 3043. 2949. 2865. .... ....