fsread / xread#
Read numbers and strings from a file into 2D float and string arrays
This module was written by Matthias Cuntz while at Department of Computational Hydrosystems, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany, and continued while at Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Nancy, France.
- copyright:
Copyright 2009-2022 Matthias Cuntz, see AUTHORS.rst for details.
- license:
MIT License, see LICENSE for details.
The following functions are provided
|
Read numbers and strings from a file into 2D float and string arrays |
|
Read floats from a file into 2D float array |
|
Read strings from a file into 2D string array |
|
Read numbers and strings from Excel file into 2D float and string arrays |
|
Wrapper for |
|
Wrapper for |
- History
Written fread and sread Jul 2009 by Matthias Cuntz (mc (at) macu (dot) de)
Keyword transpose, Feb 2012, Matthias Cuntz
Ported to Python 3, Feb 2013, Matthias Cuntz
Removed bug when nc is list and contains 0, Nov 2014, Matthias Cuntz
Keyword hskip, Nov 2014, Matthias Cuntz
Do not use function lif, Feb 2015, Matthias Cuntz
nc can be tuple, Feb 2015, Matthias Cuntz
Large rewrite of code to improve speed: keep everything list until the very end, Feb 2015, Matthias Cuntz
Written fsread Feb 2015 by Matthias Cuntz (mc (at) macu (dot) de)
nc<=-1 removed in case of nc is list, Nov 2016, Matthias Cuntz
Added xread from modifying fsread, Feb 2017, Matthias Cuntz
range instead of np.arange, Nov 2017, Matthias Cuntz
Keywords cname, sname, hstrip, rename file to infile, Nov 2017, Matthias Cuntz
full_header=True returns vector of strings, Nov 2017, Matthias Cuntz
NA -> NaN, i.e. R to Python convention in xread, Feb 2019, Matthias Cuntz
Ignore unicode characters on read, Jun 2019, Matthias Cuntz
Make ignoring unicode characters campatible with Python 2 and Python 3, Jul 2019, Matthias Cuntz
Keywords encoding, errors with codecs module, Aug 2019, Matthias Cuntz
Return as list keyword, Dec 2019, Stephan Thober
Return as array as default, Jan 2020, Matthias Cuntz
Using numpy docstring format, May 2020, Matthias Cuntz
Use openpyxl for xlsx files in xread, Jul 2020, Matthias Cuntz
flake8 compatible xread, Jul 2020, Matthias Cuntz
flake8 compatible fsread, Mar 2021, Matthias Cuntz
Preserve trailing whitespace in column delimiters, Mar 2021, Matthias Cuntz
Code refactoring, Sep 2021, Matthias Cuntz
Cleaner code by using local functions, Dec 2021, Matthias Cuntz
Make float and string code symmetric in behaviour, Dec 2021, Matthias Cuntz
Always return float and string in fsread, Dec 2021, Matthias Cuntz
Removed reform option, Dec 2021, Matthias Cuntz
Return always string array if not return as list option is set; strarr is only used with header=True now, Dec 2021, Matthias Cuntz
fread and sread are simple calls of fsread, Dec 2021, Matthias Cuntz
header returns also 2D arrays by default, Dec 2021, Matthias Cuntz
More consistent docstrings, Jan 2022, Matthias Cuntz
Merged xread into module, Jan 2022, Matthias Cuntz
Use iterators to read rows in Excel file, Jan 2022, Matthias Cuntz
Always close open files, Jan 2022, Matthias Cuntz
Default fill_value is NaN, Jan 2022, Matthias Cuntz
Remove read_only mode for openpyxl because closing is disabled in this case, Jan 2022, Matthias Cuntz
NA -> NaN, i.e. R to Python convention in fsread, Aug 2022, Matthias Cuntz
Correct docstring of strip keyword, Mar 2023, Matthias Cuntz
Assure str(fill_value) in sread, Aug 2024, Matthias Cuntz
Changed deprecated numpy.in1d to numpy.isin, Aug 2024, Matthias Cuntz
- fread(infile, nc=0, cname=None, snc=0, sname=None, **kwargs)[source]#
Read floats from a file into 2D float array
Columns can be picked specifically by index or name. The header can be read separately with the (almost) same call as reading the floats.
- Parameters:
infile (str) – Source file name
nc (int or iterable, optional) – Number of columns to be read as floats [default: all (nc=0)]. nc can be an int or a vector of column indexes, starting with 0. nc<=0 reads all columns.
cname (iterable of str, optional) – Columns for floats can be chosen by the values in the first header line; must be an iterable with strings.
snc (int or iterable, optional) – Not used in fread; will be silently ignored.
sname (iterable of str, optional) – Not used in fread; will be silently ignored.
**kwargs (dict, optional) – All other keywords will be passed to fsread.
- Returns:
Array of numbers in file, or header.
- Return type:
array of floats
Notes
If header==True then skip is counterintuitive because it is actually the number of header rows to be read. This is to be able to have the exact same call of the function, once with header=False and once with header=True.
Blank lines are not filled but are taken as end of file if fill=True.
Examples
Create some data
>>> filename = 'test.dat' >>> with open(filename,'w') as ff: ... print('head1 head2 head3 head4', file=ff) ... print('1.1 1.2 1.3 1.4', file=ff) ... print('2.1 2.2 2.3 2.4', file=ff)
Read sample file in different ways
>>> # data >>> print(fread(filename, skip=1)) [[1.1 1.2 1.3 1.4] [2.1 2.2 2.3 2.4]] >>> print(fread(filename, skip=2)) [[2.1 2.2 2.3 2.4]] >>> print(fread(filename, skip=1, cskip=1)) [[1.2 1.3 1.4] [2.2 2.3 2.4]] >>> print(fread(filename, nc=2, skip=1, cskip=1)) [[1.2 1.3] [2.2 2.3]] >>> print(fread(filename, nc=[1,3], skip=1)) [[1.2 1.4] [2.2 2.4]] >>> print(fread(filename, nc=1, skip=1)) [[1.1] [2.1]] >>> print(fread(filename, nc=1, skip=1, squeeze=True)) [1.1 2.1]
>>> # header >>> print(fread(filename, nc=2, skip=1, header=True)) [['head1', 'head2']] >>> print(fread(filename, nc=2, skip=1, header=True, full_header=True)) ['head1 head2 head3 head4'] >>> print(fread(filename, nc=1, skip=2, header=True)) [['head1'], ['1.1']] >>> print(fread(filename, nc=1, skip=2, header=True, squeeze=True)) ['head1', '1.1'] >>> print(fread(filename, nc=1, skip=2, header=True, strarr=True)) [['head1'] ['1.1']]
Create data with blank lines
>>> with open(filename, 'a') as ff: ... print('', file=ff) ... print('3.1 3.2 3.3 3.4', file=ff)
>>> print(fread(filename, skip=1)) [[1.1 1.2 1.3 1.4] [2.1 2.2 2.3 2.4]] >>> print(fread(filename, skip=1, skip_blank=True, comment='#!')) [[1.1 1.2 1.3 1.4] [2.1 2.2 2.3 2.4] [3.1 3.2 3.3 3.4]]
Create data with comment lines
>>> with open(filename, 'a') as ff: ... print('# First comment', file=ff) ... print('! Second 2 comment', file=ff) ... print('4.1 4.2 4.3 4.4', file=ff)
>>> print(fread(filename, skip=1)) [[1.1 1.2 1.3 1.4] [2.1 2.2 2.3 2.4]] >>> print(fread(filename, skip=1, nc=[2], skip_blank=True, comment='#')) [[1.3] [2.3] [3.3] [2. ] [4.3]] >>> print(fread(filename, skip=1, skip_blank=True, comment='#!')) [[1.1 1.2 1.3 1.4] [2.1 2.2 2.3 2.4] [3.1 3.2 3.3 3.4] [4.1 4.2 4.3 4.4]] >>> print(fread(filename, skip=1, skip_blank=True, comment=('#','!'))) [[1.1 1.2 1.3 1.4] [2.1 2.2 2.3 2.4] [3.1 3.2 3.3 3.4] [4.1 4.2 4.3 4.4]] >>> print(fread(filename, skip=1, skip_blank=True, comment=['#','!'])) [[1.1 1.2 1.3 1.4] [2.1 2.2 2.3 2.4] [3.1 3.2 3.3 3.4] [4.1 4.2 4.3 4.4]]
Add a line with fewer columns
>>> with open(filename, 'a') as ff: ... print('5.1 5.2', file=ff)
>>> print(fread(filename, skip=1)) [[1.1 1.2 1.3 1.4] [2.1 2.2 2.3 2.4]] >>> print(fread(filename, skip=1, skip_blank=True, comment='#!', ... fill=True, fill_value=-1)) [[ 1.1 1.2 1.3 1.4] [ 2.1 2.2 2.3 2.4] [ 3.1 3.2 3.3 3.4] [ 4.1 4.2 4.3 4.4] [ 5.1 5.2 -1. -1. ]]
>>> # transpose >>> print(fread(filename, skip=1)) [[1.1 1.2 1.3 1.4] [2.1 2.2 2.3 2.4]] >>> print(fread(filename, skip=1, transpose=True)) [[1.1 2.1] [1.2 2.2] [1.3 2.3] [1.4 2.4]]
Create some more data with Nan and Inf
>>> filename1 = 'test1.dat' >>> with open(filename1,'w') as ff: ... print('head1 head2 head3 head4', file=ff) ... print('1.1 1.2 1.3 1.4', file=ff) ... print('2.1 nan Inf "NaN"', file=ff)
Treat Nan and Inf with automatic strip of “ and ‘
>>> print(fread(filename1, skip=1, transpose=True)) [[1.1 2.1] [1.2 nan] [1.3 inf] [1.4 nan]]
Create some more data with escaped numbers
>>> filename2 = 'test2.dat' >>> with open(filename2,'w') as ff: ... print('head1 head2 head3 head4', file=ff) ... print('"1.1" "1.2" "1.3" "1.4"', file=ff) ... print('2.1 nan Inf "NaN"', file=ff)
Strip
>>> print(fread(filename2, skip=1, transpose=True, strip='"')) [[1.1 2.1] [1.2 nan] [1.3 inf] [1.4 nan]]
Create more data with an extra (shorter) header line
>>> filename3 = 'test3.dat' >>> with open(filename3,'w') as ff: ... print('Extra header', file=ff) ... print('head1 head2 head3 head4', file=ff) ... print('1.1 1.2 1.3 1.4', file=ff) ... print('2.1 2.2 2.3 2.4', file=ff)
>>> print(fread(filename3, skip=2, hskip=1)) [[1.1 1.2 1.3 1.4] [2.1 2.2 2.3 2.4]] >>> print(fread(filename3, nc=2, skip=2, hskip=1, header=True)) [['head1', 'head2']]
>>> # cname >>> print(fread(filename, cname='head2', skip=1, skip_blank=True, ... comment='#!', squeeze=True)) [1.2 2.2 3.2 4.2 5.2] >>> print(fread(filename, cname=['head1','head2'], skip=1, ... skip_blank=True, comment='#!')) [[1.1 1.2] [2.1 2.2] [3.1 3.2] [4.1 4.2] [5.1 5.2]] >>> print(fread(filename, cname=['head1','head2'], skip=1, skip_blank=True, ... comment='#!', header=True)) [['head1', 'head2']] >>> print(fread(filename, cname=['head1','head2'], skip=1, skip_blank=True, ... comment='#!', header=True, full_header=True)) ['head1 head2 head3 head4'] >>> print(fread(filename, cname=[' head1','head2'], skip=1, ... skip_blank=True, comment='#!', hstrip=False)) [[1.2] [2.2] [3.2] [4.2] [5.2]]
Clean up doctest
>>> import os >>> os.remove(filename) >>> os.remove(filename1) >>> os.remove(filename2) >>> os.remove(filename3)
- fsread(infile, nc=0, cname=None, snc=0, sname=None, skip=0, cskip=0, hskip=0, separator=None, squeeze=False, skip_blank=False, comment=None, fill=False, fill_value=nan, sfill_value='', strip=None, hstrip=True, encoding='ascii', errors='ignore', header=False, full_header=False, transpose=False, strarr=False, return_list=False)[source]#
Read numbers and strings from a file into 2D float and string arrays
Columns can be picked specifically by index or name. The header can be read separately with the (almost) same call as reading the numbers or string.
- Parameters:
infile (str) – Source file name
nc (int or iterable, optional) – Number of columns to be read as floats [default: none (nc=0)]. nc can be an int or a vector of column indexes, starting with 0. If snc!=0, then nc must be iterable, or -1 to read all other columns as floats. If both nc and snc are int, then first snc string columns will be read and then nc float columns will be read.
cname (iterable of str, optional) – Columns for floats can be chosen by the values in the first header line; must be an iterable with strings.
snc (int or iterable, optional) – Number of columns to be read as strings [default: none (snc=0)]. snc can be an int or a vector of column indexes, starting with 0. If nc!=0, then snc must be iterable, or -1 to read all other columns as strings. If both nc and snc are int, then first snc string columns will be read and then nc float columns will be read.
sname (iterable of str, optional) – Columns for strings can be chosen by the values in the first header line; must be an iterable with strings.
skip (int, optional) – Number of lines to skip at the beginning of file (default: 0)
cskip (int, optional) – Number of columns to skip at the beginning of each line (default: 0)
hskip (int, optional) – Number of lines in skip that do not belong to header (default: 0)
separator (str, optional) – Column separator. If not given, columns separators are (in order): comma (‘,’), semicolon (‘;’), whitespace.
squeeze (bool, optional) – If set to True, the 2-dim array will be cleaned of degenerated dimension, possibly resulting in a vector, otherwise output is always 2-dimensional.
skip_blank (bool, optional) – Continues reading after a blank line if True, else stops reading at the first blank line (default).
comment (iterable, optional) – Line gets excluded if the first character is in comment sequence. Sequence must be iterable such as string, list and tuple, .e.g ‘#’ or [‘#’, ‘!’].
fill (bool, optional) – Fills in fill_value if True and not enough columns in input line, else raises ValueError (default).
fill_value (float, optional) – Value to fill in float array in empty cells or if not enough columns in line and fill==True (default: numpy.nan).
sfill_value (str, optional) – Value to fill in string array in empty cells or if not enough columns in line and fill==True (default: ‘’).
strip (str, optional) – Strip float columns with str.strip(strip). If strip is None, quotes “ and ‘ are stripped from input fields (default), otherwise the character in strip is stripped from the input fields. strip has to be set explicitly to also strip string columns. If strip is set to False then nothing is stripped and reading is about 30% faster.
hstrip (bool, optional) – Strip header cells to match cname if True (default), else take header cells literally.
encoding (str, optional) – Specifies the encoding which is to be used for the file (default: ‘ascii’). Any encoding that encodes to and decodes from bytes is allowed.
errors (str, optional) – Errors may be given to define the error handling during encoding of the file. Possible values are ‘strict’, ‘replace’, and ‘ignore’ (default).
header (bool, optional) –
Return header strings instead of numbers/strings in rest of file. This allows to use (almost) the same call to get values and header:
head, shead = fsread(ifile, nc=1, snc=1, header=True) data, sdata = fsread(ifile, nc=1, snc=1) temp = data[:, head[0].index('temp')]
full_header (bool, optional) – Header will be a list of the header lines if set.
transpose (bool, optional) – fsread reads in row-major format, i.e. the first dimension are the rows and second dimension are the columns out(:nrow, :ncol). This will be transposed to column-major format out(:ncol, :nrow) if transpose is set.
strarr (bool, optional) – Return header as numpy array rather than list.
return_list (bool, optional) – Return lists rather than arrays.
- Returns:
First array is also string if header. Array is replaced by an empty string if this output is not demanded such as with nc=0.
- Return type:
array of floats, array of strings
Notes
If header==True then skip is counterintuitive because it is actually the number of header rows to be read. This is to be able to have the exact same call of the function, once with header=False and once with header=True.
Blank lines are not filled but are taken as end of file if fill=True.
Examples
Create some data
>>> filename = 'test.dat' >>> with open(filename,'w') as ff: ... print('head1 head2 head3 head4', file=ff) ... print('1.1 1.2 1.3 1.4', file=ff) ... print('2.1 2.2 2.3 2.4', file=ff)
Read sample with fread - see fread for more examples
>>> a, sa = fsread(filename, nc=[1,3], skip=1) >>> print(a) [[1.2 1.4] [2.2 2.4]] >>> print(sa) [] >>> a, sa = fsread(filename, nc=2, skip=1, header=True) >>> print(a) [['head1', 'head2']] >>> print(sa) []
Read sample with sread - see sread for more examples
>>> a, sa = fsread(filename, snc=[1,3], skip=1) >>> print(a) [] >>> print(sa) [['1.2' '1.4'] ['2.2' '2.4']]
Create some mixed data
>>> with open(filename,'w') as ff: ... print('head1 head2 head3 head4', file=ff) ... print('01.12.2012 1.2 name1 1.4', file=ff) ... print('01.01.2013 2.2 name2 2.4', file=ff)
Read float and string columns in different ways
>>> a, sa = fsread(filename, nc=[1,3], skip=1) >>> print(a) [[1.2 1.4] [2.2 2.4]] >>> print(sa) [] >>> a, sa = fsread(filename, nc=[1,3], snc=[0,2], skip=1) >>> print(a) [[1.2 1.4] [2.2 2.4]] >>> print(sa) [['01.12.2012' 'name1'] ['01.01.2013' 'name2']] >>> a, sa = fsread(filename, nc=[1,3], snc=-1, skip=1) >>> print(sa) [['01.12.2012' 'name1'] ['01.01.2013' 'name2']] >>> a, sa = fsread(filename, nc=-1, snc=[0,2], skip=1) >>> print(a) [[1.2 1.4] [2.2 2.4]]
>>> a, sa = fsread(filename, nc=[1,3], snc=-1, skip=1, return_list=True) >>> print(a) [[1.2, 1.4], [2.2, 2.4]] >>> print(sa) [['01.12.2012', 'name1'], ['01.01.2013', 'name2']]
Read header
>>> a, sa = fsread(filename, nc=[1,3], snc=[0,2], skip=1, header=True) >>> print(a) [['head2', 'head4']] >>> print(sa) [['head1', 'head3']] >>> a, sa = fsread(filename, nc=[1,3], snc=[0,2], skip=1, header=True, ... squeeze=True) >>> print(a) ['head2', 'head4'] >>> print(sa) ['head1', 'head3']
Create some mixed data with missing values
>>> with open(filename,'w') as ff: ... print('head1,head2,head3,head4', file=ff) ... print('01.12.2012,1.2,name1,1.4', file=ff) ... print('01.01.2013,,name2,2.4', file=ff)
>>> a, sa = fsread(filename, nc=[1,3], skip=1, fill=True, fill_value=-1) >>> print(a) [[ 1.2 1.4] [-1. 2.4]] >>> print(sa) [] >>> a, sa = fsread(filename, nc=[1,3], skip=1, fill=True, fill_value=-1, ... strarr=True) >>> print(a) [[ 1.2 1.4] [-1. 2.4]] >>> print(sa) []
Read data using column names
>>> a, sa = fsread(filename, cname='head2', snc=[0,2], skip=1, fill=True, ... fill_value=-1, squeeze=True) >>> print(a) [ 1.2 -1. ] >>> print(sa) [['01.12.2012' 'name1'] ['01.01.2013' 'name2']] >>> a, sa = fsread(filename, cname=['head2','head4'], snc=-1, skip=1, ... fill=True, fill_value=-1) >>> print(a) [[ 1.2 1.4] [-1. 2.4]] >>> print(sa) [['01.12.2012' 'name1'] ['01.01.2013' 'name2']] >>> # header >>> a, sa = fsread(filename, nc=[1,3], sname=['head1','head3'], skip=1, ... fill=True, fill_value=-1, strarr=True, header=True) >>> print(a) [['head2' 'head4']] >>> print(sa) [['head1' 'head3']] >>> a, sa = fsread(filename, cname=['head2','head4'], snc=-1, skip=1, ... header=True, full_header=True) >>> print(a) ['head1,head2,head3,head4'] >>> print(sa) [] >>> a, sa = fsread(filename, cname=['head2','head4'], snc=-1, skip=1, ... fill=True, fill_value=-1, header=True, full_header=True) >>> print(a) ['head1,head2,head3,head4'] >>> print(sa) [] >>> a, sa = fsread(filename, cname=[' head2','head4'], snc=-1, skip=1, ... fill=True, fill_value=-1, hstrip=False) >>> print(a) [[1.4] [2.4]] >>> print(sa) [['01.12.2012' '1.2' 'name1'] ['01.01.2013' '' 'name2']]
Clean up doctest
>>> import os >>> os.remove(filename)
- sread(infile, nc=0, cname=None, snc=0, sname=None, fill_value='', sfill_value='', header=False, full_header=False, **kwargs)[source]#
Read strings from a file into 2D string array
Columns can be picked specifically by index or name. The header can be read separately with the (almost) same call as reading the strings.
- Parameters:
infile (str) – Source file name
nc (int or iterable, optional) – Number of columns to be read as strings [default: all (nc=0)]. nc can be an int or a vector of column indexes, starting with 0. nc<=0 reads all columns. snc takes precedence if nc and snc are set.
cname (iterable of str, optional) – Columns for floats can be chosen by the values in the first header line; must be an iterable with strings. sname takes precedence if cname and sname are set.
snc (int or iterable, optional) – Number of columns to be read as strings [default: all (snc=0)]. snc can be an int or a vector of column indexes, starting with 0. snc<=0 reads all columns. snc takes precedence if nc and snc are set.
sname (iterable of str, optional) – Columns for strings can be chosen by the values in the first header line; must be an iterable with strings. sname takes precedence if cname and sname are set.
fill_value (float, optional) – Value to fill in string array in empty cells or if not enough columns in line and fill==True (default: ‘’). sfill_value takes precedence if fill_value and sfill_value are set.
sfill_value (str, optional) – Value to fill in string array in empty cells or if not enough columns in line and fill==True (default: ‘’). sfill_value takes precedence if fill_value and sfill_value are set.
fill_value – value to fill in array in empty cells or if not enough columns in line and fill==True (default: ‘’).
header (bool, optional) –
Return header strings instead of strings in rest of file. This allows to use (almost) the same call to get values and header:
shead = sread(ifile, nc=2, header=True) sdata = sread(ifile, nc=2) date = sdata[:, head[0].index('Datetime')]
full_header (bool, optional) – Header will be a list of the header lines if set.
**kwargs (dict, optional) – All other keywords will be passed to fsread.
- Returns:
Array of strings in file, or of header.
- Return type:
array of strings
Notes
If header==True then skip is counterintuitive because it is actually the number of header rows to be read. This is to be able to have the exact same call of the function, once with header=False and once with header=True.
Blank lines are not filled but are taken as end of file if fill=True.
Examples
Create some data
>>> filename = 'test.dat' >>> with open(filename,'w') as ff: ... print('head1 head2 head3 head4', file=ff) ... print('1.1 1.2 1.3 1.4', file=ff) ... print('2.1 2.2 2.3 2.4', file=ff)
Read sample file in different ways
>>> # data >>> print(sread(filename, skip=1)) [['1.1' '1.2' '1.3' '1.4'] ['2.1' '2.2' '2.3' '2.4']] >>> print(sread(filename, skip=2, return_list=True)) [['2.1', '2.2', '2.3', '2.4']] >>> print(sread(filename, skip=2)) [['2.1' '2.2' '2.3' '2.4']] >>> print(sread(filename, skip=1, cskip=1)) [['1.2' '1.3' '1.4'] ['2.2' '2.3' '2.4']] >>> print(sread(filename, nc=2, skip=1, cskip=1)) [['1.2' '1.3'] ['2.2' '2.3']] >>> print(sread(filename, nc=[1,3], skip=1)) [['1.2' '1.4'] ['2.2' '2.4']] >>> print(sread(filename, nc=1, skip=1)) [['1.1'] ['2.1']] >>> print(sread(filename, nc=1, skip=1, squeeze=True)) ['1.1' '2.1']
>>> # header >>> print(sread(filename, nc=2, skip=1, header=True)) [['head1', 'head2']] >>> print(sread(filename, nc=2, skip=1, header=True, full_header=True)) ['head1 head2 head3 head4'] >>> print(sread(filename, nc=1, skip=2, header=True)) [['head1'], ['1.1']] >>> print(sread(filename, nc=1, skip=2, header=True, squeeze=True)) ['head1', '1.1'] >>> print(sread(filename, nc=1, skip=2, header=True, squeeze=True, ... strarr=True)) ['head1' '1.1'] >>> print(sread(filename, nc=1, skip=2, header=True, squeeze=True, ... transpose=True)) ['head1', '1.1']
Data with blank lines
>>> with open(filename, 'a') as ff: ... print('', file=ff) ... print('3.1 3.2 3.3 3.4', file=ff)
>>> print(sread(filename, skip=1)) [['1.1' '1.2' '1.3' '1.4'] ['2.1' '2.2' '2.3' '2.4']] >>> print(sread(filename, skip=1, skip_blank=True)) [['1.1' '1.2' '1.3' '1.4'] ['2.1' '2.2' '2.3' '2.4'] ['3.1' '3.2' '3.3' '3.4']] >>> print(sread(filename, skip=1)) [['1.1' '1.2' '1.3' '1.4'] ['2.1' '2.2' '2.3' '2.4']] >>> print(sread(filename, skip=1, transpose=True)) [['1.1' '2.1'] ['1.2' '2.2'] ['1.3' '2.3'] ['1.4' '2.4']] >>> print(sread(filename, skip=1, transpose=True)) [['1.1' '2.1'] ['1.2' '2.2'] ['1.3' '2.3'] ['1.4' '2.4']]
Data with comment lines
>>> with open(filename, 'a') as ff: ... print('# First comment', file=ff) ... print('! Second second comment', file=ff) ... print('4.1 4.2 4.3 4.4', file=ff)
>>> print(sread(filename, skip=1)) [['1.1' '1.2' '1.3' '1.4'] ['2.1' '2.2' '2.3' '2.4']] >>> print(sread(filename, skip=1, skip_blank=True, comment='#')) [['1.1' '1.2' '1.3' '1.4'] ['2.1' '2.2' '2.3' '2.4'] ['3.1' '3.2' '3.3' '3.4'] ['!' 'Second' 'second' 'comment'] ['4.1' '4.2' '4.3' '4.4']] >>> print(sread(filename, skip=1, skip_blank=True, comment='#!')) [['1.1' '1.2' '1.3' '1.4'] ['2.1' '2.2' '2.3' '2.4'] ['3.1' '3.2' '3.3' '3.4'] ['4.1' '4.2' '4.3' '4.4']] >>> print(sread(filename, skip=1, skip_blank=True, comment=('#','!'))) [['1.1' '1.2' '1.3' '1.4'] ['2.1' '2.2' '2.3' '2.4'] ['3.1' '3.2' '3.3' '3.4'] ['4.1' '4.2' '4.3' '4.4']] >>> print(sread(filename, skip=1, skip_blank=True, comment=['#','!'])) [['1.1' '1.2' '1.3' '1.4'] ['2.1' '2.2' '2.3' '2.4'] ['3.1' '3.2' '3.3' '3.4'] ['4.1' '4.2' '4.3' '4.4']]
Data with escaped numbers
>>> filename2 = 'test2.dat' >>> with open(filename2,'w') as ff: ... print('"head1" "head2" "head3" "head4"', file=ff) ... print('"1.1" "1.2" "1.3" "1.4"', file=ff) ... print('2.1 nan Inf "NaN"', file=ff)
>>> print(sread(filename2, skip=1, transpose=True, strip='"')) [['1.1' '2.1'] ['1.2' 'nan'] ['1.3' 'Inf'] ['1.4' 'NaN']]
Data with an extra (shorter) header line
>>> filename3 = 'test3.dat' >>> with open(filename3,'w') as ff: ... print('Extra header', file=ff) ... print('head1 head2 head3 head4', file=ff) ... print('1.1 1.2 1.3 1.4', file=ff) ... print('2.1 2.2 2.3 2.4', file=ff)
>>> print(sread(filename3, skip=2, return_list=True)) [['1.1', '1.2', '1.3', '1.4'], ['2.1', '2.2', '2.3', '2.4']] >>> print(sread(filename3, skip=2, hskip=1)) [['1.1' '1.2' '1.3' '1.4'] ['2.1' '2.2' '2.3' '2.4']] >>> print(sread(filename3, nc=2, skip=2, hskip=1, header=True)) [['head1', 'head2']]
Data with missing values
>>> filename4 = 'test4.dat' >>> with open(filename4,'w') as ff: ... print('Extra header', file=ff) ... print('head1,head2,head3,head4', file=ff) ... print('1.1,1.2,1.3,1.4', file=ff) ... print('2.1,,2.3,2.4', file=ff)
>>> print(sread(filename4, skip=2, return_list=True)) [['1.1', '1.2', '1.3', '1.4'], ['2.1', '', '2.3', '2.4']] >>> print(sread(filename4, skip=2, fill=True, fill_value='-1')) [['1.1' '1.2' '1.3' '1.4'] ['2.1' '-1' '2.3' '2.4']]
>>> # cname >>> print(sread(filename, cname='head2', skip=1, skip_blank=True, ... comment='#!', squeeze=True)) ['1.2' '2.2' '3.2' '4.2'] >>> print(sread(filename, cname=['head1','head2'], skip=1, skip_blank=True, ... comment='#!')) [['1.1' '1.2'] ['2.1' '2.2'] ['3.1' '3.2'] ['4.1' '4.2']] >>> print(sread(filename, cname=['head1','head2'], skip=1, skip_blank=True, ... comment='#!', header=True)) [['head1', 'head2']] >>> print(sread(filename, cname=['head1','head2'], skip=1, skip_blank=True, ... comment='#!', header=True, full_header=True)) ['head1 head2 head3 head4'] >>> print(sread(filename, cname=[' head1','head2'], skip=1, ... skip_blank=True, comment='#!', hstrip=False)) [['1.2'] ['2.2'] ['3.2'] ['4.2']]
Clean up doctest
>>> import os >>> os.remove(filename) >>> os.remove(filename2) >>> os.remove(filename3) >>> os.remove(filename4)
- xread(infile, sheet=None, nc=0, cname=None, snc=0, sname=None, skip=0, cskip=0, hskip=0, squeeze=False, fill=False, fill_value=nan, sfill_value='', strip=None, hstrip=True, header=False, full_header=False, transpose=False, strarr=False, return_list=False)[source]#
Read numbers and strings from Excel file into 2D float and string arrays
This routine is analog to fsread but for Excel files.
Columns can be picked specifically by index or name. The header can be read separately with the (almost) same call as reading the numbers or string.
- Parameters:
infile (str) – Excel source file name
sheet (str or int, optional) – Name or number of Excel sheet (default: first sheet)
nc (int or iterable, optional) – Number of columns to be read as floats [default: none (nc=0)]. nc can be an int or a vector of column indexes, starting with 0. If snc!=0, then nc must be iterable, or -1 to read all other columns as floats. If both nc and snc are int, then first snc string columns will be read and then nc float columns will be read.
cname (iterable of str, optional) – Columns for floats can be chosen by the values in the first header line; must be an iterable with strings.
snc (int or iterable, optional) – Number of columns to be read as strings [default: none (snc=0)]. snc can be an int or a vector of column indexes, starting with 0. If nc!=0, then snc must be iterable, or -1 to read all other columns as strings. If both nc and snc are int, then first snc string columns will be read and then nc float columns will be read.
sname (iterable of str, optional) – Columns for strings can be chosen by the values in the first header line; must be an iterable with strings.
skip (int, optional) – Number of lines to skip at the beginning of file (default: 0)
cskip (int, optional) – Number of columns to skip at the beginning of each line (default: 0)
hskip (int, optional) – Number of lines in skip that do not belong to header (default: 0)
squeeze (bool, optional) – If set to True, the 2-dim array will be cleaned of degenerated dimension, possibly resulting in a vector, otherwise output is always 2-dimensional.
fill (bool, optional) – Fills in fill_value if True and not enough columns in input line, else raises ValueError (default).
fill_value (float, optional) – Value to fill in float array in empty cells or if not enough columns in line and fill==True (default: numpy.nan).
sfill_value (str, optional) – Value to fill in string array in empty cells or if not enough columns in line and fill==True (default: ‘’).
strip (str, optional) – Strip float columns with str.strip(strip). If strip is None, quotes “ and ‘ are stripped from input fields (default), otherwise the character in strip is stripped from the input fields. strip has to be set explicitly to also strip string columns. If strip is set to False then nothing is stripped and reading is about 30% faster for text files.
hstrip (bool, optional) – Strip header cells to match cname if True (default), else take header cells literally.
header (bool, optional) –
Return header strings instead of numbers/strings in rest of file. This allows to use (almost) the same call to get values and header:
head, shead = xread(ifile, nc=1, snc=1, header=True) data, sdata = xread(ifile, nc=1, snc=1) temp = data[:, head[0].index('temp')]
full_header (bool, optional) – Header will be a list of the header lines if set.
transpose (bool, optional) – fsread reads in row-major format, i.e. the first dimension are the rows and second dimension are the columns out(:nrow, :ncol). This will be transposed to column-major format out(:ncol, :nrow) if transpose is set.
strarr (bool, optional) – Return header as numpy array rather than list.
return_list (bool, optional) – Return lists rather than arrays.
- Returns:
First array is also string if header==True. The array of floats or of strings is replaced by an empty list if the output is not demanded, e.g. the array of float is set to [] if nc=0.
- Return type:
array of floats, array of strings
Notes
If header==True then skip is counterintuitive because it is actually the number of header rows to be read. This is to be able to have the exact same call of the function, once with header=False and once with header=True.
xread
needs modulexlrd
for reading xls-files, and moduleopenpyxl
for reading xlsx-files. Raises IOError during read if relevant module is not installed.Examples
Using xlrd for xls files
>>> filename = 'test_readexcel.xls' >>> dat, sdat = xread(filename, skip=1, nc=-1) >>> print(dat) [[1.1 1.2 1.3 1.4] [2.1 2.2 2.3 2.4] [3.1 3.2 3.3 3.4] [4.1 4.2 4.3 4.4]] >>> print(sdat) [] >>> dat, sdat = xread(filename, skip=1, nc=[2], squeeze=True) >>> print(dat) [1.3 2.3 3.3 4.3] >>> dat, sdat = xread(filename, skip=1, cname=['head1', 'head2']) >>> print(dat) [[1.1 1.2] [2.1 2.2] [3.1 3.2] [4.1 4.2]] >>> dat, sdat = xread(filename, sheet='Sheet3', nc=[1], snc=[0, 2], skip=1, ... squeeze=True) >>> print(dat) [1.2 2.2 3.2 4.2] >>> print(sdat) [['name1' 'name5'] ['name2' 'name6'] ['name3' 'name7'] ['name4' 'name8']] >>> dat, sdat = xread(filename, sheet=2, cname='head2', snc=[0, 2], skip=1, ... squeeze=True) >>> print(dat) [1.2 2.2 3.2 4.2] >>> print(sdat) [['name1' 'name5'] ['name2' 'name6'] ['name3' 'name7'] ['name4' 'name8']] >>> dat, sdat = xread(filename, sheet='Sheet2', cname=['head2', 'head4'], ... snc=[0, 2], skip=1, fill=True, fill_value=-9, ... sfill_value='-8') >>> print(dat) [[-9. 1.4] [ 2.2 2.4] [ 3.2 3.4] [ 4.2 4.4]] >>> print(sdat) [['1.1' '1.3'] ['2.1' '2.3'] ['3.1' '-8'] ['4.1' '4.3']] >>> dat, sdat = xread(filename, sheet='Sheet2', cname=['head2', 'head4'], ... snc=[0, 2], skip=1, header=True) >>> print(dat) [['head2', 'head4']] >>> print(sdat) [['head1', 'head3']] >>> dat, sdat = xread(filename, sheet='Sheet2', cname=['head2', 'head4'], ... snc=[0, 2], skip=1, header=True, squeeze=True) >>> print(dat) ['head2', 'head4'] >>> print(sdat) ['head1', 'head3'] >>> dat, sdat = xread(filename, sheet='Sheet2', nc=-1, skip=1, header=True) >>> print(dat) [['head1', 'head2', 'head3', 'head4']] >>> dat, sdat = xread(filename, sheet='Sheet2', cname=[' head2', 'head4'], ... snc=[0, 2], skip=1, fill=True, fill_value=-9, ... sfill_value='-8', hstrip=False) >>> print(dat) [[1.4] [2.4] [3.4] [4.4]]
Using openpyxl for xlsx files
>>> filename = 'test_readexcel.xlsx' >>> dat, sdat = xread(filename, skip=1, nc=-1) >>> print(dat) [[1.1 1.2 1.3 1.4] [2.1 2.2 2.3 2.4] [3.1 3.2 3.3 3.4] [4.1 4.2 4.3 4.4]] >>> print(sdat) [] >>> dat, sdat = xread(filename, skip=1, nc=[2], squeeze=True) >>> print(dat) [1.3 2.3 3.3 4.3] >>> dat, sdat = xread(filename, skip=1, cname=['head1', 'head2']) >>> print(dat) [[1.1 1.2] [2.1 2.2] [3.1 3.2] [4.1 4.2]] >>> dat, sdat = xread(filename, sheet='Sheet3', nc=[1], snc=[0, 2], skip=1, ... squeeze=True) >>> print(dat) [1.2 2.2 3.2 4.2] >>> print(sdat) [['name1' 'name5'] ['name2' 'name6'] ['name3' 'name7'] ['name4' 'name8']] >>> dat, sdat = xread(filename, sheet=2, cname='head2', snc=[0, 2], skip=1, ... squeeze=True) >>> print(dat) [1.2 2.2 3.2 4.2] >>> print(sdat) [['name1' 'name5'] ['name2' 'name6'] ['name3' 'name7'] ['name4' 'name8']] >>> dat, sdat = xread(filename, sheet='Sheet2', cname=['head2', 'head4'], ... snc=[0, 2], skip=1, fill=True, fill_value=-9, ... sfill_value='-8') >>> print(dat) [[-9. 1.4] [ 2.2 2.4] [ 3.2 3.4] [ 4.2 4.4]] >>> print(sdat) [['1.1' '1.3'] ['2.1' '2.3'] ['3.1' '-8'] ['4.1' '4.3']] >>> dat, sdat = xread(filename, sheet='Sheet2', cname=['head2', 'head4'], ... snc=[0, 2], skip=1, header=True) >>> print(dat) [['head2', 'head4']] >>> print(sdat) [['head1', 'head3']] >>> dat, sdat = xread(filename, sheet='Sheet2', cname=['head2', 'head4'], ... snc=[0, 2], skip=1, header=True, squeeze=True) >>> print(dat) ['head2', 'head4'] >>> print(sdat) ['head1', 'head3'] >>> dat, sdat = xread(filename, sheet='Sheet2', nc=-1, skip=1, header=True) >>> print(dat) [['head1', 'head2', 'head3', 'head4']] >>> print(sdat) [] >>> dat, sdat = xread(filename, sheet='Sheet2', cname=[' head2', 'head4'], ... snc=[0, 2], skip=1, fill=True, fill_value=-9, ... sfill_value='-8', hstrip=False) >>> print(dat) [[1.4] [2.4] [3.4] [4.4]]