Dios Docs¶
The whole package dios
is mainly a container for
the class dios.DictOfSeries
. See
DictOfSeries¶
-
class
dios.
DictOfSeries
(data=None, columns=None, index=None, itype=None, cast_policy='save', fastpath=False)[source]¶ Bases:
dios.base._DiosBase
A data frame where every column has its own index.
DictOfSeries is a collection of pd.Series’s which aim to be as close as possible similar to pd.DataFrame. The advantage over pd.DataFrame is, that every column has its own row-index, unlike the former, which provide a single row-index for all columns. This solves problems with unaligned data and data which varies widely in length.
Indexing with
di[]
,di.loc[]
anddi.iloc[]
should work analogous to these methods from pd.DataFrame. The indexer can be a single label, a slice, a list-like, a boolean list-like, or a boolean DictOfSeries/pd.DataFrame and can be used to selectively get or set data.- Parameters
data (array-like, Iterable, dict, or scalar value) – Contains data stored in Series.
columns (array-like) – Column labels to use for resulting frame. Will default to RangeIndex(0, 1, 2, …, n) if no column labels are provided.
index (Index or array-like) – Index to use to reindex every given series during init. Ignored if omitted.
itype (Itype, pd.Index, Itype-string-repr or type) – Every series that is inserted, must have an index of this type or any of this types subtypes. If None, the itype is inferred as soon as the first non-empty series is inserted.
cast_policy ({'save', 'force', 'never'}, default 'save') – Policy used for (down-)casting the index of a series if its type does not match the
itype
.
Attributes Summary
Access a group of rows and columns by label(s) or a boolean array with automatic alignment of indexers.
Access a single value for a row/column label pair.
The policy to use for casting new columns if its initial itype does not fit.
The column labels of the DictOfSeries
Alias for
to_df()
as property, for debugging purpose.Return pandas.Series with the dtypes of all columns.
Indicator whether DictOfSeries is empty.
Access a single value for a row/column pair by integer position.
Purely integer-location based indexing for selection by position.
Return pandas.Series with the indexes of all columns.
The
Itype
of the DictOfSeries.Return pandas.Series with the lenght of all columns.
Access a group of rows and columns by label(s) or a boolean array.
Return a numpy.array of numpy.arrays with the values of all columns.
Methods Summary
all
([axis])Return whether all elements are True, potentially over an axis.
any
([axis])Return whether any element is True, potentially over an axis.
apply
(func[, axis, raw, args])Apply a function along an axis of the DictOfSeries.
astype
(dtype[, copy, errors])Cast the data to the given data type.
clear
()combine_first
(other[, keepna])Update null elements with value in the same location in other.
copy
([deep])Make a copy of this DictOfSeries’ indices and data.
copy_empty
([columns])Return a new DictOfSeries object, with same properties than the original.
Drop empty columns.
dropna
([inplace])Return a bolean array that is True if the value is a Nan-value
equals
(other)Test whether two DictOfSeries contain the same elements.
for_each
(attr_or_callable, **kwds)Apply a callable or a pandas.Series method or property on each column.
get
(key[, default])hasnans
([axis, drop_empty])Returns a boolean Series along an axis, which indicates if it contains NA-entries.
index_of
([method])Return an single index with indices from all columns.
isdata
()Alias for
notna(drop_empty=True)
.isempty
()Returns a boolean Series, which indicates if an column is empty
isin
(values)Return a boolean dios, that indicates if the corresponding value is in the given array-like.
isna
([drop_empty])Return a boolean DictOfSeries which indicates NA positions.
isnull
([drop_empty])Alias for
isna()
items
()iterrows
([fill_value, squeeze])Iterate over DictOfSeries rows as (index, pandas.Series/DictOfSeries) pairs.
keys
()mask
(cond[, other, inplace])Replace values where the condition is True.
max
([axis, skipna])memory_usage
([index, deep])min
([axis, skipna])notempty
()Returns a boolean Series, which indicates if an column is not empty
notna
([drop_empty])Return a boolean DictOfSeries which indicates non-NA positions.
notnull
([drop_empty])Alias, see
notna()
.pop
(*args)popitem
()reduce_columns
(func[, initial, skipna])Reduce all columns to a single pandas.Series by a given function.
setdefault
(key[, default])squeeze
([axis])Squeeze a 1-dimensional axis objects into scalars.
to_csv
(*args, **kwargs)Write object to a comma-separated values (csv) file.
to_df
([how])Transform DictOfSeries to a pandas.DataFrame.
to_dios
()A dummy to allow unconditional to_dios calls on pd.DataFrame, pd.Series and dios.DictOfSeries
to_string
([max_rows, min_rows, max_cols, …])Pretty print a dios.
update
(other)where
(cond[, other, inplace])Replace values where the condition is False.
Attributes Documentation
-
aloc
¶ Access a group of rows and columns by label(s) or a boolean array with automatic alignment of indexers.
See indexing docs
-
at
¶ Access a single value for a row/column label pair.
See indexing docs
-
cast_policy
¶ The policy to use for casting new columns if its initial itype does not fit.
See Itype documentation for more info.
-
columns
¶ The column labels of the DictOfSeries
-
debugDf
¶ Alias for
to_df()
as property, for debugging purpose.
-
dtypes
¶ Return pandas.Series with the dtypes of all columns.
-
empty
¶ Indicator whether DictOfSeries is empty.
- Returns
If DictOfSeries is empty, return True, if not return False.
- Return type
bool
See also
DictOfSeries.dropempty
drop empty columns
DictOfSeries.dropna
drop NAN’s from a DictOfSeries
pandas.Series.dropna
drop NAN’s from a Series
Notes
If DictOfSeries contains only NaNs, it is still not considered empty. See the example below.
Examples
An example of an actual empty DictOfSeries.
>>> di_empty = DictOfSeries(columns=['A']) >>> di_empty Empty DictOfSeries Columns: ['A'] >>> di_empty.empty True
If we only have NaNs in our DictOfSeries, it is not considered empty! We will need to drop the NaNs to make the DictOfSeries empty:
>>> di = pd.DictOfSeries({'A' : [np.nan]}) >>> di A | ===== | 0 NaN | >>> di.empty False >>> di.dropna().empty True
-
iat
¶ Access a single value for a row/column pair by integer position.
See indexing docs
-
iloc
¶ Purely integer-location based indexing for selection by position.
See indexing docs
-
indexes
¶ Return pandas.Series with the indexes of all columns.
-
itype
¶ The
Itype
of the DictOfSeries.See Itype documentation for more info.
-
lengths
¶ Return pandas.Series with the lenght of all columns.
-
loc
¶ Access a group of rows and columns by label(s) or a boolean array.
See indexing docs
-
size
¶
-
values
¶ Return a numpy.array of numpy.arrays with the values of all columns.
The outer has the length of columns, the inner holds the values of the column.
Methods Documentation
-
all
(axis=0)[source]¶ Return whether all elements are True, potentially over an axis.
Returns True unless there at least one element within a series or along a DictOfSeries axis that is False or equivalent (e.g. zero or empty).
- Parameters
axis ({0 or ‘index’, 1 or ‘columns’, None}, default 0) –
- Indicate which axis or axes should be reduced.
0 / ‘index’ : reduce the index, return a Series whose index is the original column labels.
1 / ‘columns’ : reduce the columns, return a Series whose index is the union of all columns indexes.
None : reduce all axes, return a scalar.
- Returns
- Return type
pandas.Series
See also
pandas.Series.all()
Return True if all elements are True.
any()
Return True if one (or more) elements are True.
-
any
(axis=0)[source]¶ Return whether any element is True, potentially over an axis.
Returns False unless there at least one element within a series or along a DictOfSeries axis that is True or equivalent (e.g. non-zero or non-empty).
- Parameters
axis ({0 or ‘index’, 1 or ‘columns’, None}, default 0) –
- Indicate which axis or axes should be reduced.
0 / ‘index’ : reduce the index, return a Series whose index is the original column labels.
1 / ‘columns’ : reduce the columns, return a Series whose index is the union of all columns indexes.
None : reduce all axes, return a scalar.
- Returns
- Return type
pandas.Series
See also
pandas.Series.any()
Return whether any element is True.
all()
Return True if all elements are True.
-
apply
(func, axis=0, raw=False, args=(), **kwds)[source]¶ Apply a function along an axis of the DictOfSeries.
- Parameters
func (callable) – Function to apply on each column.
axis ({0 or 'index', 1 or 'columns'}, default 0) –
Axis along which the function is applied:
0 or ‘index’: apply function to each column.
1 or ‘columns’: NOT IMPLEMENTED
raw (bool, default False) –
Determines if row or column is passed as a Series or ndarray object:
False
: passes each row or column as a Series to the function.True
: the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance.
args (tuple) – Positional arguments to pass to func in addition to the array/series.
**kwds – Additional keyword arguments to pass as keywords arguments to func.
- Returns
Result of applying
func
along the given axis of the DataFrame.- Return type
Series or DataFrame
- Raises
NotImplementedError –
if axis is ‘columns’ or 1
See also
DictOfSeries.for_each()
apply pd.Series methods or properties to each column
Examples
We use the example DictOfSeries from indexing.
>>> di = di[:5] a | b | c | d | ===== | ==== | ===== | ===== | 0 0 | 2 5 | 4 7 | 6 0 | 1 7 | 3 6 | 5 17 | 7 1 | 2 14 | 4 7 | 6 27 | 8 2 | 3 21 | 5 8 | 7 37 | 9 3 | 4 28 | 6 9 | 8 47 | 10 4 |
>>> di.apply(max) columns a 28 b 9 c 47 d 4 dtype: int64
>>> di.apply(pd.Series.count) columns a 5 b 5 c 5 d 5 dtype: int64
One can pass keyword arguments directly..
>>> di.apply(pd.Series.value_counts, normalize=True) a | b | c | d | ======= | ====== | ======= | ====== | 7 0.2 | 7 0.2 | 7 0.2 | 4 0.2 | 14 0.2 | 6 0.2 | 37 0.2 | 3 0.2 | 21 0.2 | 5 0.2 | 47 0.2 | 2 0.2 | 28 0.2 | 9 0.2 | 27 0.2 | 1 0.2 | 0 0.2 | 8 0.2 | 17 0.2 | 0 0.2 |
Or define a own funtion..
>>> di.apply(lambda s : 'high' if max(s) > 10 else 'low') columns a high b low c high d low dtype: object
And also more advanced functions that return a list-like can be given. Note that the returned lists not necessarily must have the same length.
>>> func = lambda s : ('high', max(s), min(s)) if min(s) > (max(s)//2) else ('low',max(s)) >>> di.apply(func) a | b | c | d | ====== | ======= | ====== | ====== | 0 low | 0 high | 0 low | 0 low | 1 28 | 1 9 | 1 47 | 1 4 | | 2 5 | | |
-
combine_first
(other, keepna=False)[source]¶ Update null elements with value in the same location in other.
Combine two DictOfSeries objects by filling null values in one DictOfSeries with non-null values from other DictOfSeries. The row and column indexes of the resulting DictOfSeries will be the union of the two.
- Parameters
keepna (bool, default False) – By default Nan’s are updated by other and new value-index pairs from other are inserted. If set to True, NaN’s are not updated and only new value-index pair are inserted.
other (DictOfSeries) – Provided DictOfSeries to use to fill null values.
- Returns
- Return type
-
copy
(deep=True)¶ Make a copy of this DictOfSeries’ indices and data.
- Parameters
deep (bool, default True) – Make a deep copy, including a copy of the data and the indices. With deep=False neither the indices nor the data are copied.
- Returns
copy
- Return type
See also
pandas.DataFrame.copy()
-
copy_empty
(columns=True)¶ Return a new DictOfSeries object, with same properties than the original. :param columns: If
True
, the copy will have the same, but empty columns like the original. :type columns: bool, default True- Returns
DictOfSeries
- Return type
empty copy
Examples
>>> di = DictOfSeries({'A': range(2), 'B': range(3)}) >>> di A | B | ==== | ==== | 0 0 | 0 0 | 1 1 | 1 1 | | 2 2 |
>>> empty = di.copy_empty() >>> empty Empty DictOfSeries Columns: ['A', 'B']
The properties are the same, eg.
>>> empty.itype == di.itype True >>> empty.cast_policy == di.cast_policy True >>> empty.dtypes == di.dtypes columns A True B True dtype: bool
-
equals
(other)[source]¶ Test whether two DictOfSeries contain the same elements.
This function allows two DictOfSeries to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal. The column headers do not need to have the same type, but the elements within the columns must be the same dtype.
- Parameters
other (DictOfSeries) – The other DictOfSeries to compare with.
- Returns
True if all elements are the same in both DictOfSeries, False otherwise.
- Return type
bool
-
for_each
(attr_or_callable, **kwds)[source]¶ Apply a callable or a pandas.Series method or property on each column.
- Parameters
attr_or_callable (Any) – A pandas.Series attribute or any callable, to apply on each column. A series attribute can be any property, field or method and also could be specified as string. If a callable is given it must take pandas.Series as the only positional argument.
**kwds (any) – kwargs to passed to callable
- Returns
A series with the results, indexed by the column labels.
- Return type
pandas.Series
See also
DictOfSeries.apply()
Apply functions to columns and convert result to DictOfSeries.
Examples
>>> d = DictOfSeries([range(3), range(4)], columns=['a', 'b']) >>> d a | b | ==== | ==== | 0 0 | 0 0 | 1 1 | 1 1 | 2 2 | 2 2 | | 3 3 |
Use with a callable..
>>> d.for_each(max) columns a 2 b 3 dtype: object
..or with a string, denoting a pd.Series attribute and therefor is the same as giving the latter.
>>> d.for_each('max') columns a 2 b 3 dtype: object
>>> d.for_each(pd.Series.max) columns a 2 b 3 dtype: object
Both also works with properties:
>>> d.for_each('dtype') columns a int64 b int64 dtype: object
-
hasnans
(axis=0, drop_empty=False)[source]¶ Returns a boolean Series along an axis, which indicates if it contains NA-entries.
-
index_of
(method='all')[source]¶ Return an single index with indices from all columns.
- Parameters
method (string, default 'all') –
‘all’ : get all indices from all columns
’union’ : alias for ‘all’
’shared’ : get indices that are present in every columns
’intersection’ : alias for ‘shared’
’uniques’ : get indices that are only present in a single column
’non-uniques’ : get indices that are present in more than one column
- Returns
A single duplicate-free index, somehow representing indices of all columns.
- Return type
pd.Index
Examples
We use the example DictOfSeries from indexing.
>>> di a | b | c | d | ===== | ====== | ====== | ===== | 0 0 | 2 5 | 4 7 | 6 0 | 1 7 | 3 6 | 5 17 | 7 1 | 2 14 | 4 7 | 6 27 | 8 2 | 3 21 | 5 8 | 7 37 | 9 3 | 4 28 | 6 9 | 8 47 | 10 4 | 5 35 | 7 10 | 9 57 | 11 5 | 6 42 | 8 11 | 10 67 | 12 6 | 7 49 | 9 12 | 11 77 | 13 7 | 8 56 | 10 13 | 12 87 | 14 8 | 9 63 | 11 14 | 13 97 | 15 9 |
>>> di.index_of() RangeIndex(start=0, stop=16, step=1)
>>> di.index_of("shared") Int64Index([6, 7, 8, 9], dtype='int64')
>>> di.index_of("uniques") Int64Index([0, 1, 14, 15], dtype='int64')
-
isin
(values)[source]¶ Return a boolean dios, that indicates if the corresponding value is in the given array-like.
-
iterrows
(fill_value=nan, squeeze=True)[source]¶ Iterate over DictOfSeries rows as (index, pandas.Series/DictOfSeries) pairs. MAY BE VERY PERFORMANCE AND/OR MEMORY EXPENSIVE
- Parameters
fill_value (scalar, default numpy.nan) –
Fill value for row entry, if the column does not have an entry at the current index location. This ensures that the returned Row always contain all columns. If
None
is given no value is filled.If
fill_value=None
andsqueeze=True
the resulting Row (a pandas.Series) may differ in length between iterator calls. That’s because an entry, that is not present in a column, will also not be present in the resulting Row.squeeze (bool, default False) –
True
: A pandas.Series is returned for each row.False
: A single-rowed DictOfSeries is returned for each row.
- Yields
index (label) – The index of the row.
data (Series or DictOfSeries) – The data of the row as a Series if squeeze is True, as a DictOfSeries otherwise.
See also
DictOfSeries.iteritems()
Iterate over (column name, Series) pairs.
-
mask
(cond, other=nan, inplace=False)[source]¶ Replace values where the condition is True.
- Parameters
cond (bool DictOfSeries, Series, array-like, or callable) – Where cond is False, keep the original value. Where True, replace with corresponding value from other. If cond is callable, it is computed on the DictOfSeries and should return boolean DictOfSeries or array. The callable must not change input DictOfSeries (though dios doesn’t check it). If cond is a bool Series, every column is (row-)aligned against it, before the boolean values are evaluated. Missing indices are treated like False values.
other (scalar, Series, DictOfSeries, or callable) – Entries where cond is True are replaced with corresponding value from other. If other is callable, it is computed on the DictOfSeries and should return scalar or DictOfSeries. The callable must not change input DictOfSeries (though dios doesn’t check it). If other is a Series, every column is (row-)aligned against it, before the values are written. NAN’s are written for missing indices.
inplace (bool, default False) – Whether to perform the operation in place on the data.
- Returns
- Return type
See also
mask()
Mask data where condition is False
-
reduce_columns
(func, initial=None, skipna=False)[source]¶ Reduce all columns to a single pandas.Series by a given function.
Apply a function of two pandas.Series as arguments, cumulatively to all columns, from left to right, so as to reduce the columns to a single pandas.Series. If initial is present, it is placed before the columns in the calculation, and serves as a default when the columns are empty.
- Parameters
func (function) – The function must take two identically indexed pandas.Series and should return a single pandas.Series with the same index.
initial (column-label or pd.Series, default None) – The series to start with. If None a dummy series is created, with the indices of all columns and the first seen values.
skipna (bool, default False) – If True, skip NaN values.
- Returns
A series with the reducing result and the index of the start series, defined by
initializer
.- Return type
pandas.Series
-
to_csv
(*args, **kwargs)[source]¶ Write object to a comma-separated values (csv) file.
Changed in version 0.24.0: The order of arguments for Series was changed.
- Parameters
path_or_buf (str or file handle, default None) –
File path or object, if None is provided the result is returned as a string. If a file object is passed it should be opened with newline=’’, disabling universal newlines.
Changed in version 0.24.0: Was previously named “path” for Series.
sep (str, default ',') – String of length 1. Field delimiter for the output file.
na_rep (str, default '') – Missing data representation.
float_format (str, default None) – Format string for floating point numbers.
columns (sequence, optional) – Columns to write.
header (bool or list of str, default True) –
Write out the column names. If a list of strings is given it is assumed to be aliases for the column names.
Changed in version 0.24.0: Previously defaulted to False for Series.
index (bool, default True) – Write row names (index).
index_label (str or sequence, or False, default None) – Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the object uses MultiIndex. If False do not print fields for index names. Use index_label=False for easier importing in R.
mode (str) – Python write mode, default ‘w’.
encoding (str, optional) – A string representing the encoding to use in the output file, defaults to ‘utf-8’.
compression (str or dict, default 'infer') –
If str, represents compression mode. If dict, value at ‘method’ is the compression mode. Compression mode may be any of the following possible values: {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}. If compression mode is ‘infer’ and path_or_buf is path-like, then detect compression mode from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’ or ‘.xz’. (otherwise no compression). If dict given and mode is one of {‘zip’, ‘gzip’, ‘bz2’}, or inferred as one of the above, other entries passed as additional compression options.
Changed in version 1.0.0: May now be a dict with key ‘method’ as compression mode and other entries as additional compression options if compression mode is ‘zip’.
Changed in version 1.1.0: Passing compression options as keys in dict is supported for compression modes ‘gzip’ and ‘bz2’ as well as ‘zip’.
quoting (optional constant from csv module) – Defaults to csv.QUOTE_MINIMAL. If you have set a float_format then floats are converted to strings and thus csv.QUOTE_NONNUMERIC will treat them as non-numeric.
quotechar (str, default '"') – String of length 1. Character used to quote fields.
line_terminator (str, optional) –
The newline character or character sequence to use in the output file. Defaults to os.linesep, which depends on the OS in which this method is called (‘n’ for linux, ‘rn’ for Windows, i.e.).
Changed in version 0.24.0.
chunksize (int or None) – Rows to write at a time.
date_format (str, default None) – Format string for datetime objects.
doublequote (bool, default True) – Control quoting of quotechar inside a field.
escapechar (str, default None) – String of length 1. Character used to escape sep and quotechar when appropriate.
decimal (str, default '.') – Character recognized as decimal separator. E.g. use ‘,’ for European data.
errors (str, default 'strict') –
Specifies how encoding and decoding errors are to be handled. See the errors argument for
open()
for a full list of options.New in version 1.1.0.
- Returns
If path_or_buf is None, returns the resulting csv format as a string. Otherwise returns None.
- Return type
None or str
See also
read_csv()
Load a CSV file into a DataFrame.
to_excel()
Write DataFrame to an Excel file.
Examples
>>> df = pd.DataFrame({'name': ['Raphael', 'Donatello'], ... 'mask': ['red', 'purple'], ... 'weapon': ['sai', 'bo staff']}) >>> df.to_csv(index=False) 'name,mask,weapon\nRaphael,red,sai\nDonatello,purple,bo staff\n'
Create ‘out.zip’ containing ‘out.csv’
>>> compression_opts = dict(method='zip', ... archive_name='out.csv') >>> df.to_csv('out.zip', index=False, ... compression=compression_opts)
-
to_df
(how='outer')[source]¶ Transform DictOfSeries to a pandas.DataFrame.
Because a pandas.DataFrame can not handle Series of different length, but DictOfSeries can, the missing data is filled with NaNs or is dropped, depending on the keyword how.
- Parameters
how ({'outer', 'inner'}, default 'outer') –
define how the resulting DataFrame index is generated: * ‘outer’: The indices of all columns, merged into one index is used.
If a column misses values at the new index location, `NaN`s are filled.
- ’inner’: Only indices that are present in all columns are used, filling
logic is not needed, but values are dropped, if a column has indices that are not known to all other columns.
- Returns
pandas.DataFrame
- Return type
transformed data
Examples
Missing data locations are filled with NaN’s
>>> a = pd.Series(11, index=range(2)) >>> b = pd.Series(22, index=range(3)) >>> c = pd.Series(33, index=range(1,9,3)) >>> di = DictOfSeries(dict(a=a, b=b, c=c)) >>> di a | b | c | ===== | ===== | ===== | 0 11 | 0 22 | 1 33 | 1 11 | 1 22 | 4 33 | | 2 22 | 7 33 |
>>> di.to_df() columns a b c 0 11.0 22.0 NaN 1 11.0 22.0 33.0 2 NaN 22.0 NaN 4 NaN NaN 33.0 7 NaN NaN 33.0
or is dropped if how=’inner’
>>> di.to_df(how='inner') columns a b c 1 11 22 33
-
to_dios
()[source]¶ A dummy to allow unconditional to_dios calls on pd.DataFrame, pd.Series and dios.DictOfSeries
-
to_string
(max_rows=None, min_rows=None, max_cols=None, na_rep='NaN', show_dimensions=False, method='indexed', no_value=' ', empty_series_rep='no data', col_delim=' | ', header_delim='=', col_space=None)[source]¶ Pretty print a dios.
- if method == indexed (default):
every column is represented by a own index and corresponding values
- if method == aligned [2]:
one(!) global index is generated and values from a column appear at the corresponding index-location.
- Parameters
max_cols – not more column than max_cols are printed [1]
max_rows – see min_rows [1]
min_rows – not more rows than min_rows are printed, if rows of any series exceed max_rows [1]
na_rep – all NaN-values are replaced by na_rep. Default NaN
empty_series_rep – Ignored if not method=’indexed’. Empty series are represented by the string in empty_series_rep
col_delim (str) – Ignored if not method=’indexed’. between all columns col_delim is inserted.
header_delim – Ignored if not method=’indexed’. between the column names (header) and the data, header_delim is inserted, if not None. The string is repeated, up to the width of the column. (str or None).
no_value – Ignored if not method=’aligned’. value that indicates, that no entry in the underling series is present. Bear in mind that this should differ from na_rep, otherwise you cannot differ missing- from NaN- values.
Notes
[1]: defaults to the corresponding value in dios_options [2]: the common-params are directly passed to pd.DataFrame.to_string(..) under the hood, if method is aligned
-
where
(cond, other=nan, inplace=False)[source]¶ Replace values where the condition is False.
- Parameters
cond (bool DictOfSeries, Series, array-like, or callable) – Where cond is True, keep the original value. Where False, replace with corresponding value from other. If cond is callable, it is computed on the DictOfSeries and should return boolean DictOfSeries or array. The callable must not change input DictOfSeries (though dios doesn’t check it). If cond is a bool Series, every column is (row-)aligned against it, before the boolean values are evaluated. Missing indices are treated like False values.
other (scalar, Series, DictOfSeries, or callable) – Entries where cond is False are replaced with corresponding value from other. If other is callable, it is computed on the DictOfSeries and should return scalar or DictOfSeries. The callable must not change input DictOfSeries (though dios doesn’t check it). If other is a Series, every column is (row-)aligned against it, before the values are written. NAN’s are written for missing indices.
inplace (bool, default False) – Whether to perform the operation in place on the data.
- Returns
- Return type
See also
mask()
Mask data where condition is True
example_DictOfSeries¶
-
dios.
example_DictOfSeries
()[source]¶ Return a example dios.
- Returns
DictOfSeries
- Return type
an example
Examples
>>> from dios import example_DictOfSeries >>> di = example_DictOfSeries() >>> di a | b | c | d | ===== | ====== | ====== | ===== | 0 0 | 2 5 | 4 7 | 6 0 | 1 7 | 3 6 | 5 17 | 7 1 | 2 14 | 4 7 | 6 27 | 8 2 | 3 21 | 5 8 | 7 37 | 9 3 | 4 28 | 6 9 | 8 47 | 10 4 | 5 35 | 7 10 | 9 57 | 11 5 | 6 42 | 8 11 | 10 67 | 12 6 | 7 49 | 9 12 | 11 77 | 13 7 | 8 56 | 10 13 | 12 87 | 14 8 | 9 63 | 11 14 | 13 97 | 15 9 |
Most magic happen in getting and setting elements. To select any combination from columns and rows, read the documentation about indexing:
Pandas-like indexing¶
[]
and .loc[]
, .iloc[]
and .at[]
, .iat[]
- should behave exactly like
their counter-parts from pandas.DataFrame. They can take as indexer
lists, array-like objects and in general all iterables
boolean lists and iterables
slices
scalars and any hashable object
Most indexers are directly passed to the underling columns-series or row-series depending
on the position of the indexer and the complexity of the operation. For .loc
, .iloc
, .at
and iat
the first position is the row indexer, the second the column indexer. The second
can be omitted and will default to slice(None)
. Examples:
di.loc[[1,2,3], ['a']]
: select labels 1,2,3 from column adi.iloc[[1,2,3], [0,3]]
: select positions 1,2,3 from the columns 0 and 3di.loc[:, 'a':'c']
: select all rows from columns a to ddi.at[4,'c']
: select the elements with label 4 in column cdi.loc[:]
->di.loc[:,:]
: select everything.
Scalar indexing always return a pandas Series if the other indexer is a non-scalar. If both indexer are scalars, the element itself is returned. In all other cases a dios is returned. For more pandas-like indexing magic and the differences between the indexers, see the pandas documentation.
Note:
In contrast to pandas.DataFrame,
.loc[:]
and.loc[:, :]
always behaves identical. Same apply foriloc
andaloc
. For example, two pandas.DataFramesdf1
anddf2
with different columns, does align columns withdf1.loc[:, :] = df2
, but does not withdf1.loc[:] = df2
.If this is the desired behavior or a bug, i couldn’t verify so far. – Bert Palm
2D-indexer
dios[boolean dios-like]
(as single key) - dios accept boolean 2D-indexer (boolean pandas.Dataframe
or boolean Dios).
Columns and rows from the indexer align with the dios. This means that only matching columns selected and in this columns rows are selected where i) indices are match and ii) the value is True in the indexer-bool-dios. There is no difference between missing indices and present indices, but False values.
Values from unselected rows and columns are dropped, but empty columns are still preserved, with the effect that the resulting Dios always have the same column dimension than the initial dios.
Note: This is the exact same behavior like pandas.DataFrame’s handling of 2D-indexer, despite that pandas.DataFrame fill numpy.nan’s at missing locations and therefore also fill-up, whole missing columns with numpy.nan’s.
setting values
Setting values with []
and .loc[]
, .iloc[]
and .at[]
, .iat[]
works like in pandas.
With .at
/.iat
only single items can be set, for the other the
right hand side values can be:
scalars: these are broadcasted to the selected positions
lists: the length the list must match the number of indexed columns. The items can be everything that can applied to a series, with the respective indexing method (
loc
,iloc
,[]
).dios: the length of the columns must match the number of indexed columns - columns does not align, they are just iterated. Rows do align. Rows that are present on the right but not on the left are ignored. Rows that are present on the left (bear in mind: these rows was explicitly chosen for write!), but not present on the right, are filled with
NaN
s, like in pandas.pandas.Series: column indexer must be a scalar(!), the series is passed down, and set with
loc
,iloc
or[]
by pandas Series, where it maybe align, depending on the method.
Examples:
dios.loc[2:5, 'a'] = [1,2,3]
is the same asa=dios['a']; a.loc[2:5]=[1,2,3]; dios['a']=a
dios.loc[2:5, :] = 99
: set 99 on rows 2 to 5 on all columns
Special indexer .aloc
¶
Additional to the pandas like indexers we have a .aloc[..]
(align locator) indexing method.
Unlike .iloc
and .loc
indexers fully align if possible and 1D-array-likes can be broadcast
to multiple columns at once. This method also handle missing indexer-items gracefully.
It is used like .loc
, so a single indexer (.aloc[indexer]
) or a tuple of row-indexer and
column-indexer (.aloc[row-indexer, column-indexer]
) can be given. Also it can handle boolean and non-bolean
2D-Indexer.
The main purpose of .aloc
is:
to select gracefully, so rows or columns, that was given as indexer, but doesn’t exist, not raise an error
align series/dios-indexer
vertically broadcasting aka. setting multiple columns at once with a list-like value
Aloc usage¶
aloc
is called like loc
, with a single key, that act as row indexer aloc[rowkey]
or with a tuple of
row indexer and column indexer aloc[rowkey, columnkey]
. Also 2D-indexer (like dios or df) can be given, but
only as a single key, like .aloc[2D-indexer]
or with the special column key ...
,
the ellipsis (.aloc[2D-indexer, ...]
). The ellipsis may change, how the 2D-indexer is
interpreted, but this will explained later in detail.
If a normal (non 2D-dimensional) row indexer is given, but no column indexer, the latter defaults to :
aka.
slice(None)
, so .aloc[row-indexer]
becomes .aloc[row-indexer, :]
, which means, that all columns are used.
In general, a normal row-indexer is applied to every column, that was chosen by the column indexer, but for
each column separately.
So maybe a first example gives an rough idea:
>>> s = pd.Series([11] * 4 )
>>> di = DictOfSeries(dict(a=s[:2]*6, b=s[2:4]*7, c=s[:2]*8, d=s[1:3]*9))
>>> di
a | b | c | d |
===== | ===== | ===== | ===== |
0 66 | 2 77 | 0 88 | 1 99 |
1 66 | 3 77 | 1 88 | 2 99 |
>>> di.aloc[[1,2], ['a', 'b', 'd', 'x']]
a | b | d |
===== | ===== | ===== |
1 66 | 2 77 | 1 99 |
| | 2 99 |
The return type¶
Unlike the other two indexer methods loc
and iloc
, it is not possible to get a single item returned;
the return type is either a pandas.Series, iff the column-indexer is a single key (eg. 'a'
) or a dios, iff not.
The row-indexer does not play any role in the return type choice.
Note for the curios:
This is because a scalar (
.aloc[key]
) is translates to.loc[key:key]
under the hood.
Indexer types¶
Following the .aloc
specific indexer are listed. Any indexer that is not listed below (slice, boolean lists, …),
but are known to work with .loc
, are treated as they would passed to .loc
, as they actually do under the hood.
Some indexer are linked to later sections, where a more detailed explanation and examples are given.
special Column indexer are :
list / array-like (or any iterable object): Only labels that are present in the columns are used, others are ignored.
pd.Series :
.values
are taken from series and handled like a list.scalar (or any hashable obj) : Select a single column, if label is present, otherwise nothing.
special Row indexer are :
list / array-like (or any iterable object): Only rows, which indices are present in the index of the column are used, others are ignored. A dios is returned.
scalar (or any hashable obj) : Select a single row from a column, if the value is present in the index of the column, otherwise nothing is selected. [1]
pd.Series : align the index from the given Series with the column, what means only common indices are used. The actual values of the series are ignored(!).
boolean pd.Series : like pd.Series but only True values are evaluated. False values are equivalent to missing indices. To treat a boolean series as a normal indexer series, as decribed above, one can use
.aloc(usebool=False)[boolean pd.Series]
.
special 2D-indexer are :
.aloc[boolean dios-like]
: work same likedi[boolean dios-like]
(see there). Brief: full align, select items, where the index is present and the value is True..aloc[dios-like, ...]
(with Ellipsis) : Align in columns and rows, ignore its values. Per common column, the common indices are selected. The ellipsis forcesaloc
, to ignore the values, so a boolean dios could be treated as a non-boolean. Alternatively.aloc(usebool=False)[boolean dios-like]
could be used.[2].aloc[nested list-like]
: The inner lists are used asaloc
-list-row-indexer (see there) on all columns. One list for one column, which implies, that the outer list has the same length as the number of columns.
special handling of 1D-values
Values that are list- or array-like, which includes pd.Series, are set on all selected columns. pd.Series align
like s1.loc[:] = s2
do. See also the cookbook.
Aloc overiew table¶
example | type | on | like .loc |
handling | conditions / hints | link |
---|---|---|---|---|---|---|
.aloc[any, 'a'] |
scalar | columns | no | select graceful | - | cols |
Column indexer | ||||||
.aloc[any, 'a'] |
scalar | columns | no | select graceful | - | cols |
.aloc[any, 'b':'z'] |
slice | columns | yes | slice | - | cols |
.aloc[any, ['a','c']] |
list-like | columns | no | filter graceful | - | cols |
.aloc[any [True,False]] |
bool list-like | columns | yes | take True 's |
length must match nr of columns | cols |
.aloc[any, s] |
Series | columns | no | like list, | only s.values are evaluated |
cols |
.aloc[any, bs] |
bool Series | columns | yes | like bool-list | see there | cols |
Row indexer | ||||||
.aloc[7, any] |
scalar | rows | no | translate to .loc[key:key] |
- | rows |
.aloc[3:42, any] |
slice | rows | yes | slice | - | |
.aloc[[1,2,24], any] |
list-like | rows | no | filter graceful | - | rows |
.aloc[[True,False], any] |
bool list-like | rows | yes | take True 's |
length must match nr of (all selected) columns | blist |
.aloc[s, any] |
Series | rows | no | like .loc[s.index] |
- | ser |
.aloc[bs, any] |
bool Series | rows | no | align + just take True 's |
evaluate usebool -keyword |
ser |
.aloc[[[s],[1,2,3]], any] |
nested list-like | both | ? | one row-indexer per column | outer length must match nr of (selected) columns | nlist |
2D-indexer | ||||||
.aloc[di] |
dios-like | both | no | full align | - | |
.aloc[di, ...] |
dios-like | both | no | full align | ellipsis has no effect | |
.aloc[di>5] |
bool dios-like | both | no | full align + take True 's |
evaluate usebool -keyword |
|
.aloc[di>5, ...] |
(bool) dios-like | both | no | full align, no bool evaluation | - |
Example dios¶
The Dios used in the examples, unless stated otherwise, looks like so:
>>> dictofser
a | b | c | d |
===== | ====== | ====== | ===== |
0 0 | 2 5 | 4 7 | 6 0 |
1 7 | 3 6 | 5 17 | 7 1 |
2 14 | 4 7 | 6 27 | 8 2 |
3 21 | 5 8 | 7 37 | 9 3 |
4 28 | 6 9 | 8 47 | 10 4 |
5 35 | 7 10 | 9 57 | 11 5 |
6 42 | 8 11 | 10 67 | 12 6 |
7 49 | 9 12 | 11 77 | 13 7 |
8 56 | 10 13 | 12 87 | 14 8 |
or the short version:
>>> di
a | b | c | d |
===== | ==== | ===== | ===== |
0 0 | 2 5 | 4 7 | 6 0 |
1 7 | 3 6 | 5 17 | 7 1 |
2 14 | 4 7 | 6 27 | 8 2 |
3 21 | 5 8 | 7 37 | 9 3 |
4 28 | 6 9 | 8 47 | 10 4 |
The example Dios can get via a function:
from dios import example_DictOfSeries()
mydios = example_DictOfSeries()
or generated manually like so:
>>> a = pd.Series(range(0, 70, 7))
>>> b = pd.Series(range(5, 15, 1))
>>> c = pd.Series(range(7, 107, 10))
>>> d = pd.Series(range(0, 10, 1))
>>> for i, s in enumerate([a,b,c,d]): s.index += i*2
>>> dictofser = DictOfSeries(dict(a=a, b=b, c=c, d=d))
>>> di = dictofser[:5]
Select columns, gracefully¶
One can use .aloc[:, key]
to select single columns gracefully.
The underling pandas.Series is returned, if the key exist.
Otherwise a empty pandas.Series with dtype=object
is returned.
>>> di.aloc[:, 'a']
0 0
1 7
2 14
3 21
4 28
Name: a, dtype: int64
>>> di.aloc[:, 'x']
Series([], dtype: object)
Multiple columns
Just like selecting single columns gracefully, but with a array-like indexer. A dios is returned, with a subset of the existing columns. If no key is present a empty dios is returned.
>>> di.aloc[:, ['c', 99, None, 'a', 'x', 'y']]
a | c |
===== | ===== |
0 0 | 4 7 |
1 7 | 5 17 |
2 14 | 6 27 |
3 21 | 7 37 |
4 28 | 8 47 |
>>> di.aloc[:, ['x', 'y']]
Empty DictOfSeries
Columns: []
s = pd.Series(dict(a='a', b='x', c='c', foo='d'))
d.aloc[:, s]
a | c | d |
===== | ===== | ===== |
0 0 | 4 7 | 6 0 |
1 7 | 5 17 | 7 1 |
2 14 | 6 27 | 8 2 |
3 21 | 7 37 | 9 3 |
4 28 | 8 47 | 10 4 |
Boolean indexing, indexing with pd.Series and slice indexer
Boolean indexer, for example [True, 'False', 'True', 'False']
, must have the same length than the number
of columns, then only columns, where the indexer has a True
value are selected.
If the key is a pandas.Series, its values are used for indexing, especially the Series’s index is ignored. If a series has boolean values its treated like a boolean indexer, otherwise its treated as a array-like indexer.
A easy way to select all columns, is, to use null-slicees, like .aloc[:,:]
or even simpler .aloc[:]
.
This is just like one would do, with loc
or iloc
. Of course slicing with boundaries also work,
eg .loc[:, 'a':'f']
.
Selecting Rows a smart way¶
For scalar and array-like indexer with label values, the keys are handled gracefully, just like with array-like column indexers.
>>> di.aloc[1]
a | b | c | d |
==== | ======= | ======= | ======= |
1 7 | no data | no data | no data |
>>> di.aloc[99]
Empty DictOfSeries
Columns: ['a', 'b', 'c', 'd']
>>> di.aloc[[3,6,7,18]]
a | b | c | d |
===== | ==== | ===== | ==== |
3 21 | 3 6 | 6 27 | 6 0 |
| 6 9 | 7 37 | 7 1 |
The length of columns can differ:
>>> di.aloc[[3,6,7,18]].aloc[[3,6]]
a | b | c | d |
===== | ==== | ===== | ==== |
3 21 | 3 6 | 6 27 | 6 0 |
| 6 9 | | |
Boolean array-likes as row indexer¶
For array-like indexer that hold boolean values, the length of the indexer and the length of all column(s) to index must match.
>>> di.aloc[[True,False,False,True,False]]
a | b | c | d |
===== | ==== | ===== | ==== |
0 0 | 2 5 | 4 7 | 6 0 |
3 21 | 5 8 | 7 37 | 9 3 |
If the length does not match a IndexError
is raised:
>>> di.aloc[[True,False,False]]
Traceback (most recent call last):
...
IndexError: failed for column a: Boolean index has wrong length: 3 instead of 5
This can be tricky, especially if columns have different length:
>>> difflen
a | b | c | d |
===== | ==== | ===== | ==== |
0 0 | 2 5 | 4 7 | 6 0 |
1 7 | 3 6 | 6 27 | 7 1 |
2 14 | 4 7 | | 8 2 |
>>> difflen.aloc[[False,True,False]]
Traceback (most recent call last):
...
IndexError: Boolean index has wrong length: 3 instead of 2
pandas.Series and boolean pandas.Series as row indexer¶
When using a pandas.Series as row indexer with aloc
, all its magic comes to light.
The index of the given series align itself with the index of each column separately and is this way used as a filter.
>>> s = di['b'] + 100
>>> s
2 105
3 106
4 107
5 108
6 109
Name: b, dtype: int64
>>> di.aloc[s]
a | b | c | d |
===== | ==== | ===== | ==== |
2 14 | 2 5 | 4 7 | 6 0 |
3 21 | 3 6 | 5 17 | |
4 28 | 4 7 | 6 27 | |
| 5 8 | | |
| 6 9 | | |
As seen in the example above the series’ values are ignored completely. The functionality
is similar to s1.loc[s2.index]
, with s1
and s2
are pandas.Series’s, and s2 is the indexer and s1 is one column
after the other.
If the indexer series holds boolean values, these are not ignored.
The series align the same way as explained above, but additional only the True
values are evaluated.
Thus False
-values are treated like missing indices. The behavior here is analogous to s1.loc[s2[s2].index]
.
>>> boolseries = di['b'] > 6
>>> boolseries
2 False
3 False
4 True
5 True
6 True
Name: b, dtype: bool
>>> di.aloc[boolseries]
a | b | c | d |
===== | ==== | ===== | ==== |
4 28 | 4 7 | 4 7 | 6 0 |
| 5 8 | 5 17 | |
| 6 9 | 6 27 | |
To evaluate boolean values is a very handy feature, as it can easily used with multiple conditions and also fits nicely with writing those as one-liner:
>>> di.aloc[d['b'] > 6]
a | b | c | d |
===== | ==== | ===== | ==== |
4 28 | 4 7 | 4 7 | 6 0 |
| 5 8 | 5 17 | |
| 6 9 | 6 27 | |
>>> di.aloc[(d['a'] > 6) & (d['b'] > 6)]
a | b | c | d |
===== | ==== | ==== | ======= |
4 28 | 4 7 | 4 7 | no data |
Note:
Nevertheless, something like
di.aloc[di['a'] > di['b']]
do not work, because the comparison fails, as long as the two series objects not have the same index. But maybe one want to checkout DictOfSeries.index_of().
Nested-lists as row indexer¶
It is possible to pass different array-like indexer to different columns, by using nested lists as indexer. The outer list’s length must match the number of columns of the dios. The items of the outer list, all must be array-like and not further nested. For example list, pandas.Series, boolean lists or pandas.Series, numpy.arrays… Every inner list-like item is applied as row indexer to the according column.
>>> d
a | b | c | d |
===== | ==== | ===== | ===== |
0 0 | 2 5 | 4 7 | 6 0 |
1 7 | 3 6 | 5 17 | 7 1 |
2 14 | 4 7 | 6 27 | 8 2 |
3 21 | 5 8 | 7 37 | 9 3 |
4 28 | 6 9 | 8 47 | 10 4 |
>>> di.aloc[ [d['a'], [True,False,True,False,False], [], [7,8,10]] ]
a | b | c | d |
===== | ==== | ======= | ===== |
0 0 | 2 5 | no data | 7 1 |
1 7 | 4 7 | | 8 2 |
2 14 | | | 10 4 |
3 21 | | | |
4 28 | | | |
>>> ar = np.array([2,3])
>>> di.aloc[[ar, ar+1, ar+2, ar+3]]
a | b | c | d |
===== | ==== | ===== | ==== |
2 14 | 3 6 | 4 7 | 6 0 |
3 21 | 4 7 | 5 17 | |
Even this looks like a 2D-indexer, that are explained in the next section, it is not. In contrast to the 2D-indexer, we also can provide a column key, to pre-filter the columns.
>>> di.aloc[[ar, ar+1, ar+3], ['a','b','d']]
a | b | d |
===== | ==== | ==== |
2 14 | 3 6 | 6 0 |
3 21 | 4 7 | |
The power of 2D-indexer¶
Overview:
.aloc[bool-dios] |
1. align columns, 2. align rows, 3. just take True 's -- [1] |
.aloc[dios, ...] (use Ellipsis) |
1. align columns, 2. align rows, (3.) ignore values -- [1] |
[1] evaluate usebool -keyword |
T_O_D_O
Cookbook¶
Recipes¶
select common rows from all columns
align columns to an other column
align columns to a given index
align dios with dios
get/set values by condition
apply a value to multiple columns
apply a array-like value to multiple columns
nan-policy - mask vs. drop values, when nan’s are inserted (mv to Readme ??)
itype - when to use, pitfalls and best-practise
changing the index of series’ in dios (one, some, all)
changing the dtype of series’ in dios (one, some, all)
changing properties of series’ in dios (one, some, all)
T_O_D_O
Broadcast array-likes to multiple columns¶
T_O_D_O
For the idea behind the Itype concept and its usage read:
Itype¶
DictOfSeries holds multiple series, and each series can have a different index length
and index type. Differing index lengths are either solved by some aligning magic, or simply fail, if
aligning makes no sense (eg. assigning the very same list to series of different lengths (see .aloc
).
A bigger challange is the type of the index. If one series has an alphabetical index, and another one
a numeric index, selecting along columns can fail in every scenario. To keep track of the
types of index or to prohibit the inserting of a not fitting index type,
we introduce the itype
. This can be set on creation of a Dios and also changed during usage.
On change of the itype, all indexes of all series in the dios are casted to a new fitting type,
if possible. Different cast-mechanisms are available.
If an itype prohibits some certain types of indexes and a series with a non-fitting index-type is inserted, an implicit type cast is done (with or without a warning) or an error is raised. The warning/error policy can be adjusted via global options.
For implemented methods and module functions, respectively the full module api, see:
API¶
Functions¶
|
|
|
|
|
Check if obj is a instance of the given itype or its str-alias was given |
|
Check if obj is a subclass or a instance of a subclass of the given itype |
|
Check if obj is a subclass or a instance of the given itype or any of its subtypes |
|
Return the according Itype. |
|
Cast a series (more explicit the type of the index) to fit the itype of a dios. |
Return a example dios. |
Classes¶
|
A data frame where every column has its own index. |
|
|
|
|
|
|
|
|
|
storage class for string values for dios_options |
storage class for the keys in dios_options |
|
storage class for the keys in dios_options |
Variables¶
Options dictionary for module dios. |
or browse the Index..
# dummy file to be able to link to index