Pandas-like indexing

[] and .loc[], .iloc[] and .at[], .iat[] - should behave exactly like their counter-parts from pandas.DataFrame. They can take as indexer

  • lists, array-like objects and in general all iterables

  • boolean lists and iterables

  • slices

  • scalars and any hashable object

Most indexers are directly passed to the underling columns-series or row-series depending on the position of the indexer and the complexity of the operation. For .loc, .iloc, .at and iat the first position is the row indexer, the second the column indexer. The second can be omitted and will default to slice(None). Examples:

  • di.loc[[1,2,3], ['a']] : select labels 1,2,3 from column a

  • di.iloc[[1,2,3], [0,3]] : select positions 1,2,3 from the columns 0 and 3

  • di.loc[:, 'a':'c'] : select all rows from columns a to d

  • di.at[4,'c'] : select the elements with label 4 in column c

  • di.loc[:] -> di.loc[:,:] : select everything.

Scalar indexing always return a pandas Series if the other indexer is a non-scalar. If both indexer are scalars, the element itself is returned. In all other cases a dios is returned. For more pandas-like indexing magic and the differences between the indexers, see the pandas documentation.

Note:

In contrast to pandas.DataFrame, .loc[:] and .loc[:, :] always behaves identical. Same apply for iloc and aloc. For example, two pandas.DataFrames df1 and df2 with different columns, does align columns with df1.loc[:, :] = df2 , but does not with df1.loc[:] = df2.

If this is the desired behavior or a bug, i couldn’t verify so far. – Bert Palm

2D-indexer

dios[boolean dios-like] (as single key) - dios accept boolean 2D-indexer (boolean pandas.Dataframe or boolean Dios).

Columns and rows from the indexer align with the dios. This means that only matching columns selected and in this columns rows are selected where i) indices are match and ii) the value is True in the indexer-bool-dios. There is no difference between missing indices and present indices, but False values.

Values from unselected rows and columns are dropped, but empty columns are still preserved, with the effect that the resulting Dios always have the same column dimension than the initial dios.

Note: This is the exact same behavior like pandas.DataFrame’s handling of 2D-indexer, despite that pandas.DataFrame fill numpy.nan’s at missing locations and therefore also fill-up, whole missing columns with numpy.nan’s.

setting values

Setting values with [] and .loc[], .iloc[] and .at[], .iat[] works like in pandas. With .at/.iat only single items can be set, for the other the right hand side values can be:

  • scalars: these are broadcasted to the selected positions

  • lists: the length the list must match the number of indexed columns. The items can be everything that can applied to a series, with the respective indexing method (loc, iloc, []).

  • dios: the length of the columns must match the number of indexed columns - columns does not align, they are just iterated. Rows do align. Rows that are present on the right but not on the left are ignored. Rows that are present on the left (bear in mind: these rows was explicitly chosen for write!), but not present on the right, are filled with NaNs, like in pandas.

  • pandas.Series: column indexer must be a scalar(!), the series is passed down, and set with loc, iloc or [] by pandas Series, where it maybe align, depending on the method.

Examples:

  • dios.loc[2:5, 'a'] = [1,2,3] is the same as a=dios['a']; a.loc[2:5]=[1,2,3]; dios['a']=a

  • dios.loc[2:5, :] = 99 : set 99 on rows 2 to 5 on all columns

Special indexer .aloc

Additional to the pandas like indexers we have a .aloc[..] (align locator) indexing method. Unlike .iloc and .loc indexers fully align if possible and 1D-array-likes can be broadcast to multiple columns at once. This method also handle missing indexer-items gracefully. It is used like .loc, so a single indexer (.aloc[indexer]) or a tuple of row-indexer and column-indexer (.aloc[row-indexer, column-indexer]) can be given. Also it can handle boolean and non-bolean 2D-Indexer.

The main purpose of .aloc is:

  • to select gracefully, so rows or columns, that was given as indexer, but doesn’t exist, not raise an error

  • align series/dios-indexer

  • vertically broadcasting aka. setting multiple columns at once with a list-like value

Aloc usage

aloc is called like loc, with a single key, that act as row indexer aloc[rowkey] or with a tuple of row indexer and column indexer aloc[rowkey, columnkey]. Also 2D-indexer (like dios or df) can be given, but only as a single key, like .aloc[2D-indexer] or with the special column key ..., the ellipsis (.aloc[2D-indexer, ...]). The ellipsis may change, how the 2D-indexer is interpreted, but this will explained later in detail.

If a normal (non 2D-dimensional) row indexer is given, but no column indexer, the latter defaults to : aka. slice(None), so .aloc[row-indexer] becomes .aloc[row-indexer, :], which means, that all columns are used. In general, a normal row-indexer is applied to every column, that was chosen by the column indexer, but for each column separately.

So maybe a first example gives an rough idea:

>>> s = pd.Series([11] * 4 )
>>> di = DictOfSeries(dict(a=s[:2]*6, b=s[2:4]*7, c=s[:2]*8, d=s[1:3]*9))
>>> di
    a |     b |     c |     d | 
===== | ===== | ===== | ===== | 
0  66 | 2  77 | 0  88 | 1  99 | 
1  66 | 3  77 | 1  88 | 2  99 | 


>>> di.aloc[[1,2], ['a', 'b', 'd', 'x']]
    a |     b |     d | 
===== | ===== | ===== | 
1  66 | 2  77 | 1  99 | 
      |       | 2  99 | 

The return type

Unlike the other two indexer methods loc and iloc, it is not possible to get a single item returned; the return type is either a pandas.Series, iff the column-indexer is a single key (eg. 'a') or a dios, iff not. The row-indexer does not play any role in the return type choice.

Note for the curios:

This is because a scalar (.aloc[key]) is translates to .loc[key:key] under the hood.

Indexer types

Following the .aloc specific indexer are listed. Any indexer that is not listed below (slice, boolean lists, …), but are known to work with .loc, are treated as they would passed to .loc, as they actually do under the hood.

Some indexer are linked to later sections, where a more detailed explanation and examples are given.

special Column indexer are :

  • list / array-like (or any iterable object): Only labels that are present in the columns are used, others are ignored.

  • pd.Series : .values are taken from series and handled like a list.

  • scalar (or any hashable obj) : Select a single column, if label is present, otherwise nothing.

special Row indexer are :

  • list / array-like (or any iterable object): Only rows, which indices are present in the index of the column are used, others are ignored. A dios is returned.

  • scalar (or any hashable obj) : Select a single row from a column, if the value is present in the index of the column, otherwise nothing is selected. [1]

  • pd.Series : align the index from the given Series with the column, what means only common indices are used. The actual values of the series are ignored(!).

  • boolean pd.Series : like pd.Series but only True values are evaluated. False values are equivalent to missing indices. To treat a boolean series as a normal indexer series, as decribed above, one can use .aloc(usebool=False)[boolean pd.Series].

special 2D-indexer are :

  • .aloc[boolean dios-like] : work same like di[boolean dios-like] (see there). Brief: full align, select items, where the index is present and the value is True.

  • .aloc[dios-like, ...] (with Ellipsis) : Align in columns and rows, ignore its values. Per common column, the common indices are selected. The ellipsis forces aloc, to ignore the values, so a boolean dios could be treated as a non-boolean. Alternatively .aloc(usebool=False)[boolean dios-like] could be used.[2]

  • .aloc[nested list-like] : The inner lists are used as aloc-list-row-indexer (see there) on all columns. One list for one column, which implies, that the outer list has the same length as the number of columns.

special handling of 1D-values

Values that are list- or array-like, which includes pd.Series, are set on all selected columns. pd.Series align like s1.loc[:] = s2 do. See also the cookbook.

Aloc overiew table

example type on like .loc handling conditions / hints link
.aloc[any, 'a'] scalar columns no select graceful - cols
Column indexer
.aloc[any, 'a'] scalar columns no select graceful - cols
.aloc[any, 'b':'z'] slice columns yes slice - cols
.aloc[any, ['a','c']] list-like columns no filter graceful - cols
.aloc[any [True,False]] bool list-like columns yes take True's length must match nr of columns cols
.aloc[any, s] Series columns no like list, only s.values are evaluated cols
.aloc[any, bs] bool Series columns yes like bool-list see there cols
Row indexer
.aloc[7, any] scalar rows no translate to .loc[key:key] - rows
.aloc[3:42, any] slice rows yes slice -
.aloc[[1,2,24], any] list-like rows no filter graceful - rows
.aloc[[True,False], any] bool list-like rows yes take True's length must match nr of (all selected) columns blist
.aloc[s, any] Series rows no like .loc[s.index] - ser
.aloc[bs, any] bool Series rows no align + just take True's evaluate usebool-keyword ser
.aloc[[[s],[1,2,3]], any] nested list-like both ? one row-indexer per column outer length must match nr of (selected) columns nlist
2D-indexer
.aloc[di] dios-like both no full align -
.aloc[di, ...] dios-like both no full align ellipsis has no effect
.aloc[di>5] bool dios-like both no full align + take True's evaluate usebool-keyword
.aloc[di>5, ...] (bool) dios-like both no full align, no bool evaluation -

Example dios

The Dios used in the examples, unless stated otherwise, looks like so:

>>> dictofser
    a |      b |      c |     d | 
===== | ====== | ====== | ===== | 
0   0 | 2    5 | 4    7 | 6   0 | 
1   7 | 3    6 | 5   17 | 7   1 | 
2  14 | 4    7 | 6   27 | 8   2 | 
3  21 | 5    8 | 7   37 | 9   3 | 
4  28 | 6    9 | 8   47 | 10  4 | 
5  35 | 7   10 | 9   57 | 11  5 | 
6  42 | 8   11 | 10  67 | 12  6 | 
7  49 | 9   12 | 11  77 | 13  7 | 
8  56 | 10  13 | 12  87 | 14  8 | 

or the short version:

>>> di
    a |    b |     c |     d | 
===== | ==== | ===== | ===== | 
0   0 | 2  5 | 4   7 | 6   0 | 
1   7 | 3  6 | 5  17 | 7   1 | 
2  14 | 4  7 | 6  27 | 8   2 | 
3  21 | 5  8 | 7  37 | 9   3 | 
4  28 | 6  9 | 8  47 | 10  4 | 

The example Dios can get via a function:

from dios import example_DictOfSeries()
mydios = example_DictOfSeries()

or generated manually like so:

>>> a = pd.Series(range(0, 70, 7))
>>> b = pd.Series(range(5, 15, 1))
>>> c = pd.Series(range(7, 107, 10))
>>> d = pd.Series(range(0, 10, 1))
>>> for i, s in enumerate([a,b,c,d]): s.index += i*2
>>> dictofser = DictOfSeries(dict(a=a, b=b, c=c, d=d))
>>> di = dictofser[:5]

Select columns, gracefully

One can use .aloc[:, key] to select single columns gracefully. The underling pandas.Series is returned, if the key exist. Otherwise a empty pandas.Series with dtype=object is returned.

>>> di.aloc[:, 'a']
0     0
1     7
2    14
3    21
4    28
Name: a, dtype: int64

>>> di.aloc[:, 'x']
Series([], dtype: object)

Multiple columns

Just like selecting single columns gracefully, but with a array-like indexer. A dios is returned, with a subset of the existing columns. If no key is present a empty dios is returned.

>>> di.aloc[:, ['c', 99, None, 'a', 'x', 'y']]
    a |     c | 
===== | ===== | 
0   0 | 4   7 | 
1   7 | 5  17 | 
2  14 | 6  27 | 
3  21 | 7  37 | 
4  28 | 8  47 | 

>>> di.aloc[:, ['x', 'y']]
Empty DictOfSeries
Columns: []

s = pd.Series(dict(a='a', b='x', c='c', foo='d'))
d.aloc[:, s]
    a |     c |     d | 
===== | ===== | ===== | 
0   0 | 4   7 | 6   0 | 
1   7 | 5  17 | 7   1 | 
2  14 | 6  27 | 8   2 | 
3  21 | 7  37 | 9   3 | 
4  28 | 8  47 | 10  4 | 

Boolean indexing, indexing with pd.Series and slice indexer

Boolean indexer, for example [True, 'False', 'True', 'False'], must have the same length than the number of columns, then only columns, where the indexer has a True value are selected.

If the key is a pandas.Series, its values are used for indexing, especially the Series’s index is ignored. If a series has boolean values its treated like a boolean indexer, otherwise its treated as a array-like indexer.

A easy way to select all columns, is, to use null-slicees, like .aloc[:,:] or even simpler .aloc[:]. This is just like one would do, with loc or iloc. Of course slicing with boundaries also work, eg .loc[:, 'a':'f'].

Selecting Rows a smart way

For scalar and array-like indexer with label values, the keys are handled gracefully, just like with array-like column indexers.

>>> di.aloc[1]
   a |       b |       c |       d | 
==== | ======= | ======= | ======= | 
1  7 | no data | no data | no data | 

>>> di.aloc[99]
Empty DictOfSeries
Columns: ['a', 'b', 'c', 'd']

>>> di.aloc[[3,6,7,18]]
    a |    b |     c |    d | 
===== | ==== | ===== | ==== | 
3  21 | 3  6 | 6  27 | 6  0 | 
      | 6  9 | 7  37 | 7  1 | 

The length of columns can differ:

>>> di.aloc[[3,6,7,18]].aloc[[3,6]]
    a |    b |     c |    d | 
===== | ==== | ===== | ==== | 
3  21 | 3  6 | 6  27 | 6  0 | 
      | 6  9 |       |      | 

Boolean array-likes as row indexer

For array-like indexer that hold boolean values, the length of the indexer and the length of all column(s) to index must match.

>>> di.aloc[[True,False,False,True,False]]
    a |    b |     c |    d | 
===== | ==== | ===== | ==== | 
0   0 | 2  5 | 4   7 | 6  0 | 
3  21 | 5  8 | 7  37 | 9  3 | 

If the length does not match a IndexError is raised:

>>> di.aloc[[True,False,False]]
Traceback (most recent call last):
  ...
  IndexError: failed for column a: Boolean index has wrong length: 3 instead of 5

This can be tricky, especially if columns have different length:

>>> difflen
    a |    b |     c |    d | 
===== | ==== | ===== | ==== | 
0   0 | 2  5 | 4   7 | 6  0 | 
1   7 | 3  6 | 6  27 | 7  1 | 
2  14 | 4  7 |       | 8  2 | 

>>> difflen.aloc[[False,True,False]]
Traceback (most recent call last):
  ...
  IndexError: Boolean index has wrong length: 3 instead of 2

pandas.Series and boolean pandas.Series as row indexer

When using a pandas.Series as row indexer with aloc, all its magic comes to light. The index of the given series align itself with the index of each column separately and is this way used as a filter.

>>> s = di['b'] + 100
>>> s
2    105
3    106
4    107
5    108
6    109
Name: b, dtype: int64

>>> di.aloc[s]
    a |    b |     c |    d | 
===== | ==== | ===== | ==== | 
2  14 | 2  5 | 4   7 | 6  0 | 
3  21 | 3  6 | 5  17 |      | 
4  28 | 4  7 | 6  27 |      | 
      | 5  8 |       |      | 
      | 6  9 |       |      | 

As seen in the example above the series’ values are ignored completely. The functionality
is similar to s1.loc[s2.index], with s1 and s2 are pandas.Series’s, and s2 is the indexer and s1 is one column after the other.

If the indexer series holds boolean values, these are not ignored. The series align the same way as explained above, but additional only the True values are evaluated. Thus False-values are treated like missing indices. The behavior here is analogous to s1.loc[s2[s2].index].

>>> boolseries = di['b'] > 6
>>> boolseries
2    False
3    False
4     True
5     True
6     True
Name: b, dtype: bool

>>> di.aloc[boolseries]
    a |    b |     c |    d | 
===== | ==== | ===== | ==== | 
4  28 | 4  7 | 4   7 | 6  0 | 
      | 5  8 | 5  17 |      | 
      | 6  9 | 6  27 |      | 

To evaluate boolean values is a very handy feature, as it can easily used with multiple conditions and also fits nicely with writing those as one-liner:

>>> di.aloc[d['b'] > 6]
    a |    b |     c |    d | 
===== | ==== | ===== | ==== | 
4  28 | 4  7 | 4   7 | 6  0 | 
      | 5  8 | 5  17 |      | 
      | 6  9 | 6  27 |      | 

>>> di.aloc[(d['a'] > 6) & (d['b'] > 6)]
    a |    b |    c |       d | 
===== | ==== | ==== | ======= | 
4  28 | 4  7 | 4  7 | no data | 

Note:

Nevertheless, something like di.aloc[di['a'] > di['b']] do not work, because the comparison fails, as long as the two series objects not have the same index. But maybe one want to checkout DictOfSeries.index_of().

Nested-lists as row indexer

It is possible to pass different array-like indexer to different columns, by using nested lists as indexer. The outer list’s length must match the number of columns of the dios. The items of the outer list, all must be array-like and not further nested. For example list, pandas.Series, boolean lists or pandas.Series, numpy.arrays… Every inner list-like item is applied as row indexer to the according column.

>>> d
    a |    b |     c |     d | 
===== | ==== | ===== | ===== | 
0   0 | 2  5 | 4   7 | 6   0 | 
1   7 | 3  6 | 5  17 | 7   1 | 
2  14 | 4  7 | 6  27 | 8   2 | 
3  21 | 5  8 | 7  37 | 9   3 | 
4  28 | 6  9 | 8  47 | 10  4 | 

>>> di.aloc[ [d['a'], [True,False,True,False,False], [], [7,8,10]] ]
    a |    b |       c |     d | 
===== | ==== | ======= | ===== | 
0   0 | 2  5 | no data | 7   1 | 
1   7 | 4  7 |         | 8   2 | 
2  14 |      |         | 10  4 | 
3  21 |      |         |       | 
4  28 |      |         |       | 

>>> ar = np.array([2,3])
>>> di.aloc[[ar, ar+1, ar+2, ar+3]]
    a |    b |     c |    d | 
===== | ==== | ===== | ==== | 
2  14 | 3  6 | 4   7 | 6  0 | 
3  21 | 4  7 | 5  17 |      | 

Even this looks like a 2D-indexer, that are explained in the next section, it is not. In contrast to the 2D-indexer, we also can provide a column key, to pre-filter the columns.

>>> di.aloc[[ar, ar+1, ar+3], ['a','b','d']]
    a |    b |    d | 
===== | ==== | ==== | 
2  14 | 3  6 | 6  0 | 
3  21 | 4  7 |      | 

The power of 2D-indexer

Overview:

.aloc[bool-dios] 1. align columns, 2. align rows, 3. just take True's -- [1]
.aloc[dios, ...] (use Ellipsis) 1. align columns, 2. align rows, (3.) ignore values -- [1]
[1] evaluate usebool-keyword

T_O_D_O