PEP 357 -- Allowing Any Object to be Used for Slicing

PEP:	357
Title:	Allowing Any Object to be Used for Slicing
Version:	17a68e052d4f
Last-Modified:	2007-06-19 04:20:07 +0000 (Tue, 19 Jun 2007)
Author:	Travis Oliphant <oliphant at ee.byu.edu>
Status:	Final
Type:	Standards Track
Created:	09-Feb-2006
Python-Version:	2.5
Post-History:

Abstract

    This PEP proposes adding an nb_index slot in PyNumberMethods and an
    __index__ special method so that arbitrary objects can be used
    whenever integers are explicitly needed in Python, such as in slice
    syntax (from which the slot gets its name).

Rationale

    Currently integers and long integers play a special role in
    slicing in that they are the only objects allowed in slice
    syntax. In other words, if X is an object implementing the
    sequence protocol, then X[obj1:obj2] is only valid if obj1 and
    obj2 are both integers or long integers.  There is no way for obj1
    and obj2 to tell Python that they could be reasonably used as
    indexes into a sequence.  This is an unnecessary limitation.

    In NumPy, for example, there are 8 different integer scalars
    corresponding to unsigned and signed integers of 8, 16, 32, and 64
    bits.  These type-objects could reasonably be used as integers in
    many places where Python expects true integers but cannot inherit from 
    the Python integer type because of incompatible memory layouts.  
    There should be some way to be able to tell Python that an object can 
    behave like an integer.

    It is not possible to use the nb_int (and __int__ special method)
    for this purpose because that method is used to *coerce* objects
    to integers.  It would be inappropriate to allow every object that
    can be coerced to an integer to be used as an integer everywhere
    Python expects a true integer.  For example, if __int__ were used
    to convert an object to an integer in slicing, then float objects
    would be allowed in slicing and x[3.2:5.8] would not raise an error
    as it should.

Proposal

    Add an nb_index slot to PyNumberMethods, and a corresponding
    __index__ special method.  Objects could define a function to
    place in the nb_index slot that returns a Python integer
    (either an int or a long). This integer can 
    then be appropriately converted to a Py_ssize_t value whenever 
    Python needs one such as in PySequence_GetSlice, 
    PySequence_SetSlice, and PySequence_DelSlice.

Specification:

    1) The nb_index slot will have the following signature  

       PyObject *index_func (PyObject *self)

       The returned object must be a Python IntType or 
       Python LongType. NULL should be returned on
       error with an appropriate error set. 

    2) The __index__ special method will have the signature

       def __index__(self):
           return obj
       
       where obj must be either an int or a long.

    3) 3 new abstract C-API functions will be added 

       a) The first checks to see if the object supports the index
          slot and if it is filled in. 

          int PyIndex_Check(obj)

          This will return true if the object defines the nb_index
          slot.  

       b) The second is a simple wrapper around the nb_index call that
          raises PyExc_TypeError if the call is not available or if it
          doesn't return an int or long.  Because the
          PyIndex_Check is performed inside the PyNumber_Index call
          you can call it directly and manage any error rather than
          check for compatibility first.

          PyObject *PyNumber_Index (PyObject *obj)

       c) The third call helps deal with the common situation of
          actually needing a Py_ssize_t value from the object to use for
          indexing or other needs.

          Py_ssize_t PyNumber_AsSsize_t(PyObject *obj, PyObject *exc)

          The function calls the nb_index slot of obj if it is
          available and then converts the returned Python integer into
          a Py_ssize_t value.  If this goes well, then the value is
          returned.  The second argument allows control over what
          happens if the integer returned from nb_index cannot fit
          into a Py_ssize_t value.

          If exc is NULL, then the returnd value will be clipped to
          PY_SSIZE_T_MAX or PY_SSIZE_T_MIN depending on whether the
          nb_index slot of obj returned a positive or negative
          integer.  If exc is non-NULL, then it is the error object
          that will be set to replace the PyExc_OverflowError that was
          raised when the Python integer or long was converted to Py_ssize_t.

    4) A new operator.index(obj) function will be added that calls
       equivalent of obj.__index__() and raises an error if obj does not implement
       the special method.

Implementation Plan

    1) Add the nb_index slot in object.h and modify typeobject.c to 
       create the __index__ method

    2) Change the ISINT macro in ceval.c to ISINDEX and alter it to 
       accomodate objects with the index slot defined.

    3) Change the _PyEval_SliceIndex function to accommodate objects
       with the index slot defined.

    4) Change all builtin objects (e.g. lists) that use the as_mapping 
       slots for subscript access and use a special-check for integers to 
       check for the slot as well.

    5) Add the nb_index slot to integers and long_integers 
       (which just return themselves)

    6) Add PyNumber_Index C-API to return an integer from any 
       Python Object that has the nb_index slot.  

    7) Add the operator.index(x) function.

    8) Alter arrayobject.c and mmapmodule.c to use the new C-API for their
       sub-scripting and other needs. 

    9) Add unit-tests

Discussion Questions

    Speed: 

    Implementation should not slow down Python because integers and long
    integers used as indexes will complete in the same number of
    instructions.  The only change will be that what used to generate
    an error will now be acceptable.

    Why not use nb_int which is already there?

    The nb_int method is used for coercion and so means something
    fundamentally different than what is requested here.  This PEP
    proposes a method for something that *can* already be thought of as
    an integer communicate that information to Python when it needs an
    integer.  The biggest example of why using nb_int would be a bad
    thing is that float objects already define the nb_int method, but
    float objects *should not* be used as indexes in a sequence.

    Why the name __index__?

    Some questions were raised regarding the name __index__ when other
    interpretations of the slot are possible.  For example, the slot
    can be used any time Python requires an integer internally (such
    as in "mystring" * 3).  The name was suggested by Guido because
    slicing syntax is the biggest reason for having such a slot and 
    in the end no better name emerged. See the discussion thread:
    http://mail.python.org/pipermail/python-dev/2006-February/thread.html#60594 
    for examples of names that were suggested such as "__discrete__" and 
    "__ordinal__".

    Why return PyObject * from nb_index?  

    Intially Py_ssize_t was selected as the return type for the
    nb_index slot.  However, this led to an inability to track and
    distinguish overflow and underflow errors without ugly and brittle
    hacks. As the nb_index slot is used in at least 3 different ways 
    in the Python core (to get an integer, to get a slice end-point, 
    and to get a sequence index), there is quite a bit of flexibility
    needed to handle all these cases.  The importance of having the
    necessary flexibility to handle all the use cases is critical. 
    For example, the initial implementation that returned Py_ssize_t for
    nb_index led to the discovery that on a 32-bit machine with >=2GB of RAM 
    s = 'x' * (2**100) works but len(s) was clipped at 2147483647.
    Several fixes were suggested but eventually it was decided that
    nb_index needed to return a Python Object similar to the nb_int
    and nb_long slots in order to handle overflow correctly.

    Why can't __index__ return any object with the nb_index method?

    This would allow infinite recursion in many different ways that are not
    easy to check for.  This restriction is similar to the requirement that 
    __nonzero__ return an int or a bool.

Reference Implementation

    Submitted as patch 1436368 to SourceForge.

Copyright

    This document is placed in the public domain.

Python Wiki

Python Insider Blog

Python 2 or 3?

Help Fund Python

Non-English Resources

Abstract

Rationale

Proposal

Specification:

Implementation Plan

Discussion Questions

Reference Implementation

Copyright