Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

ENH: Type annotations for Index #36708

Copy link
Copy link
Open
@itamarst

Description

@itamarst
Issue body actions

Is your feature request related to a problem?

As described in #26766, it would be good to have type annotations for Index.

Describe the solution you'd like

I would like the type of the publicly-exposed sub-objects to be part of the Index type. For example, these two Index instances contain Timestamp from the user's perspective, regardless of the internal implementation:

>>> import pandas as pd
>>> import numpy as np
>>> import datetime
>>> pd.Index([np.datetime64(100, "s")])[0]
Timestamp('1970-01-01 00:01:40')
>>> pd.Index([datetime.datetime(2020, 9, 28)])[0]
Timestamp('2020-09-28 00:00:00')

Because the fact Index returns different subclasses of itself, getting the type checker to can acknowledge that correctly is tricky. What's more, you'll notice a naive "Index[S] based on fact it's created with List[S]" won't work: Index([np.datetime64(100, "s"]) contains Timestamp instances, at least as far as the user is concerned, and Timestamp is very much not a np.datetime64.

Here is the only solution I've come up with that works; see also python/mypy#9482, there is no way at the moment to have this work without breaking up Index into a parent class that does __new__ and a subclass that does all the work.

The basic idea is that you have a protocol, IndexType. This is a sketch, because demonstrating this with real code would be that much harder:

from typing import TypeVar, Generic, List, Union, overload
from typing_extensions import Protocol
from datetime import datetime

T = TypeVar("T", covariant=True)  # need to look into why covariant is required, might not be, not fundamental 
S = TypeVar("S")

class datetime64(int):
    """Stand-in for np.datetime64."""


class IndexType(Protocol[T]):
    def first(self) -> T: ...


class Index:

    @overload
    def __new__(cls, values: List[datetime64]) -> "Datetime64Index": ...
    @overload
    def __new__(cls, values: List[datetime]) -> "Datetime64Index": ...
    @overload
    def __new__(cls, values: List[S]) -> "DefaultIndex[S]": ...

    def __new__(cls, values):
        if type(values[0]) in (datetime, datetime64):
            cls = Datetime64Index
        else:
            cls = DefaultIndex
        return object.__new__(cls)


class DefaultIndex(Index, Generic[S]):
    def __init__(self, values: List[S]):
        self.values = values

    def first(self) -> S:
        return self.values[0]


class Datetime64Index(DefaultIndex):

    def __init__(self, values: Union[List[datetime], List[datetime64]]):
        self.values : List[datetime64] = [
            datetime64(o.timestamp()) if isinstance(o, datetime) else o
            for o in values
        ]

    def first(self) -> datetime:
        return datetime.fromtimestamp(self.values[0])


# Should work
a: IndexType[datetime] = Index([datetime64(100)])
b: IndexType[datetime] = Index([datetime(2000, 10, 20)])
c: IndexType[bool] = Index([True])

# Should complain
d: IndexType[datetime] = Index(["a"])
e: IndexType[bool] = Index(["a"])

API breaking implications

Hopefully nothing.

Describe alternatives you've considered

I tried lots and lots of other ways of structuring this. None of them worked, except this variant.

Additional context

Part of my motivation here is to help use type checking so that users can check whether switching from Pandas to Pandas-alikes like Modin/Dask/etc.. works, by having everyone use matching type annotations.

As such, just saying "this API accepts an Index" is not good enough, because some Pandas APIs have e.g. special cases for Index[bool], you really do need to have some way of indicating the Index type for annotations to be sufficiently helpful.

What I'd like

Some feedback on whether this approach is something you all would be OK with. If so, I can try to implement it for the real Index classes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementTypingtype annotations, mypy/pyright type checkingtype annotations, mypy/pyright type checking

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.