Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Eliminate overhead of fetching and testing NULL attributes in STORE_ATTR specializations for new objects #144141

Copy link
Copy link
@markshannon

Description

@markshannon
Issue body actions

Consider

class C:
    def __init__(self, a, b, c):
        self.a = a
        self.b = b
        self.c = c
C(1,2,3)

This produces a trace looking something like this:

...
_CHECK_AND_ALLOCATE_OBJECT
_CREATE_INIT_FRAME
_PUSH_FRAME
# Some guards
_LOAD_FAST_BORROW_1
_LOAD_FAST_BORROW_0
# Some more guards
_STORE_ATTR_INSTANCE_VALUE
# Some more guards
_LOAD_FAST_BORROW_2
_LOAD_FAST_BORROW_0
# Some more guards
_STORE_ATTR_INSTANCE_VALUE
# Some more guards
_LOAD_FAST_BORROW_3
_LOAD_FAST_BORROW_0
# Some more guards
_STORE_ATTR_INSTANCE_VALUE
...

Each of those _STORE_ATTR_INSTANCE_VALUE reads the old value out of memory and then conditionally decrefs it.
But in this case we know that the old value was NULL so we can just overwrite it.
So we can replace this:

        PyObject **value_ptr = (PyObject**)(((char *)owner_o) + offset);
        PyObject *old_value = *value_ptr;
        FT_ATOMIC_STORE_PTR_RELEASE(*value_ptr, PyStackRef_AsPyObjectSteal(value));
        if (old_value == NULL) {
            PyDictValues *values = _PyObject_InlineValues(owner_o);
            Py_ssize_t index = value_ptr - values->values;
            _PyDictValues_AddToInsertionOrder(values, index);
        }
        Py_XDECREF(old_value);

with this:

        PyObject **value_ptr = (PyObject**)(((char *)owner_o) + offset);
        FT_ATOMIC_STORE_PTR_RELEASE(*value_ptr, PyStackRef_AsPyObjectSteal(value));
        PyDictValues *values = _PyObject_InlineValues(owner_o);
        Py_ssize_t index = value_ptr - values->values;
       _PyDictValues_AddToInsertionOrder(values, index);

On Aarch64, this reduces the number of machine instructions from 48 to 26.

The same reasoning also applies to _STORE_ATTR_SLOT where it reduces the number of machine instructions from 32 to 14.

See also #134584

We can probably remove some of those guards as well, but that's a separate issue.

Reactions are currently unavailable

Metadata

Metadata

Assignees

No one assigned

    Labels

    interpreter-core(Objects, Python, Grammar, and Parser dirs)(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usagePerformance or resource usage
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.