Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

⚡️ Speed up method _LegacyExperimentService._execution_to_column_named_metadata by 20%#45

Open
codeflash-ai[bot] wants to merge 1 commit into
maincodeflash-ai/python-aiplatform:mainfrom
codeflash/optimize-_LegacyExperimentService._execution_to_column_named_metadata-mglgx3jdcodeflash-ai/python-aiplatform:codeflash/optimize-_LegacyExperimentService._execution_to_column_named_metadata-mglgx3jdCopy head branch name to clipboard
Open

⚡️ Speed up method _LegacyExperimentService._execution_to_column_named_metadata by 20%#45
codeflash-ai[bot] wants to merge 1 commit into
maincodeflash-ai/python-aiplatform:mainfrom
codeflash/optimize-_LegacyExperimentService._execution_to_column_named_metadata-mglgx3jdcodeflash-ai/python-aiplatform:codeflash/optimize-_LegacyExperimentService._execution_to_column_named_metadata-mglgx3jdCopy head branch name to clipboard

Conversation

@codeflash-ai

@codeflash-ai codeflash-ai Bot commented Oct 10, 2025

Copy link
Copy Markdown

📄 20% (0.20x) speedup for _LegacyExperimentService._execution_to_column_named_metadata in google/cloud/aiplatform/metadata/metadata.py

⏱️ Runtime : 1.17 milliseconds 976 microseconds (best of 203 runs)

📝 Explanation and details

The optimization replaces the expensive ".".join([metadata_type, key]) string operation with simple string concatenation using the + operator.

Key changes:

  • Pre-computes metadata_type_dot = metadata_type + '.' once outside the loop instead of creating a list and joining it for every key
  • Uses direct concatenation metadata_type_dot + key instead of ".".join([metadata_type, key])

Why this is faster:

  • str.join() has overhead for creating a temporary list [metadata_type, key] and then iterating through it to build the final string
  • Simple string concatenation with + is a more direct operation that avoids the list creation and iteration overhead
  • Pre-computing the dot-appended metadata type eliminates redundant string operations in the loop

Performance gains:
The optimization shows consistent 6-30% speedups across test cases, with the largest gains (17-30%) appearing in scenarios with many keys where the loop runs frequently. The line profiler shows the critical line (string concatenation) improved from 36.8% to 32.3% of total runtime, with overall function time reduced by ~10%. Small metadata collections (empty dicts) show slight regressions due to the overhead of pre-computing the string, but all meaningful workloads benefit significantly.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 57 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from typing import Dict, Optional, Union

# imports
import pytest  # used for our unit tests
from aiplatform.metadata.metadata import _LegacyExperimentService

# unit tests

# Basic Test Cases

def test_basic_single_key():
    # Single key-value pair, no filter
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", {"alpha": 0.1}
    ); result = codeflash_output # 1.02μs -> 963ns (6.13% faster)

def test_basic_multiple_keys():
    # Multiple key-value pairs, no filter
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "metric", {"accuracy": 0.95, "loss": 0.05}
    ); result = codeflash_output # 1.11μs -> 1.06μs (4.51% faster)

def test_basic_with_filter_prefix():
    # Keys with filter prefix, should remove prefix
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", {"input:alpha": 1, "input:beta": 2}, filter_prefix="input:"
    ); result = codeflash_output # 2.04μs -> 1.95μs (5.09% faster)

def test_basic_with_partial_prefix():
    # Only keys starting with prefix should be filtered
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", {"input:alpha": 1, "gamma": 2}, filter_prefix="input:"
    ); result = codeflash_output # 1.89μs -> 1.75μs (8.05% faster)

def test_basic_empty_metadata():
    # Empty metadata dict should return empty dict
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "metric", {}, filter_prefix="input:"
    ); result = codeflash_output # 734ns -> 866ns (15.2% slower)

def test_basic_different_types():
    # Values can be int, float, str
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", {"a": 1, "b": 2.5, "c": "hello"}
    ); result = codeflash_output # 1.24μs -> 1.16μs (7.45% faster)

# Edge Test Cases

def test_edge_prefix_not_present():
    # Prefix provided but no keys start with it
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", {"alpha": 1, "beta": 2}, filter_prefix="input:"
    ); result = codeflash_output # 1.49μs -> 1.43μs (3.77% faster)

def test_edge_prefix_is_empty_string():
    # Empty string as prefix, should not filter anything
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", {"input:alpha": 1, "beta": 2}, filter_prefix=""
    ); result = codeflash_output # 1.33μs -> 1.20μs (11.0% faster)

def test_edge_key_is_only_prefix():
    # Key is exactly the prefix, should become empty string
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", {"input:": 42}, filter_prefix="input:"
    ); result = codeflash_output # 1.57μs -> 1.49μs (5.59% faster)

def test_edge_key_is_prefix_and_more():
    # Key is prefix plus more, should remove only the prefix
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", {"input:alpha": 10, "input:": 20}, filter_prefix="input:"
    ); result = codeflash_output # 1.95μs -> 1.76μs (10.2% faster)

def test_edge_key_is_empty():
    # Key is empty string
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", {"": "empty"}
    ); result = codeflash_output # 949ns -> 884ns (7.35% faster)

def test_edge_metadata_type_empty():
    # Metadata type is empty string
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "", {"alpha": 1, "beta": 2}
    ); result = codeflash_output # 1.11μs -> 974ns (14.1% faster)

def test_edge_metadata_type_special_chars():
    # Metadata type contains special characters
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "type$", {"alpha": 1}
    ); result = codeflash_output # 920ns -> 884ns (4.07% faster)

def test_edge_key_special_chars():
    # Keys contain special characters
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", {"a.b": 1, "c-d": 2}
    ); result = codeflash_output # 1.14μs -> 1.05μs (8.48% faster)

def test_edge_key_with_dot():
    # Key contains dot, should not split
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", {"input.alpha": 7}, filter_prefix="input."
    ); result = codeflash_output # 1.73μs -> 1.61μs (7.25% faster)

def test_edge_key_with_multiple_prefixes():
    # Key contains multiple prefixes, only first occurrence is removed
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", {"input:input:alpha": 5}, filter_prefix="input:"
    ); result = codeflash_output # 1.56μs -> 1.44μs (8.36% faster)

def test_edge_value_is_none():
    # Value is None, should be preserved
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", {"alpha": None}
    ); result = codeflash_output # 953ns -> 931ns (2.36% faster)

def test_edge_value_is_bool():
    # Value is boolean
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", {"alpha": True, "beta": False}
    ); result = codeflash_output # 1.14μs -> 1.04μs (9.00% faster)

def test_edge_value_is_list_or_dict():
    # Value is list or dict (should be preserved as is)
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", {"alpha": [1,2,3], "beta": {"x": 1}}
    ); result = codeflash_output # 1.08μs -> 1.03μs (4.46% faster)

def test_edge_filter_prefix_is_none():
    # filter_prefix is None, should not filter anything
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", {"input:alpha": 1, "beta": 2}, filter_prefix=None
    ); result = codeflash_output # 1.39μs -> 1.26μs (10.5% faster)

def test_edge_metadata_is_not_dict():
    # Metadata is not a dict, should raise AttributeError
    with pytest.raises(AttributeError):
        _LegacyExperimentService._execution_to_column_named_metadata(
            "param", ["alpha", "beta"], filter_prefix="input:"
        ) # 1.45μs -> 1.48μs (2.56% slower)


def test_edge_metadata_type_is_none():
    # metadata_type is None, should raise TypeError in join
    with pytest.raises(TypeError):
        _LegacyExperimentService._execution_to_column_named_metadata(
            None, {"alpha": 1}
        ) # 3.01μs -> 1.54μs (95.5% faster)


def test_large_many_keys():
    # Large number of keys, check performance and correctness
    metadata = {f"input:key_{i}": i for i in range(1000)}
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", metadata, filter_prefix="input:"
    ); result = codeflash_output # 191μs -> 163μs (17.4% faster)

def test_large_no_filter_prefix():
    # Large number of keys, no filtering
    metadata = {f"key_{i}": i for i in range(1000)}
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "metric", metadata
    ); result = codeflash_output # 94.9μs -> 73.5μs (29.1% faster)

def test_large_mixed_prefix():
    # Large number of keys, some with prefix, some without
    metadata = {}
    for i in range(500):
        metadata[f"input:key_{i}"] = i
    for i in range(500, 1000):
        metadata[f"key_{i}"] = i
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", metadata, filter_prefix="input:"
    ); result = codeflash_output # 161μs -> 133μs (20.7% faster)
    expected = {f"param.key_{i}": i for i in range(1000)}

def test_large_long_keys_and_values():
    # Large keys and string values
    metadata = {f"input:{'x'*50}_{i}": "y"*100 for i in range(100)}
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", metadata, filter_prefix="input:"
    ); result = codeflash_output # 23.4μs -> 20.0μs (17.4% faster)
    expected = {f"param.{('x'*50)}_{i}": "y"*100 for i in range(100)}

def test_large_all_keys_are_prefix():
    # All keys are exactly the prefix
    metadata = {"input:": i for i in range(100)}
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", metadata, filter_prefix="input:"
    ); result = codeflash_output # 1.62μs -> 1.52μs (6.99% faster)
    expected = {"param.": i for i in range(100)}

def test_large_all_keys_empty():
    # All keys are empty string
    metadata = {"": i for i in range(100)}
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", metadata
    ); result = codeflash_output # 972ns -> 919ns (5.77% faster)
    expected = {"param.": i for i in range(100)}

def test_large_values_are_large_lists():
    # Values are large lists
    metadata = {f"input:key_{i}": list(range(100)) for i in range(10)}
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", metadata, filter_prefix="input:"
    ); result = codeflash_output # 3.76μs -> 3.33μs (12.8% faster)
    expected = {f"param.key_{i}": list(range(100)) for i in range(10)}

def test_large_keys_with_special_chars():
    # Large number of keys with special characters
    metadata = {f"input:key_{i}@!": i for i in range(100)}
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", metadata, filter_prefix="input:"
    ); result = codeflash_output # 22.7μs -> 19.4μs (17.0% faster)
    expected = {f"param.key_{i}@!": i for i in range(100)}
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from typing import Dict, Optional, Union

# imports
import pytest  # used for our unit tests
from aiplatform.metadata.metadata import _LegacyExperimentService

# unit tests

# Basic Test Cases

def test_basic_single_entry_no_prefix():
    # Single key-value, no prefix filtering
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", {"foo": 42}
    ); result = codeflash_output # 950ns -> 951ns (0.105% slower)

def test_basic_multiple_entries_no_prefix():
    # Multiple key-value pairs, no prefix filtering
    metadata = {"foo": 1, "bar": "baz", "qux": 3.14}
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "metric", metadata
    ); result = codeflash_output # 1.27μs -> 1.19μs (7.32% faster)

def test_basic_with_prefix_removal():
    # Keys with prefix, should be removed
    metadata = {"input:foo": 123, "input:bar": 456}
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", metadata, filter_prefix="input:"
    ); result = codeflash_output # 1.97μs -> 1.81μs (8.82% faster)

def test_basic_with_partial_prefix_removal():
    # Only keys that start with prefix should be changed
    metadata = {"input:foo": 1, "bar": 2}
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", metadata, filter_prefix="input:"
    ); result = codeflash_output # 1.84μs -> 1.77μs (4.06% faster)

def test_basic_empty_metadata():
    # Empty dict should return empty dict
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", {}
    ); result = codeflash_output # 572ns -> 690ns (17.1% slower)

def test_basic_empty_prefix_string():
    # Empty prefix string should not remove anything
    metadata = {"foo": 1, "bar": 2}
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", metadata, filter_prefix=""
    ); result = codeflash_output # 1.29μs -> 1.22μs (6.26% faster)

def test_basic_non_string_values():
    # Test with various value types
    metadata = {"foo": 0, "bar": 1.1, "baz": "test"}
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", metadata
    ); result = codeflash_output # 1.22μs -> 1.12μs (8.18% faster)

# Edge Test Cases

def test_edge_prefix_longer_than_key():
    # Prefix longer than key, should not match and not remove
    metadata = {"f": 1}
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", metadata, filter_prefix="foobar"
    ); result = codeflash_output # 1.26μs -> 1.24μs (1.29% faster)

def test_edge_prefix_matches_entire_key():
    # Prefix is exactly the key, should remove all and leave empty key
    metadata = {"foo": 5, "bar": 6}
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", {"foo": 5}, filter_prefix="foo"
    ); result = codeflash_output # 1.62μs -> 1.52μs (6.86% faster)

def test_edge_prefix_is_none():
    # Prefix is None, should not remove anything
    metadata = {"input:foo": 1}
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", metadata, filter_prefix=None
    ); result = codeflash_output # 1.12μs -> 1.06μs (5.83% faster)

def test_edge_key_with_multiple_prefixes():
    # Key contains prefix multiple times, only leading prefix is removed
    metadata = {"input:input:foo": 99}
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", metadata, filter_prefix="input:"
    ); result = codeflash_output # 1.62μs -> 1.58μs (2.54% faster)

def test_edge_metadata_type_empty_string():
    # Empty metadata_type should result in keys starting with "."
    metadata = {"foo": 1}
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "", metadata
    ); result = codeflash_output # 983ns -> 881ns (11.6% faster)

def test_edge_key_is_empty_string():
    # Key is empty string
    metadata = {"": 123}
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", metadata
    ); result = codeflash_output # 958ns -> 891ns (7.52% faster)

def test_edge_value_is_none():
    # Value is None, should be preserved
    metadata = {"foo": None}
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", metadata
    ); result = codeflash_output # 977ns -> 886ns (10.3% faster)

def test_edge_non_ascii_characters():
    # Key and value contain non-ASCII characters
    metadata = {"输入:测试": "值"}
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", metadata, filter_prefix="输入:"
    ); result = codeflash_output # 2.28μs -> 2.16μs (5.65% faster)

def test_edge_key_with_dot_in_name():
    # Key contains a dot, should not be split
    metadata = {"foo.bar": 77}
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", metadata
    ); result = codeflash_output # 959ns -> 919ns (4.35% faster)

def test_edge_value_is_bool():
    # Value is boolean
    metadata = {"foo": True, "bar": False}
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", metadata
    ); result = codeflash_output # 1.08μs -> 1.05μs (2.66% faster)

def test_edge_value_is_list_or_dict():
    # Value is a list or dict (should be preserved as is)
    metadata = {"foo": [1, 2], "bar": {"baz": 3}}
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", metadata
    ); result = codeflash_output # 1.07μs -> 1.05μs (1.91% faster)

def test_edge_prefix_is_empty_and_key_is_prefix():
    # Prefix is empty and key is empty, should result in "param."
    metadata = {"": 111}
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", metadata, filter_prefix=""
    ); result = codeflash_output # 1.12μs -> 1.06μs (6.23% faster)

# Large Scale Test Cases

def test_large_scale_many_keys_no_prefix():
    # Large number of keys, no prefix
    metadata = {f"key{i}": i for i in range(1000)}
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", metadata
    ); result = codeflash_output # 94.6μs -> 72.7μs (30.1% faster)
    # All keys should be present and correctly mapped
    for i in range(1000):
        pass

def test_large_scale_many_keys_with_prefix():
    # Large number of keys with prefix
    metadata = {f"input:key{i}": i for i in range(1000)}
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", metadata, filter_prefix="input:"
    ); result = codeflash_output # 190μs -> 160μs (18.2% faster)
    for i in range(1000):
        pass

def test_large_scale_mixed_keys_with_and_without_prefix():
    # Mix of keys with and without prefix
    metadata = {f"input:key{i}": i for i in range(500)}
    metadata.update({f"key{i}": i for i in range(500, 1000)})
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", metadata, filter_prefix="input:"
    ); result = codeflash_output # 160μs -> 133μs (20.4% faster)
    for i in range(500):
        pass
    for i in range(500, 1000):
        pass

def test_large_scale_empty_metadata():
    # Empty metadata at large scale (should still be empty)
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", {}
    ); result = codeflash_output # 576ns -> 702ns (17.9% slower)

def test_large_scale_long_prefix():
    # Prefix is long, only matching keys should be changed
    prefix = "verylongprefix:"
    metadata = {f"{prefix}key{i}": i for i in range(500)}
    metadata.update({f"otherkey{i}": i for i in range(500, 1000)})
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", metadata, filter_prefix=prefix
    ); result = codeflash_output # 162μs -> 137μs (18.0% faster)
    for i in range(500):
        pass
    for i in range(500, 1000):
        pass

def test_large_scale_all_keys_are_prefix():
    # All keys are exactly the prefix
    prefix = "foo"
    metadata = {"foo": i for i in range(1000)}
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", metadata, filter_prefix=prefix
    ); result = codeflash_output # 1.65μs -> 1.58μs (4.63% faster)

def test_large_scale_different_metadata_types():
    # Test with different metadata_type values
    metadata = {f"key{i}": i for i in range(10)}
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "param", metadata
    ); result_param = codeflash_output # 2.05μs -> 1.88μs (8.93% faster)
    codeflash_output = _LegacyExperimentService._execution_to_column_named_metadata(
        "metric", metadata
    ); result_metric = codeflash_output # 1.47μs -> 1.35μs (8.90% faster)
    # Keys should be different
    for i in range(10):
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-_LegacyExperimentService._execution_to_column_named_metadata-mglgx3jd and push.

Codeflash

The optimization replaces the expensive `".".join([metadata_type, key])` string operation with simple string concatenation using the `+` operator. 

**Key changes:**
- Pre-computes `metadata_type_dot = metadata_type + '.'` once outside the loop instead of creating a list and joining it for every key
- Uses direct concatenation `metadata_type_dot + key` instead of `".".join([metadata_type, key])`

**Why this is faster:**
- `str.join()` has overhead for creating a temporary list `[metadata_type, key]` and then iterating through it to build the final string
- Simple string concatenation with `+` is a more direct operation that avoids the list creation and iteration overhead
- Pre-computing the dot-appended metadata type eliminates redundant string operations in the loop

**Performance gains:**
The optimization shows consistent 6-30% speedups across test cases, with the largest gains (17-30%) appearing in scenarios with many keys where the loop runs frequently. The line profiler shows the critical line (string concatenation) improved from 36.8% to 32.3% of total runtime, with overall function time reduced by ~10%. Small metadata collections (empty dicts) show slight regressions due to the overhead of pre-computing the string, but all meaningful workloads benefit significantly.
@codeflash-ai codeflash-ai Bot requested a review from mashraf-222 October 10, 2025 23:19
@codeflash-ai codeflash-ai Bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.