Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Add first-class embedding primitives across network API, DSL, graph_ops, and pipeline#1270

Merged
SkBlaz merged 3 commits intomasterSkBlaz/py3plex:masterfrom
copilot/add-node-and-edge-embedding-supportSkBlaz/py3plex:copilot/add-node-and-edge-embedding-supportCopy head branch name to clipboard
Mar 14, 2026
Merged

Add first-class embedding primitives across network API, DSL, graph_ops, and pipeline#1270
SkBlaz merged 3 commits intomasterSkBlaz/py3plex:masterfrom
copilot/add-node-and-edge-embedding-supportSkBlaz/py3plex:copilot/add-node-and-edge-embedding-supportCopy head branch name to clipboard

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 14, 2026

This PR makes embeddings a first-class ML primitive in py3plex with multilayer-native behavior and end-to-end integration across the builder/DSL architecture, graph_ops, and pipelines. It introduces a unified embedding surface while reusing and extending existing embedding infrastructure (netmf, metapath2vec, DSL embedding stage).

  • New embedding package (py3plex/ml/embedding)

    • Added model modules and shared infrastructure:
      • base.py (base interface + result type re-export)
      • node2vec.py, deepwalk.py, netmf.py, line.py, metapath2vec.py
      • multiplex.py (multiplex-aware variants)
      • trainer.py, utils.py, similarity.py, evaluation.py
    • Added package exports in py3plex/ml/__init__.py and py3plex/ml/embedding/__init__.py.
  • Unified network entry point

    • Added multi_layer_network.embed(...) to instantiate and run embedding models via a single API.
    • Supports core methods (node2vec, deepwalk, netmf, line, metapath2vec) and multiplex-aware methods (multiplex_node2vec, supra_adjacency, layer_regularized).
  • EmbeddingResult promoted to first-class result object

    • Extended py3plex/embeddings/base.py with:
      • direct indexing (embedding[node_id])
      • vectors, nodes, dimension
      • to_pandas(), to_numpy(), to_arrow()
      • similarity search (similarity, knn, most_similar)
      • clustering helper (cluster)
      • persistence (save/load for parquet/arrow/npy/npz)
  • DSL integration

    • Extended EmbeddingSpec and Q.embed(...) to support node2vec/deepwalk/line-oriented parameters (p, q, window_size, negative_samples, workers, order, alias dimensions).
    • Updated executor embedding stage to route through network.embed(...) and expose embedding vectors via query attributes for downstream export/use.
  • graph_ops integration

    • Added NodeFrame.embed(...) to attach embedding vectors as an embedding column in node frames.
  • Pipeline integration

    • Added NodeEmbedding pipeline step (method/dimensions-driven).
    • Exposed NodeEmbedding in public exports and added compatibility shim for py3plex.pipeline.steps.embedding.
  • Docs and package surface

    • Updated embedding exports and AGENTS summary note for the new first-class embedding primitive.

Example

from py3plex.core.multinet import multi_layer_network
from py3plex.dsl import Q

net = multi_layer_network(directed=False, network_type="multilayer")
# ... add/load nodes+edges ...

emb = net.embed(method="node2vec", dimensions=128, walk_length=40, num_walks=10, seed=42)
top = emb.most_similar(("Alice", "social"), k=10)

result = (
    Q.nodes()
     .embed("node2vec", dim=128, p=1.0, q=1.0, walk_length=40, num_walks=10)
     .execute(net)
)
Original prompt

This section details on the original issue you should resolve

<issue_title>emb fclass</issue_title>
<issue_description>Implement first-class node and edge embedding support in py3plex as a core ML primitive integrated with the builder/DSL architecture.

Goals:

Embeddings must work naturally with multilayer networks.

API must integrate with existing DSL (Q), pipelines, and graph_ops.

Backend should support scalable implementations (NumPy/JAX/PyTorch optional).

Embeddings must be reusable across downstream tasks (link prediction, clustering, classification).

Design and implement the following components.


Create a new module:

py3plex/ml/embedding/

Submodules:

base.py
node2vec.py
deepwalk.py
netmf.py
line.py
metapath2vec.py
multiplex.py
trainer.py
utils.py


Define a base embedding interface.

File: base.py

Requirements:

class BaseEmbedding:
name: str

def fit(self, network):
    pass

def transform(self, nodes=None):
    pass

def fit_transform(self, network):
    pass

def get_embedding(self, node):
    pass

def to_pandas(self):
    pass

def to_numpy(self):
    pass

All embedding models must inherit from this base.


Add a unified embedding entry point on the network object.

Extend multi_layer_network with:

net.embed(
method="node2vec",
dimensions=128,
walk_length=40,
num_walks=10,
context_size=10,
workers=4
)

Return object:

EmbeddingResult


Implement EmbeddingResult.

Capabilities:

embedding.vectors
embedding.nodes
embedding.dimension

embedding[node_id]

embedding.to_pandas()
embedding.to_numpy()
embedding.to_arrow()

embedding.similarity(node_a, node_b)
embedding.knn(node, k=10)

Storage:

dict[(node_id, layer)] -> vector

Support multiplex nodes.


Implement Node2Vec.

File: node2vec.py

Features:

biased random walks

parameters p, q

skipgram training

negative sampling

support multiplex networks

Interface:

Node2VecEmbedding(
dimensions=128,
walk_length=80,
num_walks=10,
p=1.0,
q=1.0,
window_size=10,
negative_samples=5,
)

Steps:

  1. generate random walks

  2. build training corpus

  3. train skipgram

  4. produce embeddings


Implement DeepWalk.

File: deepwalk.py

Same interface as Node2Vec but without bias parameters.


Implement NetMF.

File: netmf.py

Requirements:

spectral approximation of DeepWalk

use sparse matrices

truncated SVD

Interface:

NetMFEmbedding(
dimensions=128,
window=10,
negative=1,
)


Implement LINE.

File: line.py

Support:

order=1
order=2

Optimization:

negative sampling

stochastic gradient descent


Implement MetaPath2Vec for multiplex networks.

File: metapath2vec.py

Features:

meta-path guided random walks

layer-aware walks

Example metapath:

["author","paper","venue","paper","author"]

Interface:

MetaPath2VecEmbedding(
metapaths=[...],
dimensions=128,
walk_length=40
)


Implement multiplex-aware embeddings.

File: multiplex.py

Add:

MultiplexNode2Vec
SupraAdjacencyEmbedding
LayerRegularizedEmbedding

Capabilities:

cross-layer transitions

layer weighting

inter-layer edge handling

Random walk must support:

(node, layer)

states.


Implement embedding trainer.

File: trainer.py

Responsibilities:

random walk generation

negative sampling

batching

parallelization

APIs:

generate_walks(network)
train_skipgram(walks)
optimize_embeddings()

Allow backend selection:

backend="numpy"
backend="jax"
backend="torch"


Add DSL integration.

Extend Q builder.

Example:

result = (
Q.nodes()
.embed("node2vec", dim=128)
.execute(net)
)

Output:

QueryResult with embedding vectors

Columns:

node
layer
embedding


Add graph_ops integration.

Example:

nodes(net)
.embed(method="node2vec", dim=64)
.mutate(cluster=lambda x: kmeans(x["embedding"]))

Embedding must appear as column:

embedding


Add pipeline step.

File:

py3plex/pipeline/steps/embedding.py

Step:

NodeEmbedding(
method="node2vec",
dimensions=128
)

Usage:

Pipeline([
("embed", NodeEmbedding(method="node2vec")),
("cluster", NodeClustering())
])


Add similarity utilities.

File:

similarity.py

Functions:

cosine_similarity
euclidean_distance
dot_similarity

Add methods:

embedding.similarity(a,b)
embedding.most_similar(node,k=10)


Add clustering helpers.

embedding.cluster(method="kmeans", k=10)
embedding.cluster(method="spectral")

Return:

node -> cluster_id


Add persistence.

EmbeddingResult must support:

embedding.save("embeddings.parquet")
embedding.load("embeddings.parquet")

Formats:

parquet
arrow
numpy


Add evaluation utilities.

File:

evaluation.py

Functions:

evaluate_link_prediction()
evaluate_node_classification()
evaluate_clustering()

Metrics:

ROC-AUC
F1
NMI
ARI


Performance requirements.

random walk generation parallelized

sparse matrix operations

avoid dense adjacency matrices

s...


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 2 commits March 14, 2026 11:03
Co-authored-by: SkBlaz <10035780+SkBlaz@users.noreply.github.com>
Co-authored-by: SkBlaz <10035780+SkBlaz@users.noreply.github.com>
Copilot AI changed the title [WIP] Add first-class node and edge embedding support in py3plex Add first-class embedding primitives across network API, DSL, graph_ops, and pipeline Mar 14, 2026
Copilot AI requested a review from SkBlaz March 14, 2026 11:09
@SkBlaz SkBlaz marked this pull request as ready for review March 14, 2026 19:43
@SkBlaz SkBlaz merged commit e7e7347 into master Mar 14, 2026
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

emb fclass

2 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.