Skip to main content

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow, the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Visit Stack Exchange
Asked
Viewed 4k times
1
\$\begingroup\$

I have a matrix of ~4.5 million vector [4.5mil, 300] and I want to calculate the distance between a vector of length 300 against all the entries in the matrix.

I got some great performance time using the answers from the following post: Efficient numpy cosine distance calculation.

from scipy import spatial


def cos_matrix_multiplication(vector, matrix):

    v = vector.reshape(1, -1)
    scores=spatial.distance.cdist(matrix_1, v, 'cosine').reshape(-1)
    return scores

Here, vector is a NumPy array [300,1] and matrix_1 is a NumPy matrix [4.5mil, 300]

Using this I can calculate scores for the entire matrix (4.5 mil records) in about 19secs. I have been trying to optimize it even further but don't seem to make any progress.

I want to know if it is feasible to convert this code into a cython code or use Process/Thread pool to speed up even more.

I split my matrix into smaller chunks (500K each) and used ThreadPoolExecutor as follows:

matrix_list=[mat_small1,mat_small2,....,mat_small9]

import concurrent.futures
def cos_thread():
    neighbors=[]

    with concurrent.futures.ThreadPoolExecutor(max_workers=8) as executor:

        future_to_list = {executor.submit(cos_matrix_multiplication,vec,mat_col): mat_col for mat_col in matrix_list}

        for future in concurrent.futures.as_completed(future_to_list):

            data = future.result()

            neighbors.extend(data)

    return neighbors

I can currently compute the entire ~4.5 million cosines in ~5 seconds.

\$\endgroup\$
2
  • 1
    \$\begingroup\$ Work through cython tutorials. And study the internals of cdist You can't just throw your code at cython. \$\endgroup\$
    hpaulj
    –  hpaulj
    2016-11-16 03:20:21 +00:00
    Commented Nov 16, 2016 at 3:20
  • 1
    \$\begingroup\$ If you can split the matrix into parts, throw it on a multiprocessing pool? \$\endgroup\$
    Simon
    –  Simon
    2016-11-16 08:04:02 +00:00
    Commented Nov 16, 2016 at 8:04

1 Answer 1

2
\$\begingroup\$

Does scipy.spatial.distance.cdist run multiple cores in parallel on your machine ? On my mac with Accelerate framework, it runs all 4 cores, but equivalent numpy seems to run only 1. (This is regardless of VECLIB_MAXIMUM_THREADS, which I don't understand.)

If the big matrix doesn't change very often, normalize it once outside the loop:

A /= np.linalg.norm( A, axis=1 )  # <-- once

def cosdist( A, b, dtype=np.float32 ):
    """ (A . b) / |b| -- A normalized """
    Adotb = A.dot(b) / 
    Adotb /= np.linalg.norm(b)
    return Adotb

Since A.dot(b) and norm(A) take roughly the same time, this runs about twice as fast -- on 1 core.

How much memory do you have ? 1M x 300 x 8 bytes is 2.4 Gbytes, which may be a reasonable chunk size; 4.5M, 11Gbytes, will be memory-bound. Can you monitor memory usage / swapping ?

Using np.float32 instead of the default np.float64 might be faster, or allow bigger chunks. But scipy cdist seems to convert float32 to float64, slower.

Numpy and scipy link to libraries for BLAS, Basic Linear Algebra Subprograms. These libs are usually vendor-tuned, faster than C or Cython loops. To see what numpy / scipy link to:

from numpy import __config__
print "numpy blas:\n", __config__.get_info( "blas_opt" )

from scipy import __config__
print "scipy blas:\n", __config__.get_info( "blas_opt" )

Even though these are the same (Accelerate framework on my mac), numpy and scipy have different wrappers around BLAS (numpy/linalg/linalg/lapack_lite.so, scipy/linalg/cython_blas.so) with different overheads.

Summary: numpy / scipy dot and norm to BLAS to multiple cores is a messy business.
Don't trust any runtimes that you haven't run yourself.

\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Morty Proxy This is a proxified and sanitized view of the page, visit original site.