How do I use Arc with interior mutability over slices?

Question

MRE:

use std::sync::Arc;
use std::thread;

fn main() {
    let arr1 = Arc::new(vec![10; 100]);
    let arr2 = Arc::new(vec![20; 100]);
    let arr3 = Arc::new(vec![00; 100]); // Going to need interior mutability. Might need to use
                                        // Mutex/RwLock. Need to find a way to lock a slice instead
                                        // of the entire structure.

    let mut handles = vec![];

    for i in 0..4 {
        let chunk_beg = 25 * i;
        let chunk_end = chunk_beg + 25;

        let arr1_clone = Arc::clone(&arr1);
        let arr2_clone = Arc::clone(&arr2);
        let arr3_clone = Arc::clone(&arr3);

        handles.push(thread::spawn(move || {
            for idx in chunk_beg..chunk_end {
                arr3_clone[idx] = arr1_clone[idx] + arr2_clone[idx];
            }
        }));
    }

    for handle in handles {
        handle.join();
    }
}

Error:

error[E0596]: cannot borrow data in an `Arc` as mutable
  --> src/main.rs:23:17
   |
23 |                 arr3_clone[idx] = arr1_clone[idx] + arr2_clone[idx];
   |                 ^^^^^^^^^^ cannot borrow as mutable
   |
   = help: trait `DerefMut` is required to modify through a dereference, but it is not implemented for `Arc<Vec<i32>>`

What I am trying not to do: Lock arr3_clone in its entirety because at that point, the threads just become sequential.
What I am trying to do: Find a way to write into individual slices of arr3_clone indirections.

The easiest way to do this is with rayon. If you want to do it manually, you can use split_at_mut to get independent mutable references to each chunk, and scoped threads so that you can capture plain references in your threads. — Jmb, Commented Apr 12 at 7:19
@user2722968 Interior mutability is not always !Sync. Atomic integers, Mutex, and RwLock all provide thread-safe (and therefore Sync) interior mutability. — cdhowie, Commented Apr 12 at 15:44

Kevin Reid · Accepted Answer · 2025-04-12 17:23:01Z

Technically, it is possible to use interior mutability for this task; simply make arr3 out of atomic integers. (An atomic integer type is sort of like a Mutex, but far cheaper, takes up no extra space, and cannot actually be locked — you only get to pick from a set of basic operations on it each of which happens “instantly”. The non-thread-safe but more generic equivalent of atomics is Cell, which may be informative to study.)

use std::sync::atomic::{AtomicI32, Ordering};

...

let arr3 = Arc::new((0..100).map(|_| AtomicI32::new(0)).collect::<Vec<_>>());

...

arr3_clone[idx].store(arr1_clone[idx] + arr2_clone[idx], Ordering::Relaxed);

However, this is less efficient than it could be. Scoped threads allow you to borrow data inside of threads, which means that you don't need Arc and can instead access the vectors directly, which allows borrowing each chunk as a separate &mut [i32] and avoiding interior mutability entirely.

use std::thread;

fn main() {
    let arr1 = vec![10; 100];
    let arr2 = vec![20; 100];
    let mut arr3 = vec![0; 100];

    thread::scope(|scope| {
        for ((chunk1, chunk2), chunk3) in arr1
            .chunks(25)
            .zip(arr2.chunks(25))
            .zip(arr3.chunks_mut(25))
        {
            scope.spawn(move || {
                for ((elem1, elem2), elem3) in chunk1.iter().zip(chunk2).zip(chunk3) {
                    *elem3 = elem1 + elem2;
                }
            });
        }
    });

    dbg!(arr3);
}

An even better solution is to use rayon, which has two advantages: a thread pool, and a parallel iterator API to use. rayon will automatically distribute the work among an appropriate number of threads.

use rayon::prelude::*;

fn main() {
    let arr1 = vec![10; 100];
    let arr2 = vec![20; 100];
    let mut arr3 = vec![0; 100];

    arr1.par_chunks(25)
        .zip(arr2.par_chunks(25))
        .zip(arr3.par_chunks_mut(25))
        .for_each(|((chunk1, chunk2), chunk3)| {
            for ((elem1, elem2), elem3) in chunk1.iter().zip(chunk2).zip(chunk3) {
                *elem3 = elem1 + elem2;
            }
        });

    dbg!(arr3);
}

I'm still using chunks in this example, because for an operation as small as a few additions, it's useful to give the compiler the opportunity to unroll or otherwise optimize the inner loop, by not expressing it as a parallel iteration but a plain loop. But you should tune the chunk size for whatever is fastest for your actual operation, and if the per-element operation is expensive, you can skip the chunks:

arr1.par_iter()
    .zip(arr2.par_iter())
    .zip(arr3.par_iter_mut())
    .for_each(|((elem1, elem2), elem3)| {
        *elem3 = elem1 + elem2;
    });

And you can replace the pre-created arr3 with a collect operation:

let arr3: Vec<i32> = arr1.par_iter()
    .zip(arr2.par_iter())
    .map(|(elem1, elem2)| elem1 + elem2)
    .collect();

Collectives™ on Stack Overflow

How do I use Arc with interior mutability over slices?

1 Answer 1

3 Comments

Your Answer

Post as a guest

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related