Skip to main content

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow, the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Visit Stack Exchange
Asked
Modified 7 days ago
Viewed 1k times
7
\$\begingroup\$

I am trying to solve the 909th challenge of Project Euler. It is basically about replacing specific patterns in a string until no pattern is found. A given string like "S(S)(S(S))(S(Z))(A)(0)" is called an L_expression and the rules for replacement are: $$\begin{array}{lcl} A(u) & \longrightarrow & u+1 \\ Z(u)(v) & \longrightarrow & v \\ S(u)(v)(w) &\longrightarrow & v(u(v)(w)) \end{array}$$

At first I thought regex would do the job, and it did for the two given examples, "S(Z)(A)(0)" and "S(S)(S(S))(S(Z))(A)(0)". But for the challenge itself, it turned out the iterations are far too many. So in the end, I wrote a C program and tried to make it as fast and optimized as possible:

#include <string.h>
#include <stdio.h>

static int find_patterns(const char* input, char* output, size_t size)
{
    size_t last_x = 0, last_y = 0;
    for (size_t p, i = 0; (p = i) < size; ++i)
    {
        char c = input[p++], next = input[p++];
        if ((c < 'A' || next != '(') && (c != '+' || next < '0' || next > '9'))
        {
            continue; // look for either "A(", "S(", "Z(", "+d"
        }
        if (c == 'S')
        {
            int depth = 1, args = 0, lu = 0, lv = 0, lw = 0;
            char *u = input + p, *v = NULL, *w = NULL, *y = output + last_y;
            do {
                switch (input[p])
                {
                case '(':
                    ++depth;
                    break;
                case ')':
                    if (--depth > 0) break;
                    if (depth < 0 || (args < 2 && input[p + 1] != '(')) {
                        p = size; // exit the loop
                        break;
                    }
                    switch (++args)
                    {
                    case 1:
                        lu = p - i - 2;
                        v = input + p + 2;
                        break;
                    case 2:
                        lv = p - i - 4 - lu;
                        w = input + p + 2;
                        break;
                    case 3: // found pattern S(..)(..)(..)
                        lw = p - i - 6 - lu - lv;
                        memcpy(y, input + last_x, i - last_x);
                        y += i - last_x;
                        memcpy(y, v, lv); // replace with v(u(v)(w))
                        y += lv;
                        memcpy(y, u - 1, lu + 1);
                        y += lu + 1;
                        memcpy(y, v - 1, lv + lw + 4);
                        y += lv + lw + 5;
                        y[-1] = ')';
                        last_x = p + 1;
                        last_y = y - output;
                        i = p;
                        p = size;
                    }
                    break;
                }
            } while (p++ < size);
        }
        else if (c == 'Z')
        {
            int depth = 1, args = 0, lu = 0, lv = 0;
            char *u = input + p, *v = NULL, *y = output + last_y;
            do {
                switch (input[p])
                {
                case '(':
                    ++depth;
                    break;
                case ')':
                    if (--depth > 0) break;
                    if (depth < 0 || (!args && input[p + 1] != '(')) {
                        p = size; // exit the loop
                        break;
                    }
                    switch (++args)
                    {
                    case 1:
                        lu = p - i - 2;
                        v = input + p + 2;
                        break;
                    case 2: // found pattern Z(..)(v)
                        lv = p - i - 4 - lu;
                        memcpy(y, input + last_x, i - last_x);
                        y += i - last_x;
                        memcpy(y, v, lv); // replace with v
                        y += lv;
                        last_x = p + 1;
                        last_y = y - output;
                        i = p;
                        p = size;
                    }
                    break;
                }
            } while (p++ < size);
        }
        else if (c == 'A')
        {
            int depth = 1, lu = 0;
            char *u = input + p, *y = output + last_y;
            do {
                switch (input[p])
                {
                case '(':
                    ++depth;
                    break;
                case ')':
                    if (--depth > 0) break;
                    if (depth < 0) {
                        p = size; // exit the loop
                        break;
                    }
                    lu = p - i - 2; // found pattern A(u)
                    memcpy(y, input + last_x, i - last_x);
                    y += i - last_x;
                    memcpy(y, u, lu); // replace with u+1
                    y += lu + 2;
                    y[-2] = '+';
                    y[-1] = '1';
                    last_x = p + 1;
                    last_y = y - output;
                    i = p;
                    p = size;
                    break;
                }
            } while (p++ < size);
        }
        else // c == '+': look for u+v where both are numbers
        {
            size_t u = 0, v = 0, d = 1, lu;
            for (p = i; p--; d *= 10) {
                c = input[p] - '0';
                if (c >= 0 && c < 10) u += d * c;
                else break;
            }
            for (lu = i - p - 1, p = i + 1; 1; ++p) {
                c = input[p] - '0';
                if (c >= 0 && c < 10) v = v * 10 + c;
                else break;
            }
            if (lu && v) // both u and v are non-negative numbers
            {
                char str[20], *y = output + last_y;
                for (u += v, d = 0; u; u /= 10) { // convert u+v to string
                    str[d++] = u % 10 + '0';
                }
                for (str[d--] = v = 0; v < d; ) {
                    u = str[v];
                    str[v++] = str[d];
                    str[d--] = u;
                }
                memcpy(y, input + last_x, i - last_x - lu);
                y += i - last_x - lu;
                memcpy(y, str, lu = strlen(str));
                y += lu;
                last_x = p;
                last_y = y - output;
                i = p - 1;
            }
        }
    }
    if (last_x < size) {
        memcpy(output + last_y, input + last_x, size - last_x);
    }
    output[size - last_x + last_y] = 0;
    return last_x > 0;
}

#define MAX_ITERATIONS 1E6 // (size_t)(-2)

int main()
{
    const char
        *test1 = "S(Z)(A)(0)",
        *test2 = "S(S)(S(S))(S(Z))(A)(0)",
        *test3 = "S(S)(S(S))(S(S))(S(Z))(A)(0)";
    char str[2][0x5000] = { { 0 } };
    size_t iterations, i, max_size = 0;
    strcpy(*str, test3);

    for (iterations = i = 0; ++iterations < MAX_ITERATIONS; i = 1 - i)
    {
        size_t size = strlen(str[i]);
        if (size > max_size) {
            max_size = size;
        }
        if (!find_patterns(str[i], str[1 - i], size)) {
            break;
        }
    }
    printf("result of L-actions:\n%s\nmax size: %d, #iterations: %d\n",
        str[1 - i], max_size, iterations);
    return 0;
}

The problem is, the code still takes ages to complete like 10⁸ iterations and even so, we still won't reach anywhere near the solution. I have tried configuring GCC/Clang flags for speed optimizations and don't know what else I should do.

\$\endgroup\$
1
  • 10
    \$\begingroup\$ For most Project Euler problems, a “straight forward” algorithm is far too slow, no matter how optimized the implementation is. The challenge of those problems is to find a better algorithm. \$\endgroup\$
    Martin R
    –  Martin R
    2025-08-26 17:40:31 +00:00
    Commented Aug 26 at 17:40

4 Answers 4

12
\$\begingroup\$

Simplify error checking

        if ((c < 'A' || next != '(') && (c != '+' || next < '0' || next > '9'))
        {
            continue; // look for either "A(", "S(", "Z(", "+d"
        }

This doesn't help you. You're doing a bunch of extra checks for invalid input. If you found it, you shouldn't continue, you should exit. However, this doesn't check for all invalid input, e.g. B, C, [, or a. An easier way would be to remove that block and replace

        else // c == '+': look for u+v where both are numbers

with

        else if (c == '+')

and add a new clause at the end

        else 
        {
            // this should never happen,
            // the current character wasn't A, S, Z, or +
            exit(-1);
        }

Then you'd check if c was A, S, Z, or +. If not, you'd stop. Either the input was malformed or you processed it incorrectly. Either way, no reason to continue.

You'd do something similar for next, which must be ( for A, S, or Z and a digit for +.

This would be better for two reasons:

  1. It checks for all invalid values of c.
  2. It checks for the valid values first.

Your code does two checks every time before it starts looking for valid values. However, invalid values should not happen with proper input and processing. Move that check last rather than doing it first. You'll save one to two comparisons each iteration of that loop in the happy path.

Write before you unroll

One of the reasons why this is hard to read is that you unrolled your helper functions into the code you're showing us. Consider the following

switch (input[i]) {
    case 'A':
        verify('(', input[i + 1]);
        int end = find_matching_closing_parenthesis(input, i + 1);
        replace(i, end, i + 2, end - 1, '+1');
        break;

That would be much easier for us to read than what you have, although presumably it's what you're actually doing.

Also, switch statements may be optimized more than an if/else. They can be implemented with a jump table which does fewer calculations to determine the next line of code to run.

Write that code first. Get it working. Then, if you need slightly faster code, you can unroll those function calls. However, please realize that the compiler can also inline function calls. It's not at all obvious that it will be faster to unroll them in the code. It might be. We don't know what the compiler is doing. But this is something that you should absolutely profile before you try to optimize. This is also true of my switch suggestion. Profile it both ways.

You're doing micro-optimizations, but you're not showing us that they're helping. It's not strange for the compiler optimizations to be superior to your manual optimizations. That's why you profile, to tell the difference.

Consider tokenization

            size_t u = 0, v = 0, d = 1, lu;
            for (p = i; p--; d *= 10) {
                c = input[p] - '0';
                if (c >= 0 && c < 10) u += d * c;
                else break;
            }
            for (lu = i - p - 1, p = i + 1; 1; ++p) {
                c = input[p] - '0';
                if (c >= 0 && c < 10) v = v * 10 + c;
                else break;
            }
            if (lu && v) // both u and v are non-negative numbers
            {
                char str[20], *y = output + last_y;
                for (u += v, d = 0; u; u /= 10) { // convert u+v to string
                    str[d++] = u % 10 + '0';

Here, you convert strings to numbers. And then you convert numbers to strings. Why? You want to work with numbers (why you convert the input string to numbers). There is no reason to change the numbers back to strings until you're ready to output them. You have to input as strings, but you don't have to convert back to strings while processing.

Instead, tokenize the entire thing. Convert all the numbers to integer values mod \$10^9\$ (you are supposed to be returning the answer mod \$10^9\$). Operate on them that way. Only convert back to a string at the very end. Use printf. Again, profile that, but I would expect the built-in to have optimizations over your version. Note that \$10^9\$ fits in thirty bits, so one thirty-one bit signed (or thirty-two bit unsigned, since there's no subtraction) integer is sufficient.

It might make sense to get rid of the parentheses as part of the tokenization and read into a tree structure instead.

Micro-optimizing

Also

                c = input[p] - '0';
                if (c >= 0 && c < 10) u += d * c;
                else break;

That might be faster as

                if (input[p] < '0' || input[p] > '9') {
                    break;
                }
                u += d * (input[p] - '0');

If not faster, it's more readable.

There's also a built-in isdigit that would tell you if it's a digit. Again, possibly faster. Profile.

Don't brute force Euler

Your solution is basically about iterating as fast as you can. However, Euler is almost always about making observations about how to short cut calculations.

This is a branch of math called combinator calculus (not to be confused with combinatorics or combinatorials). You'll also see it referred to as SKI. SKI defines three functions: a substitution S (corresponds to S in the Euler problem); a combinator K (corresponds to Z in the Euler problem); an identity I, which can also be written SK.

The main characteristic of I is that it takes its argument and returns it. So I(x) is just x. The Euler problem doesn't define I. You are supposed to compose it. Then you scan the input for it. Rather than iterating multiple times, each time you find I in the input, you can replace it with its argument. That reduces your iterations.

The problem is likely to find the set of optimizations like this that you can check. Rather than iterating to the solution, they expect you to substitute to the solution.

Project Euler wants you to make mathematical insights, not computing insights. If you start veering off into optimizing computation, you're probably on the wrong track.

I'm not familiar with this branch of mathematics, so I can't help you with more specific insight. I'm simply confident that computation is not the problem here. Understanding how the math works is. I only made suggestions for your computation to help with other problems. This one probably needs a better algorithm, not better computation.

\$\endgroup\$
3
  • \$\begingroup\$ Thank you, those are truly valuable advices. Though I just figured out how to dramatically reduce the number of iterations. I'll add an answer later. \$\endgroup\$
    polfosol
    –  polfosol
    2025-08-27 04:16:52 +00:00
    Commented Aug 27 at 4:16
  • \$\begingroup\$ The compiler will hopefully make asm like c = input[p] - '0';. The way the OP wrote it is how I'd write it if thinking about the asm I wanted the compiler to make: range-check with SUB then unsigned compare. double condition checking in assembly . (And see NASM Assembly convert input to integer? for what efficient asm for this specific case looks like. Basically the same, it's very convenient that if in the range, we want to map it to a 0-indexed position in the range, i.e. the SUB result.) \$\endgroup\$
    Peter Cordes
    –  Peter Cordes
    2025-08-28 20:29:19 +00:00
    Commented Aug 28 at 20:29
  • \$\begingroup\$ But anyway, modern ahead-of-time compilers like GCC, Clang, and MSVC should all optimize either way, doing common-subexpression-elimination between the c = input[p] - '0' and the range check if you write it that way. Even if c is signed so doing input[p] - '0' before the range-check can produce a negative. Hopefully none of them get distracted by trying to take advantage of signed-overflow being UB. Especially not on a normal machine with char narrower than int so overflow is impossible. Anyway, your way is not worse, possibly better, despite being farther from the optimal asm. \$\endgroup\$
    Peter Cordes
    –  Peter Cordes
    2025-08-28 20:35:08 +00:00
    Commented Aug 28 at 20:35
9
\$\begingroup\$

I took a look over your code and my main problem is that I fail to understand what your code does, thus I cannot give hints on the performance problem.
I am also not familiar with this particular challenge, so my challenge related answers may not fit to the nature of the problem.

  • It is difficult to identify what the variables stand for and thus what they should do.
  • The many nesting levels make it difficult to see what starts and ends where.
  • The big and multiple nested loops (together with the prior two problems) make it difficult to identify how the string is processed in which order.
  • The multiple definitions and initializations in the same code line can be quite confusing while this brings no benefit to the functionality or performance.

I do give credit that you implemented an iterative approach instead of a recursive one, which is probably a good approach even though I don't yet see if the recursion depth would be a problem or not.

  • Add more comments about the many not obvious parts (e.g. which characters fall into the category c < 'A', does that include or exclude the numbers and brackets?; where does lw = p - i - 6 - lu - lv point to and so on)
  • As the entire code is in the same module it is quite easy for the compiler to do some obvious optimizations like function inlining. I recommend to put the three different replacement rules into separate functions to improve readability and let the compiler put it back together.
  • I recommend to give all the single-character variable names a more meaningful name unless this name is part of the general challenge description (like u, v and w)

So I am aware that my answer does not answer your direct question, but following my proposals might help others to help with your performance question.
Don't underestimate the benefits of maintainable code. Reading and understanding code is a part of maintaining.

\$\endgroup\$
2
  • 5
    \$\begingroup\$ I know you are embarrassed that you can't offer more advice, but I genuinely believe that "this code is hard to understand" is extremely valuable feedback (even though it hurts when I receive it!). If polfosol takes that criticism and posts a new question with named functions, improved identifiers and (where all else fails) explanatory comments, then a more informed review is likely to be the result. \$\endgroup\$
    Toby Speight
    –  Toby Speight
    2025-08-26 19:22:47 +00:00
    Commented Aug 26 at 19:22
  • \$\begingroup\$ In my experience, you can take any code and make it maybe ten times faster, making it unreadable in the process. That may be a good compromise. However, no you cannot improve your code through deeper insights anymore. And for Euler problems a factor ten improvement is very often not enough. So you do this only as the very last step after you run completely out of ideas. \$\endgroup\$
    gnasher729
    –  gnasher729
    2025-08-28 20:53:24 +00:00
    Commented Aug 28 at 20:53
4
\$\begingroup\$

Analyze looking for intelligence

One answer already mentioned that your approach is basically brute force. This is due to simply iterating through all.

Looking at the possible steps:

  • A(u) → u+1
  • Z(u)(v) → v
  • S(u)(v)(w) → v(u(v)(w))

You are reducing a sequence.

You'll notice that applying A will not change the number of sequence items; applying Z will remove one item from the sequence; applying S will add one item to the sequence.

Relying on:

Note: it can be proved that the L-expression in question can only be transformed a finite number of times, and the final result does not depend on the order of the transformations.

One should prefer taking reductions in the order Z - A - S.

By the way: this problem reeks of combinators.

As S can explosively ("exponentially") increase the number of steps, this is important.

So my suggestion is to write your algorithm in high level language using elementary data structures (list probably), and iterate picking Z/A/S. With sufficient abstraction reduce_by_rule, change_list_xxx.

Such code better explains things. And describes an optimally chosen algorithm.

After that a C version is easier. Your code now is a huge monolithic block, which is hard to communicate.

You could start with a high level language like Java, Python or whatever, and just use lists.

You could also start formulating data types. Schematically/sketchily that would be like:

Pseudo-Code!

interface V {
}

static class W implements V {
    String value;

    W(String value) {
        this.value = value;
    }
}

static class U implements V {
    int value;

    U(int value) {
        this.value = value; // + 1 = Automatic anticipatory reduction under "A(u) --> u+1"?
    }
}

static final W A = new W("A");
static final W Z = new W("Z");
static final W S = new W("S");
static final U zero = new U(0);

class P implements V {
    private List<V> args;

    P(List<V> args) {
        this.args = new ArrayList<>(args);
    }
}

P p(V... args) {
    return new P(List.of(args));
}

void reduce(List<V> input) {
    ...
}

private void main() {
    String[] rules = new String[]{
        "A(u) → u+1",
        "Z(u)(v) → v",
        "S(u)(v)(w) → v(u(v)(w))"
    };
    //String input = "S(S)(S(S))(S(Z))(A)(0)";
    List<V> input = new ArrayList<>(List.of(S, p(S), p(S, p(S)), p(S, p(Z)), p(A), p(zero)));
    reduce(input);
}

The C view

Here you should deal with "(...)" as done with P p(...) above. It hints of a list structure. Instead of trying to live with handling "A(" you could convert the input string to something useful.

As one can imagine from the terrible S, the overhead from any translation away of string manipulations poses an acceptable cost. Pure string manipulation as here, is less readable and blocks easy smart optimizations.

\$\endgroup\$
2
  • \$\begingroup\$ Ah I forgot that I was going to add an answer applying some of @mdfst13 suggestions and a few that I discovered myself. Anyway, my improved method was basically similar to your suggestion and it did reduce the number of iterations by an order of magnitude. But in the end, I ran the optimized code for some 10 hours and it still didn't converge. So I guess this will additionally need some high level math or something to further reduce the iterations. \$\endgroup\$
    polfosol
    –  polfosol
    2025-10-07 13:28:37 +00:00
    Commented Oct 7 at 13:28
  • \$\begingroup\$ Also one can easily implement such looping problems wrong and achieve an endless loop. Storing every seqence encoded as string or number in a database may find erroneous repetitions. When not C the language might be slow. \$\endgroup\$
    Joop Eggen
    –  Joop Eggen
    2025-10-07 13:45:31 +00:00
    Commented Oct 7 at 13:45
3
\$\begingroup\$

I don't spend a lot of time on Code Review, but I noticed this question in HNQ and let it stew in the background for a month or so before I found a solution. This answer isn't a review of the code in the question, but it should help to point you toward a solution.

The first thing I noticed is that those definitions look an awful lot like lambda expressions, so I thought I'd try just laying them out in a functional language. The one I know best is F#. Here's a copy of an F# interactive session:

Microsoft (R) F# Interactive version 12.9.101.0 for F# 9.0
Copyright (c) Microsoft Corporation. All Rights Reserved.

For help type #help;;

> let a u = u + 1;;     
val a: u: int -> int

> let z u v = v;;
val z: u: 'a -> v: 'b -> 'b

> let s u v w = v (u v w);;
val s: u: (('a -> 'b) -> 'c -> 'a) -> v: ('a -> 'b) -> w: 'c -> 'b

> s z a 0;;
val it: int = 1

> s(s)(s(s))(s(z))(a)(0);;
val it: int = 6

> s s (s s) (s z) a 0;;
val it: int = 6

>

Of course, s s (s s) (s s) (s z) a 0 took too long to evaluate, but I found this extremely useful for experimentation.

The problem is, the code still takes ages to complete like 10⁸ iterations and even so, we still won't reach anywhere near the solution.

Indeed. Consider that A is the increment operation. The example that evaluates to 6 transforms to A(A(A(A(A(A(0)))))) before it yields the final result. The figure of which you need to calculate the last 9 digits is actually larger than 2⁶⁴, so even if you ignore the iterations that get you to the multiple instances of A, and assuming your computer can count to 2³² in one second, you're talking a lower bound of a few hundred years just to get from A(A(A(...(A(A(A(0))))...))) to the final answer. The actual figure is more likely to be in the tens of thousands of years. As mdfst13 and Martin R have noted, optimization is not the answer here.

Instead, the solution is to figure out what these things mean and then see how that leads you to a more efficient workable solution. Why does s s (s s) (s z) a 0 evaluate to 6? Once I hit on the key insight, I was able to write a function that gets to the result in about 4 milliseconds. I don't want to give too much away, but I'll add some more hints if you ask for them. For now, here are the things that helped me find the solution:

  • Experimenting with different expressions using the F# functions to see how changes to the expressions affected the end result.
  • Reading about (with thanks to mdfst13) combinators and SKI. (Note that the definitions here are somewhat different from S and K of SKI, and that leads me to the next point.)
  • Realizing that the letters Z and S are not chosen by accident: in fact they reflect a simple and fundamental meaning.
  • Doing a few transformations manually to see the patterns that result in intermediate stages.

In fact, with regard to the last point, I first solved the problem by transforming s s (s s) (s s) (s z) a 0 manually after having hit on the key insight. This took maybe 20 or 30 minutes or an hour using the F# functions and an IDE for syntax checking (to make sure I didn't drop any parentheses, for example). Once I had done that, I had the numeric answer to the problem, but I wrote a function to do the transformation for any L-expression. (I wrote the function in F# using its pattern matching facility, and I defined the expressions as a discriminated union type rather than using strings, which I expect would be a bit more cumbersome.)

If you haven't yet solved this problem and are still thinking about doing so, good luck. It was a rewarding experience.

\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Morty Proxy This is a proxified and sanitized view of the page, visit original site.