Skip to main content
  1. About
  2. For Teams
Asked
Viewed 7k times
2

I have a python method that takes a list of tuples of the form (string, float) and returns a list of strings that, if combined, would not exceed a certain limit. I am not splitting sentences to preserve the output length, but making sure to stay within a sentence length from the desired output length.

For example:
s: [('Where are you',1),('What about the next day',2),('When is the next event',3)]

max_length : 5
output : 'Where are you What about the next day'

max_length : 3
output: 'Where are you'

This is what I am doing:

l=0
output = []
for s in s_tuples:
   if l <= max_length:
     output.append(s[0])
     l+=len(get_words_from(s[0]))
 return ''.join(output)

Is there a smarter way to make sure the output word length does not exceed max_length other than stopping when the length is reached?

4
  • I don't understand the output for max_length 5. Isn't 'When is the next event' of length 5, too? EDIT: okay I've got it.
    jmilloy
    –  jmilloy
    2011-11-08 20:39:39 +00:00
    Commented Nov 8, 2011 at 20:39
  • @atlantis: Your variable name "max_length" AND your "would not exceed a certain limit" AND your "make sure the output word length does not exceed max_length" contradict what you say in comments. Please edit your question so that it is consistent with what you really want to do.
    John Machin
    –  John Machin
    2011-11-08 20:54:51 +00:00
    Commented Nov 8, 2011 at 20:54
  • So, are you are looking for the shortest set of strings that has at least the given number of words? That is what your example seems to be doing. Also, what is the point of the number in the pairs? Must we choose the first string before any of the others?
    101100
    –  101100
    2011-11-08 21:43:10 +00:00
    Commented Nov 8, 2011 at 21:43
  • @10100: "seems to be" != "is"
    John Machin
    –  John Machin
    2011-11-08 22:23:46 +00:00
    Commented Nov 8, 2011 at 22:23

5 Answers 5

2

First, I see no reason to defer the breaking out of loop if the maximum length is reached to the next iteration.

So, altering your code, I come up with the following code:

s_tuples = [('Where are you',1),('What about the next day',2),('When is the next event',3)]


def get_words_number(s):
    return len(s.split())


def truncate(s_tuples, max_length):
    tot_len = 0
    output = []
    for s in s_tuples:
        output.append(s[0])
        tot_len += get_words_number(s[0])
        if tot_len >= max_length:
            break
    return ' '.join(output)


print truncate(s_tuples,3)

Second, I really don't like that a temporary object output is created. We can feed the join method with the iterator which iterates over the initial list without duplicating the information.

def truncate(s_tuples, max_length):

    def stop_iterator(s_tuples):
        tot_len = 0
        for s,num in s_tuples:
            yield s
            tot_len += get_words_number(s)
            if tot_len >= max_length:
                break

    return ' '.join(stop_iterator(s_tuples))


print truncate(s_tuples,3)

Also, in your examples, the output is slightly bigger than the set maximum of words. If you want the number of words to be always less that the limit (but still the maximum possible), than just put yield after checking against the limit:

def truncate(s_tuples, max_length):

    def stop_iterator(s_tuples):
        tot_len = 0
        for s,num in s_tuples:
            tot_len += get_words_number(s)
            if tot_len >= max_length:
                if tot_len == max_length:
                    yield s
                break
            yield s

    return ' '.join(stop_iterator(s_tuples))


print truncate(s_tuples,5)
Sign up to request clarification or add additional context in comments.

10 Comments

In your final snippet, you will never get exactly the max_length.
@JohnMachin You are right. I'll edit the answer. It's usually a good practice to check the solution on the 'corner' cases (what I didn't do here).
Now re-read the first sentence of your answer and apply it to your final snippet.
@JohnMachin Sorry, but it's already implemented. What was in the OP's question is that he calculated l+=... and did nothing with it on the current iteration, it's usage was deferred till the next iteration where he compared it with max_length. In my code these operations come together.
Sorry*2, but l+=... is irrelevant. The issue is that in your final snippet, when tot_len == max_length, it does not break after yielding, it goes around once more (if the input is not depleted) and uselessly calculates the length of the next item. This behaviour qualifies as "defer the breaking out of loop if the maximum length is reached to the next iteration"
|
1

One smarter way would be to break out of the loop as soon as you exceed max_length, that way you are not looping over the rest of the list for no reason:

for s in s_tuples:
    if l > max_length:
        break
    output.append(s[0])
    l += len(get_words_from(s[0]))
return ''.join(output)

Comments

1

Your code doesn't stop when the limit is reached. "max_length" is a bad name ... it is NOT a "maximum length", your code allows it to be exceeded (as in your first example) -- is that deliberate? "l" is a bad name; let's call it tot_len. You even keep going when tot_len == max_length. Your example shows joining with a space but your code doesn't do that.

You probably need something like:

tot_len = 0
output = []
for s in s_tuples:
    if tot_len >= max_length:
        break
    output.append(s[0])
    tot_len += len(get_words_from(s[0]))
return ' '.join(output)

Comments

1

what is max_length supposed to control? the total number of words in the returned list? i would have expected a max_length of five to only yield 5 words, not 8.

EDIT: i would keep two lists around since i think it's easy to read, but some might not like the additional overhead:

def restrictWords(givenList, whenToStop):
    outputList = []
    wordList = []
    for pair in givenList:
        stringToCheck = pair[0]
        listOfWords = stringToCheck.split()
        for word in listOfWords:
            wordList.append(word)
        outputList.append( stringToCheck )
        if len( wordList ) >= whenToStop:
            break
    return outputList

so for

testList = [ ('one two three',1),
             ('four five',2),
             ('six seven eight nine',3) ]

2 should give you ['one two three'] 3 should give you ['one two three'] 4 should give you ['one two three', 'four five']

5 Comments

Yes, max_length does control number of words in the output but not precisely. I am not splitting sentences to preserve the output length, but making sure to stay within a sentence length from the desired output length. for output length 5, I can't split the second sentence.
When you post an answer, it should be an actual attempt to answer the question. As it is now, it should be a comment. If this is just some kind of placeholder answer, you should never do this.
@atlantis okay i see, so it's not the maximum length, but it indicates which string will be your last one.
@JeffMercado well i'm still continuing to type the answer i was going to provide, but i wanted to see if i could get more information from the op while i finished it up. i don't have the ability to comment yet. i guess i should just not answer in the future, sorry.
Understandable. Please try to complete your answer then now that you have the information you wanted. Otherwise it will end up being deleted.
0

If NumPy is available the following solution using list comprehension works.

import numpy as np

# Get the index of the last clause to append.
s_cumlen = np.cumsum([len(s[0].split()) for s in s_tuples])
append_until = np.sum(s_cumlen < max_length)

return ' '.join([s[0] for s in s_tuples[:append_until+1]])

For clarity: s_cumlen contains the cumulative sums of the word counts of your strings.

>>> s_cumlen
array([ 3,  8, 13])

Comments

Your Answer

Post as a guest

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Morty Proxy This is a proxified and sanitized view of the page, visit original site.