Python NLTK :: Intersecting words and sentences

I'm using NLTK - a specific toolkit for manipulating corpus texts, and I've defined a function to intersect user inputs with Shakespeare's words.

def shakespeareOutput(userInput):

    user = userInput.split()
    user = random.sample(set(user), 3)

    #here is NLTK's method
    play = gutenberg.sents('shakespeare-hamlet.txt') 

    #all lowercase
    hamlet = map(lambda sublist: map(str.lower, sublist), play) 

print hamlet returns:

[ ['[', 'the', 'tragedie', 'of', 'hamlet', 'by', 'william', 'shakespeare', '1599', ']'],
['actus', 'primus', '.'],
['scoena', 'prima', '.'],
['enter', 'barnardo', 'and', 'francisco', 'two', 'centinels', '.'],
['barnardo', '.'],
['who', "'", 's', 'there', '?']...['finis', '.'],
['the', 'tragedie', 'of', 'hamlet', ',', 'prince', 'of', 'denmarke', '.']]

I would like to find the sentence which contains the most occurrences of user words and return the sentence. I am trying:

    bestCount = 0
    for sent in hamlet:
        currentCount = len(set(user).intersection(sent))
        if currentCount > bestCount:
            bestCount = currentCount
            answer = ' '.join(sent)
            return ''.join(answer).lower(), bestCount

calling the function:

   shakespeareOutput("The Actus Primus")

returns:

['The', 'Actus', 'Primus'] None

what am I doing wrong?

thanks in advance.


ANSWERS:


Your way of evaluating currentCount is wrong. Set intersection returns the number of distinct elements matched, not the count of the matched elements.

>>> s = [1,1,2,3,3,4]
>>> u = set([1,4])
>>> u.intersection(s)
set([1, 4])    # the len is 2, however the total number matched elements are 3

Use the following code.

bestCount = 0

for sent in hamlet:
    currentCount = sum([sent.count(i) for i in set(user)])
    if currentCount > bestCount:
        bestCount = currentCount
        answer = ' '.join(sent)

return answer.lower(), bestCount


 MORE:


 ? ThreadPoolExecutor is not defined [python3]
 ? ThreadPoolExecutor is not defined [python3]
 ? ThreadPoolExecutor is not defined [python3]
 ? How to inspect generators in the repl/ipython in Python3
 ? Python3: ReferenceError: weakly-referenced object no longer exists
 ? body = 'cmd=' + urllib_parse.quote_plus(unicode(verb).encode('utf-8')) returns "name 'unicode' is not defined"
 ? body = 'cmd=' + urllib_parse.quote_plus(unicode(verb).encode('utf-8')) returns "name 'unicode' is not defined"
 ? body = 'cmd=' + urllib_parse.quote_plus(unicode(verb).encode('utf-8')) returns "name 'unicode' is not defined"
 ? How to convert Selenese (html) to Python programmatically?
 ? python3 manage.py migrate exceptions