Report: A Natural Language Query System in Python/NLTK

In my second semester as part of my Processing Formal and Natural Languages course I had to complete two assignments. One realted to formal languages and one related to natural lanuages. This report covers the second, in which I scored 92%.


In this assignment, you will use Python and NLTK to construct a system that reads simple facts and then answers questions about them. You can think of it as a simple form of both machine reading and question answering.

Your completed system will enable dialogues such as the following:

$$ John is a duck.
$$ Mary is a duck.
$$ John is purple.
$$ Mary flies.
$$ John likes Mary.
$$ Who is a duck?
John  Mary
$$ Who likes a duck who flies?

$$ Which purple ducks fly?

Sentences submitted by the user are either statements or questions. Statements have a very simple form, but the system uses them to learn what words are in the language and what parts of speech they have. (For example, from the statements above, the system learns that duck is a noun, fly is an intransitive verb and so on.) Questions can have a much more complex form, but can only use words and names that the system has already learned from the statements it has seen.
In Part A, you will develop the machinery for processing statements. This will include a simple data structure for storing the words encountered (a lexicon), and another for storing the content of the statements (a fact base). You will also write some code to extract a verb stem from its 3rd person singular form (e.g. flies → fly).
Parts B to D develop the machinery for questions. Part B is concerned with part-of-speech tagging of questions, allowing for ambiguity and also taking account of singular and plural forms for nouns and verbs. In Part C you are given a context free grammar for the question language, along with a parser, courtesy of NLTK. Your task is to write some Python code that does agreement checking on the resulting parse trees, in order to recognize that e.g. Which ducks flies? is ungrammatical. Agreement checking is used in the system to eliminate certain impossible parse trees. In Part D, you will give a semantics for questions, in the form of a Python function that translates them into lambda expressions. These lambda expressions are then processed by NLTK to transform them into logical formulae; the answer to the question is then computed by a back-end model checker which is provided for you.
Finally, in Part E, you are invited to supply a short comment on ways in which the resulting system might be improved.



Due to University regulations my code is only available on request, so if you’re interested at all please contact me:

Leave a Reply

Your email address will not be published.