Assignment 3, CS3346A, 2013

Assigning date: Oct 30, 2013
Due date: Nov 10 (extended), 2013 (midnight)
Electronic submission: See how here).
Individual effort (no group work)
Total marks: 15% of the final marks

Update (none):

Question 1:
Consider the following grammar rules:

s->np, vp
np->det, n
np->det, adjp
adjp->adj, n
pp->p, np
comp->p, vp
vp->v, pp
vp->v, comp

(a) Using the aforementioned grammar rules, draw the syntactic tree for the sentence "The bird tried to escape from the strong cage".
(b) According to the syntactic tree that you produced, provide the label brackets.

Question 2:
Use the Stanford Parser and CCG Parser to parse the following sentence:

"A significant increase in glutamate levels was induced in the normal dentate gyrus during 10-min ischemia."

The Part-of-speech (POS) tags provided by Stanford parser for the sentence are:
A/DT
significant/JJ
increase/NN
in/IN
glutamate/JJ
levels/NNS
was/VBD
induced/VBN
in/IN
the/DT
normal/JJ
dentate/JJ
gyrus/NNS
during/IN
10-min/JJ
ischemia/NN

The POS tags provided by CCG parser for the sentence are:
A/DT
significant/JJ
increase/NN
in/IN
glutamate/NN
levels/NNS
was/VBD
induced/VBN
in/IN
the/DT
normal/JJ
dentate/NN
gyrus/NN
during/IN
10-min/NN
ischemia/NN

Consider the tagging made by CCG parser as the "gold standard" and evaluate the tagging performance of Stanford parser in terms of Precision, Recall, and F1-score. Give the confusion matrix that shows the number of true positives, false positives, false negatives, and true negatives.

Question 3:
Approximately what order n-grams should we use to model each of these problems? Should the model be at the level of characters or words? Explain your reasoning. There are no absolute right or wrong answers, but the probability of being right for some answers is higher than for others. Explain your answers.

(a) Generating a nonsense English word that looks English-like in its spelling.

(b) Predicting whether the English morpheme '-s' will be pronounced as z or s. (Listen carefully to some English plural nouns to figure out a pattern: places, papers, tips, cats, dogs, etc.)

(c) Predicting the next character in a partially-typed word.

(d) Context-sensitive spell-check (e.g: correcting mistakes like `their' for `there')

(e) Checking the coherence of a translation of a sentence from Chinese to English.

(f) Checking the coherence of a speech recognition engine's output in English.

(g) Checking the coherence of an English text that has been transcribed with Optical Character Recognition from a page image.

(h) Generating a fake academic paper by training a language model on a corpus of papers. (See an actual example at http://pdos.csail.mit.edu/scigen/. They use a context-free grammar, but you could do a pretty good job with n-grams.)

(i) Predicting whether an e-mail is spam.

(j) Automatic capitalization in English. (Type in lowercase in a word processor like MS Word and observe where it auto-capitalizes.)

Question 4:
Recall that [s] and [/s] are symbols that denote start of sentence and end of sentence respectively. We ignore punctuation here. Consider these expressions:

(a) The probability that someone asks `why do you' as a complete question.

(b) The probability that you hear `why do you' as a snippet of a conversation as you walk across the Green.

(c) The probability that you hear a question beginning `why do you'.

Match them with the probabilities below, and briefly explain.

1. P(why)P(do|why)P(you|do)
2. P(why|[s])P(do|why)P(you|do)
3. P(why|[s])P(do|why)P(you|do)P([/s]|you)