Update (none):
Question 1:
Consider the following grammar rules:
s->np, vp
np->det, n
np->det, adjp
adjp->adj, n
pp->p, np
comp->p, vp
vp->v, pp
vp->v, comp
(a) Using the aforementioned grammar rules, draw the syntactic tree for the
sentence "The bird tried to escape from the strong cage".
(b) According to the syntactic tree that you produced, provide the label
brackets.
Question 2:
Use the
Stanford Parser
and
CCG Parser
to
parse the following sentence:
"A significant increase in glutamate levels was induced in the normal dentate gyrus during 10-min ischemia."
The Part-of-speech (POS) tags provided by Stanford parser for the sentence are:
A/DT
significant/JJ
increase/NN
in/IN
glutamate/JJ
levels/NNS
was/VBD
induced/VBN
in/IN
the/DT
normal/JJ
dentate/JJ
gyrus/NNS
during/IN
10-min/JJ
ischemia/NN
The POS tags provided by CCG parser for the sentence are:
A/DT
significant/JJ
increase/NN
in/IN
glutamate/NN
levels/NNS
was/VBD
induced/VBN
in/IN
the/DT
normal/JJ
dentate/NN
gyrus/NN
during/IN
10-min/NN
ischemia/NN
Consider the tagging made by CCG parser as the "gold standard" and
evaluate the tagging performance of Stanford parser in terms of Precision,
Recall, and F1-score. Give the confusion matrix that shows the number of
true positives, false positives, false negatives, and true negatives.
Question 3:
Approximately what order n-grams should we use to model each of these
problems? Should the model be at the level of characters or words?
Explain your reasoning. There are no absolute right or wrong answers, but
the probability of being right for some
answers is higher than for others. Explain your answers.
(a) Generating a nonsense English word that looks English-like in its spelling.
(b) Predicting whether the English morpheme '-s' will be pronounced as z or s. (Listen carefully to some English plural nouns to figure out a pattern: places, papers, tips, cats, dogs, etc.)
(c) Predicting the next character in a partially-typed word.
(d) Context-sensitive spell-check (e.g: correcting mistakes like `their' for `there')
(e) Checking the coherence of a translation of a sentence from Chinese to English.
(f) Checking the coherence of a speech recognition engine's output in English.
(g) Checking the coherence of an English text that has been transcribed with Optical Character Recognition from a page image.
(h) Generating a fake academic paper by training a language model on a corpus of papers. (See an actual example at http://pdos.csail.mit.edu/scigen/. They use a context-free grammar, but you could do a pretty good job with n-grams.)
(i) Predicting whether an e-mail is spam.
(j) Automatic capitalization in English. (Type in lowercase in a word processor like MS Word and observe where it auto-capitalizes.)
Question 4:
Recall that
[s] and [/s]
are symbols that denote start of sentence and end of sentence respectively. We ignore punctuation here.
Consider these expressions:
(a) The probability that someone asks `why do you' as a complete question.
(b) The probability that you hear `why do you' as a snippet of a conversation as you walk across the Green.
(c) The probability that you hear a question beginning `why do you'.
Match them with the probabilities below, and briefly explain.
1. P(why)P(do|why)P(you|do)
2. P(why|[s])P(do|why)P(you|do)
3. P(why|[s])P(do|why)P(you|do)P([/s]|you)