Saturday, February 5, 2011

Position Swap (Transformational Grammar)

a."there is a cat," said the man.
b. the man said "there is a cat".

In the above direct speech, the position of agent, reporting verb and speech are swapped. We may refer to this as transformation. Basically, the strings are composed of three variables. First variable is agent. Agent is the one who's uttering speech, and it is usually human. Syntactically, it is constituted by NP. Second variable is reporting verb. Reporting verbs are verbs that are used in reported speech such as: say, reply, shout etc. Third variable is speech uttered by the agent. In direct speech it is marked by quotes (" speech"). The speech can be just one word, phrase or clause.

First variable (V0): agent

Second variable (V1): Reporting verb

Third variable (V2): speech <"speech">

The aim is to swap the position of to . In this way, sentence (a) is changed to sentence (b). To do this, there are several steps to do. We're going to do it with LGG.


First, LGG must be designed to recognize V2+V1+V0.

Second, the position of variable must be swapped. To swap this, we must set up perimeter around the variable, which is called variable function, marked by red bracket. How to setup variable?

$+variable name+opening bracket+ box filled with lexicon + $ +variable name (must be the same as previous one)+closing bracket

LGG for Direct Speech Transformation


Speech are marked by quotes (opening and closing). However, in Unitex, quote must be escaped by backslash \. Therefore in the box, they are written as \". Inside the quotes, there is a speech sequence. This sequence is marked by uppercase letter just after opening bracket. Token started by uppercase letter is written as <PRE>. Speech aren't just limited to one word, but it can be phrase and clause. For the sequence is unlimited, we put loop on . It means sequences of tokens. The sequence is ended by comma and quotation mark.

Unfortunately there is no dictionary set for reporting verbs. However, the verbs are most likely to be reporting when it exists after speech. Therefore, I just put , indicating any verb. Agent is really open to develop. My LGG there captures only NP consisting of or .

Third (After long explanation, here's the climax), we must swap the order. V0 is set up excluding quotation mark. Why? because it will involve comma inside the speech. Therefore, it is setup between the beginning of speech to right before comma. V1 is set on reporting verb and V2 is set on agent. The following output is given so the position is swapped $var2$ $var1$ "$var0".

From the concordance under the LGG, you can see that it successfully extracts sentence composed of direct speech + reporting verb + agent (e.g "there is a cat" said the man) to agent+reporting verb+direct speech (e.g the man said "there is a cat")

No comments:

Post a Comment