11 users online:
-- 0 registered
-- 1 hidden
-- 10 guests
0 user in the chatroom
(User activity over the last 10 minutes)
Author: ASmallNumberOfMonkeys
Started: 26/11/06
Last Edited: 28/11/06
Published: 26/11/06
Revision: 6
read reviews/comments
(what's this?)
| Holiday Cottage Bembridge, Isle of Wight, UK | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Do you need someone impartial and experienced to look over your work? Visit Natasha Wagner Editorial and get a free, no-obligation quote.
| Extended non-Fiction [Other] | Moderators for this section: ochsterboxter |
The Moving Digit Writes IIOutline: An article on computer generated poetry part 2. First draft and written pretty quickly. Why: An exercise; may be interesting too. Review: Anything at all. Brutal as you like. The Markov Chain
So we’ve covered a basic word selection process and phonetic and metric analysis let’s consider some more sophisticated word selection as we left it at a fairly primitive stage. It’s now time to mention Andrey Markov , he was a Russian professor of mathematics, and yes he was poet too. His ideas were far reaching and are used in many fields requiring statistical probability modelling. While his interest in poetry was at times mathematical, without computers available in his day he wasn’t designing poetry engines. One of his theories has been developed into what we now call a Markov Chain , but unless I see it as code I have trouble with the equations too so I’ll try to explain. If we take the preceding paragraph you will notice that some words appear more than once; e.g. ‘so’, ‘a’, ‘I’, ‘it’ and so on. Now imagine making a chain out of the words from the paragraph in question. But where we have a duplicate word we don’t insert a new link, we go back along the chain to where the word first appeared and connect the end we were just about to put the duplicate word on to that word. The word that follows our duplicate word we also connect to that point. We end up with something that might look like a flail or more importantly a tree of words with lots of looped branches. That is a pretty basic way to describe how a Markov Chain may be built. Now we have this thing what do we do with it? Well if you start at any link in the chain and moving to the next link until you come to a point where there are multiple connections. Selecting a next link at random you may continue to traverse the chain. And if you were writing the words down you’d be creating random streams of words. You can imagine with a large text the words ‘and’, ‘the’ and punctuation marks would provide a fair selection of next words and if we have enough of them we can select only words that fit a criteria. Perhaps we want a rhyming word with a specific metric signature. Maybe we have an assonance or consonance pattern we wish to find. We can do this to try to produce some poetic effect in the output stream of words. Perhaps if we have a list of criteria and introduce precedence if the first criteria cannot be met we may apply the next and so on. If the worst comes to the worst we can just take what we can get or give up and start again. We have a fair amount of choice. We can do a few more things with these chains like adding them together to produce sum chains. If we have a number of individual chains we can use a probability selection process by weighting (making it a more or less probable choice). Poetry engines Many people have used this system to produce poetry engines, and I have used it incorporating the word and chain selection processes I’ve just described here. The most notable examples one can find on the internet are Ray Kurtzwell’s Cyber poet which is quite interesting but not open source so I’ve not studied the code for this project. Actually Ray is quite an interesting man too, he’s a poet as well and something of a driving force in the field of artificial intelligence. The other big project was something called Gnoetry and this is open source and I have looked at some of it and they have some different approaches to the ones I have used. I’m not sure if they’re still going as I’ve not seen any development for a while on this or indeed Cyber poet for some time. There are limitations to the Markov chain approach as proper nouns and phrases obviously can leak out and give the game away. The text is not quite original, not downright plagiarism but capturing something of the style of the training texts. The great advantage is that we can side step the question of dealing with the grammar, and allow the word organisation in the Markov chain to take over this complex task. Here is a sample of raw Markov generated poetry; the training text is ‘Alice in Wonderland’ by Lewis Carol and the pattern used was Shakespeare’s Sonnet 18. The software used to generate this was my poetry engine ‘Word Bench’. Her eyes appeared on where you thinking I? We went mad after her so dreadfully: it's pleased so rich and walking hand on my, two people had found and then turn them free: wouldn't be much to fall a moment's pause, that rabbit cried the most important air; for anything had a fish footman because, no more, but after thinking I declare; might find another figure my dear paws, all is another of this down a court, all in hand round your finger for your jaws, for your flamingo was at home this short; I was this caused some wine', and half my plan, bill had not as, while two were down off than. Analysing poetry This brings me on to the next point for discussion, how did I get a pattern for Shakespeare’s Sonnet 18? And what exactly is a ‘pattern’? Well we need to ‘parse’ a piece of text and carefully examine each and every part of it. The computer splits the text into lines and treats each line as a separate entity. The main reason for this is to reconstruct the form in terms of component parts. Also it’s very useful when looking for end rhymes, although a better system would just look for more than just rhymes and would look throughout the whole text. If I do another iteration of development this will be a primary consideration. A well-featured poetry analyser should do this in any case, and any of us could benefit from using a tool like that to hone our efforts. But for now it’s just end rhymes and these are stored as strings of text. For a Shakespearian sonnet the string would be ‘A,B,A,B,C,D,C,D,E,F,E,F,G,G’ which any poet would recognise as a rhyme signature for a 14 line poem. The template produced by the analysis of Shakespeare’s Sonnet 18 looks like the text below. PRP$ NNS VBD IN WRB PRP VBG NN PUN <10:0,1,01,0,0,0,10,1> PRP VBD JJ IN PRP RB RB PUN <10:0,1,1,10,0,1,100> POS VBN RB JJ CC VBG NN IN PRP$ PUN <10:1,1,1,1,0,10,1,0,0> CD NNS VBD VBN CC RB VB PRP JJ PUN <10:0,10,1,1,0,1,1,0,1> NN VB JJ TO VB DT POS NN PUN <10:12,1,1,0,1,0,10,1> IN NN VBD DT RBS JJ NN PUN <10:0,10,1,0,1,010,1> IN NN VBD DT NN NN IN PUN <11:0,100,1,0,1,12,01> DT JJR PUN CC IN VBG NN VB PUN <10:0,1,0,10,10,1,01> MD VB DT NN PRP$ JJ NNS PUN <10:0,1,210,10,0,1,1> DT VBZ DT IN DT RB DT NN PUN <10:0,1,010,0,0,1,0,1> DT IN NN NN PRP$ NN IN PRP$ NNS PUN <10:0,0,1,1,0,10,0,0,1> IN PRP$ NN VBD IN NN DT JJ PUN <10:0,0,210,1,0,1,0,1> NN VBD DT VBN DT POS PUN CC NN PRP$ NN PUN <10:1,1,0,1,0,1,0,1,0,1> NN VBD RB IN PUN IN CD VBD RB IN IN PUN <10:1,1,1,0,0,0,1,1,0,0> Grammar Taggers This is where things get a bit tricky, you’ll notice the various symbols PRP$, NNS VBD and so on. There is one of these for each word to represent it grammatical part of speech. PUN is just punctuation NN is actually a proper noun and appears in numerous places it shouldn’t, thus highlighting some of the failures of the system. I’m not nearly clever enough to write a grammar parser for the English language. Not that I’m putting myself down, it is simply a hugely complex task requiring a knowledge of English grammar that’s beyond me, and a good understanding of software development, which fortunately I do have to some degree. But I know of a man who does both, and on the open source scene I think he’s the best there is at the moment; Eric Brill , and I am using a version of his ‘Grammatical Part of Speech Tagger’. The mistakes it makes are not entirely Eric’s fault as I said this is a complex area and the ‘Tagger’ has to be trained to recognise grammar. Coupled with that I have hand translated it from one computer language to another. I don’t think I’ve hashed it up too badly as it seems to do the same things as the original but all the same, we do need a better solution. The grammar symbols are not used in the Markov generation but I will come to where they are used shortly. Metrical lookup analysis You will notice the parts that look like <10:0,0,210,1,0,1,0,1>, these are the metric signatures of each line broken down into words. The <10: states there are ten syllables in the line and the 0,0,210,1,0,1,0,1> are signatures of the words. 0 is unstressed, 1 primary stress and 2 secondary stress. This information comes from the CMU Pronouncing Dictionary. Generation by dictionary substitution If we can get a fairly accurate part of speech tag for a word and its metric signature then there’s the possibility of looking for a word in a dictionary that is similar. We may set up what criteria satisfy our ‘similar’ filter. Metrics, Rhyme Assonance, Consonance and Alliteration are all possible with a decent phonetic dictionary. The only obstacle is we have to correctly supply grammatical tags to the dictionary but if we do this we can combine the grammar tagger and the dictionary lookup used during the analysis and template generation to double check each other. Still not one hundred percent accurate but at least it works to some degree. A word in the dictionary may have a number of tags so we wont know which is the right one. If the grammar tagger comes up with a tag we can check if the dictionary agrees that it’s a possibility. If the dictionary disagrees with the tagger then we take the first one in the dictionary’s list of tags, assuming we ordered them probabilistically. Template of Shakespeare’s Sonnet 18 used for dictionary lookup generation with assonance filter applied. With rads recessed than an it any bye? Thou arched bushed anti she asked stupidly: husk's crimped through bake both trotting thumb up why, nine inches deep flopped or `off vaunt they'll the: sunlight rate charged to ask' them pennant's sides, at any imped this be projected changed; at emery dealt such tress sunshine besides, her sheer, or any taming dong arranged; must be' producing people' their backs watch, the act another up four west this dressed, both his dregs ram mine farmer if a watch', lest she emergence lined due light much rest; snow poled an air that case, or `off thine come, go wore fist 'till, as a hitched whilst due some. Excerpt of another dictionary lookup generation, assonance applied there were a few failures so consonance applied and regeneration was performed to fill in the failures. The template was a familiar Christmas song. where drunk nor twelve aground none thrust Sees trick such rood and as Far Phantom Braced was beating back my arm a back was' ground and As Buddhism drink his arm a graced inter that lovely blank and as we've golfed congealing shucks, jungle bolls, nothing bee Jungle all a back yuck, what I can hews to ask' in a three blasts spartan Slash-b Jungle bolls, better plan' Jungle all a back yes, what I can peps to ask' then such one arm spartan slam yuck I can start editing the templates and introducing strict rhythms and come up with something like this. Owl madly taste bottoms fish namely took novelty all mining here's shooting queens chow panted thieves mildly makes people shook bodily hedge sewing loaned fasting means quake any good jousting stone wrestling be scrupulous bruise owing plumb ketchup clear French lesson book anti fluff hangers he'd seventy east payments meet funky fear it' inward 'way enter hank sutures pour theatre kelped duly blight sowing cakes leaves anti lumps custom yawn cleanly floor usual slit beeper eight tremble cakes' lies newly tried owing group boodle share solidly orderly insolent hare But by now you’ll be getting tired of reams of meaningless rubbish and that’s where I come to the end of my exploration of the subject. The Future It holds some promise of greater sophistication and maybe someone will find a neat trick to play with a neural network or genetic algorithms. Perhaps the very first system I described could be set up using the gramatical analysis along with some form of AI. One thing that is distinctly lacking in what you have seen here is context and there is an interesting project Wordnet available on and offline. Some folks may already be looking at this with a view to poetry generation. It’s an exciting development and I’d love to see it generating poetry with a subject. Also there are some very exciting developments in artificial intelligence modelling human behaviour and personality. Ray Kurtzwell has recently suggested that artificial intelligence will outstrip human intelligence in the not too distant future and I’m afraid it not a load of old coprolites, though I do wonder about some of his predictions; after all he’s just human, right? |
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||