Who is online?

11 users online:
-- 0 registered
-- 1 hidden
-- 10 guests

0 user in the chatroom

(User activity over the last 10 minutes)

1801 registered users
Members List

Extended non-Fiction [Other] Moderators for this section: ochsterboxter

The Moving Digit Writes II


Outline: An article on computer generated poetry part 2. First draft and written pretty quickly.
Why: An exercise; may be interesting too.
Review: Anything at all. Brutal as you like.
The Markov Chain

So we’ve covered a basic word selection process and phonetic and metric analysis let’s consider some more sophisticated word selection as we left it at a fairly primitive stage. It’s now time to mention Andrey Markov , he was a Russian professor of mathematics, and yes he was poet too. His ideas were far reaching and are used in many fields requiring statistical probability modelling. While his interest in poetry was at times mathematical, without computers available in his day he wasn’t designing poetry engines. One of his theories has been developed into what we now call a Markov Chain , but unless I see it as code I have trouble with the equations too so I’ll try to explain.

If we take the preceding paragraph you will notice that some words appear more than once; e.g. ‘so’, ‘a’, ‘I’, ‘it’ and so on. Now imagine making a chain out of the words from the paragraph in question. But where we have a duplicate word we don’t insert a new link, we go back along the chain to where the word first appeared and connect the end we were just about to put the duplicate word on to that word. The word that follows our duplicate word we also connect to that point. We end up with something that might look like a flail or more importantly a tree of words with lots of looped branches. That is a pretty basic way to describe how a Markov Chain may be built.

Now we have this thing what do we do with it? Well if you start at any link in the chain and moving to the next link until you come to a point where there are multiple connections. Selecting a next link at random you may continue to traverse the chain. And if you were writing the words down you’d be creating random streams of words.

You can imagine with a large text the words ‘and’, ‘the’ and punctuation marks would provide a fair selection of next words and if we have enough of them we can select only words that fit a criteria. Perhaps we want a rhyming word with a specific metric signature. Maybe we have an assonance or consonance pattern we wish to find. We can do this to try to produce some poetic effect in the output stream of words. Perhaps if we have a list of criteria and introduce precedence if the first criteria cannot be met we may apply the next and so on. If the worst comes to the worst we can just take what we can get or give up and start again. We have a fair amount of choice. We can do a few more things with these chains like adding them together to produce sum chains. If we have a number of individual chains we can use a probability selection process by weighting (making it a more or less probable choice).

Poetry engines

Many people have used this system to produce poetry engines, and I have used it incorporating the word and chain selection processes I’ve just described here. The most notable examples one can find on the internet are Ray Kurtzwell’s Cyber poet which is quite interesting but not open source so I’ve not studied the code for this project. Actually Ray is quite an interesting man too, he’s a poet as well and something of a driving force in the field of artificial intelligence. The other big project was something called Gnoetry and this is open source and I have looked at some of it and they have some different approaches to the ones I have used. I’m not sure if they’re still going as I’ve not seen any development for a while on this or indeed Cyber poet for some time.

There are limitations to the Markov chain approach as proper nouns and phrases obviously can leak out and give the game away. The text is not quite original, not downright plagiarism but capturing something of the style of the training texts. The great advantage is that we can side step the question of dealing with the grammar, and allow the word organisation in the Markov chain to take over this complex task.

Here is a sample of raw Markov generated poetry; the training text is ‘Alice in Wonderland’ by Lewis Carol and the pattern used was Shakespeare’s Sonnet 18. The software used to generate this was my poetry engine ‘Word Bench’.

Her eyes appeared on where you thinking I?
We went mad after her so dreadfully:
it's pleased so rich and walking hand on my,
two people had found and then turn them free:
wouldn't be much to fall a moment's pause,
that rabbit cried the most important air;
for anything had a fish footman because,
no more, but after thinking I declare;
might find another figure my dear paws,
all is another of this down a court,
all in hand round your finger for your jaws,
for your flamingo was at home this short;
I was this caused some wine', and half my plan,
bill had not as, while two were down off than.



Analysing poetry


This brings me on to the next point for discussion, how did I get a pattern for Shakespeare’s Sonnet 18? And what exactly is a ‘pattern’? Well we need to ‘parse’ a piece of text and carefully examine each and every part of it. The computer splits the text into lines and treats each line as a separate entity. The main reason for this is to reconstruct the form in terms of component parts. Also it’s very useful when looking for end rhymes, although a better system would just look for more than just rhymes and would look throughout the whole text. If I do another iteration of development this will be a primary consideration. A well-featured poetry analyser should do this in any case, and any of us could benefit from using a tool like that to hone our efforts. But for now it’s just end rhymes and these are stored as strings of text. For a Shakespearian sonnet the string would be

‘A,B,A,B,C,D,C,D,E,F,E,F,G,G’

which any poet would recognise as a rhyme signature for a 14 line poem. The template produced by the analysis of Shakespeare’s Sonnet 18 looks like the text below.

PRP$ NNS VBD IN WRB PRP VBG NN PUN <10:0,1,01,0,0,0,10,1>
PRP VBD JJ IN PRP RB RB PUN <10:0,1,1,10,0,1,100>
POS VBN RB JJ CC VBG NN IN PRP$ PUN <10:1,1,1,1,0,10,1,0,0>
CD NNS VBD VBN CC RB VB PRP JJ PUN <10:0,10,1,1,0,1,1,0,1>
NN VB JJ TO VB DT POS NN PUN <10:12,1,1,0,1,0,10,1>
IN NN VBD DT RBS JJ NN PUN <10:0,10,1,0,1,010,1>
IN NN VBD DT NN NN IN PUN <11:0,100,1,0,1,12,01>
DT JJR PUN CC IN VBG NN VB PUN <10:0,1,0,10,10,1,01>
MD VB DT NN PRP$ JJ NNS PUN <10:0,1,210,10,0,1,1>
DT VBZ DT IN DT RB DT NN PUN <10:0,1,010,0,0,1,0,1>
DT IN NN NN PRP$ NN IN PRP$ NNS PUN <10:0,0,1,1,0,10,0,0,1>
IN PRP$ NN VBD IN NN DT JJ PUN <10:0,0,210,1,0,1,0,1>
NN VBD DT VBN DT POS PUN CC NN PRP$ NN PUN <10:1,1,0,1,0,1,0,1,0,1>
NN VBD RB IN PUN IN CD VBD RB IN IN PUN <10:1,1,1,0,0,0,1,1,0,0>


Grammar Taggers

This is where things get a bit tricky, you’ll notice the various symbols PRP$, NNS VBD and so on. There is one of these for each word to represent it grammatical part of speech. PUN is just punctuation NN is actually a proper noun and appears in numerous places it shouldn’t, thus highlighting some of the failures of the system. I’m not nearly clever enough to write a grammar parser for the English language. Not that I’m putting myself down, it is simply a hugely complex task requiring a knowledge of English grammar that’s beyond me, and a good understanding of software development, which fortunately I do have to some degree. But I know of a man who does both, and on the open source scene I think he’s the best there is at the moment; Eric Brill , and I am using a version of his ‘Grammatical Part of Speech Tagger’. The mistakes it makes are not entirely Eric’s fault as I said this is a complex area and the ‘Tagger’ has to be trained to recognise grammar. Coupled with that I have hand translated it from one computer language to another. I don’t think I’ve hashed it up too badly as it seems to do the same things as the original but all the same, we do need a better solution. The grammar symbols are not used in the Markov generation but I will come to where they are used shortly.

Metrical lookup analysis

You will notice the parts that look like <10:0,0,210,1,0,1,0,1>, these are the metric signatures of each line broken down into words. The <10: states there are ten syllables in the line and the 0,0,210,1,0,1,0,1> are signatures of the words. 0 is unstressed, 1 primary stress and 2 secondary stress. This information comes from the CMU Pronouncing Dictionary.

Generation by dictionary substitution

If we can get a fairly accurate part of speech tag for a word and its metric signature then there’s the possibility of looking for a word in a dictionary that is similar. We may set up what criteria satisfy our ‘similar’ filter. Metrics, Rhyme Assonance, Consonance and Alliteration are all possible with a decent phonetic dictionary.
The only obstacle is we have to correctly supply grammatical tags to the dictionary but if we do this we can combine the grammar tagger and the dictionary lookup used during the analysis and template generation to double check each other. Still not one hundred percent accurate but at least it works to some degree. A word in the dictionary may have a number of tags so we wont know which is the right one. If the grammar tagger comes up with a tag we can check if the dictionary agrees that it’s a possibility. If the dictionary disagrees with the tagger then we take the first one in the dictionary’s list of tags, assuming we ordered them probabilistically.

Template of Shakespeare’s Sonnet 18 used for dictionary lookup generation with assonance filter applied.

With rads recessed than an it any bye?
Thou arched bushed anti she asked stupidly:
husk's crimped through bake both trotting thumb up why,
nine inches deep flopped or `off vaunt they'll the:
sunlight rate charged to ask' them pennant's sides,
at any imped this be projected changed;
at emery dealt such tress sunshine besides,
her sheer, or any taming dong arranged;
must be' producing people' their backs watch,
the act another up four west this dressed,
both his dregs ram mine farmer if a watch',
lest she emergence lined due light much rest;
snow poled an air that case, or `off thine come,
go wore fist 'till, as a hitched whilst due some.


Excerpt of another dictionary lookup generation, assonance applied there were a few failures so consonance applied and regeneration was performed to fill in the failures.
The template was a familiar Christmas song.

where drunk nor twelve aground
none thrust Sees trick such rood
and as Far Phantom Braced
was beating back my arm
a back was' ground and As
Buddhism drink his arm
a graced inter that lovely blank
and as we've golfed congealing

shucks, jungle bolls, nothing bee
Jungle all a back
yuck, what I can hews to ask'
in a three blasts spartan Slash-b
Jungle bolls, better plan'
Jungle all a back
yes, what I can peps to ask'
then such one arm spartan slam yuck



I can start editing the templates and introducing strict rhythms and come up with something like this.

Owl madly taste bottoms fish namely took
novelty all mining here's shooting queens
chow panted thieves mildly makes people shook
bodily hedge sewing loaned fasting means
quake any good jousting stone wrestling be
scrupulous bruise owing plumb ketchup clear
French lesson book anti fluff hangers he'd
seventy east payments meet funky fear
it' inward 'way enter hank sutures pour
theatre kelped duly blight sowing cakes
leaves anti lumps custom yawn cleanly floor
usual slit beeper eight tremble cakes'
lies newly tried owing group boodle share
solidly orderly insolent hare


But by now you’ll be getting tired of reams of meaningless rubbish and that’s where I come to the end of my exploration of the subject.

The Future

It holds some promise of greater sophistication and maybe someone will find a neat trick to play with a neural network or genetic algorithms. Perhaps the very first system I described could be set up using the gramatical analysis along with some form of AI. One thing that is distinctly lacking in what you have seen here is context and there is an interesting project Wordnet available on and offline. Some folks may already be looking at this with a view to poetry generation. It’s an exciting development and I’d love to see it generating poetry with a subject. Also there are some very exciting developments in artificial intelligence modelling human behaviour and personality. Ray Kurtzwell has recently suggested that artificial intelligence will outstrip human intelligence in the not too distant future and I’m afraid it not a load of old coprolites, though I do wonder about some of his predictions; after all he’s just human, right?

Inker

[Sun Nov 26, 2006 10:50 pm]

This is all too weird for me, Mark. I prefer the human touch. What with sheep roaming the countryside with words making poems and now this, it does appear to want to mock the art.

What would Colleridge, Shakespeare, etc, etc make of it all?

An interesting subject, but a shame these people cannot put their skills to better use, in my opinion. Thanks for posting and showing us what's going on out there! I'm quite happy with self-composing, much more fun.
Report to moderator
ASmallNumberOfMonkeys

[Tue Nov 28, 2006 1:01 am]

Thanks for looking in on this Inker, and your take on it. I enjoy writing English a lot more than code too. But I do like asking questions and if there’s no one who at hand to give me an answer and I have the means to discover something for my self, I’ll go for it.

Those sheep! The funding for that enterprise was scandalous Very Happy . However the idea was an old one. William Burroughs, John Cage (but with music scores) and David Bowie all used similar techniques long ago, but with bits of paper, not sheep.

As for what those venerable pillars of our art would think; I think they’d be quite interested by the idea. And I can’t help disagreeing about it being a waste of skills. One day you will be talking to your computer; it’s research like this that is paving the way. Not mine of course; I’ve just dabbled with it out if interest, but the techniques I describe are fundamental to developing a viable machine speech interface.

All the best Smile

Mark
Report to moderator
Roy

[Tue Nov 28, 2006 10:32 am]

Part 2 certainly got heavier. I had to read paragraph two at least three times before I could (more or less) understand. Still, it's intriguing to think what will soon be possible. I couldn't help noticing the similarity in the first 'Alice' piece with some of the nonsense poems Barrie and others produced at the time. In a sense, they were attempting a very similar process, though never quite able to switch off their own 'context filters.'

Thanks for these illuminating essays, Mark. I'll be linking Smile
_________________
Roy

www.royeveritt.com
Report to moderator
Inker

[Tue Nov 28, 2006 8:56 pm]

We do talk to computers now. Automated answer services: press this button, push that one. You can dictate to a computer and it will type for you. But I imagine you meant the AI rather than functional elements.

If you'd shown this to Byron, he'd have had a hissy fit. Wordsworth could not take any criticsm of his work so I would imagine he'd have been horrified that a machine could produce work, lol.

When I was at school, one of the Christmas games was to take a piece of paper, write half a sentence and see if we could make sense with someone else's half. Quite good fun, and works on the same principle as the programme, so definitely nothing new there.

Do you remember the Tomorrow's World programme about art (I think it was TW) where they got monkeys to paint? They then put the 'masterpieces of modern art' in a gallery where they were quite popular. If a monkey can paint, I suppose AI could compose literature.

Enjoyed the research, but will it ever be the best thing since sliced bread? Time will tell.

Best wishes
Inker
Report to moderator
Logicus tracticus

[Tue Nov 28, 2006 9:11 pm]

Interesting in small bytes or bites hence will be nipping back every so often as links lead to interesting bits as well.
_________________
read once for meter, twice for rhythm
thrice for rhyme, then again for
leisure or measure of pleasure;
you: parasites of no consequence:
Larkin
Report to moderator
ASmallNumberOfMonkeys

[Wed Nov 29, 2006 8:04 pm]

And thanks again Roy for persevering with part 2. I’ve been trying for a while to dispense with diagrams wherever possible in my technical explanations. I can develop my language skills surreptitiously while at work. The Markov chain explanation will get a bit of a re-write soon as I may be able to describe it slightly better. I may revisit the project soon, as I have new ideas, tools, information, and a host of unanswered questions Smile .

Mark
Report to moderator
ASmallNumberOfMonkeys

[Wed Nov 29, 2006 8:20 pm]

Hi Inker you’re right about AI there. I’m not sure where it’s all going to end up; but I’m sure when it arrives we’ll have exactly the same number of problems as before.

I take your point about Byron but funnily enough, if you take it with a pinch of salt, Countess Ada Lovelace was not only Byron’s only legitimate child she was also the first computer programmer.

And I do remember the Monkey Artists fooling the experts, it was magic. Now of course things have moved on and Elephants are painting these days. They seem slightly better than the monkeys too!

If they ever manage artificial consciousness we’ll have a problem or two but I think we’ll be in business for some time yet Smile .

Mark
Report to moderator
ASmallNumberOfMonkeys

[Wed Nov 29, 2006 8:27 pm]

Thanks for looking in Logi, I’m pleased you’re finding this interesting. It will be some time before one of these gizmos manages to intertwine words the way you do, and I have given it some thought Smile . As I said I’ll try to make paragraph two a bit easier to read, so watch out for that when it comes.

All the best

Mark
Report to moderator
Logicus tracticus

[Wed Nov 29, 2006 8:51 pm]

and I was wondering how one of mine would come out the other end, (herominus boche/kingsly amis)

Is a early effort from 2oo2 my writing with speech to text
program Mucked up puck, tried reading middsummers nights dream to train to reconise my speech patterns strange very strange. the two works have had me going from link to link
found couple of old links that have in browser fav, still yet to set up the "new visio" speech package luckyenough to download a image file for it to spare hard disc for laptop so is 64 bit edition so easy enough to switch between the two.

why not higlight "I or three letter words" first paragraph in red by three letters also including of which often goes unread, have you tried inputing the "words" wtih the frsit and lat ltetres jubeld up azeamig tnihg is biarn can srot it. but can computer
_________________
read once for meter, twice for rhythm
thrice for rhyme, then again for
leisure or measure of pleasure;
you: parasites of no consequence:
Larkin
Report to moderator
ASmallNumberOfMonkeys

[Thu Nov 30, 2006 1:20 am]

Thanks Logi; if I have to I may resort to colours.

I had one of those speech to text programs too. I had to take my budgies out of the room as I was reading out the training text. It wasn’t until it was fully trained and the birds back in the room that I discovered what utter rubbish birds go on about all day long. Often repeating the same word over and over and then bursts of jumbled disconnected words, and never a reference to me. Not even a ‘hello nice to see you Mark’. After all the classics, poetry and drama we’ve listened to on the computer together one would have imagined that they would have grasped some elementary grammar. Oh well… they seem to know what they’re on about, but it beats me how Confused .

The jumbled up letters problem would be too much for the Markov chain training to deal with as every permutation would seem to be a new word. The brain does it by pattern matching and probability, a neural net could possibly do the same thing but it’s a very complex problem for software and a bit like reading handwriting.
Report to moderator
1