Thursday, October 11, 2007

Generating Random Words

I was screwing around this morning and I needed some random words to test something with. The words needed to be real words, not just random sequences of characters (btw, you can generate a random sequence of 8 characters from the shell using jot -r -c 8 a z | rs -g 0 8). In this case, I decided to simply grab a random word from /usr/share/dict/words.

Hmm, but how do I grab a random word from a file? My solution was to generate a random number in the range [1..n] where n is the number of lines in the file, cat -n the file so that line numbers are printed, grep for the line matching the random number, then print out the second column. It looks like this:

$ n=$(cat /usr/share/dict/words | wc -l)
$ cat -n /usr/share/dict/words | grep -w $(jot -r 1 1 $n) | cut -f2
$ cat -n /usr/share/dict/words | grep -w $(jot -r 1 1 $n) | cut -f2
$ cat -n /usr/share/dict/words | grep -w $(jot -r 1 1 $n) | cut -f2
$ cat -n /usr/share/dict/words | grep -w $(jot -r 1 1 $n) | cut -f2
$ cat -n /usr/share/dict/words | grep -w $(jot -r 1 1 $n) | cut -f2

Now, this solution is certainly not cryptographically sound, but it should serve for quick, ad-hoc testing.


Ben Hoskings said...

Thanks for mentioning jot, I've never heard of that before. I thought maybe this would be useful, same thing without having to generate line numbers:

n=`cat /usr/share/dict/words | wc -l`; head -n`jot -r 1 1 $n` /usr/share/dict/words | tail -n1

slacy said...

Wow, that cat -n | grep stuff seems roundabout. Oh, and jot does seem useful, but I was surprised that its not installed by default, and is in package 'athena-jot' so I'm concerned that its non-standard? How about this (that includes some bash-isms, I think)

$ n=$(cat /usr/share/dict/words | wc -l)
$ l=$(( ($RANDOM * 32768 + $RANDOM) % n ))
$ cat /usr/share/dict/words | head -$l | tail -1

That'll even work for files of any form of input (like files that contain numbers, which your original solution may choke on)

Unixjunkie said...

Jot is standard and installed by default on Mac OS X. There are a ton of ways to do this; I was only interested in showing one straight-forward way.

Luke said...

No, no! You broke the golden rule!

never grep the cat
never grep the cat
never grep the cat

Grep does work with text files. You simply pass the file name as the send parameter.

Otherwise, cool script!

Unixjunkie said...

Perhaps grep means something different to you; in which case, I would probably agree ;-) However, saying that you should _never_ cat and then grep is just bad advice. Of course grep takes file names as arguments—most Unix tools do. In this case, I was grepping for the line number that was produced by "cat -n". I wasn't simply using cat to spit out the contents of the file (though I agree that "cat foo | grep bar" is unnecessary because "grep bar foo" does the job with one less process). Grep matches patterns, not line numbers, so the cat -n was producing a pattern that grep could match. So, in this case, grepping the cat is a valid solution. Another good solution is to use the head/tail tricks as others have already pointed out.

Anonymous said...

sed `perl -e "print $RANDOM"`"q;d" /usr/share/dict/words

Rob Menke said...

Yep, that's a classic Perl one-liner:

perl -nle '$word = $_ if rand($.) < 1; END { print $word }' /usr/share/dict/words

You could do the same in awk. In fact, I believe I first saw this done in awk.

The inductive proof of why this works is left as an exercise for those who like inductive proofs.

Anonymous said...

Hi Rob i agree with you, but perl, in my sample code, was only an advice to do a far more complex thing, indeed, you could write it as follows:

sed $(echo $RANDOM)"q;d" /usr/share/dict/words

here the point: sed command line tool.

rob menke said...

Ah, sorry. I had no idea that $RANDOM was a ksh variable, as I usually avoid shell specific features when programming.

The problem with using $RANDOM in that way is that you only will select out of the first 32767 entries, and woe if you actually get a zero back from $RANDOM as sed will choke on that.

Anonymous said...

Here comes the sun; to generate a random integer:

sed `perl -e "print int(rand(99999))"`"q;d" /usr/share/dict/words

Many shells provide ``RANDOM`, it's a useful shell variable, its ``integer` value changes each time it's referenced.

Anonymous said...

a shell ``RANDOM`:

$(dd if=/dev/urandom count=1 2>/dev/null | od -t u1 | awk 'NR==1 {print $2$4}')

jeff said...

i have about 250 jpeg files that i want to randomize the names. what is the code to rename this list to be random?