- The dictionary used by the Unix system command spell is in the file
/usr/dict/words. Write a small C program that will calculate the
frequencies of each character.
#include
main() {
int c;
int charCount[256];
int i;
for(i=0; i < 256; ++i) charCount[i] = 0;
while((c = getchar())!= EOF)
++charCount[c];
for(i=0; i < 256; ++i)
if(charCount[i] != 0) printf("%15d\t%c\n", charCount[i], (char)i);
}
- Using the Huffman coding encode the phrase "USC wins".
00110 0110 10101 111 001011 1001 0100 0110
- Compare the performance of compress, compact, and gzip calculating
the compression ratio for each on /usr/dict/words.
Also record the time required using the command "time."
- To do this create a symbolic link to the dictionary
from your home directory with the command
ln -s /usr/dict/words words
- Then run each of the commands on the file using time using the elapsed time.
time compact words
Use the manual ( man 1 time ) to interpret the output.
u = user time, s = system time, then total or elapsed time.
- Check the size using the command "wc"
- You will need to uncompact, uncompress, and guzip to get back to the original file
What I got was:
compress 102727 bytes 1.91 seconds
compact 111646 bytes 5.43 seconds
gzip 79269 bytes 5.76 seconds
Note these are actual sizes.
Compression ratios are calculated by
CR = (OriginalSize - NewSize)/OriginalSize
- Find a spelling/grammar checker on a PC or Mac.
- Which wordprocessor are you using?
- What is the grammar checker's evaluation of
An hoarse is one thee gulf curse.
Out the window, the bird flew.
- How does the spelling checker respond to
fastly, greenly, et cetera (and I mean the phrase)
I didn't do this one.
- Extra Credit 3 points A digram is a two character sequence.
- Using /usr/dict/words calculate a static model
for digram compression.
- For the top ten digrams compute a Huffman code