PROGRAMMING LANGUAGES Harder version of Assignment # 2 -------------------------------------------------------- Write a C program which reads in text from the standard input (eg. typed at the keyboard until the EOF indicator (control-Z on DOS, control-D on UNIX) or redirected from a file with the '<' symbol) and prints out three lists of words: 1) A list (in alphabetical order) of the distinct words appearing in the text. (A word occurring more than once in the text will only be output once.) 2) The same list as in 1) but in the order that the words appeared in the text. (Note: the number of times each word occurs is to be ignored - we are only concerned here with the order of occurrence.) 3) A list (in alphabetical order) of the words that end with the letter 'e'. Use a page of text from your favourite novel to test your program. For example, if the input text is: She thinks she said that she sells sea shells by the sea shore. Did she ? The output would be: ALPHABETICAL ORDER ORDERED BY OCCURRENCE ENDING WITH E ------------------ --------------------- ------------- by she she did thinks shore said said the sea that sells sells she sea shells shells shore by that the the shore thinks did Notes: 1. The following characters may appear in the file: alphabetic (a-z, A-Z), whitespace (space, tab, newline), punctuation (period, comma, semi-colon, question mark) 2. A word is defined to be a sequence of alphabetic characters. Ie. words are delimited by whitespace or punctuation characters. Note that the program should handle words of arbitrary length. 3. Upper and lower case characters are not distinguished; for example, "She" and "she" are the same word. 4. Don't worry about making your output look like the example above - just do whatever seems the easiest. 5. Note that the program should handle an arbitrarily long input text but to make things simpler, just store the first 500 distinct words. If there are more than 500 distinct words in the text, the remaining words are ignored. You may assume that input lines are never more than 256 characters long (before the carriage return) and that words are not hypenated. Implementation Hints: First off, you have to input the words from the standard input. One way of doing this would be to use 'gets()' to input lines of text, then break each line up into words. Or maybe it is easier to use 'getchar()' and accumulate words character by character. For each word, convert to lower case, then check if it has already been stored. If not, use 'malloc()' to allocate space for the word and then copy it into the allocated space. Save the pointer to the allocated space in an array of pointers. Note: Do NOT use an array of fixed sized strings like char array[500][20] since this wastes space for short words. After all the text has been input, make a copy of the array of pointers and then use 'qsort()' to sort this array according to alphabetical order. Note that the standard functions 'isspace()', 'ispunct()', 'tolower()' will likely be useful. The function 'qsort()' requires a "comparison function" as its 4th argument. A suitable comparison function for comparing strings can be written using 'strcmp()'. Note that the comparison function for use by 'qsort()' will be passed pointers to the elements of the array, NOT the elements themselves. The array we are sorting here will be an array of strings, - ie. the elements will be of type char * - so the things being passed to the comparison function will be of type char ** (two *'s !). Note the example using 'qsort()' on the course web page". As a matter of style, you should write your program using several small functions, each doing a well-defined task. Your main() function should be short and consist mainly of calls to the functions that actually do the work. Avoid using global or file-scope variables - instead you should pass things around via function parameters. Most people will find it easiest to develop the program in a series of steps that result in working programs of increasing functionality: 1) Make a program that reads in the text, splits each line up into words, and prints out these words on the screen as they are found. 2) Enhance this program to use an array to store the words that are found, checking that the word is not already in the array before storing it. Print out the array of words at the end of the program (without worrying about the order of the words). 3) Enhance this program to sort the array (using qsort) before printing out the words. 4) Enhance this program to make a copy of the array before sorting it so that you can print out the words in their original order even though the array has been re-ordered by qsort. 5) Enhance this program to meet the remaining requirements listed above. Concepts required: ----------------- character handling, arrays, strings, pointers, malloc, qsort