Histogramming is a process of generating the frequency table for keys used in a set. When the set is a text file, the histogramming process can generate a table of all words used in the file and the number of times (the frequency) each occurs in the file. In this project, you need to write a program that reads a text file and generates a word histogram of the file. The output is a list of all words, in alphabetical order, appeared in the file, and the number of times each word occurs in the file.
Use C++ to implement a binary search tree data structure to store and search the words, which are lexicographic ordered (as in a standard English dictionary). A frequency number should be stored with each node. In addition to the histogram list, the output should also include some information about your binary search tree such as the total number of nodes, the maximum depth and the average depth (over all nodes).
Some restrictions can be made to the text file. You can assume that only lower case letters, digits, spaces, line-breaks and punctuation marks (use only ; , ; . ? !) are used. Spaces, line-breaks and punctuation marks are only used as word separations. A word can only start with a letter, and can have up to 16 letters or digits (only the first 16 characters will be used for words longer than 16 characters). Digits precede letters lexicographically. A sample file is attached.
## Deliverables
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):
a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.
b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.
3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).
## Platform
UNIX