GeneIndex©
GeneIndex started from a question of biological interest,
"What are frequencies and positions of all words of a certain length in
a DNA sequence". The GeneInde program aims to answer this question.
Other than that, it also allows users to visualize the results.
Its performance is proportional to the length of DNA sequence.
Say, let N be the length, then both time and space complexities
will be O(N). Basically, the program includes the following functions:
- Compute frequencies and positions of all words in a given length, and output results selectively, either frequencies, or positions, or both, into the standard output by default or a text file specified by user;
- Retrieve the frequency and positions of a particular word, or frequencies and positions of a group of words listed in a given file, and output the corresponding data, selectively as mentioned above, into the standard output, or a text file;
- View frequencies of all words in the DNA sequence by using Tcl/Tk. Optionally, frequencies can displayed in their descending order, in the words' alphabet order, or in the order of words' positions from left to right in the DNA sequence.
- And use Tcl/Tk to visualize the distribution of all words along the sequence.

Currently the word length this program can handle ranges from 1 to 32. A README file is included in this downloadable tarball. It gives the information on how to compile the code. The usage of code is quite straightforward. A Windows version and a parallelized version are also available for download now, but they have not been tested extensively. Please read the license information before downloading. This is an Open Source ["BSD style"] license. If you use and improve this package, please send a note to us ["hpc" at iu.edu]. Your contributions would be welcome!





