USERS GUIDE AND TUTORIAL FOR
                      PC-GenoGraphics Version I

                 Ray Hagstrom, Ross Overbeek, and Morgan Price
                     Argonne National Laboratory
                     Argonne, IL 60439












        CHAPTER 1      GETTING READY

        CHAPTER 2      INSTALLATION

        CHAPTER 3      USING PC-GenoGraphics AS A VIEWING TOOL

        CHAPTER 4      USING PC-GenoGraphics AS A DATA DISPLAY TOOL

        CHAPTER 5      USING PC-GenoGraphics AS A DATA SEARCH TOOL

        CHAPTER 6      SEARCHING SEQUENCE

        CHAPTER 7      USING PC-GenoGraphics AS AN INTERACTIVE LOGBOOK

        CHAPTER 8      USING PC-GenoGraphics AS A VISUAL REASONING TOOL
                        FOR SEQUENCE

        CHAPTER 9      WORKING REPETITIVE PROCEDURES FROM SCRIPT FILES

        CHAPTER 10     PRINTING THINGS AND DOING OTHER THINGS

        CHAPTER 11     THE EASY WAY TO START YOUR OWN *.ALL FILE

        CHAPTER 12     GORY DETAILS ABOUT *.ZWD FILES AND *.ALL FILES
                        AND *.UPD FILES





        FIGURE 1   Screen Requesting Identification of Best Graphics
                   Mode.   This screen comes up from GGSETUP after
                   initial questions identifying the source and target
                   disk names and directories.  The correct response is
                   to type

                                D<Enter>

                   to allow you to investigate another possible description
                   of the video adapter in your PC.   You should keep
                   trying new video adapters until you find the one
                   which works best with your machine.

        FIGURE 2   Screen with Candidate Video Adapter Descriptions.
                   Presented by GGSETUP after you have identified the
                   video mode to be investigated.  Use the up and down
                   arrow keys to highlight your next guess at the identity
                   of your video adapter.   Hit

                                <Enter>

                   to investigate the applicability of the highlighted
                   mode.

        FIGURE 3   Screen Showing Performance of Video Mode under GGSETUP
                   Investigation.  Your screen should look very much like
                   this if you are investigating some video mode which
                   operates successfully on your machine.   You should
                   move the cursor into and out of the blinking square
                   and the legend at the top should change.   Black/white
                   video modes will not produce the square of 16 brightly
                   colored blocks.   If the blocks are produced, there
                   should be 16 distinct colors visible.   YOU CAN ALWAYS
                   EXIT FROM THIS SCREEN (EVEN IF THE DISPLAY IS INACCURATE)
                   BY HITTING

                                <Ctrl-C>

        FIGURE 4   Second Screen Showing Performance of Video Mode under
                   GGSETUP Investigation.   These tiles are drawn shortly

                   after the previous screen is exited.   This screen will
                   spontaneously disappear after a few seconds, and the
                   user will be asked to confirm whether the video mode
                   under investigation is performing adequately.   If you
                   answer 'N', the screen depicted in Figure 1 will
                   reappear and you can make another trial.   IF YOU ANSWER
                   'Y', INSTALLATION WILL PROCEED WITH THE PRESUMPTION THAT
                   THE VIDEO MODE WHICH YOU JUST SAW IS THE BEST ONE.

        FIGURE 5   Full-screen Display of E.coli dataset distributed with
                   some copies of PC-GenoGraphics.

        FIGURE 6   Full-screen Display of GDB human dataset distributed with
                   some copies of PC-GenoGraphics.

        FIGURE 7   Full-screen Display of HIV virus dataset distributed with
                   some copies of PC-GenoGraphics.

        FIGURE 8   Full-screen Display of AATEST dataset distributed with
                   some copies of PC-GenoGraphics.   This dataset is of
                   no biological interest, but should be studied thoroughly
                   to understand PC-GenoGraphics.

        FIGURE 9   Information Databox Attached to Object m3o1 in AATEST.
                   To fetch this databox you should position the cursor
                   over object m3o1, and click any mouse button.   If you
                   do not have a mouse attached to your PC, you can navigate
                   the cursor by holding <Shift> and pressing the arrow
                   keys, the equivalent of clicking a mouse button is
                   <Shift-Enter>.













           CHAPTER 1 GETTING READY

           Introduction

           This short chapter tells you what sorts of resources you need
        to exploit PC-GenoGraphics and, perhaps more important for the
        beginner, what sorts of resources you do NOT need.  You will
        find out what is desired on your PC well enough to be able to
        make an intelligent choice if you have more than one possible PC
        on which to mount PC-GenoGraphics.  After completing this
        chapter you will be able to install PC-GenoGraphics onto your
        PC.

           PC-GenoGraphics is an integrated software package which
        allows what we call Visual Reasoning to be performed on genomic
        data which have been properly prepared.  Included into
        PC-GenoGraphics is a set of properly prepared descriptions of
        some sample organisms together with tools which allow the user
        to custom-prepare whatever genomic data are at hand.  There are,
        of course, two principal subdivisions of PC-GenoGraphics, the
        programs themselves (which manipulate the genomic data, but
        which contain zero specific information about any genome) and
        the data (which contain all of the information specific to the
        genome, but which cannot be visualized without the aid of the
        programs.)

           PC-GenoGraphics is designed to function as well as possible
        on the most primitive PC's, and is, furthermore, able to exploit
        the special capabilities of the most advanced units.  The idea
        here is that you might consider running PC-GenoGraphics in the
        role of an Interactive Logbook visualizing and querying your
        personal data on some very primitive PC in your wetlab where
        graphics performance, etc. are not critical while you might
        consider running it on some high-powered machine to do complex
        queries to unified datasets concerning the organization of
        genomes as a whole.  The amount of information which you can
        learn from PC-GenoGraphics rises with the quality of the PC you
        place it on.  In general, you want to pick your PC with the best
        available color CRT screen and several MB of free disk storage
        space.

           MOST MODERN PCS ARE MORE THAN ADEQUATE TO RUN
        PC-GenoGraphics.  IT IS RECOMMENDED AT THIS POINT THAT YOU SKIP
        IMMEDIATELY TO CHAPTER 2 AND TRY A BLIND INSTALLATION WITHOUT
        REFERRING TO THE FORMAL TECHNICAL DISCUSSION WHICH FOLLOWS
        BELOW, RETURNING TO SLUG OUT THE DETAILS ONLY IF THERE ARE
        PROBLEMS WITH THE BLIND INSTALLATION.

           System Requirements

           1.)  PC-GenoGraphics runs exclusively on DOS-based PC
        machines (IBM PCs or compatibles), these machines comprise about
        90% of all computers in the world.  Most notably, it does NOT
        run on Apple, Macintosh, or UNIX machines of any kind.  If you
        have any doubt whether your intended machine carries DOS...  The
        answer is certainly YES if it responds to your typing to the
        command line:

                VER<Enter>

        with a message like:

                MS-DOS Version 4.01

        The answer is certainly YES if your machine is running
        Microsoft Windows.  The answer is most likely YES if you have
        programs like Lotus-123, Excel, Paradox, WordPerfect, Word, DB,
        Harvard Graphics, Applause, Hollywood, Quattro, WordStar, Corel
        Draw, etc.  If you are still in doubt, look at the package from
        any software which is installed on the PC, and if it is intended
        for IBMs, Compatibles, or Clones, then the answer is certainly
        YES.

           2.)  PC-GenoGraphics is intended to run on the vast majority
        of DOS-based PC machines, but it has only been tested thoroughly
        on machines with PC-DOS 3.30, MS-DOS 4.01, and DR-DOS 6.0 We
        believe that PC-GenoGraphics should run on machines which have
        the following operating system installations:

                MS-DOS Version 3.**
                MS-DOS Version 4.**
                MS-DOS Version 5.**
                MS-DOS Version 6.**
                PC-DOS Version 3.**
                PC-DOS Version 4.**
                PC-DOS Version 5.**
                PC-DOS Version 6.**
                DR-DOS Version 5.**
                DR-DOS Version 6.**

        The * symbols above mean "wildcard" and can be matched by any
        number, thus "MS-DOS Version 4.01" is OK while "MS-DOS Version
        2.10" is not likely to work with PC-GenoGraphics.  To find out
        exactly what DOS version you have installed on your PC you must
        get to the DOS prompt and type:

                VER<Enter>

           3.)  While a hard disk is not strictly required for
        PC-GenoGraphics to operate in principle, the fact is that
        operating strictly from floppies makes performance so slow as to
        compromise scientific utility for all but the very smallest
        genomes.  It is, we think, practical to do only the most modest
        Interactive Logbook functions without a hard-disk.  If you are
        interested in getting a floppies-only distribution, get in touch
        with us.

           4.)  A certain amount of RAM is required to work
        PC-GenoGraphics as well, the present minimum is about 520K.  You
        can always tell how much free RAM there is by getting to the DOS
        prompt and typing:

                CHKDSK<Enter>

        The response will look something like this:

                Volume SYSTEM DISK created 06-23-1989 12:28p
                Volume Serial Number is 180A-1A2E

                  33431552 bytes total disk space
                     73728 bytes in 3 hidden files
                    174080 bytes in 77 directories
                  30570496 bytes in 1539 user files
                    266240 bytes in bad sectors
                   2347008 bytes available on disk

                      2048 bytes in each allocation unit
                     16324 total allocation units on disk
                      1146 available allocation units on disk

                    653312 total bytes memory
                    549248 bytes free

           Your "bytes free" line is the amount of available RAM.  If
        you do not actually have enough "bytes free", you can ALWAYS get
        nearly the amount shown in your "total bytes memory" line by the
        following procedure or a simple variant:

           You get to the DOS prompt and move to the boot directory,
        usually by typing:

                CD C:\<Enter>
                C:<Enter>

           Next YOU MUST SAVE the system files:

                COPY CONFIG.SYS CONFIG.OLD<Enter>
                COPY AUTOEXEC.BAT AUTOEXEC.OLD<Enter>

           Next create new system files:

                COPY CON > CONFIG.GG<Enter>
                FILES = 15<Enter>
                BUFFERS = 15<Enter>
                <Ctrl-Z><Ctrl-Z><Enter>

                COPY CON > AUTOEXEC.GG<Enter>
                <Ctrl-Z><Ctrl-Z><Enter>

                COPY CONFIG.GG CONFIG.SYS<Enter>
                COPY AUTOEXEC.GG AUTOEXEC.BAT<Enter>

           Next re-boot your system:

                <Ctrl-Alt-Delete>

        and, when everything is settled down again the CHKDSK command
        should reveal the largest practical amount of RAM which can be
        accessed on your machine.  If this new amount is adequate, you
        should work out a compromise with your collaborators to "kick
        out as much of the resident codes and devices as is required to
        free up the memory"... your negotiating position starts from
        here:  The only devices which you can even use while running
        PC-GenoGraphics are a so-called "disk-caching routine" such as
        SUPERPCK, Vcache, Cache86, DCACHE, Lightning, or FAST TRAX (all
        of which are recommended for speed) or a mouse (which is
        optional at that); absolutely no TSR of any kind is required for
        PC-GenoGraphics.  If, on the other hand, the new value is still
        inadequate to meet the requirements of PC-GenoGraphics, you must
        try another machine.  IN ANY EVENT, YOU SHOULD REVERSE THE
        REVISION OF YOUR OPERATING SYSTEM FILES BY TYPING:

                COPY CONFIG.OLD CONFIG.SYS<Enter>
                COPY AUTOEXEC.OLD AUTOEXEC.BAT<Enter>
                <Ctrl-Alt-Delete>

           5.)  Finally, there is the question of CRT display/hardcopy
        graphics performance.  Virtually any graphics monitor and
        adapter combination should work at some level with
        PC-GenoGraphics.  Most graphics monitor and adapter combinations
        (other than those sold directly on IBM brand PC's) can work at
        truly superior performance if you know how to describe your
        combination during the GGSETUP procedure for PC-GenoGraphics.
        Most graphics monitor and adapter combinations can be exploited
        fully by PC-GenoGraphics.  Our GGSETUP procedure is sufficiently
        robust so that you can figure out by trial and error what
        graphics monitor and adapter combination you actually have.
        Your installation will go more smoothly, however, if you take
        the time to look up the brand of whatever video adapter card is
        present in your machine, and, best of all, determine how much
        video-RAM is present on your board.  IF YOU HAVE A TRUE IBM
        BRAND MACHINE, TO WHICH NO UPGRADE OF THE VIDEO ADAPTER CARD HAS
        BEEN MADE, YOU WILL BE STUCK WITH ONE OF THE STANDARD VIDEO
        PROTOCOLS AND THERE WILL BE NO USE IN TRYING TO GET IMPROVED
        VIDEO PERFORMANCE; THE GOOD NEWS PART OF THIS IS THAT YOUR
        INSTALLATION PROCEDURE WILL BE SIMPLE.

           Summary of Requirements:

                1.)IBM PC, XT, AT, PS/2, or 100% compatible

                2.)DOS version 3 or later

                3.)520K Minimum free RAM

                4.)Some CRT graphics monitor and video adapter (it will
                help, but it is not necessary to know which kind you
                have)

                5.)Floppy disk drive to load distribution disks

           Summary of Recommended Capablilties

                1.)Hard disk with several MB free space

                2.)Disk cache software such as SUPERPCK, Vcache,
                Cache86, DCACHE, Lightning, or FAST TRAX.

                3.)Good quality color graphics such as SVGA 1024x768
                color display with at least 256KB V-RAM; 1MB is best.

                4.)At least 1MB of RAM with EMM driver installed in the
                system.

                5.)Mouse with driver installed into system

                6.)Laser Printer (HP-LJ2 or compatible)


           As of this writing, complete systems with more than ample
        computing speed (16 MHz 386SX) and full graphics capabilities
        are selling for almost exactly $1000 at agressive mail-order
        distributers while laser printers are going for about $600.

           Summary of what is NOT REQUIRED and does NOT provide ANY
        advantage to the operation of PC-GenoGraphics:

                1.)Microsoft Windows

                2.)Network Connections of any kind

                3.)TSR routines of any kind (except disk caching
                software)

                4.)Optical Disk Drive

                5.)IBM's 8514A video adapter














CHAPTER 2   INSTALLATION

           This chapter tells how to move PC-GenoGraphics from the
        distribution floppy disks onto your PC.  After completing this
        installation procedure, you will be ready to operate
        PC-GenoGraphics at its full range of query and display
        functions.

           MOST EXPERIENCED USERS WILL KNOW HOW TO MOUNT PC-GenoGraphics
        WITHOUT GOING THROUGH THIS CHAPTER.  INSERT DISK1 INTO YOUR
        FLOPPY DRIVE, SET DEFAULT TO THAT DRIVE, AND TYPE GGSETUP.  IF
        THIS PROCEEDS REASONABLY, YOU MAY SKIP IMMEDIATELY TO CHAPTER 3
        AND RESUME THE TUTORIAL.

           The PC-GenoGraphics distribution contains programs and data.
        The programs allow visualization of whatever datasets are
        loaded, but the programs have no information specific to any
        organism.  All organism-specific information is contained within
        the datasets.  Datasets for tiny genomes such as viri require
        less disk space than the programs.  Datasets for larger
        organisms such as E.coli and humans are much larger than the
        programs.  The user exercises discretion over which datasets to
        install.  If available disk storage space is a limitation for
        your PC you may choose to omit the larger datasets being
        installed.

           To install PC-GenoGraphics:

                1.)Start your PC

                2.)Get to the DOS prompt (totally exit from Windows or
                any other shell, if present) so that you get a prompt
                like this:

                                C:>

                3.)Place the distribution disk #1 into a proper floppy
                drive (typically drive A:  or drive B:).  If you are in
                doubt as to which letter applies, type

                                DIR A:<Enter>

                and if the little light on the drive with the disk
                inserted lights up, you have disk A:.  If not, you may
                get the abominable DOS message:

                                Not ready reading drive A
                                Abort, Retry, Fail?

                in which event you should type

                                A

                to abort and then try

                                DIR B:<Enter>

                etc. until you have located the proper floppy drive
                name.  WE WILL CONTINUE THIS DISCUSSION AS IF THE
                INSTALLATION HAPPENS FROM DRIVE B:, but you must use the
                correct letter from your system.

                4.)Next we will establish what part of the hard-disk
                (typically C:, D:, or E:)  will take the installed copy
                of PC-GenoGraphics.  To find out how much free disk
                space exists on hard-drive C:  type

                        CD C:\<Enter>
                        C:<Enter>
                        CHKDSK<Enter>

                The response should look roughly like this:

                        Volume SYSTEM DISK created 06-23-1989 12:28p
                        Volume Serial Number is 180A-1A2E

                          33431552 bytes total disk space
                             73728 bytes in 3 hidden files
                            174080 bytes in 77 directories
                          30570496 bytes in 1539 user files
                            266240 bytes in bad sectors
                           2347008 bytes available on disk

                              2048 bytes in each allocation unit
                             16324 total allocation units on disk
                              1146 available allocation units on disk

                            653312 total bytes memory
                            549248 bytes free


                Here, the crucial line is the one labelled "bytes
                available on disk".  The exact amount of disk space
                available which you require depends strongly upon how
                you plan to use PC-GenoGraphics, but the wisest choice
                is to repeat the above procedure for D:, E;, etc. and
                choose the one with the largest "bytes available on
                disk".  REMEMBER THIS LETTER.  In any event you will
                want at least 1 megabyte for the minimal installation of
                PC-GenoGraphics, at least 2 MB more for the minimum with
                E.coli, 4 MB more for the minimum with the GDB Human
                maps, and 14 MB more for the complete human system
                including sequence.  If you get a message like:

                        Invalid drive specification

                In reponse to the above procedure, that is a sign that
                there is no opportunity to use that letter at all.  The
                most common situation is for C:  to be the only useable
                letter at all.  If "bytes available on disk" is less
                than your desired number of MB for all available
                letters, you can still always get PC-GenoGraphics to run
                by deleting un-needed files from the hard-drive.  These
                do not need to be lost permanently provided you use the
                BACKUP command to save them to floppies first.  BACKUP
                and ERASE files from the hard-disk until the CHKDSK
                command shows enough "bytes available on disk".

                5.)Now type

                        B:<Enter>
                        GGSETUP<Enter>

                At this point you will start the installation procedure.
                There will be the usual sorts of queries asking for the
                drive designator (typically A or B) for the source
                floppy, the destination hard drive designator for the
                installation (typically C or D), and the destination
                subdirectory (typically GG).  The meaning of the
                destination subdirectory is that all programs and
                distribution data will be installed into the disk region
                which is reached by

                        CD C:\GG<Enter>
                        C:<Enter>

                or

                        CD D:\GG<Enter>
                        D:<Enter>

                After this comes a somewhat mysterious question about
                "high-contrast" displays.  The answer to this is almost
                always no, "N".  The circumstances which indicate a
                "yes" answer are when you are using a PC which has a
                "color" display, but which exhibits these "colors" only
                as distinct shades of gray or actual color displays such
                a transparency projection panels where dark colors are
                difficult to differentiate against bad lighting
                conditions.  Usually laptop computers or other systems
                with high-quality LCD panel displays use this standard.
                What does NOT indicate a "yes" answer is a true
                monochrome display (one in which only two colors are
                present, black&white or black&green or black&amber, for
                instance)

                Now come three informative screens containing
                abbreviated information about the installation and
                operation of PC-GenoGraphics.  Each of these requires an
                answer of 'Y' before progress continues.

                Next will come up the graphic devices query panel shown
                in Fig.1 In order to identify the best performing
                graphics mode, you will select menu item "D".  This will
                open up a large, multi-screen display, like the one show
                in Fig.2, which you navigate using the arrow keys until
                a promising video mode is highlighted.  You then select
                it with the <Enter> key.  After a few seconds, a small
                box from MetaWindows will appear in the middle of your
                screen, to be followed a few seconds later with a
                display like that shown in Fig.3.

                AT THIS POINT, YOU CAN ALMOST ALWAYS CONTINUE THE
                INSTALLATION BY SIMPLY HITTING:

                        <Enter>
                        <Ctrl-C>
                        Y

                AND PICKING UP THE TUTORIAL AT CHAPTER 3.

                The only opportunity you would be passing up would be
                that you would temporarily be stuck with operating
                merely at the best IBM-STANDARD video mode accessible to
                your machine.  It may be that your machine's best
                operation is no better than this, but almost all
                machines less than 5 years old (except those with the
                IBM brand) perform much better than this.  You can, of
                course, always operate with the default mode,
                re-installing to get the best performance from your
                machine at some later time.

                If the candidate video mode you have selected is
                sufficiently far from the hardware configuration
                actually present on your machine, you may have a totally
                different presentation, perhaps a uniform blank screen,
                perhaps a chaotic pattern of blinking patterns.  In this
                latter case, you want to return to select another
                candidate video mode...this can almost always be done by
                hitting <Ctrl-C>, the visual image may start fluttering
                a bit and the menu will come back in a few more seconds.
                In the most extreme cases of unfortunate guesses at the
                identity of your video adapter, your PC can actually
                "lock up", requiring a re-boot...either simply enter the
                keyboard command <Ctrl-Alt-Delete>, or hit the PC
                control panel button "Reset", or turn the power to the
                PC off and then back on.

                Your goal here is to get that video mode which produces
                a proper screen like Fig.3 and which has the highest
                resolution.  A mediocre resolution is 320x240; a high
                resolution is 1024x768.  The number of colors desired is
                at least 16 (unless you actually have a Black&White
                monitor).  NO BENEFIT IS OBTAINED from selecting a 256
                color mode over a 16 color mode, so that you should take
                any other choice with 16 colors that has higher
                resolution.  You are allowed arbitraily many attempts to
                find the best performing video mode.



           At this point, you should test the mouse (if any) to verify
        that it moves the cursor.

           How to Wake up your Mouse if it is Dead:

                A MOUSE OR TRACKBALL POINTING DEVICE IS NOT REQUIRED TO
                USE PC-GenoGraphics, BUT IT FACILITATES ACCESS
                SIGNIFICANTLY, ESPECIALLY FOR THE LESS
                COMPUTER-EXPERIENCED USER.  If the cursor does not
                respond to mouse movement, you will need to activate the
                mouse and/or install the mouse into your system.  First,
                exit GGSETUP by hitting <Ctrl-C> and when the query
                about endorsing the performance comes up, hit <Ctrl-C>
                again.  When the DOS prompt appears, type whatever
                command activates the mouse on your system, typically:

                        MOUSE<Enter>

                Some "feel-good" message about "mouse successfully
                installed" should come up.  If not, or if after
                repeating the GGSETUP procedure, testing the mouse still
                yields no response, you will want to install the mouse
                onto your system.  This is done by editting the infamous
                file, CONFIG.SYS to insert a line directing that the
                description of the mouse be included into your operating
                system.  The new line is typically something like:

                DEVICE = C:\DOS\MSMOUSE.SYS

                or

                DEVICE = C:\DOS\PS2MOUSE.SYS

                After editing CONFIG.SYS, you must re-boot your machine,
                <Ctrl-Alt-Del> to place the changes into effect.

           Once you have found the best performing video mode, you
        answer "Y" to the question about endorsing the performance and
        the rest of the installation is straightforward:  The only
        remaining options are to decide which of the datasets included
        in the distribution are to be installed on your hard-disk.
        These are straightforward "Y" or "N" answers which are dictated
        by your needs and the disk space available on your disk.  The
        datasets presently distributed with PC-GenoGraphics are:


                GDB       Mapping data from the Human Genome as of
                          HGM-11
                HC21      Mapping data plus sequence data for Human
                          Choromosome 21
                ECOLI_R1  Kenn Rudd's curated version of Kohara's famous
                          mapping of E.coli together with sequence placed
                          on the map.

           At this point PC-GenoGraphics should be installed onto your
        system.  Pitfalls which indicate to the contrary include odious
        messages about your disk being full.  Obviously, nothing
        PC-GenoGraphics can do will relieve disk space congestion, and
        we recommend using BACKUP and ERASE to move unecessary files off
        your hard-disk.  Once you have cleared enough disk space, simply
        start our PC-GenoGraphics installation procedure over again from
        the beginning.  There are things which you can do to relieve
        disk crowding which do not require deleting any files.  There
        are so-called disk-compression programs such as DiskMax and
        SuperStore which are avaiable at any reasonable PC software
        vendor, typically priced in the $50 range or cheaper.  These
        typically allow twice as much information to be stored on any
        given hard-disk with at worst modest loss of performance.













CHAPTER 3   USING PC-GenoGraphics AS A VIEWING TOOL

           This chapter teaches you how to select a chromosome for
        viewing and how to enter elementary commands to set the field of
        view and how to exit PC-GenoGraphics.  PC-GenoGraphics has
        comprehensive built-in context- sensitive help.  The bottom of
        the PC screen always displays some context-sensitive hint.  In
        addition, errors cause more extensive help messages to appear.
        Every menu has a help item attached.  EXPERIENCED USERS MAY BE
        ABLE TO DO THIS ON THEIR OWN AND MIGHT RESUME THE TUTORIAL AT
        CHAPTER 4.

           The most elementary use for PC-GenoGraphics is to visualize
        genomic information.  The basic technique in dealing with
        genomic data is to take advantage of the linear organization of
        chromosomes.  We will universally represent the length of the
        genome as a horizontal stripe across most of the video display
        with 5' at the left and 3' at the right of the display.  Various
        information may be represented by different stripes at distinct
        heights on the screen, but they all cover the same horizontal
        space.  Fig.5 provides a specific example of data included into
        Kenn Rudd's compilation of data on the well-mapped E.coli
        organism, Fig.6 provides a specific example of data present in
        GDB and GenBank concerning the much less-accurately mapped human
        genome, while Fig.7 provides a specific example of the
        completely sequenced genome of an HIV virus.

           You will notice immediately the principal obstacle in graphic
        display of genomic data:  Even the tiny virus has nearly ten
        thousand basepairs in its genome, while even most advanced
        graphics displays have less than one thousand pixels avaible to
        display this length of data.  The time-honored method of dealing
        with this sort of discrepancy is by enabling the user to "zoom"
        in to fine scales when fine detail is required while still being
        able to search around for data which may be "offscreen" because
        of zooming activity.  PC-GenoGraphics has extensive zooming and
        panning capabilities which comprise the bulk of the pure
        visualization manipulation available to the user.

           We will explore how to use the zooming capability of
        PC-GenoGraphics by explicit tutorial example on the artificial
        system AATEST which is included in all distributions of
        PC-GenoGraphics.  First, we need to get AATEST onto the screen.

                1.)Activate PC-GenoGraphics:

                        CD C:\GG<Enter>   (or whatever you actually used)
                        C:<Enter>
                        GG<Enter>

                At this point a screen with an open menu should appear.

                BELOW, KEYSTROKES ARE GIVEN, BUT YOU CAN (AND SHOULD) DO
                EXACTLY THE SAME THINGS WITH THE MOUSE (IF YOU HAVE ONE)
                BY POINTING AT THE LABELLED BUTTONS OR MENU ITEMS WHICH
                HAVE THE APPROPRIATE CAPITALIZED AND UNDERLINED LETTERS
                IN THEIR LABELS:

                2.)Choose file to view:  The following command picks the
                default directory and then the top file named in the
                list and chooses not to load any "Update File".  It also
                chooses to load all of the maps in the selected file.

                        <Enter>
                        G
                        N
                        G

                At this point a screen looking like Fig.8 should appear.
                You may now skip down to item 3.) and continue the
                tutorial without loss.

                In general, this procedure can be modified by standard
                "look and feel" techniques to alter the selected
                conditions:

                        D:\MYDIR\MYSUB<Enter>

                replaces the first command above if you want to look at
                files in that particular subdirectory.

                To select other than the first file in the list, you
                must promote your desired filename to lie within the
                small box above the list.  This is done by navigating
                the cursor to point at the desired filename within the
                large box (using the slider bar at the right to scroll
                the list if necessary) and clicking the cursor on the
                desired filename.  When the desired filename has been
                promoted, it is endorsed by hitting the "Go" button:

                        G

                After this, a list of potential update files is
                presented and the desired update filename, if any, is
                likewise promoted and is endorsed by hitting the "Go"
                button:

                        G

                If no update file is desired, simply hit the "None"
                button:

                        N

                If a new update file is desired, hit the "Anew" button
                and type in its name:

                        A
                        newname<Enter>

                At this point you will be presented with a list of maps
                which are contained in the file you have selected.
                These represent unified classes of data items attached
                to that file.  Any combination of these maps (other than
                none at all) may be selected for viewing.  Unselected
                maps play no further role in the session.

                You can always reinitiate this entire file selection
                procedure by invoking the File option in the Files menu.

                        F
                        F

                3.)Practice Keyboard Zoom:

                        Z
                        Z
                        K
                        0.20<Enter>
                        0.60<Enter>

                Notice that this zooms in on the range [0.20,0.60] where
                the whole possible length of the "chromosome" under
                study is always [0.0,1.0].  Obviously, keyboard entry is
                the most cumbersome and the most precise mechanism for
                specifying a zoomed field of view.  Try one or two more
                keyboard zooms of your own choosing.  Notice two
                features of the zoomed views:  First, that irregular
                shapes such as triangles are always drawn to fit within
                the boundaries of the screen quite regardless of whether
                their full length hangs outside the screen width.
                Second, that when a shape hangs outside the screen
                width, this fact is indicated by the addition of a
                "continuation arrow" drawn off-scale (in cerise on color
                displays, elsewise in black) in the direction of the
                overflow.  This protocol obtains no matter what method
                is used to invoke the zoomed view.

                4.)Practice Unzoom:

                        Z
                        U

                Each time you do this you should recover from memory the
                zoomed view you had once further in the past (up to a
                maximum of 20).  Notice on the menu where the Unzoom
                option is finally represented, that there is a "U" in
                parentheses.  This means that you actually did not need
                to invoke the menu explicitly, rather, "U" is a
                so-called "hot-key" and that you could have typed a "U"
                while viewing the screen when no menus of any kind
                obtrude into the viewing area and the Unzoom would have
                been implemented.

                5.)Practice Rezoom

                        Z
                        R

                This precisely undoes the action of the last Unzoom.
                "R" is the hot-key for this command.

                6.)Practice fullView zoom

                        Z
                        V

                This always shows the full view regardless of previous
                zoom history.  "V" is the hot-key.

                7.)Practice Mouse zoom

                        Z
                        Z
                        M

                Notice that when the item is selected, the cursor shape
                changes from an arrowhead to a cross...THIS IS HOW YOU
                CAN TELL IF A MOUSE EVENT IS AWAITED TO DEFINE A ZOOM
                WINDOW.  When the cross cursor is obtained, you can move
                the cursor to one side of the range which you want to
                view, click the mouse button once, move the cursor to
                the other side of the range which you want to view, and
                click the mouse once more.  "M" is the hot-key for this
                option.  ("M" is the most important hot-key to remember,
                by far).  IF YOU DO NOT HAVE A MOUSE:  You can still
                activate mouse functions (although in a somewhat
                cumbersome manner) by holding the <Shift> key while
                pressing the arrow keys for motion or holding the
                <Shift> key while hitting the <Enter> key instead of
                pressing the mouse button.

                8.)Practice panning:

                First zoom to full screen:

                        V

                Notice the little box with horizontal stripes in the
                menu bar at the top of your screen.  Watch this box
                while you zoom in to the range [0.20,0.60] using
                keyboard zoom as above:

                        Z
                        Z
                        K
                        0.20<Enter>
                        0.60<Enter>

                Notice that the bright stripes (which represent the
                fraction of the whole genome visible at present) have
                been narrowed, being partially blacked out on the left
                and on the right.  Notice that 20% of the left of the
                stripes is obscured and 40% of the right.  This protocol
                obtains no matter what method is used to invoke the
                zoomed view.

                Now pan to the right (hot-key <PageUp>):

                        Z
                        Z
                        R

                The field under view should jump over 25% of its width
                to the right.  Repeated <PageUp> hits will walk the
                length of the chromosome.

                Now pan to the left (hot-key <PageDown>):

                        Z
                        Z
                        L

                To exit PC-GenoGraphics, point at the large button in
                the upper right of the display with your mouse and click
                once or simply type the key-stroke <ESC> whenever no
                menu items are obtruding into you screen.














           CHAPTER 4 USING PC-GenoGraphics AS A DATA DISPLAY TOOL

           This chapter teaches you how to visualize the data-box which
        is attached to visual objects in the PC-GenoGraphics screen.
        This activity is fairly complicated and all users are
        recommended to go through this part of the tutorial.  After this
        chapter, you should be able to call up any data-box, to view all
        of the data which it contains, and to search for specific
        information within the data-box.

           The visual displays in PC-GenoGraphics are organized
        horizontally as copies of the genome under study.  Different
        data are organized into vertically distinct stripes running the
        width of the screen.  These horizontal stripes are segregated
        into "maps" each of which consists of some number of what we
        call "submaps".  Placed upon the length of each submap are
        various blobs (which we call "objects") that represent parts of
        the data.  The logical scheme is that objects which are similar
        are all present on a single map and distributed among enough
        submaps in that map to allow all objects so as to avoid actual
        overlap (simply sharing edges is not an actual overlap).  THUS,
        EACH DATUM IS ASSOCIATED WITH AN OBJECT WHICH DEFINES THE
        LOCATION OF THE DATUM ALONG THE LENGTH OF THE CHROMOSOME, AND
        EACH OBJECT IS ASSIGNED TO SOME PARTICULAR MAP AND SUBMAP TO
        FACILITATE IDENTIFYING ITS SIGNIFICANCE AND TO ALLOW ITS VISUAL
        DISTINGUISHABILITY, RESPECTIVELY.

           Let us see how to recover the data associated with objects on
        our visual display.  We will again use AATEST:

                1.)Activate PC-GenoGraphics:

                        CD C:\GG<Enter>   (or whatever you actually used)
                        C:<Enter>
                        GG<Enter>

                At this point a screen with an open menu should appear.

                BELOW, KEYSTROKES ARE GIVEN, BUT YOU CAN (AND SHOULD) DO
                EXACTLY THE SAME THINGS WITH THE MOUSE (IF YOU HAVE ONE)
                BY POINTING AT THE LABELLED BUTTONS OR MENU ITEMS WHICH
                HAVE THE APPROPRIATE CAPITALIZED AND UNDERLINED LETTERS
                IN THEIR LABELS:

                2.)Choose the file AATEST to view:

                        <Enter>
                        G
                        N
                        G
                        V

                At this point a screen looking like Fig.8 should appear.
                Notice that there are three seperate maps, labelled
                "DNA1", "RNA1", and "PEP".  The first two maps have one
                submap each while the third map has three submaps.

                3.)Now using the mouse (or the combination of <Shift>
                and arrow keys to navigate the cursor to point at the
                medium sized, triangular object labelled m3o1 in the
                bottom left quadrant of your screen.  Once you have the
                cursor positioned on this object, press the mouse button
                (or the combination of <Shift> and <Enter>).  This will
                activate the data attached to that object to be
                displayed superimposed over the visual image of the
                maps, submaps, and objects.  The screen should look
                something like Fig.9.


                4.)Examine the head-line at the top of the new data-box:
                This contains the complete name and location of the
                selected object.  NOTICE THAT THE COORDINATES WHICH
                DEFINE THE POSITION OF THE VIEWED OBJECT ARE IN UNITS IN
                WHICH THE ENTIRE RANGE OF THE GENOME UNDER CONSIDERATION
                IS EXACTLY [0.00 , 1.00].

                5.)The large scrollable field of several text lines
                which dominates the data-box contains arbitrary
                information which is attached to the chosen object.
                Because of the wide range of possiblities, a number of
                controls are present to allow you to navigate these
                data.  The most elementary form of motion through the
                text in the data-box is by "grabbing" the slider handle
                (the light part) in the slider control to the right in
                the data-box, holding with mouse button down, and
                pulling the slider bar upwards or downwards on the
                screen by moving the mouse position.  The text will
                scroll in response to this action.  THIS IS THE MOST
                CUMBERSOME ACTIVITY IMPOSED UPON USERS LACKING A MOUSE
                because they must use the arrow keys to move the
                highlight to the top or bottom of the large text block,
                arrow presses which would take the highlight off screen
                scroll the text block one line.

                From left to right along the bottom of the data-box are
                buttons:


                     Move   This allows the user to look through the
                            data-box and to reposition it if desired
                            to get visual access to the underlying
                            maps.  To return the data-box to
                            visibility, hit any mouse key or hit
                            <Enter>.  YOU CANNOT CONTINUE ANY ACTION
                            OTHER THAN MOVING THE DATA-BOX UNTIL IT
                            IS RE-MATERIALIZED IN THIS WAY.

                <<     >>   Notice that the text in the large block
                            is organized in a special hierarchy.
                            Each line in the display is either
                            left-justified or it has a leading blank
                            character.  Those lines which are
                            left-justified have, in general, some
                            number of leading ">" characters ranging
                            from zero to three.  All lines with
                            leading blanks are to be thought of as
                            continuation lines of the preceeding
                            left-justified line.  The notion here is
                            that data with one or zero ">" characters
                            in their left-justified line are the most
                            salient and should always be visible.
                            Data with two such ">" characters are
                            intermediate in saliency while large
                            datasets (such as sequence) which are to
                            be viewed rarely have left-justified
                            lines with three ">" characters.  Notice
                            that as the data-box comes up, the "<"
                            button is initially "ghosted out".  This
                            means that the "<" button is not
                            pressable.  If you press the ">" button
                            once, however, data at the intermediate
                            level of saliency will be interleaved
                            into the display.  Hitting ">" again will
                            bring up the sequence data attached to
                            this object.  In general, ">" will bring
                            up more detailed versions of the data
                            while "<" will bring up more condensed
                            versions.

              pgUp   pgDn   At whatever ">" level you are viewing the
                            data in a data-box, it is possible that
                            there are too many data to be viewed as a
                            unified text even when the scrollbar to
                            the right of the text-block is used.  In
                            this case, the resulting text is
                            paginated, and you can move forward one
                            page with "D" and backwards one page with
                            "U".

                      Add   Discussion of this feature will be
                            deferred to Chapter 7.

                     Find   This button activates searching of the
                            data contained in the data-box under
                            view.  To demonstrate usage of "Find", we
                            first will get to the top line at the
                            highest level of saliency:

                                <
                                <
                                U
                                U
                                U

                            Now initiate the search:

                                F
                                ELVIS<Enter>


                            This will find all occurences of the word
                            "ELVIS" present in the comment as viewed
                            at the most salient level.  Notice that
                            there are none, and that you are notified
                            of this lamentable fact by a briefly
                            appearing informative text box.  Next we
                            will drop down to the bottom level of
                            saliency and try again:

                                >
                                >
                                U
                                U
                                U
                                F
                                ELVIS<Enter>

                            Notice that this important peptide
                            feature does appear in our model dataset.
                            Notice also that the top line settles on
                            the first occurence of "ELVIS" in the
                            dataset under investigation.  Here is how
                            to find the next occurence (if any) of
                            "ELVIS":  Navigate the cursor to light up
                            the second line of the large display area
                            in the data-box, and click a mouse button
                            or <Enter>, then

                                F
                                ELVIS<Enter>

                            To find yet another occurence of ELVIS,
                            repeat the above procedure, this time
                            allowing the search to extend across
                            multiple pages which would require you to
                            issue the command "D" for access:

                                F
                                ELVIS<Enter>
                                O

                            Notice that the last command orders the
                            search to span the pagination of the text
                            and the next occurence of "ELVIS" would
                            be located, although there are no more in
                            this particular data-box.

                     dumP   You may want to preserve a copy of
                            certain data contained within a data-box.
                            This command saves the entire contents of
                            all pages of the present data text which
                            would be visible at the present level of
                            ">" and "<" setting.  You are queried for
                            the name of the destination file.

                                P
                                MORGAN.TXT<Enter>

                     Quit   This closes the data-box and allows you
                            to continue other activities on the
                            graphics viewing screen.  YOU CANNOT
                            PERFORM ANY ACTIVITIES OUTSIDE THOSE IN
                            THE OPEN DATA-BOX UNTIL YOU HAVE CLOSED
                            IT BY HITTING THE Quit BUTTON.

                                Q













           CHAPTER 5 USING PC-GenoGraphics AS A DATA SEARCH TOOL

           This chapter teaches the user how to identify which
        PC-GenoGraphics objects are interesting on the basis of their
        names or the text contents of their data-boxes.  After
        completing this chapter, the user should know how to select sets
        of objects, how to tell which are selected, how to concentrate
        on selected objects, and how to search keyword-indexed text data
        attached to any objects.  The user will also know how to
        restrict the range of searches.  This is specialized material
        and is recommended for all users.

           So far, precious little we have done is at all specific to
        the actual data attached to the various objects in our display.
        In fact, a number of data searching tools are included into
        PC-GenoGraphics which allow considerable opportunities to probe
        the data and to generate visual displays in response to those
        queries.  This class of capabilities elevates PC-GenoGraphics
        above the rank of a mere visualization tool.


           Selecting Objects By Name

           Our most primitive class of data search is what we call
        "selecting" some class of objects.  This selection process
        elevates objects to a more visible status and makes them more
        readily addressable.  Let us start with the simplest class of
        object selection, selection by Name.  Our first exercise will be
        to select two objects by name.  Invoke PC-GenoGraphics, and
        select the file AATEST.ALL, then type:

                 S
                 N
                 m1o3<Enter>
                 m2o2<Enter>
                 <Enter>

           Notice that there are two lower-case "o" (not zero)
        characters in the above.

           Notice that the two selected objects blink on the screen.
        Notice also that, after the blinking has settled down, the
        colors of selected objects are inverted from what they would
        have been had they not been selected.  You can always get the
        selected objects visually to reveal themselves without the
        trouble of re-drawing the whole screen by typing the hot-key "B"
        whenever the viewing screen is unobstructed by menus or
        information boxes.  Note that the every other time you hit the
        hot-key "B", the final state of the selected objects is reversed
        from the previous final state.  Of course, the file AATEST
        describes objects which are so large on the screen that their
        names are clearly legible, but more realistic biological systems
        do not maintain this convenience and selecting objects by name
        will point out the interesting regions of the chromosome for
        further investigation.


           Zooming on to Selected Objects

           Now, while two objects are selected, we can exploit their
        status and zoom in on them.  Issue the commands:

                Z
                Z
                S

           This zooms in to the smallest screen which will contain ALL
        of the selected objects.  Notice, of course, that the left and
        right ends of the screen both have selected objects butting up
        to them.  Now issue the commands

                Z
                Z
                N

           This zooms in so that the full width of the screen is spanned
        by the Next selected object (next in left to right order), m2o2.
        This valuable zoom action has hot-key "N".  Issuing this command
        sequence again:

                Z
                Z
                N

        will zoom in on object m1o3.  Similarly, the command sequence

                Z
                Z
                P

        zooms in so that the full width of the screen is spanned by
        the Previous selected object, m2o2.  Here, the hot-key is "P".

           This combination of selecting objects by name followed by
        ZZN, ZZP command sequences facilitates access to the data-boxes
        for objects whose name is known.

           Unselecting Previously Selected Objects

           If you continue to select more objects, say, by issuing the
        commands

                S
                N
                m3o3<Enter>
                <Enter>

        the newly selected objects are simply appended to the
        previous list of selected objects.  If you wish to start over,
        de-selecting all previously selected objects, the commands are:

                S
                U

           Notice that issuing this "Unselect" command does not fold up
        its underlying menu, instead leaving it open to continue further
        selection activity.  If you simply wish to close this menu box
        and continue with other activities, the menu command is "Back"

                B

        THIS PROTOCOL IS USED ON ALL MENUS IN PC-GenoGraphics.

           Issue the following commands now, before continuing to the
        next step in this tutorial.

                B
                V
                S
                U
                B

           Selecting Objects by Contents

           Of course, we do not always know the names of objects which
        contain sought information, we usually know something else about
        the information.  This sort of search is what we call "selecting
        objects by contents".  This is facilitated by a two-level
        hierarchical index structure for the bulk of data attached to
        objects.  Recall from the description of the organization of the
        large text-block in a data-box that each line of the text-block
        is attached to a header line (the preceeding left justified
        line) which has some number of ">" characters at its head
        followed immediately by some arbitrary "keyword".  Let us start
        by searching the file AATEST for all objects which contain
        information under keyword GOLD and which refer to LONDON
        somewhere else in the chosen header line or its subsequent
        continuation lines.

                S
                C
                Gold<Enter>
                LONDON<Enter>
                A

           The last menu with three pushbuttons (All, Could, Must)
        defines the range of the genome to search for the desired
        object.  "All" means that the entire genome should be searched
        regardless of what is visible on the screen.  "Must" means to
        limit the search only to objects which are totally contained
        within the present screen image.  "Could" means to limit the
        search only to objects of which any part is visible on the
        present screen.  These directives are quite useful in providing
        PC-GenoGraphics intellectual guidance in speeding up its
        searching procedures.

           Notice that three objects are selected by this search, m1o2,
        m1o3, and m3o1.  They all contain news of gold prices in London.
        Point the cursor at m1o2, one of these selected objects, and
        click to reveal the underlying data-box.  You will notice the
        line ">GOLD" at the top level of saliency, but no mention of
        London is visible at this top level...this is not the source of
        our successful selection.  Hit ">" once, and a comment headed by
        the line ">>GOLD_PRICES" will appear; this is the line whose
        contination lines actually refer to London, and this is the
        source of our successful selection.  Notice that the keyword
        found need not be a complete precise match to the keyword
        sought, but rather that the found keyword can have more letters
        following after a prefix which is a precise, case-insensitive,
        match to the sought keyword.  I.e.  ">>GOLDBUG" will provide a
        match to sought keywords "GoLd", "goldb", "G" , or "", for that
        matter, but it will NOT match sought keywords "GOLDBAG",
        "GOLDBUGS", or ">>GOLDBUG".  Likewise for the sought string:  We
        could find all objects with any keyword starting with "GOLD" by
        commands:

                S
                U
                C
                Gold<Enter>
                <Enter>
                A

        while we could find all objects which mention "GOLD" (and
        "Goldfarb", and "Rhinegold", etc. etc.)  QUITE REGARDLESS OF
        WHAT THEIR HEADLINE SAYS by commands:

                S
                U
                C
                <Enter>
                GOLD<Enter>
                A

           If the data attached attached to objects are intellegently
        organized with keywords and regularized spelling, one can
        already attain considerable navigability in databases which
        describe genomes where exact sequence placement is not yet
        important.  A beautiful query to do with our distribution of the
        Human Genome data from GDB (files GDB or HUMAN, if you have
        them) is to visualize all data relevant to zinc-fingers (more
        precisely, relevant to zinc in any way) by the commands

                S
                C
                <Enter>
                zinc<Enter>
                A

           Another such query, this time against the E.coli database (in
        files ECOLI_R1 or ECOLI_H1, if you have them) is to locate all
        objects containing reference to phages:

                S
                C
                <Enter>
                phage<Enter>
                A

           The appropriate sites light up along the genome, and are
        accessible to further investigation using the ZZN and ZZP
        commands to track down the interesting objects, clicking the
        cursor on those objects to bring up their data-boxes, and using
        ">" and "F" commands automatically to scan the text-blocks for
        the relevant information.













           CHAPTER 6 SEARCHING SEQUENCE

           This chapter teaches the user how to search arbitrary objects
        for a wide range of possibly imprecise sequence patterns.  The
        user will learn an entire language to define such queries, which
        we call "punits".  The user will learn how to identify whether
        DNA, RNA, or peptides are to be searched, how to restrict the
        range of searches, and how to formulate queries.  This material
        will be new to all users.

           So far, there has been no mention of specializations in
        PC-GenoGraphics which reflect the genomic nature of the data.
        In fact, it is a general rule that "PC-GenoGraphics knows
        nothing about genomes".  In particular, PC-GenoGraphics has no
        explicit understanding of the concept of basepairs, or of genes,
        etc.  PC-GenoGraphics does understand keywords and their
        attached comment lines, and these more universal constructs must
        serve to convey the more specialized meanings.  The exception to
        this rule is for sequence data.  We have implemented versatile
        and efficient search mechanisms for sequence data and learning
        how to manipulate these tools will open to many users their
        highest level of exploitation of PC-GenoGraphics.

           Before getting down to techniques, let us examine sequence
        data attached to a data box:  To do this, load the file AATEST
        into PC-GenoGraphics and click the cursor on the object labelled
        "m2o1" which is a striped rounded rectangle in the left half of
        the screen inside the the large box (map) labelled "RNA1".  The
        text-block should be viewed at the lowest level of saliency:

                >
                >

        and you should see a small block of RNA sequence displayed
        within the text-block...use the <Down-Arrow> or scrollbar if
        necessary.  Notice how this sequence is displayed:  It has a
        header line that is special.  The actual sequence is attached to
        a line which looks like this

                >>>X 985795

           The special character represented above as an X probably
        looks somewhat different on your screen.  The preceeding line is
        also related to the sequence data, but it is more normal and has
        a higher level of saliency (we always like to have this be at
        the most salient level) and looks like this:

                >SEQ_RNA  start: 985795  end: 985984

           Of course this line is visible at all levels of saliency
        while the unusual line (and the actual sequence attached to it)
        is visible only at the lowest level of saliency.  Notice that
        the sequence contains the subsequence UUUUGUUCAG in its third
        block of ten basepairs.  Let us rediscover this subsequence with
        a simple sequence search.


           The Simplest Sequence Search

           Quit the data-box (command "Q") and invoke the commands:

                Q
                U
                Q
                UUUUGUUCAG<Enter>
                C
                A

           The last menu with three pushbuttons (All, Could, Must)
        defines the range of the genome to search for the desired
        object.  "All" means that the entire genome should be searched
        regardless of what is visible on the screen.  "Must" means to
        limit the search only to objects which are totally contained
        within the present screen image.  "Could" means to limit the
        search only to objects of which any part is visible on the
        present screen.  These directives are quite useful in providing
        PC-GenoGraphics intellectual guidance in speeding up its
        searching procedures.

           The next to last menu with two pushbuttons (Ok and Cancel)
        defines whether overlapping matches are regarded as distinct.
        For instance:  searching for AGA in the sequence GCAGAGA will
        yield only the first match if "Cancel" is issued, but will yield
        both if "Ok" is issued.

           The above finds all occurences of this subsequence on the
        forward strand of all DNA or RNA sequence entries in the file
        AATEST.  Notice the information box which appears monitoring
        progress of your search.  We will come to the proper use of this
        box later.  When the search is done, the information box will
        disappear, the screen will be re-drawn and a new gray bar
        running the height of the display will appear for each match.
        Needless to say, the left-to-right position of this bar
        corresponds to the location of the sought subsequence relative
        to the length of its containing object and that object's
        placement along the map to which it is assigned.  Although this
        subregion is not associated with any particular object
        previously present on the screen (notice in particular that the
        object m2o1 is not blinking), the gray stripe itself is a new
        object which is "selected".  Thus, for instance, the command to
        zoom to the next selected object:

                Z
                Z
                N

        will fill the screen at just the position of the matching
        subsequence and, of course, the whole screen will be grayed as
        well.


           More Complicated Searches for DNA/RNA Patterns

           The previous search was for a precisely known subsequence;
        rarely do we know precisely what we seek!  PC-GenoGraphics
        offers a wide variety of imprecise searching options.  These
        comprise a small language which is directly based upon that
        created by Searle:  Each DNA/RNA sequence query is structured as
        a set of text blobs (we call them "punits") seperated by <Space>
        characters.  The notion is that a successful match for the whole
        query requires a consecutive set of subsequences each of which
        matches the consecutive punits of the query.  The simplest
        example above has only one punit which was "UUUUGUUCAG".  In
        general, we allow punits to be defined in several ways:

                1.)Explicit Sequence With Ambiguity Codes.  We would
                have matched the same subsequence as above (and possibly
                other sequences as well) if the query were for
                YYYYRYYYRR or UUNUGNUNAG.  Our complete list of RNA/DNA
                sequence ambiguity codes follows:

                                +-----+------+
                                |Code | Match|
                                +-----+------+
                                |   A | A    |
                                |   B |  CGT |
                                |   C |  C   |
                                |   D | A GT |
                                |   G |   G  |
                                |   H | AC T |
                                |   K |   GT |
                                |   M | AC   |
                                |   N | ACGT |
                                |   R | A G  |
                                |   S |  CG  |
                                |   T |    T |
                                |   U |    T |
                                |   V | ACG  |
                                |   W | A  T |
                                |   Y |  C T |
                                +-----+------+

                Notice that "U" and "T" are totally equivalent in our
                query language.

                2.)Alternative Matching Possibilities (OR).  We can
                combine two alternative definitions of one punit by
                placing them into the construct

                        ( punitA | punitB )

                this is read "punitA OR punitB" and means that a match
                is acceptable if EITHER of the two criteria are met.
                With this construct it is possible (at reduced search
                speed) to recreate all of the ambiguity codes.  For
                instance ((A | C) | G) is equivalent to the ambiguity
                code V.

                Notice that the peptide codons are now uniquely
                representable using these techniques:

                          +-------+-------+-------------+
                          |Peptide|Abbrev.|RNA/DNA code |
                          +-------+-------+-------------+
                          |  Ala  |  A    | GCN         |
                          |  Arg  |  R    | (CGN | AGR) |
                          |  Asn  |  N    | AAY         |
                          |  Asp  |  D    | GAY         |
                          |  Cys  |  C    | TGY         |
                          |  Gln  |  Q    | CAR         |
                          |  Glu  |  E    | GAR         |
                          |  Gly  |  G    | GGN         |
                          |  His  |  H    | CAY         |
                          |  Ile  |  I    | ATV         |
                          |  Leu  |  L    | (TTR | CTN) |
                          |  Lys  |  K    | AAR         |
                          |  Met  |  M    | ATG         |
                          |  Phe  |  F    | TTY         |
                          |  Pro  |  P    | CCN         |
                          |  Ser  |  S    | (TCN | AGY) |
                          |  Str  |       | RTG         |
                          |  Ter  |       | (TAR | TGA) |
                          |  Thr  |  T    | ACN         |
                          |  Trp  |  W    | TGG         |
                          |  Tyr  |  Y    | TAY         |
                          |  Unk  |  X    | NNN         |
                          |  Val  |  V    | GTN         |
                          +-------+-------+-------------+

                3.)Ellipses.  One frequently wants to allow some number
                of basepositions to be "skipped over", i.e. to be
                matched regardless of what their identity is.  An
                instance is that one often does not care what the
                identity of the nucleotides in the loops when searching
                for the classical sequence indicating a "hairpin" in RNA
                secondary structure.  Our query punit which does this is
                the ellipsis such as:

                        4...16

                which is two non-negative integers seperated by three
                periods, the first cannot be larger than the second.  An
                explicit use of an ellipsis is to search for certain
                hairpins, such as:

                        AAGCT 4...6 AGCTT

                Notice that this query has three punits, the middle of
                which is an ellipsis, and the outer two of which are
                reverse complements.  It will match any hairpin with the
                specified sequences on both sides of the ladder and any
                length of loop ranging from 4 to 6, inclusive.  Notice
                that the above query produces exactly the same result as
                the more complicated three punit query:

                        AAGCT ((NNNN | NNNNN) | NNNNNN) AGCTT

                4.)Specified Limits on Mismatches, Inserts and Deletes.
                Any single punit can be matched imprecisely by
                specifying a maximum number of mismatches, insertions
                and deletions which are required to transform the target
                into a precise match to the specified subsequence.  Our
                language allows this to be specified by appending a
                bracketted addendum such as:

                        AGCTT[1,2,3]

                the order of the numerical arguments is

                        [#Mismatches , #Insertions , #Deletions]

                Virtually any target subsequence will match the
                preceeding punit because the [1,2,3] condition is so
                sloppy compared to the length of the specified sequence
                AGCTT.  Examples of imprecise matches to AGCTT are given
                below:


                       |----------|---------|---------|
                       |Specified | [ , , ] | Matches |
                       |----------|---------|---------|
                       | AGCTT    | [1,0,0] | AGCTT   |
                       | AGCTT    | [1,0,0] | AGTTT   |
                       | AGCTT    | [1,0,0] | ACCTT   |
                       | AGCTT    | [1,0,0] | AGCCT   |
                       | AGCTT    | [1,0,0] | AGCAT   |
                       | AGCTT    | [0,1,0] | AGCTT   |
                       | AGCTT    | [0,1,0] | AGACTT  |
                       | AGCTT    | [0,1,0] | AGCTAT  |
                       | AGCTT    | [0,1,0] | AAGCTT  |
                       | AGCTT    | [0,0,1] | AGCTT   |
                       | AGCTT    | [0,0,1] | AGCTT   |
                       | AGCTT    | [0,0,1] | ACTT    |
                       | AGCTT    | [0,0,1] | AGTT    |
                       | AGCTT    | [0,0,1] | AGCT    |
                       |----------|---------|---------|

                Notice that the first and last nucleotides always are
                precisely matched and that the exact match always
                matches.

                5.)Weighted Matching.  Especially when recognizing
                certain motifs, etc. it is convenient to allow quite
                general scoring algorithms to be implemented.  At each
                position, we can not require any sort of specific
                nucleotide to be present, but rather accumulate a score
                based upon what is there, and accept as a match any
                target pattern which exceeds a specified score.

                        (20,40,10,30)

                accumulates a score of 20 for A, 40 for C, 10 for G, and
                30 for T, respectively.  A series of these scores is
                accumulated within curlies and is tested as follows:

                        {(20,40,10,30),(10,10,80,0),(22,28,43,7)}>60

                will be matched by NGN, CTA, CGC, etc.  It will not be
                matched by GTT, TAT, CAT, etc.

                6.)Labelled Punits.  It is convenient to save re-typing
                sequence patterns by assigning labels to various punits.
                This is done in our language by the syntax

                        p6=

                where the 6 could be replaced by any non-negative
                integer.  The "p6" could be used at any point in later
                definitions within the query to stand for another copy
                of the punit to which it was attached with the "=".  For
                example:

                        p1=AAGCT GAG p1

                is precisely the same as

                        AAGCT GAG AAGCT

                or, of course

                        AAGCTGAGAAGCT

                7.)Reverse Complement Operator.  Not only can we use the
                labels on punits to call out for repeats of some
                previously specified punit, but also for the reverse
                complement of some previously specified punit.  This is
                particularly useful in specifying searches for
                substrings with likely secondary structure such as our
                first hairpin search above:

                        AAGCT 4...6 AGCTT

                which could have been written more compactly as:

                        p1=AAGCT 4...6 ~p1


                A more general use of this construct is to find ANY
                hairpin with ladder of length ranging from 10 to 12 and
                loop ranging from 4 to 8:

                        p1=10...12 4...8 ~p1


           Searching Peptide Sequence

           In order to activate peptide searching mode, you will issue
        commands:

                Q
                T
                P

           Now all subsequent sequence queries will use peptide
        protocols.  These are a subset of the above DNA/RNA protocols,
        modified as follows:

                1.)Explicit Sequence With Ambiguity Codes.  All letter
                codes are now considered to be the standard peptide
                abbreviations and "1...1" becomes the only "wildcard"
                character:

                            +-------+-------+
                            |Peptide|Abbrev.|
                            +-------+-------+
                            |  Ala  |  A    |
                            |  Arg  |  R    |
                            |  Asn  |  N    |
                            |  Asp  |  D    |
                            |  Cys  |  C    |
                            |  Gln  |  Q    |
                            |  Glu  |  E    |
                            |  Gly  |  G    |
                            |  His  |  H    |
                            |  Ile  |  I    |
                            |  Leu  |  L    |
                            |  Lys  |  K    |
                            |  Met  |  M    |
                            |  Phe  |  F    |
                            |  Pro  |  P    |
                            |  Ser  |  S    |
                            |  Thr  |  T    |
                            |  Trp  |  W    |
                            |  Tyr  |  Y    |
                            |  Unk  | 1...1 |
                            |  Val  |  V    |
                            +-------+-------+

                2.)Alternative Matching Possibilities (OR).  We can
                combine two alternative definitions of one punit by
                placing them into the construct

                        ( punitA | punitB )

                this is read "punitA OR punitB" and means that a match
                is acceptable if EITHER of the two criteria are met.

                3.)Ellipses.  One frequently wants to allow some number
                of peptides to be "skipped over", i.e. to be matched
                regardless of what their identity is.  Our query punit
                which does this is the ellipsis such as:

                        4...16

                which is two non-negative integers seperated by three
                periods, the first cannot be larger than the second.

                4.)Specified Limits on Mismatches, Inserts and Deletes.
                Not implemented for peptides.

                5.)Weighted Matching NOT YET IMPLEMENTED FOR PEPTIDES.

                6.)Labelled Punits---NOT YET IMPLEMENTED FOR PEPTIDES.
                It is convenient to save re-typing sequence patterns by
                assigning labels to various punits.  This is done in our
                language by the syntax

                        p6=

                where the 6 could be replaced by any non-negative
                integer.  The "p6" could be used at any point in later
                definitions within the query to stand for another copy
                of the punit to which it was attached with the "=".  For
                example:

                        p1=ELVIS GAG p1

                is precisely the same as

                        ELVIS GAG ELVIS

                or, of course

                        ELVISGAGELVIS

                7.)Reverse Complement Operator. is not applicable to
                peptides.

           NOTICE THAT WHEN YOU WISH TO REVERT TO SEARCHING DNA/RNA
        SEQUENCES, YOU MUST TOGGLE:

                Q
                T
                D












           CHAPTER 7 USING PC-GenoGraphics AS AN INTERACTIVE LOGBOOK

           In this chapter, the user will learn how to attach
        annotations to objects in PC-GenoGraphics.  Also, some pithy
        hints on how to organize such annotations are provided.

           So far, the capabilities of PC-GenoGraphics which we have
        considered have been aimed at reasoning with "dead" datasets
        which apparently must have been provided in the original
        distribution of PC-GenoGraphics!  In fact, the user is allowed
        some considerable leeway to annotate and modify existing
        datasets and even to create your own totally new maps.  The
        first class of this interactive updating of the datasets is
        simple annotation of objects and implements what we consider to
        be an elementary interactive logbook.


           Update Files:  What they Mean and How to Use them

           Restart PC-GenoGraphics from the DOS command line, but this
        time we will open a new "update file" (called MYADDS.UPD in this
        example) to hold our additions to the distribution file AATEST.
        This is done by the following sequence of commands:


                CD C:\GG<Enter>     (or whatever area you actually used)
                C:<Enter>
                GG<Enter>
                <Enter>
                G
                A
                MYADDS<Enter>
                G

           The first thing to understand about an update file like
        MYADDS.UPD is that IT APPLIES ONLY TO THE DATAFILE *.ALL FOR
        WHICH IT WAS CREATED.  You will notice that, although we chose
        to create a totally new update file called MYADDS.UPD for this
        session, we might have considered continuing updates which were
        started by the authors of PC-GenoGraphics at Argonne National
        Laboratory.  Three such files are included on the distribution:
        MORGAN.UPD, RAY.UPD, and ROSS.UPD.  Had you tried to select
        ROSS.UPD or RAY.UPD, PC-GenoGraphics would have balked.  The
        reason is that ROSS.UPD updates the file ECOLI_H1 while RAY.UPD
        updates the file COLORS.  In the present tutorial, we have
        chosen to create a totally new update file, MYADDS.UPD rather
        than continuing our addenda onto the file MORGAN.UPD.  For
        reasons which will become apparent later, it is also not allowed
        for any new update file to have the same first name (AATEST.UPD
        in the present case) as the datafile, *.ALL, which it updates.

           With our update file open, let us now perform some simulated
        scholarship by appending annotations onto object m1o3.  Navigate
        the cursor to point at object m1o3, and click the cursor on it.
        This brings up the data-box attached to the object, and at the
        highest level of saliency there is only one comment line which
        is about gold.  Next, go down in saliency:

                >
                >

        to see the full text-block of a few lines.  Now to start
        adding annotation, we hit the "Add" button

                A

        and a new class of data-entry box will appear.  At this
        point, you can start entering any style of text material which
        you wish to addend to the text-block in the data-box attached to
        object m1o3.  A rude text editor with simple arrowkey
        navigation, etc. is enabled; the main limitation is that the
        total amount of information added before hitting "Ok" cannot
        exceed about 3000 characters.  Notice that a suggested first
        line containing a datestamp has been tentatively included into
        your update.  Rather than just entering text arbitrarily, WE
        STRONGLY RECOMMEND THAT YOU ADOPT THE FOLLOWING SORT OF PROTOCOL
        WHEN ANNOTATING FILES:

                1.)All users who wish to annotate a given file share the
                SAME *.UPD file.

                2.)The timestamp line ALWAYS be included into every
                update (even empty updates.)

                3.)The next line (after the timestamp line) contains an
                identification AT THE HIGHEST LEVEL OF SALIENCY of who
                is doing the update, like this:

                        >RAY_UPDATE

                or

                        >MORGAN_UPDATE

                and that subsequent lines obey the full ">" protocol to
                identify their saliency and contain keywords that start
                with the name of the scholar and include other key
                information afterwards:

                        >>RAY_GOLD_TACTICS
                         This price seems kind of low relative to
                         Singapore on the same day
                        >>RAY_GOLD_STRATEGY
                         The announcement of new supplies in the long
                         term from Africa, suggest a moderate long-term
                         downtrend.   See object m3o1 on map PEP for
                         further information.

                Notice how the continuation lines for each of the two
                headlines are indented with leading " " characters.
                Notice how each headline starts with a number of ">"
                characters to indicate the saliency of it and its
                continuation lines.  Notice how the keywords
                RAY_GOLD_TACTICS and RAY_GOLD_STRATEGY are chosen to
                facilitate search in an orderly way.

                4.)Bitter experience has shown to us that larding the
                text-blocks in data-boxes with blank lines with the
                intention of aiding readability is actually quite
                CONTERPRODUCTIVE.

           Add some set of comments like the above to object m1o3.  When
        done, close the data entry box.  This requires using the mouse
        (or <Shift> and arrowkeys) to navigate to the "okay" button and
        click a mouse button (or hit <Shift-Enter>).  The visual screen
        should return and now, when you click the mouse on object m1o3,
        you should see (when the saliency level has been chosen
        correctly) the information which you just addended apparently
        placed democratically at the end of the visual text-block.
        Further annotations of this kind will accumulate with their own
        date stamps, etc. right below the ones you have just added.

           In fact, in the present release of PC-GenoGraphics, the
        annotations addended as above are not elligible to be found by
        "Select Contents" commands.  This limitation will, no doubt, be
        overcome in later releases, but at present you will need to
        "re-compile" your updates to create a new *.ALL file in which
        your annotations are completely democratically included.  To
        perform this procedure, you need to exit PC-GenoGraphics,
        returning to the DOS prompt by hitting <Esc> when no windows
        obtrude on the graphic map display or by navigating the cursor
        into the box in the upper right corner and clicking the cursor.
        Compiling the corrections is a two-step process requiring
        invoking "GGALLZWD first_name_of_ALL_file
        first_name_of_UPD_file", followed by invoking "GGTRANS
        first_name_of_UPD_file".

           In our our tutorial example this is precisely:

        GGALLZWD AATEST MYADDS<Enter>
        GGTRANS MYADDS<Enter>

           After this computation, which ranges from non-trivial to
        truly massive in size, two new files will emerge:  MYADD.ZWD
        from the first step, and MYADD.ALL from the second.  ONE THING
        TO KEEP IN MIND IS THAT THE LARGE COMPUTATIONS IMPLIED BY THIS
        ACTIVITY REQUIRE TRULY SIGNIFICANT AMOUNTS OF DISK SPACE.  You
        can erase the no longer needed intermediate file MYADD.ZWD
        without any loss at this point if disk space is a problem.

           The next invocation of GG will contain a new file to select
        from on just the same footing as our original AATEST, namely,
        MYADD.  Invoke GG now, and at the presetation of the input file
        name menu, navigate the cursor to point at the name MYADD.ALL in
        the scrollable list, and click the cursor.  The name MYADD.ALL
        should appear in the small box at the top which indicates that
        it will be loaded when the "Go" button is pressed.  Before the
        visual display comes up, another menu is offered, this time
        allowing you to select from one of the *.UPD update files from
        which to draw addenda.  You will see MYADD.UPD among this list.
        Of course, that is not even an elligible selection at this point
        because MYADD.UPD applies updates to AATEST.ALL, and you have
        selected the file MYADD.ALL to display.  In fact no *.UPD is at
        present elligible although you could always open a totally new
        update file (using the button "Anew") for upcoming annotations
        to MYADD.ALL.

           Whether or not you choose to open another update file, the
        new comments which we appended above are now accessible by
        Select on Contents just like any other comment.  Thus the
        commands:

                S
                C
                RAY_GOLD<Enter>
                <Enter>
                A

        will select the object m1o3, and if you open its data-box,
        the "Find" button will function like normal for the new
        comments:

                >
                >
                F
                AFRICA<Enter>

        will locate the strategic addendum entered above.














           CHAPTER 8 USING PC-GenoGraphics AS A VISUAL REASONING TOOL
        FOR SEQUENCE

           In this chapter, the user will learn how to create new maps
        based on the results of sequence queries.  This may be the most
        elaborate capability of PC-GenoGraphics and will require study
        from any user who cares to use it.

           In Chapter 4 we discussed how to search for subsequences
        within the sequence data attached to object's data-boxes.  At
        that point these searches seemed to be rather dead-end.  The
        vertical stripes at the positions of the sequence matches grayed
        out the correct part of the screen, but (apart from being able
        to zoom to the next selected object, etc.) no further recourse
        to these sequence matches was evident in that presentation.  In
        this chapter we will learn how to promote those sequence matches
        to fully viewable and queryable objects on a new map.

           This method of computing new maps has been rather well
        automated, but the user will notice that considerable resources
        are required from your PC.  If you intend doing much of this
        class of action with our larger datasets you will probably need
        a fast 386 or 486 machine with much disk space free in a single
        partition (about 30 MB for E.coli and about 150MB for the whole
        human genome).  On the other hand working fluently with viri and
        plasmids ought to be possible on pretty much any platform.

           The procedure is simple enough.  Let us re-load AATEST:

                1.)Re-load AATEST with a new update file:

                        CD C:\GG<Enter>           (or whatever)
                        C:<Enter>
                        GG<Enter>
                        <Enter>
                        G
                        A
                        ABTEST<Enter>
                        G

                At this point a screen looking like Fig.8 should appear.

                4.)Open a new map to take the computed locations:

                        Q
                        M
                        <Enter>
                        <Enter>

                3.)Perform a sequence query:

                        Q
                        Q
                        UUUUGUUCAG<Enter>
                        C
                        A

                At this point a couple of vertical gray stripes should
                appear.

                4.)Save these positions into ABTEST.UPD and exit
                PC-GenoGraphics:

                        Q
                        S
                        <Enter>
                        <Enter>
                        <Enter>
                        <Enter>
                        <Esc>

                At this point we have included the description of a
                totally new map containing two new objects which are the
                two sequence matches.  The other information entered
                above will be included strategically into the new map as
                well.

                5.)Create the new files ABTEST.ZWD and ABTEST.ALL

                        GGALLZWD AATEST ABTEST<Enter>
                        GGTRANS ABTEST<Enter>

                6.)And, if desired, erase the unecessary file:

                        ERASE ABTEST.ZWD<Enter>

           Now when you activate PC-GenoGraphics, you will be offered a
        new file to view, ABTEST.ALL.  This file will contain the new
        map.













           CHAPTER 9 WORKING REPETITIVE PROCEDURES FROM SCRIPT FILES

           In this chapter, the user will learn how to invoke
        pre-prepared sets of queries or any other class of GenoGraphics
        commands, for that matter.  In addition, explicit examples will
        be given for the most frequent application of this capability,
        namely, launching large batteries of sequence queries and saving
        their results.

           In many cases, the user may wish to perform a long set of
        PC-GenoGraphics commands which require a good deal of compuation
        time and which accumulate their consequences in a way that the
        user need not be present except to issue the commands.  An ideal
        example of this would be for users who might generate various
        pieces of new sequence data and then want to perform the same
        set of sequence queries to each one of these.  The first step,
        of course, would be to cast the new data into a form suitable
        for use by PC-GenoGraphics (See Chapters 11 and 12).  Then one
        could, in principle, memorize the long list of queries and enter
        each one successively and have PC-GenoGraphics show the results
        interactively.  But, in fact, this job is best done by a script
        of PC-GenoGraphics commands stored once in a file.  This chapter
        will teach you to use such files and to make new ones of your
        own.

           Let us consider a precise set of commands which would create
        a new map called "EcXY" in the currently active update file,
        then search the whole genome for all occurrences of the DNA
        subsequence "GATTCGATTC" on either strand, without overlaps, and
        saves the answers as blue arrows on two different submaps into
        the presently open map in the active update file, finally
        closing that open map.  In general, this takes commands:

                Q
                T
                D
                B
                Q
                M
                MINE<Enter>
                My GAATC query, like my advisor taught me.<Enter>
                Q
                Q
                GAATCGAATC<Enter>
                C
                A
                Q
                S
                rightarrow<Enter>
                solid<Enter>
                blue<Enter>
                FWD<Enter>
                Q
                Q
                GATTCGATTC<Enter>
                C
                A
                Q
                S
                leftarrow<Enter>
                solid<Enter>
                blue<Enter>
                REV<Enter>
                Q
                E


           This whole procedure is probably within reason for human
        entry once or twice, but if you want to do this a thousand
        times, you should prepare a file, *.INP, perhaps MINE.INP in
        this case, with your word-processor which contains the
        following:

Q
T
D
B
Q
M
"MINE"
"My GAATC query, like my advisor taught me."
Q
Q
"GAATCGAATC"
C
A
Q
S
"rightarrow"
"solid"
"blue"
"FWD"
Q
Q
"GATTCGATTC"
C
A
Q
S
"leftarrow"
"solid"
"blue"
"REV"
Q
E

           Now, whenever you want to do this query, you issue:

                F
                I
                MINE.INP<Enter>

        and your PC will writhe around doing your glorious query for
        as long as it takes.  One query like the above takes a minute or
        two on a modern PC, maybe ten minutes or more on an old one.  We
        have provided a few sample queries into our distribution.
        Thousands of such queries have been compiled by David Ghosh of
        NCBI and translated by ourselves and are available seperately
        from us.  A thousand queries is an overnight job, but the whole
        task runs unattended.













           CHAPTER 10 PRINTING THINGS AND DOING OTHER THINGS

           In this chapter we will learn how to make hardcopy of
        PC-GenoGraphics displays and how temporarily to exit
        PC-GenoGraphics to do simple DOS tasks like getting directory
        lists, printing documents, veiwing or deleting files, etc.

           If your PC has a printer attached you may be able to get
        hardcopy images of PC-GenoGraphics screens.  We support only the
        two most common standards of printers, namely the Epson standard
        for dot-matrix printers and the HP-LJ2 standard for laser
        printers.  These two standards cover most but not all PC
        printers.  We regret that supporting other printers is not
        cost-effective.  As of this writing, the street price for an
        excellent laser printer is about $600.  When you have a
        PC-GenoGraphics screen which you want to save, the commands are
        in the "Files" menubar:

                F
                P

           At this point, you will be offered a large pushbutton panel
        to describe your printer.  IF YOU DESCRIBE IT INACCURATELY AND
        CONTINUE THE PRINT ACTIVITY, IT IS POSSIBLE TO HANG YOUR PC,
        REQUIRING A RE-BOOT.  You can exit this menu without printing by
        hitting <Esc>.  You need to know:

                1.)Whether your printer is attached to a serial port or
                to a parallel port:  The parallel port connector to your
                PC is nearly two inches wide, while the serial port
                connector is about 1 inch wide.

                2.)Which port your printer is connected to.  (LPT1 is
                the most common for parallel ports), COM1 or COM2 are
                the most common for serial ports.

                3.)How large you want the image to be.  For Epson
                printers "Low" is the largest, and "Hi" is the smallest.
                For HPLJ-2 printers, "75" is the largest and "300" is
                the smallest.  The actual size of your image is
                determined by a combination of these choices and the
                resolution of your screen.  Try the smallest sizes first
                and escalate while your images fit on a single sheet.

                4.)The orientation of the image on the paper.  Portrait
                mode places the image so that it would appear properly
                oriented in a normal book page.  Landscape mode is
                sideways.

           When these selections have been made, hit the button "Print"
        and the screen should re-paint (in monochrome) and the data are
        transferred to your printer.  This process can take a few
        minutes.  Completion is announced by your screen returning to
        normal.

           The above is the only way PC-GenoGraphics supports hardcopy
        of its graphic screen images when no menus or information boxes
        obtrude into the image.  If you wish, for instance, to get a
        hardcopy of the contents of a data-box, the above procedure is
        no use.  This sort of function is done with the DOS shell
        command in PC-GenoGraphics.  First you must save the desired
        text-block with "dumP" while the data-box is open; you will be
        asked to name the destination file, say you called it
        MYFILE.TXT.  Then quit the data-box ("Quit") and invoke the DOS
        shell and issue the print command:

                F
                S
                PRINT MYFILE.TXT<Enter>

           IT IS ONLY FAIR TO WARN YOU THAT THE PERFORMANCE OF THIS DOS
        SHELL IS SOMEWHAT TRICKY AND MANY POSSIBILITIES WILL HANG YOUR
        MACHINE, REQUIRING A RE-BOOT.  For instance, the above PRINT
        command will probably hang your machine unless you have enabled
        your printer by issuing the command:

                PRINT<Enter>

        to the DOS prompt BEFORE activating PC-GenoGraphics.  The
        general rule in using the DOS shell from PC-GenoGraphics is to
        learn a reptioire of simple tasks that work, and if loosing you
        session would be unacceptable, AVOID USING THE DOS SHELL
        altogether.  It is always possible totally to exit
        PC-GenoGraphics, do your DOS task and re-activate
        PC-GenoGraphics afterwards.

           Last, and not least is the question of setting the text
        labels on the PC-GenoGraphics graphics screen to their most
        legible size.  PC-GenoGraphics allows three different fonts
        (Small, Medium, and Large) to be substituted.  Obviously, using
        too large a font will make some of the labels so big that they
        are unable to fit within their natural location so that they are
        omitted altogether from the display.  Also, some PCs may have
        too little memory available to hold the larger font sizes.  To
        select the medium size font:

                S
                F
                M

           If too much memory is required, PC-GenoGraphics will notify
        you and switch back to the smallest size.













           CHAPTER 11 THE EASY WAY TO START YOUR OWN *.ALL FILE

           If you are in the fortunate position of creating or curating
        completely sequenced objects, we have shortcut the process of
        installing your data into PC-GenoGraphics.  You first prepare
        your sequence data into some file, say MYSEQ.TXT, with a
        word-processor in the format

-----------------------------------------------------------(start)
Arbitrary line of information
ACGTACGTACACGTACGTACACGTACGTACACGTACGTACACGTACGTACACGTACGTACACGTACGTAC
ACGTACGTACACGTACGTACACGTACGTACACGTACGTACACGTACGTACACGTACGTACACGTACGTAC
ACGTACGTACACGTACGTACACGTACGTACACGTACGTACACGTACGTACACGTACGTACACGTACGTAC
ACGTACGTACACGTACGTACACGTACGTACACGTACGTACACGTACGTACACGTACGTACACGTACGTAC
ACGTACGTACACGTACGTACACGTACGTACACGTACGTACACGTACGTACA
-------------------------------------------------------------(end)

           Notice, 60 characters of sequence per line.  Notice, one and
        only one information line at the top.  Notice, NOTHING ELSE
        ALLOWED.  Now the DOS command:

                GGTXTZWD MYSEQ.TXT

        and answer the questions about what type of sequence and its
        strand.  Now you will have a new file, MYSEQ.ZWD which is
        further processed with the DOS command:

                GGTRANS MYSEQ

        which produces the new file MYSEQ.ALL, which is viewable and
        queryable on an equal footing with any other *.ALL file.
        Usually the next thing to do would be to compute some site maps
        with a set of canned query files as in chapter 9.














           CHAPTER 12 GORY DETAILS ABOUT *.ZWD FILES AND *.ALL FILES AND
                              *.UPD FILES

           In this chapter, the user will learn the maximum capabilities
        to manipulate and create totally new or customized maps for
        PC-GenoGraphics.  Very few users will have need for this
        material.

           So far, we have only considered the sorts of operations which
        can be performed starting with some *.ALL file which is included
        into your distribution.  This is not a limitation of
        PC-GenoGraphics.  In fact, you can create arbitrary maps,
        submaps, and objects of your own, either to add into the
        distributed *.ALL files, or else to create your own totally
        independent *.ALL files.  This chapter will outline how to do
        arbitrary compilations which are fully visualizable,
        annotatable, and queryable using PC-GenoGraphics.  This aspect
        of PC-GenoGraphics use is what we call curatorship, and you will
        find that distributing your compiled *.ALL files (together with
        PC-GenoGraphics) allows your correspondents an unparalled access
        to your intellectual work.

           The structure which we have seen previously has concentrated
        upon the *.ALL file (AATEST, for exmaple), perhaps as updated by
        one of its corresponding *.UPD files (MYADDS.UPD, for example).
        Recall that, if fully democratic querying and display capablity
        is required, the combination can be promoted to a new *.all file
        by DOS commands as follows:

                GGALLZWD AATEST MYADDS<Enter>
                GGTRANS MYADDS<Enter>

           Recall, further that the first line above creates an
        intermediate file (MYADDS.ZWD in this example) and that the
        second line creates the final file (MYADDS.ALL in this example).
        In fact, you could have done the above procedure even though no
        *.UPD file exists (you are asked to confirm that this is what
        you want to do first).  Let us do this for AATEST:

                GGALLZWD AATEST BBTEST<Enter>
                Y<Enter>

           This creates the intermediate file (BBTEST.ZWD in this
        example) which corresponds precisely to AATEST.ALL without any
        modifications at all.  Of course one could, at this point, issue
        the GGTRANS BBTEST command and recover a new file, BBTEST.ALL,
        which contains precisely the same information as AATEST.ALL.  In
        fact, what we wish to do is to examine the "intermediate" file,
        BBTEST.ZWD.  If possible, print out a copy of BBTEST.ZWD (6
        pages total) to facilitate this part of our tutorial.

           Describing Objects:

           Skip down to the 10th and 11th lines, a somwhat unauspicious
        looking business:

m1o1 DNA1|s1o1 1 12.000000 44.000000 rightarrow solid cerise
0000000093

           This specifies the visual aspects of the object, labelled
        "m1o1" which is the furthest to the upper left in the dataset
        specified by BBTEST.ZWD, i.e. the right-pointing triangle in Fig
        1. This line breaks into 6 strings of characters (seperated by
        spaces) on the first line and one more string on the next line.
        The first line specifies the visual aspect of this object:

        m1o1         The name which appears on the object when space allows.
        DNA1|s1o1    Another name used only for linking connections to it.
        1            Which submap does this object lie in (NO DECIMAL POINT).
        12.000000    Left coordinate of the object.
        44.000000    Right coordinate.
        rightarrow   Object shape.
        solid        Object fill pattern.
        cerise       Object color.


           Clearly we need to understand what this submap number means
        and what coordinate system is used, but the rest is clear.  The
        second line specifies how many lines below define the data-box
        attached to this object:

        0000000093   93 lines of data follow (NO DECIMAL POINT).

           Needless to say, the next 93 lines (which are dominated by a
        long DNA sequence) define what comes up in this object's
        data-box.  After these lines, the next object description
        starts:

m1o2 DNA1|s1o2 1 44.000000 64.000000 ldee solid blue

           Let us examine the structure of the 93 lines attached to the
        first object:

----------------------------------------------------------(start)
>GOLD
>ACCESSION M69872
ref: EMBO J. 137:183-193(1998)
 Notice obvious Typographical error here
>SEQ_DNA start: 982632 length: 4246
 000982682 GATCCCTCGTTCCGTCTTGTCGGAACTGGATATGATGGTCGGGAAAATCC
 000982732 TCTGTTATCTCTATCTACGCCGGAACGGCTGGCGAATGAGGGGATTTTCA
 000982782 CCCAGCAGGAACTGTACGACGAACTGCTCACCCTGGCCGATGAAGCAAAA
 :
 000986832 GAAACTGGCGATCAGTTCCCGCAGCGTGGCGAACATCATTCGCAAAACCA
 000986882 TTCAGCGCGAGCAGAACCGTATCCGTATGCTCAACCAGGGGTTGCA
>>GOLD_PRICES Selected world gold prices,
 Monday: Hong Kong late: $355.65, off $1.50.
----------------------------------------------------------(end)

           The first four lines are textual data.  Notice that they are
        arranged into three logical comment lines (as discussed in
        CHAPTER 4) with successive keywords "GOLD","ACCESSION", and
        "ref:".  Notice that the last line is a continuation of the
        logical comment line headed by the keyword "ref:".  Notice that
        all three lines are at the same level of saliency (the highest
        level).  In general, textual data are quite unrestricted in
        format and content with the following limitations:

                1.)  Lines are no more than 64 characters wide.

                2.)  Only honest text characters are allowed:

                       A-Z
                       a-z
                       0-9
                       ~!@#$%^&*()_+{}|:"<>?`-=[]\;',./
                       <Space>

                In particular <Tab>, <Enter>, <Ctrl>, and <Alt>
                characters are NOT allowed.

                3.)  New logical comment lines are headed by a
                left-justified line with some number (zero to three,
                inclusive) of ">" characters identifying the saliency,
                followed by a <Space> character and any desired legal
                text characters.

                4.)  Continuation lines within a logical comment line
                are always headed by at least one <Space> character.

           If you obey these restrictions and choose your keywords and
        saliency levels intelligently, your data will be highly
        accessible to any interested scientist.  Sequence data are much
        more restrictive in format:

>SEQ_DNA  start: 982632    length: 4246

           Any given sequence fragment must lie within a single logical
        comment line with a header having one of the following keywords:

            CCW__DNA
            CCW__PEP
            CCW__RNA
            CCW_DNA
            CCW_PEP
            CCW_RNA
            CW__DNA
            CW__PEP
            CW__RNA
            CW_DNA
            CW_PEP
            CW_RNA
            FWD_DNA
            FWD_PEP
            FWD_RNA
            REV_DNA
            REV_PEP
            REV_RNA
            SEQ_DNA
            SEQ_PEP
            SEQ_RNA

           Which define the sequence type (DNA, RNA, or peptide) and its
        strand or its direction of expression.  As usual, a <Space>
        character following the keyword can be followed (within the same
        line) by arbitrary text.  The very next line must be of very
        precisely determined format:  It MUST be lead by a <Space>
        character.  This MUST be followed by a (multi-digit) integer (NO
        DECIMAL POINT) which identifies the sequence position of the
        FIRST element of sequence...if the sequence is "unplaced", this
        value MUST be -1.  This integer MUST be followed by one or two
        <Space> characters.  Then must follow EXACTLY 50 characters of
        sequence, NO <Space> characters, etc. are allowed.  NO OTHER
        INFORMATION can follow at the end of a line.  This same format
        is followed for each subsequent line of sequence data until the
        last one which, of course, need not have the full 50 characters
        of sequence NO BLANK LINES can intervene.  After the last line
        of sequence, the next line must be the headline for a new
        logical comment line or else it must start the description of
        the next object...NO BLANK LINES can intervene.

           Last, and not least, the linecount (93 lines of data in this
        example), MUST be correct!

           Notice that this sequence data entry is followed, in our
        example, by another logical comment line (with two continuation
        lines)

>>GOLD_PRICES
   Selected world gold prices, Monday:
   Hong Kong late: $355.65, off $1.50.



           Describing Maps and Submaps:

           Now scan upwards to lines 5 through 9 of your file BBTEST.ZWD:

MAPBEGIN
DNA1 0.000000 100.000000 1 black FALSE
Full name of map 1
DNA
0000000000

           These lines describe the first map (of three) in the file
        BBTEST.ZWD.  A new map description is ALWAYS introduced by the
        MAPBEGIN line.  The next line contains 6 strings (seperated by
        <Space> characters), in this example, these are:

DNA1         Map name (Appears to the left when possible)
0.000000     Lowest coordinate on this map
100.000000   Highest coordinate
1            Number of Submaps (NO DECIMAL POINT)
black        Outline color for all objects on this map (MUST BE "black")
FALSE        Hint about display,
               TRUE  means "This map can be vertically compressed"
               FALSE means "This map should not be vertically compressed"

           The next line:

Full name of map 1

        which can contain no more than 60 characters is an
        abbreviated description of the map which is used to allow the
        end-user to decide whether this map is of interest or not.  This
        is followed, in this example by two more lines, one indicating
        the name of the (only) submap on this map, and the next telling
        how many lines follow giving the long description of the map (in
        this case, none).

DNA
0000000000

           A more general example of map description is for the third
        map in the file BBTEST.ZWD:


MAPBEGIN
PEP 0 100 3 black FALSE
Full name of map of peptide data
PEP1
PEP2
PEP3
1
Miscellaneous map information

           Notice here that three submaps are called out, and that three
        lines follow the seond line, one to name each of these submaps,
        and that there is one line in the longeer description of the
        map.

           Another totally crucial line tells when the description of
        one map actually ends:

MAPEND

           Your *.ZWD file will never succeed until there is one MAPEND
        line for each MAPBEGIN line!


           Describing Connections Between Objects:

           Last, and not least, you have the capability to describe
        connections between objects on various maps and submaps.  The
        section in BBTEST.ZWD which does this is:

CONNBEGIN
DNA1   DNA1|s1o1   PEP    PEP|s2o1   0
DNA1   DNA1|s1o1   PEP    PEP|s1o2   0
RNA1   RNA1|s1o3   DNA1   DNA1|s1o2  1
PEP    PEP|s3o3    DNA1   DNA1|s1o3  2
RNA1   RNA1|s1o1   DNA1   DNA1|s1o4  3
PEP    PEP|s3o2    DNA1   DNA1|s1o5  0
CONNEND

           Notice that this section starts with CONNBEGIN and ends with
        CONNEND.  Your *.ZWD file CANNOT BE INTERPRETED IF IT DOES NOT
        HAVE THESE, even if no connections are described between.  The
        present example describes 6 connections.  Notice that a
        connection connects two objects and that the description has two
        strings for each object.  These two strings are somewhat
        redundant, the first is the name of the map containing the
        object, such as:

                PEP

        while the second is made up from this same map name, a "|"
        character, an "s" character, an integer (NO DECIMAL POINT)
        identifying which of that map's submaps contains this object, an
        "o" character, and another integer identifying which object on
        the submap is desired:

                PEP|s2o1

           In addition, each connection has another string which is an
        integer whose value does not matter at present.  If you leave
        this out, however, your *.ZWD file will never succeed!


           This extremely rigid format definition reflects the original
        intent of the *.ZWD file, namely, to facilitate automatic
        transfer of data from orderly databases to PC-GenoGraphics.  For
        small sized datasets, the motivated scientist who is willing to
        hew to this format will be able to create arbitrary maps, etc.
        using a standard word-processor or text editor to create the
        *.ZWD file.  We are preparing visual tools which allow the user
        to create maps, etc. without any knowledge of the underlying
        file formats, etc.


        FIGURE 1   Screen Requesting Identification of Best Graphics
                   Mode.   This screen comes up from GGSETUP after
                   initial questions identifying the source and target
                   disk names and directories.  The correct response is
                   to type

                                D<Enter>

                   to allow you to investigate another possible description
                   of the video adapter in your PC.   You should keep
                   trying new video adapters until you find the one
                   which works best with your machine.


        FIGURE 2   Screen with Candidate Video Adapter Descriptions.
                   Presented by GGSETUP after you have identified the
                   video mode to be investigated.  Use the up and down
                   arrow keys to highlight your next guess at the identity
                   of your video adapter.   Hit

                                <Enter>

                   to investigate the applicability of the highlighted
                   mode.


        FIGURE 3   Screen Showing Performance of Video Mode under GGSETUP
                   Investigation.  Your screen should look very much like
                   this if you are investigating some video mode which
                   operates successfully on your machine.   You should
                   move the cursor into and out of the blinking square
                   and the legend at the top should change.   Black/white
                   video modes will not produce the square of 16 brightly
                   colored blocks.   If the blocks are produced, there
                   should be 16 distinct colors visible.   YOU CAN ALWAYS
                   EXIT FROM THIS SCREEN (EVEN IF THE DISPLAY IS INACCURATE)
                   BY HITTING

                                <Ctrl-C>


        FIGURE 4   Second Screen Showing Performance of Video Mode under
                   GGSETUP Investigation.   These tiles are drawn shortly

                   after the previous screen is exited.   This screen will
                   spontaneously disappear after a few seconds, and the
                   user will be asked to confirm whether the video mode
                   under investigation is performing adequately.   If you
                   answer 'N', the screen depicted in Figure 1 will
                   reappear and you can make another trial.   IF YOU ANSWER
                   'Y', INSTALLATION WILL PROCEED WITH THE PRESUMPTION THAT
                   THE VIDEO MODE WHICH YOU JUST SAW IS THE BEST ONE.


        FIGURE 5   Full-screen Display of E.coli dataset distributed with
                   some copies of PC-GenoGraphics.


        FIGURE 6   Full-screen Display of GDB human dataset distributed with
                   some copies of PC-GenoGraphics.


        FIGURE 7   Full-screen Display of HIV virus dataset distributed with
                   some copies of PC-GenoGraphics.


        FIGURE 8   Full-screen Display of AATEST dataset distributed with
                   some copies of PC-GenoGraphics.   This dataset is of
                   no biological interest, but should be studied thoroughly
                   to understand PC-GenoGraphics.


        FIGURE 9   Information Databox Attached to Object m3o1 in AATEST.
                   To fetch this databox you should position the cursor
                   over object m3o1, and click any mouse button.   If you
                   do not have a mouse attached to your PC, you can navigate
                   the cursor by holding <Shift> and pressing the arrow
                   keys, the equivalent of clicking a mouse button is
                   <Shift-Enter>.

