## Main window

This window is the one you see when you start the graphical interface. The
window is divided into three parts. The top frame allows you to select
the input file in the **Input filename** textbox and specify the
transformation you want to perform on the values in the input file before
starting the clustering in the **Transformation** combo box. Note that
the transformation is selected automatically if the extension of your
input file is one of the known ones (see Supported input formats):
E-values obtained from FASTA files and files ending in `.blast` will be
transformed using a non-linear similarity transformation, while the similarity
values in `.sim` will be left intact.

Before the analysis, a symmetrisation process will be performed. The symmetrisation
process takes all A-B pairs of elements, checks the similarity function for A-B
and for B-A, and keeps either the higher or the lower value. The symmetrisation
method can be set using the **Symmetrisation** combo box. It is safe to use the
max(A, B) similarity method in most of the cases.

Warning

There is a common catch associated with the min(A, B) symmetrisation method. If you don't specify a similarity value for a pair of elements in the input file, it is assumed to be zero. Hence, if there exists a pair of elements A and B where a similarity value is specified for A-B but not for B-A, the value will be ignored as the missing one is treated as zero, which is smaller than any positive weight.

The middle frame in the main window is used to select the algorithm you
wish to run and also tune its behaviour. The algorithm can be selected using
the **Clustering algorithm** combo box. The **Number of clusters** combo box
allows you to tune how the algorithm decides the number of clusters. The
following methods are available (but not all clustering algorithms support
all the methods, so some may be hidden depending on your algorithm selection):

- Automatic
- The algorithm will try to select the number of clusters automatically. The exact details of the process differ for each algorithm. Please refer to the Clustering algorithms section for more details.
- Exactly
- The algorithm will try to create exactly
kclusters, wherekis a number given by you in a textbox that appears when you select this option in theNumber of clusterscombobox. Note that sometimes the requested cluster count cannot be satisfied; e.g., you cannot create 5 clusters when your original similarity dataset consists of more than 5 connected components. In this case, the algorithm will try to get as close to the desired number of clusters as possible.- At most
- This method is similar to
Automatic, but it imposes an upper bound on the number of clusters. It is highly advised to use this option when you run the spectral clustering algorithm on large datasets (larger than a thousand sequences or so), as it saves time and resources: the algorithm will calculate only the topkeigenvalues and eigenvectors, which can be done more efficiently for sparse input matrices.- Manually
- This method is supported only for the spectral clustering algorithm: it will calculate the top 100 eigenvalues and eigenvectors and lets you select the number of clusters based on the eigenvalues and eigengaps. Due to the fact that only the top 100 eigenvalues are computed, this method is suitable for datasets where you don't expect more than a hundred clusters. See the section on the Cluster count selector window for more details.

You can also tweak the advanced parameters of each algorithm in the
Advanced algorithm parameters window that is shown after clicking on the
**Parameters...** button. Refer to the Clustering algorithms section
for more details on the parameter names and values you can use there.

The computation can be started by clicking on the **Start** button. A progress
bar at the bottom of the window will show you the approximate progress of the
computation. Note that it is not possible to estimate the remaining time for
eigenvector calculations accurately, hence the progress bar will not move while
the eigenvectors are calculated for the spectral clustering process. It will,
however, display the exact progress when performing a connected component
analysis with automatic cluster count selection. When the calculation is
finished, the result viewer window will be shown; please refer to the Result
viewer section for more details.

The **Show log** menu item in the **Window** menu can be used to display
diagnostics messages that may help you check what is going on behind the
scenes. In general, you shouldn't need this window unless you suspect something
is wrong with SCPS and you wish to file a bug report.