Frequently Asked Questions

What is PSPACE?
The Protein Structure Space Explorer or PSPACE is a web application for user-driven exploration of the protein universe. More detail is available on the about page and in the tutorials.
What is a MPSS?
A Map of Protein Structure Space (MPSS), or Structure Space Maps (SSM) is a visual representation of the simultaneous interrelationships (structural distances) between large groups of proteins. In PSPACE, the primary UI element is a MPSS displayed using an interactive 3D scatter plot.
What are the requirements to use PSPACE?
At minimum, a web browser which supports JavaScript, Adobe Flash (version 10 or higher) and a Java plugin. PSPACE was tested with Google Chrome and Mozilla Firefox.
Do you recommend a particular browser?
We recommend Google Chrome be used with PSPACE. In our testing, the performance of the user interface is much higher in Chrome than in any other browser.
Is there a way to speed up the UI?
The performance of the UI is determined by the client device (CPU speed) and the implementation of the Flash plugin. The plugin provided with Chrome appears to offer the best performance.
How can additional (non-reference) structures be specified?
Users of PSPACE can specify structures during job submission ("upload custom data"). Up to two files may be provided, which may be ZIP archives of multiple PDB-format files (.pdb, .ent), individual PDB files (.pdb, .ent, .ent.gz), or text files containing comma separated lists of 4- or 5-character structure codes (.csv). Chains from different files are labeled with distinct group IDs in the PSPACE UI. In a CSV file, any lines starting with # are treated as comments and ignored.
What limitations are there on user-supplied structures?
Up to 20 additional (non-reference) chains can be aligned and mapped. If a PDB ID without a chain letter is specified or a structure file with multiple chains uploaded, every chain (of the first model) will be aligned and count towards the limit of 20. Chains must have unique 5-character PDB codes, and are removed from the list if they share a PDB ID with the reference set. If the ID is not in the reference set, but is in the PDB itself, then the structure is referenced from a mirror of the PDB. Chains are also removed when the structure file fails to validate with the BioJava structure file parser.
How long does it take to execute jobs with user-supplied structures?

Short answer: The duration t of a job with N reference structures and M user structures can be estimated using this formula:

t = ( N*M + M*(M - 1)/2 ) * 1 s

Long answer: Structure alignments are computationally expensive. Dali uses Monte Carlo matrix search methods to search protein interatomic distance matrices for similar submatrices, while CE and FATCAT employ dynamic programming. The complexity is dependent on the lengths of the proteins being aligned, but the average rate for PDBSelect25 is about 1 pairwise alignment per second. For a reference set containing 4,000 chains, this works out to a total time of 66.7 minutes per user structure supplied. If the maximum of 20 structures are supplied, an additional 3.17 minutes are required to align the user structures against themselves. Alignments using Dali typically complete more quickly than those using FATCAT or CE, but only because Dali aligns only the 1% most similar pairs based on sequence and secondary structure considerations.
Where can I find the case studies from the paper?
The MPSS which are examined at a high level are pre-calculated, "default" MPSS which can be seen by submitting jobs without user specified structures. Three case studies using specific chains outside of a reference set are also presented:
What is the difference between a raw score and a probability score?
"Raw scores" are the unique measures of pairwise similarity given as output by each method, while probability scores attempt to represent the statistical significance of a result. The Z-score is an example of such a score. The precise definitions of the raw and probability scores used by each of the aligners can be found in the appropriate references.
Which score type should I use (raw similarity or probability score)?
Probability scores take the statistical significance of a result into account, but typically also use a weight determined by the length of protein chains. The MPSS produced using either score type are visually similar in general and to date the full implications of the measure used for low-dimensional projection of inter-protein distances are unknown.
Which alignment method should I use (Dali, FATCAT or CE)?
We recommend FATCAT for most use. Although each alignment method provides a different perspective on protein space. MPSS using FATCAT perform very well at the superfamily level. Furthermore, while Dali jobs complete more quickly due to an internal similarity cutoff, MPSS which use FATCAT are much less crowded in terms of the placement of the majority of structures and as such are more well suited to visual analysis. CE is a completely rigid aligner and does not seem to capture "twilight zone" similarities very well, although the noise reduction from low-dimensional projection improves significantly improves performance in this regard.
Which MDS method should I use (classical or stress majorization)?
There are extreme qualitative differences between MPSS produced using classical MDS and those using stress majorization. The former have been studied in significantly more detail and are known to perform highly for classification of proteins at the superfamily level. Stress majorization, however, results in an objectively smaller total difference between inter-protein distances in the MPSS and the original pairwise data.
What dimensionalities are supported for low-dimensional projection?
The PSPACE UI can display projections in 2 or 3 dimensions, but has been particularly optimized for MPSS in 3 dimensions. In general, projections using fewer than 2 dimensions discard too much useful information, while higher dimensions 1) contain relatively small amounts of information and 2) are non-trivial to visualize and explore.
Top