Features & Details
Paracel BLAST Overview
Researchers are now regularly investigating the largest and most complex genome sequences on the planet. As the size and complexity of sequences increase, the large database and high throughput requirements of performing even a simple BLAST sequence similarity search can dramatically escalate.
The best solution to these increased computational demands is the use of high performance parallel computing to split large, complex tasks across multiple processors. Paracel BLAST is the only commercially-supported BLAST solution for parallel computing. Paracel recoded the original NCBI BLAST source code from the ground up, eliminating important performance bottlenecks and optimizing it for use on parallel platforms.
- Work with queries 100 times larger than before
Paracel BLAST incorporates a proprietary query chopping and a results reassembly process that allow you to search chromosome-sized queries in parallel without loss of accuracy. - Search large databases without loss of performance
Searches that previously failed because of their sheer size can be executed rapidly with Paracel BLAST. Researchers can routinely complete searches that would be prohibitively long using traditional methods. - Automatic parallelization, queuing, and scheduling
Paracel BLAST automatically executes your searches in parallel on multiple processors, giving you the highest performance without complex administration.
Paracel BLAST scales up to run very rapidly on multiple CPU systems. A 32-CPU linux cluster runs a large database search in several hours that would take days running NCBI code on an ordinary single processor system. Click here for additional benchmarks.
Download PDF White Papers:
Bioinformatics Clusters In Action
Enhancing BLAST Performance by Using the Paracel Filtering Package
Download the Adobe PDF Reader
Enhancements in Parallelism
One of the key reasons why Paracel BLAST is far superior to other solutions is its ability to run incredibly large sequences on massively-parallel systems. Paracel BLAST's parallel-computing enhancements include:
- Incorporates a custom scheduler that is tightly coupled with the application to promote job parallelism and dynamically control query parallelism.
- Performs splitting and merging of results internally, alleviating the need for external parsing, in addition to only generating a report once.
- Splits the database dynamically, saving the user the time of manually splitting and assigning processes at the time of database loading, and re-splitting when the number of processors changes.
- Keeps track of which nodes have which sections of the database loaded into their RAM, allowing the integrated scheduler to choose the best node to perform the given task.
- Incorporates query chopping to segment very large query sequences and allow separate processors to search individual pieces.
- Uses a proprietary query packing algorithm to search multiple queries in one pass of the database, generating the same results, in the same report formats, as if queries hadn't been packed.
- Partitions jobs over multiple processors. Paracel BLAST tends to over-partition so that, instead of beoming idle, each node has a new task to work on immediately after finishing its initial task.
Integrated Filtering using the Paracel Filtering Package
The presence of contaminants and repeats can significantly skew search results. In addition to vector information and other contaminants, 45-50% of the Human Genome consists of repeats that create false-positive hits in the results. Traditionally, these results were manually sifted through to find desired results.
Thankfully, this is no longer the case; PFP is used to mask genome-wide repeats and both high- & low-complexity regions from a BLAST search. PFP filters contaminants and repetitive regions from query sequences, which eliminates undesired hits against said contaminants and repetitive regions from the BLAST search results. It also allows searches to succeed that would have otherwise failed if the repetitive regions had not been removed. Additionally, PFP's integration into Paracel BLAST condenses the two-step process of cleaning the query and then using the cleaned query ina search into a one-step, automatic process.
The user will also find PFP to be incredibly customizable. Users may choose to either filter, mask, or trim contaminants, repeats, and/or low-complexity regions.
PFP is integrated with Paracel BLAST such that PFP can be automatically invoked as part of the BLAST search. PFP includes several parameter files that have been pre-configured for various commonly-researched organisms. Users can also create their own parameter files to perform the desired filtering and masking.