C1_long_support_bases - the number of nucleotides in the long support of the preceding column counted and reported in this column. The new release is dependent on Java1.8. Please use this version if possible. C1_long_support - this number counts the number of "long" soft-clips (as specified during the run of Socrates) that support the cluster C1. This number is at least 1, as there would not be a cluster without a realigned soft-clip. where PROG is one of BamStratifier, RealignmentBAM, RealignmentClustering and AnnotatePairedClusters. The rearrangement predictor is the main part of the algorithm. It clusters the split reads, and then pairs clusters to form the output. There are two files generated: the paired and unpiared outputs. The paired output contains the best results, while the unpaired contains any soft-clips that have realigned anywhere else in the process re-alignments stage of the algorithm. The unpaired results are included for completeness as they can contain some useful information, but overall this output is comprised of false positives due to mapping errors and other artefacts. The paired output contains various columns of information that describe the location of the break point and the level of support: socrates= dirname $0 / libs=${socrates}lib/sam-1.77.jar:${socrates}lib/commons-lang3-3.1.jar:${socrates}lib/commons-cli-1.2.jar:${socrates}lib/picard-1.85.jar:${socrates}lib/snappy-java-1.0.3-rc3.jar java -Xmx4g -cp ${socrates}bin:$libs net.wehi.socrates.[PROG] [OPTIONS]'. Length threshold of long soft-clip [default: 25 (bp)]. A Python driver script - "Socrates", for retaining cross-platform compatibility of Java, is included to, is included to execute the programs. A reasonable threshold helps removing low quality soft clips that could lead to erroneous breakpoint calls. Studies have shown that longer the sequences the more likely they can be uniquely placed in a genome. In an early study, it is demonstrated that while percentage of unique mapping improves with increasing read length, the rate of gain di- minishes past 25nt ( 80% at 25nt and 90% at 40nt). If value for this parameter is too low, many non-unique soft clips will be produced and impact on system requirement, processing time and reliability of results downstream. On the other hand, too high the value results in low number of long soft clips and hence risk of missing breakpoints. Percent identity: We often observe higher-than-expected base mismatch rate for reads in satellite, centromeric and telomeric regions where correctness of alignments can be con- tentious. Minimum percent identity threshold, which is equivalent to maximum allowable mismatch rate, can greatly reduce these erroneous alignments. To find out if your web browser supports JavaScript or to enable JavaScript, see web browser help. To use Socrates without the driver script, Java class path needs to be set:. Maximum realignment support to search for short SC cluster [default: 30]. C1_anchor - a second locus describing the anchor region of the cluster. The anchor is defined by the reads that were mapped in the initial alignments and which soft-clips formed the earlier columns. The postion is the consensus of positions before the first soft-clipped base. Micro-homology: Xbp homology found! (XXX): the two joined regions are identical for X bases across the break. Therefore the true location of the breakpoint is only known within those boundaries. Use only proper pair 5' SC and anomalous pair 3' SC [default: false]. -n, --normal Socrates paired breakpoint calls for normal sample. C1_anchor - a second locus describing the anchor region of the cluster. The anchor is defined by the reads that were mapped in the initial alignments and which soft-clips formed the earlier columns. The postion is the consensus of positions before the first soft-clipped base. Micro-homology: Xbp homology found! (XXX): the two joined regions are identical for X bases across the break. Therefore the true location of the breakpoint is only known within those boundaries. Use only proper pair 5' SC and anomalous pair 3' SC [default: false]. -n, --normal Socrates paired breakpoint calls for normal sample. C1_anchor_dir - analogous to above this field describes whether the anchor region is upstream ("+") or downstream ("-") of the breakpoint in the reference. This program merges soft clip re-alignment BAM file with anchor alignment information. The program has built-in sorting mechanism and therefore can take unsorted, raw re- alignment output from aligner. While the program accepts input BAM file from standard input channel, this requires more system memory for buffering. short SC cluster: In most experiments the most prevalent type, yet the least trustworthy. Only one side of the breakpoint is supported by realiged split reads, the other by short bits of soft-clipped sequence only. This sort of cluster pairing makes Socrates very sensitive, but introduces false positives. C1_short_support_max_len - the length of the longest SC in this support group. Size of flank for promiscuity filter [default: 50 (bp)]. Socrates: Identification of genomic rearrangements in tumour genomes by re-aligning soft clipped reads. With help from teachers, Socratic brings you visual explanations of important concepts. -b, --base-quality Minimum average base quality score of soft clipped sequence [default: 5]. output_bam Output re-alignment BAM with anchor info merged. Minimum realignment percent identity to reference [default: 95 (%)]. Higher mapping quality, while may not guarantee unique align- ment, is sufficient to exclude multi-mapping anchor alignments from further analysis for Bowtie2 and BWA aligned reads. Minimum alignment percent identity to reference [default: 95 (%)]. .



