基因数据处理52之cs-bwamem集群版运行(1千万条100bp的reads)
发布时间:2021-03-10 00:38:39 所属栏目:大数据 来源:网络整理
导读:1.art生成模拟序列: art_illumina -ss HS20 -i GRCH38BWAindex/GRCH38chr1L3556522 . fna -l 100 -c 10000000 -o g38L100c10000000Nhs20 2.上传到hdfs,制定partition数 spark-submit --class cs .ucla .edu .bwaspark .BWAMEMSpark --master spark://maste
副标题[/!--empirenews.page--]
1.art生成模拟序列: art_illumina -ss HS20 -i GRCH38BWAindex/GRCH38chr1L3556522.fna -l 100 -c 10000000 -o g38L100c10000000Nhs20 2.上传到hdfs,制定partition数 spark-submit --class cs.ucla.edu.bwaspark.BWAMEMSpark --master spark://masterIP:7077 /home/hadoop/xubo/tools/cloud-scale-bwamem-0.2.1/target/cloud-scale-bwamem-0.2.0-assembly.jar upload-fastq 0 21 g38L100c10000000Nhs20.fq /xubo/data/alignment/cs-bwamem/fastq/g38L100c10000000Nhs20.fastq masterIP为真实IP 3.使用cs-bwamem进行匹配 spark-submit --executor-memory 4g --class cs.ucla.edu.bwaspark.BWAMEMSpark --total-executor-cores 20 --master spark://masterIP:7077 --conf spark.driver.host=masterIP --conf spark.driver.cores=4 --conf spark.driver.maxResultSize=4g --conf spark.storage.memoryFraction=0.7 --conf spark.akka.threads=2 --conf spark.akka.frameSize=1024 /home/hadoop/xubo/tools/cloud-scale-bwamem-0.2.1/target/cloud-scale-bwamem-0.2.0-assembly.jar cs-bwamem -bfn 1 -bPSW 1 -sbatch 10 -bPSWJNI 1 -oChoice 2 -oPath hdfs://masterIP:9000/xubo/data/alignment/cs-bwamem/fastq/g38L100c10000000Nhs20.adam -localRef 1 -isSWExtBatched 1 0 GRCH38BWAindex/GRCH38chr1L3556522.fasta /xubo/data/alignment/cs-bwamem/fastq/g38L100c10000000Nhs20.fastq 4.将数据进行合并 spark-submit --executor-memory 4g --class cs.ucla.edu.bwaspark.BWAMEMSpark --total-executor-cores 4 --master spark://masterIP:7077 --conf spark.driver.host=masterIP --conf spark.driver.cores=4 --conf spark.driver.maxResultSize=6g --conf spark.storage.memoryFraction=0.7 --conf spark.akka.threads=2 --conf spark.akka.frameSize=1024 /home/hadoop/xubo/tools/cloud-scale-bwamem-0.2.1/target/cloud-scale-bwamem-0.2.0-assembly.jar merge hdfs://masterIP:9000 /xubo/data/alignment/cs-bwamem/fastq/g38L100c10000000Nhs20.adam /xubo/data/alignment/cs-bwamem/fastq/g38L100c10000000Nhs20.merge.adam 5.统计数量 分析:distinct显示没有重复的数据 参考: https://github.com/ytchen0323/cloud-scale-bwamem/blob/453a4178cd6c3fb3a0fe99df816e6cd7df521b95/src/main/scala/cs/ucla/edu/bwaspark/Usage.scala 附录: +--------------------+---------+---------+----+------------+--------------------+--------------------+-----+---------------------+-------------------+----------+----------+----------+----------+-----------+------------+-------------------------+-------------+------------------+------------------+----------------+------------------+----------------------+--------------------+--------+--------------------+---------------+---------------------------+----------------------+-----------------------+--------------------+----------------------+------------------+------------------------------------+-------------------+-----------------------+-----------------+------------------+----------------+----------+ | contig| start| end|mapq| readName| sequence| qual|cigar|basesTrimmedFromStart|basesTrimmedFromEnd|readPaired|properPair|readMapped|mateMapped|firstOfPair|secondOfPair|failedVendorQualityChecks|duplicateRead|readNegativeStrand|mateNegativeStrand|primaryAlignment|secondaryAlignment|supplementaryAlignment|mismatchingPositions|origQual| attributes|recordGroupName|recordGroupSequencingCenter|recordGroupDescription|recordGroupRunDateEpoch|recordGroupFlowOrder|recordGroupKeySequence|recordGroupLibrary|recordGroupPredictedMedianInsertSize|recordGroupPlatform|recordGroupPlatformUnit|recordGroupSample|mateAlignmentStart|mateAlignmentEnd|mateContig| +--------------------+---------+---------+----+------------+--------------------+--------------------+-----+---------------------+-------------------+----------+----------+----------+----------+-----------+------------+-------------------------+-------------+------------------+------------------+----------------+------------------+----------------------+--------------------+--------+--------------------+---------------+---------------------------+----------------------+-----------------------+--------------------+----------------------+------------------+------------------------------------+-------------------+-----------------------+-----------------+------------------+----------------+----------+ |[chr1,248956422,n...| 14809719| 14809819| 60|chr1-6811129|CATGTACAGTGCTCGCC...|CEDA8DA@E<DDED.DE...| 100M| 0| 0| false| false| true| false| false| false| false| false| true| false| true| false| false| 34T46C18| null|NM:i:2 AS:i:90 XS...| foo| null| null| null| null| null| null| null| null| null| bar| null| null| null| |[chr1,n...| 80256345| 80256445| 60|chr1-6811128|AGTAGGCTTGGAGAAGA...|D?BCED>D=@DDEC9CF...| 100M| 0| 0| false| false| true| false| false| false| false| false| true| false| true| false| false| 100| null|NM:i:0 AS:i:100 X...| foo| null| null| null| null| null| null| null| null| null| bar| null| null| null| |[chr1,n...|173354132|173354232| 60|chr1-6811126|AGCACAATCAGAAATAA...|?@@FFDFF2?HHHJ3JJ...| 100M| 0| 0| false| false| true| false| false| false| false| false| false| false| true| false| false| 29T70| null|NM:i:1 AS:i:95 XS...| foo| null| null| null| null| null| null| null| null| null| bar| null| null| null| |[chr1,n...| 67191506| 67191606| 60|chr1-6811125|CATTCCCATTGGCTTTT...|?C@F=DEFHHHHHFHIH...| 100M| 0| 0| false| false| true| false| false| false| false| false| false| false| true| false| false| 100| null|NM:i:0 AS:i:100 X...| foo| null| null| null| null| null| null| null| null| null| bar| null| null| null| |[chr1,n...|143209655|143209755| 5|chr1-6811124|ATTCCATTTGATGACAA...|D5>D:DBCCAC5>,(AC...| 100M| 0| 0| false| false| true| false| false| false| false| false| true| false| true| false| false| 31A44C23| null|NM:i:2 AS:i:90 XS...| foo| null| null| null| null| null| null| null| null| null| bar| null| null| null| |[chr1,n...|125183587|125183687| 0|chr1-6811124|ATTCCATTTGATGACAA...|D5>D:DBCCAC5>,(AC...| 100M| 0| 0| false| false| true| false| false| false| false| false| true| false| false| true| false| 31A0A43C23| null|NM:i:3 AS:i:85 RG...| foo| null| null| null| null| null| null| null| null| null| bar| null| null| null| |[chr1,n...|125166941|125167041| 0|chr1-6811124|ATTCCATTTGATGACAA...|D5>D:DBCCAC5>,n...|125181832|125181932| 0|chr1-6811124|ATTCCATTTGATGACAA...|D5>D:DBCCAC5>,n...|143271137|143271237| 0|chr1-6811124|TGGAATCATGATCAAAT...|C@CFFFFDGH:HAJJIJ...| 100M| 0| 0| false| false| true| false| false| false| false| false| false| false| false| true| false| 23G43T0T31| null|NM:i:3 AS:i:85 RG...| foo| null| null| null| null| null| null| null| null| null| bar| null| null| null| |[chr1,n...|143211045|143211145| 0|chr1-6811124|ATTCCATTTGATGACAA...|D5>D:DBCCAC5>,(AC...| 100M| 0| 0| false| false| true| false| false| false| false| false| true| false| false| true| false| 27A3A0A43C23| null|NM:i:4 AS:i:80 RG...| foo| null| null| null| null| null| null| null| null| null| bar| null| null| null| |[chr1,n...|143189224|143189324| 0|chr1-6811124|TGGAATCATGATCAAAT...|C@CFFFFDGH:HAJJIJ...| 100M| 0| 0| false| false| true| false| false| false| false| false| false| false| false| true| false| 23G22G20T0T31| null|NM:i:4 AS:i:80 RG...| foo| null| null| null| null| null| null| null| null| null| bar| null| null| null| |[chr1,n...|143233259|143233359| 0|chr1-6811124|TGGAATCATGATCAAAT...|C@CFFFFDGH:HAJJIJ...| 100M| 0| 0| false| false| true| false| false| false| false| false| false| false| false| true| false| 6T16G43T0T31| null|NM:i:4 AS:i:80 RG...| foo| null| null| null| null| null| null| null| null| null| bar| null| null| null| |[chr1,n...|143274817|143274917| 0|chr1-6811124|TGGAATCATGATCAAAT...|C@CFFFFDGH:HAJJIJ...| 100M| 0| 0| false| false| true| false| false| false| false| false| false| false| false| true| false| 23G14T28T0T31| null|NM:i:4 AS:i:80 RG...| foo| null| null| null| null| null| null| null| null| null| bar| null| null| null| |[chr1,n...|143215364|143215464| 0|chr1-6811124|TGGAATCATGATCAAAT...|C@CFFFFDGH:HAJJIJ...| 100M| 0| 0| false| false| true| false| false| false| false| false| false| false| false| true| false| 6T16G43T0T31| null|NM:i:4 AS:i:80 RG...| foo| null| null| null| null| null| null| null| null| null| bar| null| null| null| |[chr1,n...|125180499|125180599| 0|chr1-6811124|ATTCCATTTGATGACAA...|D5>D:DBCCAC5>,(AC...| 100M| 0| 0| false| false| true| false| false| false| false| false| true| false| false| true| false| 31A0A43C16A6| null|NM:i:4 AS:i:80 RG...| foo| null| null| null| null| null| null| null| null| null| bar| null| null| null| |[chr1,n...|143267274|143267374| 0|chr1-6811124|TGGAATCATGATCAAAT...|C@CFFFFDGH:HAJJIJ...| 100M| 0| 0| false| false| true| false| false| false| false| false| false| false| false| true| false| 6T16G43T0T31| null|NM:i:4 AS:i:80 RG...| foo| null| null| null| null| null| null| null| null| null| bar| null| null| null| |[chr1,n...|143226169|143226269| 0|chr1-6811124|TGGAATCATGATCAAAT...|C@CFFFFDGH:HAJJIJ...| 100M| 0| 0| false| false| true| false| false| false| false| false| false| false| false| true| false| 23G14T28T0T31| null|NM:i:4 AS:i:80 RG...| foo| null| null| null| null| null| null| null| null| null| bar| null| null| null| |[chr1,n...|143264022|143264122| 0|chr1-6811124|TGGAATCATGATCAAAT...|C@CFFFFDGH:HAJJIJ...| 100M| 0| 0| false| false| true| false| false| false| false| false| false| false| false| true| false| 6T16G43T0T31| null|NM:i:4 AS:i:80 RG...| foo| null| null| null| null| null| null| null| null| null| bar| null| null| null| |[chr1,n...|143223869|143223969| 0|chr1-6811124|TGGAATCATGATCAAAT...|C@CFFFFDGH:HAJJIJ...| 100M| 0| 0| false| false| true| false| false| false| false| false| false| false| false| true| false| 23G25C17T0T31| null|NM:i:4 AS:i:80 RG...| foo| null| null| null| null| null| null| null| null| null| bar| null| null| null| |[chr1,n...|143258530|143258630| 0|chr1-6811124|ATTCCATTTGATGACAA...|D5>D:DBCCAC5>,(AC...| 100M| 0| 0| false| false| true| false| false| false| false| false| true| false| false| true| false| 31A0A17G25C23| null|NM:i:4 AS:i:80 RG...| foo| null| null| null| null| null| null| null| null| null| bar| null| null| null| +--------------------+---------+---------+----+------------+--------------------+--------------------+-----+---------------------+-------------------+----------+----------+----------+----------+-----------+------------+-------------------------+-------------+------------------+------------------+----------------+------------------+----------------------+--------------------+--------+--------------------+---------------+---------------------------+----------------------+-----------------------+--------------------+----------------------+------------------+------------------------------------+-------------------+-----------------------+-----------------+------------------+----------------+----------+ only showing top 20 rows (编辑:威海站长网) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |