基因数据处理49之cloud-scale-bwamem运行成功
发布时间:2021-03-10 19:30:40 所属栏目:大数据 来源:网络整理
导读:1.先使用art生成数据: 请看前一篇 2.上传fastq到hdfs: hadoop @Master :~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem $ spark-submit -- class cs . ucla . edu . bwaspark . BWAMEMSpark -- master local [2] / home / hadoop / xubo / tools / cloud - s
副标题[/!--empirenews.page--]
1.先使用art生成数据: 2.上传fastq到hdfs: hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ spark-submit --class cs.ucla.edu.bwaspark.BWAMEMSpark --master local[2] /home/hadoop/xubo/tools/cloud-scale-bwamem-0.2.1/target/cloud-scale-bwamem-0.2.0-assembly.jar upload-fastq 0 1 fastq/G38L100c1Nhs20.fastq /xubo/data/alignment/cs-bwamem/fastq/g38L100c1Nhs20upload.fastq command: upload-fastq Map('isPairEnd -> 0,'filePartNum -> 1,'inFilePath1 -> fastq/G38L100c1Nhs20.fastq,'outFilePath -> /xubo/data/alignment/cs-bwamem/fastq/g38L100c1Nhs20upload.fastq) Upload FASTQ command line arguments: 0 1 fastq/G38L100c1Nhs20.fastq /xubo/data/alignment/cs-bwamem/fastq/g38L100c1Nhs20upload.fastq 250000 [WARNING] Avro: Invalid default for field comment: null not a "bytes" [WARNING] Avro: Invalid default for field comment: null not a "bytes" SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Upload FASTQ to HDFS Finished!!! 3.进行align: hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ spark-submit --executor-memory 2g --class cs.ucla.edu.bwaspark.BWAMEMSpark --total-executor-cores 2 --master local[2] --conf spark.driver.host=**MasterIP** --conf spark.driver.cores=2 --conf spark.driver.maxResultSize=2g --conf spark.storage.memoryFraction=0.7 --conf spark.akka.threads=2 --conf spark.akka.frameSize=1024 /home/hadoop/xubo/tools/cloud-scale-bwamem-0.2.1/target/cloud-scale-bwamem-0.2.0-assembly.jar cs-bwamem -bfn 1 -bPSW 1 -sbatch 10 -bPSWJNI 1 -oChoice 2 -oPath hdfs://**MasterIP**:9000/xubo/11.adam -localRef 1 -isSWExtBatched 1 0 GRCH38BWAindex/GRCH38chr1L3556522.fasta /xubo/data/alignment/cs-bwamem/fastq/g38L100c1Nhs20upload.fastq command: cs-bwamem Map('isPSWJNI -> 1,'localRef -> 1,'batchedFolderNum -> 1,'isPSWBatched -> 1,'subBatchSize -> 10,'inFASTQPath -> /xubo/data/alignment/cs-bwamem/fastq/g38L100c1Nhs20upload.fastq,'inFASTAPath -> GRCH38BWAindex/GRCH38chr1L3556522.fasta,'outputPath -> hdfs://**MasterIP**:9000/xubo/11.adam,'isSWExtBatched -> 1,'isPairEnd -> 0,'outputChoice -> 2) CS- BWAMEM command line arguments: false GRCH38BWAindex/GRCH38chr1L3556522.fasta /xubo/data/alignment/cs-bwamem/fastq/g38L100c1Nhs20upload.fastq 1 true 10 true ./target/jniNative.so 2 hdfs://**MasterIP**:9000/xubo/11.adam HDFS master: hdfs://Master:9000 Input HDFS folder number: 1 Head line: @RG ID:foo SM:bar Read Group ID: foo Load Index Files Load BWA-MEM options Output choice: 2 SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. [WARNING] Avro: Invalid default for field comment: null not a "bytes" [WARNING] Avro: Invalid default for field comment: null not a "bytes" [WARNING] Avro: Invalid default for field comment: null not a "bytes" CS-BWAMEM Finished!!! Jun 3,2016 11:32:26 AM INFO: parquet.hadoop.ParquetInputFormat: Total input paths to process : 1 Jun 3,2016 11:32:27 AM WARNING: parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext,but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl Jun 3,2016 11:32:27 AM INFO: parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 1 records. Jun 3,2016 11:32:27 AM INFO: parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block Jun 3,2016 11:32:27 AM INFO: parquet.hadoop.InternalParquetRecordReader: block read in memory in 17 ms. row count = 1 MasterIP需要修改成相对应的 4.查看adam文件: package org.bdgenomics.avocado.cli import org.apache.spark.sql.SQLContext import org.apache.spark.{SparkConf,SparkContext} import org.bdgenomics.adam.rdd.ADAMContext._ /** * Created by xubo on 2016/5/27. * 从hdfs下载经过avocado匹配好的数据 * run:success */ object parquetRead2csbwamem { def main(args: Array[String]) { val conf = new SparkConf().setMaster("local[4]").setAppName(this.getClass().getSimpleName().filter(!_.equals('$'))) val sc = new SparkContext(conf) val sqlContext = new SQLContext(sc) println("start:") val file = "hdfs://**MasterIp**:9000/xubo/14.adam/0" val df3 = sqlContext.read.option("mergeSchema","true").parquet(file) // df3.printSchema() df3.show() println("end") sc.stop } } 结果: +--------------------+---------+---------+----+--------------------+--------------------+--------------------+-----+---------------------+-------------------+----------+----------+----------+----------+-----------+------------+-------------------------+-------------+------------------+------------------+----------------+------------------+----------------------+--------------------+--------+--------------------+---------------+---------------------------+----------------------+-----------------------+--------------------+----------------------+------------------+------------------------------------+-------------------+-----------------------+-----------------+------------------+----------------+----------+ | contig| start| end|mapq| readName| sequence| qual|cigar|basesTrimmedFromStart|basesTrimmedFromEnd|readPaired|properPair|readMapped|mateMapped|firstOfPair|secondOfPair|failedVendorQualityChecks|duplicateRead|readNegativeStrand|mateNegativeStrand|primaryAlignment|secondaryAlignment|supplementaryAlignment|mismatchingPositions|origQual| attributes|recordGroupName|recordGroupSequencingCenter|recordGroupDescription|recordGroupRunDateEpoch|recordGroupFlowOrder|recordGroupKeySequence|recordGroupLibrary|recordGroupPredictedMedianInsertSize|recordGroupPlatform|recordGroupPlatformUnit|recordGroupSample|mateAlignmentStart|mateAlignmentEnd|mateContig| +--------------------+---------+---------+----+--------------------+--------------------+--------------------+-----+---------------------+-------------------+----------+----------+----------+----------+-----------+------------+-------------------------+-------------+------------------+------------------+----------------+------------------+----------------------+--------------------+--------+--------------------+---------------+---------------------------+----------------------+-----------------------+--------------------+----------------------+------------------+------------------------------------+-------------------+-----------------------+-----------------+------------------+----------------+----------+ |[chr1,248956422,n...|225496693|225496793| 60|chr1-1 RG ID:foo ...|CATATTTACCAATTAAA...|@C@D@FFDFHHHHIJ.J...| 100M| 0| 0| false| false| true| false| false| false| false| false| false| false| true| false| false| 61A38| null|NM:i:1 AS:i:95 XS...| foo| null| null| null| null| null| null| null| null| null| bar| null| null| null| +--------------------+---------+---------+----+--------------------+--------------------+--------------------+-----+---------------------+-------------------+----------+----------+----------+----------+-----------+------------+-------------------------+-------------+------------------+------------------+----------------+------------------+----------------------+--------------------+--------+--------------------+---------------+---------------------------+----------------------+-----------------------+--------------------+----------------------+------------------+------------------------------------+-------------------+-----------------------+-----------------+------------------+----------------+----------+ end (编辑:威海站长网) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |