650 / 2019-04-29 18:46:53
Using PacBio Isoform Sequencing data to improve the de novo assembly of RNA-seq data
PcaBio,RNA-seq,Transcriptome
摘要录用
Chi-Fa Huang / Academia Sinica
De novo assembly of RNA-seq data provides a means to study transcriptomes without a reference genome sequence. However, assembling short reads is highly challenging because of the complexity of the transcriptome. Therefore, we sequenced the leaf transcriptome of Gynandropsis gynandra, a species related to the model plant Arabidopsis thaliana, by PcaBio Iso-seq and Illumina RNA-seq and established a pipeline to improve the short-reads de novo assembly. First, the longest transcripts were selected as representative transcripts from PacBio data. Additionally, the CLC Genomics Workbench was used to do RNA-seq assembly. Then, we predicted and collected the PacBio and CLC CDSs each of which covers over 90% of the length of its targeting Arabidopsis gene into a CDS dataset, calling it as a full length CDS dataset. Second, the unmapped RNA-seq reads against the full length CDSs were used to do de novo assembly. After assembling, new full length CDSs were collected into the full length CDS dataset. Third, the remaining transcripts from PacBio Iso-seq and CLC contigs were used to assemble by CAP3. Finally, we constructed a new G. gynandra CDS dataset in which we combined CAP3 results with the full length CDS dataset and removed redundant transcripts. After these steps, we reconstructed over 80% of CDSs at full length in G. gynandra leaf transcriptome, which significantly improves the de novo assembly result compared to the assembly using only the RNA-seq data by current assembly tools.
重要日期
  • 会议日期

    06月16日

    2019

    06月21日

    2019

  • 05月01日 2019

    初稿截稿日期

  • 06月21日 2019

    注册截止日期

联系方式
历届会议
移动端
在手机上打开
小程序
打开微信小程序
客服
扫码或点此咨询