Question: Trying to upload own reference genome to use in Bowtie2 and BWA aligner, having trouble with FTP filezilla
0
gravatar for hafsa2009
8 months ago by
hafsa20090
hafsa20090 wrote:

Hello there, i am new to galaxy, and so far im loving all of the features. As a final year student in university, i'm comparing how Bowtie2 and BWA align genomes. so here comes the problem: im trying to add my own reference genome, and its only 20 base pair long, that looks like this:

AGCTTTTCATTCTTCTGACT

I have made it a fasta file, and downloaded filezilla, then used the host name as usegalaxy,org and used my login credentials for galaxy. the software outputs as 'cannot connect to server' its very infuriating as i have been trying to resolve this issue for hours with no luck. using the provided E.coli genomes works against my reads, and i wish that i have use the reads provided above. Is there anyway i can make the reference genome above without going through all the hassle of filezilla? im using windows 10. I would really appreciate your response, thank you in advance

rna-seq bwa alignment bowtie • 658 views
ADD COMMENTlink modified 8 months ago • written 8 months ago by hafsa20090

I've been mad on the server issue for several days as well...hope the galaxy team can help

ADD REPLYlink written 8 months ago by wxh810

have you had any luck so far? im still not able to connect to the server

ADD REPLYlink written 8 months ago by hafsa20090

Not yet....T^T......

ADD REPLYlink written 8 months ago by wxh810
1
gravatar for Jennifer Hillman Jackson
8 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

I just connected with Filezilla to Galaxy Main https://usegalaxy.org (FTP server: usegalaxy.org) and transferred a BAM dataset and all went Ok. I used all the Filezilla default settings and accepted the security certificate.

For larger uploads, for right now, the server may disconnect and reconnect, requiring that the security certificate is re-accepted before resuming the transfer. This is can occur several times during longer transfers and we are working to improve that as an active priority.

If your data are under 2 GB, the option to browse and load local files is available - that might be a good choice for a small fasta file. If just a single short sequence, you could even copy/paste this into the Upload tool as another alternative. Make sure it is formatted correctly with a title line: https://galaxyproject.org/learn/datatypes/#fasta Hosting data at a public site and loading by URL is also always an option. https://galaxyproject.org/support/loading-data/

Anyone having trouble with Filezilla/FTP will need to double check their settings versus the usage described here: https://galaxyproject.org/ftp-upload/. Then confirm there is not a firewall in the network from where you are connecting (check with your local admins). Recent Q&A on the topic: https://lists.galaxyproject.org/pipermail/galaxy-dev/2018-March/026245.html

Thanks! Jen, Galaxy team

ADD COMMENTlink written 8 months ago by Jennifer Hillman Jackson25k
0
gravatar for hafsa2009
8 months ago by
hafsa20090
hafsa20090 wrote:

Thank you for your reply, I have followed the guidelines to making a FASTA file, e.g putting the '<' and a unique identifier such as sequence1 then the reference genome and uploading as a FASTA file by pasting it. when I have run the application Bowtie2 this is the error message I received:

Could not display BAM file, error was:
file has no sequences defined (mode='rb') - is it SAM/BAM format? Consider opening with check_sq=False

BWA aligner would print the outcome in a SAMS format which is what i am looking for, but the POS is always 0. which i'm guessing the reads did not map?

QNAME   FLAG    RNAME   POS MAPQ    CIGAR   MRNM    MPOS    ISIZE   SEQ QUAL    OPT
@HD VN:1.3 SO:coordinate
@SQ SN:sequence1 LN:0
@PG ID:bwa PN:bwa VN:0.7.17-r1188 CL:bwa samse localref.fa first.sai /jetstream/scratch0/main/jobs/18913641/inputs/dataset_24305944.dat
gi|110640213|ref|NC_008253.1|_418_952_1:0:0_1:0:0_0 4   *   0   0   *   *   0   0   TCTTG   22222   
gi|110640213|ref|NC_008253.1|_31_476_0:0:0_0:0:0_1  4   *   0   0   *   *   0   0   TTTTG   22222   
gi|110640213|ref|NC_008253.1|_210_743_2:0:0_1:1:0_2 4   *   0   0   *   *   0   0   CTTGA   22222   
gi|110640213|ref|NC_008253.1|_239_759_1:0:0_2:1:0_3 4   *   0   0   *   *   0   0   ACCTT   22222   
gi|110640213|ref|NC_008253.1|_334_899_2:0:0_1:0:0_4 4   *   0   0   *   *   0   0   AGTTC   22222   
gi|110640213|ref|NC_008253.1|_330_843_1:0:0_2:0:0_5 4   *   0   0   *   *   0   0   CTTCT   22222   
gi|110640213|ref|NC_008253.1|_182_657_3:0:0_0:0:0_6 4   *   0   0   *   *   0   0   ACTTT   22222   
gi|110640213|ref|NC_008253.1|_400_815_0:0:0_2:0:0_7 4   *   0   0   *   *   0   0   TCTTT   22222   
gi|110640213|ref|NC_008253.1|_388_857_1:0:0_2:0:0_8 4   *   0   0   *   *   0   0   TTACT   22222   
gi|110640213|ref|NC_008253.1|_369_908_0:0:0_0:0:0_9 4   *   0   0   *   *   0   0   TCTTC   22222   
gi|110640213|ref|NC_008253.1|_353_846_3:0:0_0:0:0_a 4   *   0   0   *   *   0   0   TTACT   22222   
gi|110640213|ref|NC_008253.1|_488_915_0:0:0_1:0:0_b 4   *   0   0   *   *   0   0   AAATT   22222   
gi|110640213|ref|NC_008253.1|_295_724_0:0:0_0:0:0_c 4   *   0   0   *   *   0   0   GGCTT   22222   
gi|110640213|ref|NC_008253.1|_293_822_2:0:0_1:0:0_e 4   *   0   0   *   *   0   0   TCGAC   22222   
gi|110640213|ref|NC_008253.1|_211_590_1:0:0_2:0:0_f 4   *   0   0   *   *   0   0   TATTG   22222   
gi|110640213|ref|NC_008253.1|_255_785_2:0:0_0:1:0_10    4   *   0   0   *   *   0   0   GGTTT   22222   
gi|110640213|ref|NC_008253.1|_9_480_2:0:0_0:0:0_11  4   *   0   0   *   *   0   0   CATTT   22222   
gi|110640213|ref|NC_008253.1|_366_839_1:0:0_2:0:0_12    4   *   0   0   *   *   0   0   GCTGT   22222   
gi|110640213|ref|NC_008253.1|_510_992_3:0:0_2:0:0_13    4   *   0   0   *   *   0   0   ACTTC   22222

Thanks, Hafsa

ADD COMMENTlink written 8 months ago by hafsa20090
1

The custom genome format should be like this. I am not sure if "<" was a typo or not.

>sequence1
AGCTTTTCATTCTTCTGACT

This is a very short sequence to map against and the "reads" are also very short. It looks like you are not getting any hits (FLAG "4" means not mapped).

This tool isn't really designed to do this type of mapping, nor is Bowtie2 (this tool just happened to fail when there were no hits to report at all). Both expect NGS fastq reads as an input and genomes/transcriptomes as a target.

If you want to compare the two tools - create test data with longer sequence content that mimics real usage.

ADD REPLYlink modified 8 months ago • written 8 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 175 users visited in the last hour