Question: Question About Uploading Data Through Url Method
0
gravatar for Xiangming Ding
7.1 years ago by
United States
Xiangming Ding40 wrote:
Hi galaxy I am a new user of galaxy. i met a problem and didnot find similar question in FAQ. I wanted to upload the data from DDBJ DRA dataset to galaxy through UTL method. The file is around 800M. However after uploading, the FASTQ file was just around 2M. So I wanted know whether it is possible to upload a large file to galaxy through URL method? or I should download the file to my pc and then uploading to galaxy through FTP method. Thanks xiangmimg
galaxy • 2.2k views
ADD COMMENTlink modified 7.1 years ago by Jennifer Hillman Jackson25k • written 7.1 years ago by Xiangming Ding40
0
gravatar for Jennifer Hillman Jackson
7.1 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hello Xiangmimg, Data files can be loaded using a URL on the "Get Data => Upload" form. FTP and HTTP connections are supported. This is briefly described on that form. If you are still having issues, there may be a problem with file compression or the connection. Downloading locally then using Galaxy's FTP upload function is certainly an option. http://wiki.g2.bx.psu.edu/Learn/Upload%20via%20FTP Best, Jen Galaxy team -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/wiki/Support
ADD COMMENTlink written 7.1 years ago by Jennifer Hillman Jackson25k
Hi,  I am using the galaxy public server.  Is there a way to access output files (via ftp, perhaps) so I can bulk download them to my computer? I am over my quota and want to get data off of Galaxy but prefer not to do this all one at a time.    Similarly, is there a way to access a directory (via unix, ftp, etc) to rename files quickly while they are on Galaxy, since renaming each output file (i.e. the multiple ones output from cuffdiff) within galaxy is very inefficient and time consuming. Thanks. Rich
ADD REPLYlink written 7.1 years ago by Richard Mark White240
Hi Rich, This is a good question! Maybe people have been asking about this. To download data with a unix line command method, please try wget or curl, for example: unix% wget 'url_for_the_dataset' or unix% wget 'url_for_the_history' To capture the url for a dataset, right click on the disk icon for a dataset and select "copy link location". To capture the url for an entire history, select "Options -> Export to File". The middle panel will display a link. A downloaded history can be loaded into a local Galaxy instance where the datasets can be managed (copy/rename) or the histories archived. Hopefully this helps you and others that are managing larger datasets & histories, Best, Jen Galaxy team -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/wiki/Support
ADD REPLYlink written 7.1 years ago by Jennifer Hillman Jackson25k
Hi,   I was nearing my disk quota (at 97%), so I deleted a large number of datasets using "delete permanently".  But my usage did not go down at all.  Is there a delay in this happening, or is there some way to purge the files? richard
ADD REPLYlink written 7.0 years ago by Richard Mark White240
Hi Richard, Yes, it takes a short time for the UI counts to update. If you deleted permanently, then the result should be what you expected. Should the quota count remain high by tomorrow, that would point to an issue with lingering data counting in the quota. Places to search for unexplained disk use: 1 - Older pre-quota "deleted" datasets that were not permanently deleted. You can check for these in the View Histories -> advanced -> deleted set. The far right column "Status" will note deleted vs permanently deleted. 2 - Shared histories can count towards a quota. So, if not needed or only portions are, copy out of these what you want to use and ask the user that shared the data to "unshared" you, so you don't get stuck with the entire history in your quota. Shared histories/data and quotas are somewhat tricky to tune, and better solutions may be developed as the details are worked out, but this is the current implementation. A good feature to know about is that an imported dataset from a public Data Library never counts towards your quota (if left unmodified). You have probably seen this, but for others who may be reading the thread, this wiki has many details and tips for managing data: http://galaxyproject.org/wiki/Learn/Managing%20Datasets One last comment - it would be very helpful for us if questions were sent with the mailing list as a "to" recipient, so that our ticket tracker picks it up. Hopefully this helps! And please feel free to ask if you need more help or the disk size is not what you expect after the counts refresh. Best, Jen Galaxy team -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/wiki/Support
ADD REPLYlink written 7.0 years ago by Jennifer Hillman Jackson25k
Hi, Thanks for the info and I did what you suggested.  But still no luck.  I deleted everything, and when I add up the data totals in my active histories (I have nothing shared) it adds up to 376gb, but i am showing 100%.  any ideas? rich ________________________________ To: Richard Mark White <whiter3@yahoo.com>; "galaxy-user@bx.psu.edu" <galaxy-user@bx.psu.edu> Cc: closeticket@galaxyproject.org Subject: disk quota not updating Hi Richard, Yes, it takes a short time for the UI counts to update. If you deleted permanently, then the result should be what you expected. Should the quota count remain high by tomorrow, that would point to an issue with lingering data counting in the quota. Places to search for unexplained disk use: 1 - Older pre-quota "deleted" datasets that were not permanently deleted. You can check for these in the View Histories -> advanced -> deleted set. The far right column "Status" will note deleted vs permanently deleted. 2 - Shared histories can count towards a quota. So, if not needed or only portions are, copy out of these what you want to use and ask the user that shared the data to "unshared" you, so you don't get stuck with the entire history in your quota. Shared histories/data and quotas are somewhat tricky to tune, and better solutions may be developed as the details are worked out, but this is the current implementation. A good feature to know about is that an imported dataset from a public Data Library never counts towards your quota (if left unmodified). You have probably seen this, but for others who may be reading the thread, this wiki has many details and tips for managing data: http://galaxyproject.org/wiki/Learn/Managing%20Datasets One last comment - it would be very helpful for us if questions were sent with the mailing list as a "to" recipient, so that our ticket tracker picks it up. Hopefully this helps! And please feel free to ask if you need more help or the disk size is not what you expect after the counts refresh. Best, Jen Galaxy team -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/wiki/Support
ADD REPLYlink written 7.0 years ago by Richard Mark White240
Hi Rich, This sounds correct. The quota is set on Main to be 250G. So anything at or over that amount will be 100% of use. http://galaxyproject.org/wiki/Main#User_data_and_job_quotas If I misunderstood your question, please provide more details. Thanks! Jen Galaxy team -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/wiki/Support
ADD REPLYlink written 7.0 years ago by Jennifer Hillman Jackson25k
Hi Richard, It can take a bit if you delete a large amount of data at once. Did your usage eventually decrease? --nate
ADD REPLYlink written 7.0 years ago by Nate Coraor3.2k
yup...took a while, but eventually resolved. thanks. rich ________________________________ To: Richard Mark White <whiter3@yahoo.com> Cc: Jennifer Jackson <jen@bx.psu.edu>; "galaxy-user@bx.psu.edu" <galaxy-user@bx.psu.edu> Subject: Re: [galaxy-user] disk quota not updating Hi Richard, It can take a bit if you delete a large amount of data at once.  Did your usage eventually decrease? --nate
ADD REPLYlink written 7.0 years ago by Richard Mark White240
Hi, My seq core returns FASTQ files to me in *.txt.tar.gz format.  When I upload this to galaxy, it unzips, but it is apparently still TAR'd and cannot be read. Is it possible to upload this format, or do I need to untar and unzip it first (which is less than ideal)? Rich
ADD REPLYlink written 7.0 years ago by Richard Mark White240
I am unable to access for past several hours.  Are others having the same issue? rich
ADD REPLYlink written 7.0 years ago by Richard Mark White240
Hi Rich, Our core router has crashed, we're working on the problem and hope to have it fixed within the next few hours. Sorry for the inconvenience. --nate
ADD REPLYlink written 7.0 years ago by Nate Coraor3.2k
Hi,   I have generated a transcript file using cufflinks for the human (hg19) or zebrafish(zv9) assemblies.  When I try to display the cufflinks "assembled transcripts" in UCSC I get this error in the UCSC browser and it wont display the transcripts. human: GFF/GTF group NM_005638 on chrX+, this line is on chrY+, all group members must be on same seq and strand zebrafish: "GFF/GTF group vapb on chr6+, this line is on chr7-, all group members must be on same seq and strand" Any ideas? Rich
ADD REPLYlink written 6.9 years ago by Richard Mark White240
ADD REPLYlink written 6.9 years ago by Richard Mark White240
Hi, Is anyone else having trouble connecting to main.g2.bx.psu.edu for FTP uploads?  I cannot seem to connect since yesterday. Rich
ADD REPLYlink written 6.7 years ago by Richard Mark White240
Hi Rich, It's back up now. --nate
ADD REPLYlink written 6.7 years ago by Nate Coraor3.2k
Hi the file name is spr097786.fastq.bz2.After upload it showed spr097786.fastq. It showed it only contain around 5000 sequence reads. I also tried to upload through FTP. so i download the file to my computer and then upload to FTP in galaxy. the totlal 800M file was uploaded to the FTP successfully. But when i transfered the file to the history i met the same problem. only 5000 sequence reads was moved to history. I donnot whether it is because of the bz2 file extension. or i should try other compressed file extension. xaingmimg Quoting Jennifer Jackson <jen@bx.psu.edu>:
ADD REPLYlink written 7.1 years ago by Xiangming Ding40
Hello Xaingmimg, When you uncompress the archive locally, does it contain a single file with more than 5000 reads? The consistent results and even number of reads (5000) may mean that the archive contains more than one file. Currently, Galaxy will only load the first file in an archive. Hopefully this helps or you have already found the solution, Take care, Jen Galaxy team -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/wiki/Support
ADD REPLYlink written 7.1 years ago by Jennifer Hillman Jackson25k

Hi Jen

I have a local galaxy instance installed on a cluster.

I've been trying to upload data using URL but kept getting this error: "unable to fetch <url of data> [Errno socket error] [Errno 110] connection timed out"

I've tried with and without file compression format and still gives me that error.

Am I suppose to modify/edit any config file?

Please advise.

Thanks.

Best regards

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by epic0
0
gravatar for Simao Lee
7.1 years ago by
Simao Lee10
Simao Lee10 wrote:
You can download DRA files directly by FTP to Galaxy . . Just paste the FTP address directly in the file box when using upload from my computer Best Simon
ADD COMMENTlink written 7.1 years ago by Simao Lee10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 177 users visited in the last hour