Question: ref genome using Rsync
0
gravatar for ChickenRNA
2.2 years ago by
ChickenRNA50
ChickenRNA50 wrote:

Hi, I am trying to download the reference genome of the chicken on my local instance of galaxy using rsync. I am currently only downloading the allfasta data tables, and it has been more than 24 hours and it stills says running. Is this normal? is there a faster way to bring the reference genome into the local instance of galaxy to perform RNASeq analysis?

Thank you

ADD COMMENTlink modified 2.2 years ago by Jennifer Hillman Jackson25k • written 2.2 years ago by ChickenRNA50
1
gravatar for Jennifer Hillman Jackson
2.2 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

Instead of using the Rsync server, consider installing the genome with Data Managers (sourced from the Tool Shed http://usegalaxy.org/toolshed). Install DMs like any other tool using the Admin functions.

You'll need these DMs at a minimum, and execute them in this order first:

  • Fasta fetcher. Tthere are two and often both are needed. If the genome is not listed in the builds list in the Upload tool, use the one that creates a "dbkey"
  • SAM indexer
  • Picard indexer
  • 2bit indexer

Then get the DMs that create indexes for the tools you want to use. Run these after the others have completed for the best results.

Thanks, Jen, Galaxy team

Ps: I will check into the Rsych server issues meanwhile. Still, using DMs directly is still the best choice.

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by Jennifer Hillman Jackson25k

Thank you so much for your prompt response Jennifer! Do I need different DM's for different tools like Tophat? I am trying to set it up so it would be similar to usegalaxy.org, where the reference genome will be available in dropdown in the tools? or do I have to set them up each time? Is there any tutorials you would suggest on this? I am fairly new to all this, so thank you for your patience and guidance

ADD REPLYlink written 2.2 years ago by ChickenRNA50
1

This is the help for Data Managers: https://wiki.galaxyproject.org/Admin/Tools/DataManagers

The idea is to load the genome. This can be any fasta file - including the same ones as on Galaxy Main, if you wish. The full name indicates the exact build. Find this information the Upload tool or by clicking into the pencil icon for any dataset - the list of genomes is included in both places - or you can add in your own custom genome ("dbkey").

To index for tools, do the first steps (load the fasta, do basic indexes), then proceed to tool-specific DMs. At this time, perform the indexing per-genome. Workflowing this type of processing is an enhancement the team is considering to make it all go smoother. You could also do the indexing using a script you create with a Galaxy API: https://wiki.galaxyproject.org/Develop/API.

Once the indexes are created they will be persistent data on your instance. In other words, if Tophat indexes are created (with the Bowtie2 DM using the option "Include Tophat indexes" on the form) - these will be available to all users on that instance.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Jennifer Hillman Jackson25k
1

Update: If the chicken genome(s) are named like galGal3, galGal4, etc .. these are sourced from UCSC. This is a specific data source choice in the fasta fetching data manager tool form.

If you are confused about any genome source found at http://usegalaxy.org, a google with the build name will usually locate the source, but please feel free to write back and we can help guide you.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Jennifer Hillman Jackson25k

Thanks Again Jennifer, Is there a particular way of knowing what indexes the specific tools use?

ADD REPLYlink written 2.2 years ago by ChickenRNA50
1

The tool name and the Data Manager are usually named in a way that makes this clear. Is there one you are confused about?

ADD REPLYlink written 2.2 years ago by Jennifer Hillman Jackson25k

No I am just in the process of installing all the DM's that you mentioned. I am confused about setting up the indexes, but it may become clear once I get going.

ADD REPLYlink written 2.2 years ago by ChickenRNA50

Jennifer, Also does installing these DM's take longer than installing tools from tool sheds? It is taking much longer in my case, so I am not sure if it is normal or something is wrong on my end? Thanks

ADD REPLYlink written 2.2 years ago by ChickenRNA50
1

Not that I have ever noticed, but different tools have different dependencies. The overall load on the Tool Shed can also be a factor as well as how many tools are loaded simultaneously. I suggest starting the install for those you want and then allowing them to complete. Once done, check the status for each to ensure all went as expected.

Update: re-read your post. Do you mean the actual jobs performing the indexing are long running? If so, then these will consume about the same resources (memory, time, compute) as running the indexing line command. Some indexes do take time. If any fail for resource, there was probably not enough available to run the single (sometimes) or concurrent jobs. Just re-run those. I have had a few genomes consume a very large amount of memory, but these were generally very large and/or highly fragmented genomes. Adjusting the parameters can usually help - or providing more resource (most often memory). If parameters are unclear, examine the target tool's documentation. The manual/help should describe how indexes are best created for particular genome build types and recommended resources.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Jennifer Hillman Jackson25k

Hi Jennifer, My problem was it was taking long to install the DM's from the toolshed. I will follow your advice and stop them all and install one by one.

ADD REPLYlink written 2.2 years ago by ChickenRNA50
1

You shouldn't need to install the tools one at time. Just allow them to complete and delete/reinstall should any fail.

ADD REPLYlink written 2.2 years ago by Jennifer Hillman Jackson25k

Jennifer, Sorry about all the basic and simple questions, but i am very new to all this hence my questions. I have my local galaxy running through a VM on windows (was going to dual boot into linux to use galaxy but it worked with VM). I had selected all the DM's that you mentioned to install using my toolshed and it was was taking several hours (when I download tools it usually doesn't take that long). So I am sure there is something wrong, but I am lost in trying to figure out how to identify it in order to fix it. Thank you so much for all your help, I am learning loads, but the learning curve seems to be steep.

ADD REPLYlink written 2.2 years ago by ChickenRNA50
1

I am not clear by what you mean by "my toolshed"? Could you explain more if the rest of this does not help ....

The DMs should be installed from the Main Tool Shed (hosted at http://usegalaxy.org/toolshed). This is the default tool shed accessed through the Admin install tools function on a local/cloud from http://getgalaxy.org (within a VM or not - Galaxy is not supported on Windows directly). Doing this ensures that the most current version of the DM that works with the most current Galaxy release will install and run correctly.

ADD REPLYlink written 2.2 years ago by Jennifer Hillman Jackson25k

By my toolshed, what I had meant was the when I download the DMs using the toolshed on my local instance of Galaxy. When I install it and go to "monitor installing tool shed repositories", the status for the DM's are cloning or "installing dependent repositories" (the exact wording may not be right), for several hours (>24 hours)

Would you suggest dual booting my computer and running this on linux, is there a potential that this may be caused by running it through a VM?

ADD REPLYlink written 2.2 years ago by ChickenRNA50
1

I am not sure what is going on but have asked our team for input to help troubleshoot. Typically when tools take this long to load an uninstall/reinstall can help - but that might be another dead end considering the windows/VM factors. More feedback soon. - Jen

ps: Using a docker image (as asked in your other post here: https://biostar.usegalaxy.org/p/19402/) is one solution to try until that happens - and may be what the team recommends anyway.

ADD REPLYlink written 2.2 years ago by Jennifer Hillman Jackson25k

Thnk you Jennifer. Eagerly waiting to hear what the team says about this issue. If running through VM is the problem, I am dual boot into linux ot run this.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by ChickenRNA50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour