Hello, I got an unexpected scientific result from a simple "get data" from UCSC table browser with galaxy. I have uploaded the mouse mm9 repeat masker track with a filtering on repClass = LINE SINE LTR DNA If I ask the output to be sequences, I will get 1 454 739 sequences, but if I ask the same data to be retrieved as a BED format I get roughly 3 600 000 which is the closest to the "summary/statiscs" of the dataset (item count = 3 493 484). Why is there a difference between the FASTA file and the BED file? Thank you Rita
Hello Rita, The UCSC Table Browser has a limit on the amount of output that can be extracted in any single query. Without seeing your history, my initial suspicion is that both of the queries timed out, the first sooner than the last. Comparing the number of items between the original UCSC Table and the final dataset in Galaxy is a good place to start. You could also check the last few lines of the dataset to see if the data ends abruptly (sometimes with a message) using the "Text Manipulation -> Select last lines from a dataset", last 10 lines or so, converted to tabular format if necessary first. As long as the entire track is under 50G, you could consider loading the flat text file of the data. This would be on the UCSC Downloads server. Download using their instructions and then upload into Galaxy using FTP: UCSC: See "Downloads" on left tool menu Navigate to genome, build, annotation database, and target table Help is at: Best wishes for your project, Jen Galaxy team -- Jennifer Jackson
