<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Setting up Galaxy</title>
	<atom:link href="http://www.cassj.co.uk/blog/?feed=rss2&#038;p=359" rel="self" type="application/rss+xml" />
	<link>http://www.cassj.co.uk/blog/?p=359</link>
	<description>The sum total of interesting things I know</description>
	<lastBuildDate>Wed, 28 Jul 2010 22:56:55 +0100</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.1</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: cass</title>
		<link>http://www.cassj.co.uk/blog/?p=359&#038;cpage=1#comment-66</link>
		<dc:creator>cass</dc:creator>
		<pubDate>Wed, 08 Jul 2009 14:31:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.cassj.co.uk/blog/?p=359#comment-66</guid>
		<description>They gave us a grant, so it&#039;s currently free.  I need to work out how we&#039;re going to use it in the long term though. 

For doing data analysis, it&#039;s much easier than booking time on a cluster at work, as I can install exactly what I want without having to go through the cluster admin and I don&#039;t have to wait in line. Unless we start doing a lot more data analysis, I think it&#039;s still going to work out much cheaper than buying in our own hardware and it&#039;s certainly going to be less hassle. I&#039;m envisaging just running instances of the Galaxy and analysis AMIs when we&#039;re actually doing the alignment, peak finding and so on.  I&#039;ll probably keep the short read data as BAM or BioHDF on S3 so people can get at them and run their own analysis on EC2 if they want to.  

I also need to set up a LIMS-type server for managing and sharing our data. Probably with some kinda REST-API so other stuff (network building tools etc) can query the data (eg. &quot;What experiments have you done that involve NRSF? as RDF?&quot; or  &quot;Give me all the features between these chr co-ords for NRSF binding in NS5 cell lines as BED, or SAM or something). We can host this kind of thing at work and it doesn&#039;t really need to scale much. So I imagine I&#039;ll have the analysis pipeline hand me back a list of peaks (genome pos, plus score, pvalue etc) and some metadata describing the analysis workflow and dump that into my LIMS thing.

I&#039;ll bring my laptop to the biogeeks thing if you want to have a play with the AWS stuff.

I guess I&#039;ll probably just stick the short read data onto EC2 for analysis and have it spit out the binding peaks and analysis metadata back to me.</description>
		<content:encoded><![CDATA[<p>They gave us a grant, so it&#8217;s currently free.  I need to work out how we&#8217;re going to use it in the long term though. </p>
<p>For doing data analysis, it&#8217;s much easier than booking time on a cluster at work, as I can install exactly what I want without having to go through the cluster admin and I don&#8217;t have to wait in line. Unless we start doing a lot more data analysis, I think it&#8217;s still going to work out much cheaper than buying in our own hardware and it&#8217;s certainly going to be less hassle. I&#8217;m envisaging just running instances of the Galaxy and analysis AMIs when we&#8217;re actually doing the alignment, peak finding and so on.  I&#8217;ll probably keep the short read data as BAM or BioHDF on S3 so people can get at them and run their own analysis on EC2 if they want to.  </p>
<p>I also need to set up a LIMS-type server for managing and sharing our data. Probably with some kinda REST-API so other stuff (network building tools etc) can query the data (eg. &#8220;What experiments have you done that involve NRSF? as RDF?&#8221; or  &#8220;Give me all the features between these chr co-ords for NRSF binding in NS5 cell lines as BED, or SAM or something). We can host this kind of thing at work and it doesn&#8217;t really need to scale much. So I imagine I&#8217;ll have the analysis pipeline hand me back a list of peaks (genome pos, plus score, pvalue etc) and some metadata describing the analysis workflow and dump that into my LIMS thing.</p>
<p>I&#8217;ll bring my laptop to the biogeeks thing if you want to have a play with the AWS stuff.</p>
<p>I guess I&#8217;ll probably just stick the short read data onto EC2 for analysis and have it spit out the binding peaks and analysis metadata back to me.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andrew Clegg</title>
		<link>http://www.cassj.co.uk/blog/?p=359&#038;cpage=1#comment-64</link>
		<dc:creator>Andrew Clegg</dc:creator>
		<pubDate>Tue, 07 Jul 2009 15:09:14 +0000</pubDate>
		<guid isPermaLink="false">http://www.cassj.co.uk/blog/?p=359#comment-64</guid>
		<description>I remember we were talking about the potential of AWS at that Perl night... You said you were worried about the sheer data transfer/storage costs for chipseq data. Did it turn out to be fairly cost effective after all?</description>
		<content:encoded><![CDATA[<p>I remember we were talking about the potential of AWS at that Perl night&#8230; You said you were worried about the sheer data transfer/storage costs for chipseq data. Did it turn out to be fairly cost effective after all?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
