<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Tim Bull &#187; celery</title>
	<atom:link href="http://timbull.com/feed/?tag=celery" rel="self" type="application/rss+xml" />
	<link>http://timbull.com</link>
	<description>This WordPress.com site is the cat’s pajamas</description>
	<lastBuildDate>Mon, 25 Mar 2013 19:17:54 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='timbull.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Tim Bull &#187; celery</title>
		<link>http://timbull.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://timbull.com/osd.xml" title="Tim Bull" />
	<atom:link rel='hub' href='http://timbull.com/?pushpress=hub'/>
		<item>
		<title>Reflections on start-up life: Week 25</title>
		<link>http://timbull.com/2010/05/11/reflections-on-start-up-life-week-25/</link>
		<comments>http://timbull.com/2010/05/11/reflections-on-start-up-life-week-25/#comments</comments>
		<pubDate>Tue, 11 May 2010 10:14:15 +0000</pubDate>
		<dc:creator>Tim Bull</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[celery]]></category>
		<category><![CDATA[processing]]></category>
		<category><![CDATA[tribalytic]]></category>

		<guid isPermaLink="false">http://timbull.com/reflections-on-start-up-life-week-25</guid>
		<description><![CDATA[The first full week back from San Francisco is done. Boy, what a change of pace! After a hectic time rushing from meeting to meeting, it's now back to the challenge of coding and trying to re-engage a meeting schedule here back in Australia. The c...<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=timbull.com&#038;blog=49080029&#038;post=18447464&#038;subd=timbull001&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>The first full week back from San Francisco is done.  Boy, what a change of pace!
<p />
<div>After a hectic time rushing from meeting to meeting, it&#039;s now back to the challenge of coding and trying to re-engage a meeting schedule here back in Australia.  The challenge has been switching back to a self motivation mode.</div>
<p />
<div>Really our biggest challenge now is focus.  It&#039;s not that we don&#039;t know what we need to do, it&#039;s that there is so much of it.  I&#039;ve said to a few people that we could literally employ six others full time &#8211; not that we have the money to do that yet!  The trick is really trying to balance between the longer term things that need focus now to happen in three months, Vs. the shorter term critical things to get us into the hands of end users.</div>
<p />
<div>Despite all this, we are closing in rapidly on a product that we can charge money for.  We are getting closer to our customers, we are finding out what they need and what they like (and what they&#039;re not so sure of) about Tribalytic.  Yet there is still so much more to go to leap the gap.</div>
<p />
<div>One of the best thing about this blog is the ability to go back and review history.  I was about to say &quot;It&#039;s been so long since we&#039;ve updated the product&quot; yet the current beta was released in Week 21 (<a href="http://timbull.com/reflections-on-start-up-life-week-21">http://timbull.com/reflections-on-start-up-life-week-21</a>) which was only four weeks ago!  Since then, we&#039;ve spent a week with customers, close to three weeks on the road and now here we are.</div>
<p />
<div>In that time, our own experiences and the feedback on Tribalytic have shown several short comings that need to be addressed, so we are back into a round of engineering (a somewhat shorter one this time) to address these.  While technically challenging, these are essential for the accuracy and performance of the product.  Our next beta (fingers crossed for next week) will support:</div>
<div>
<ol>
<li>A near real time index of Australian Twitter users and a greatly expanded index.</li>
<li>A revised stemmer (which means that Bunnings won&#039;t be treated as Bun and related to &quot;Hot, Cross and Easter&quot; (of all bugs this one makes me laugh the most). For those interested, the basic stemming algorithm we were using drops the s, then the ing, then the double n to get to a word &quot;stem&quot; or root.  Works great for walking, walked walks, but obvious issues for Bunnings.</li>
<li>Some level of boolean logic in our search engine (AND, OR and NOT). e.g @kevinruddpm OR rudd</li>
</ol>
</div>
<div>The more astute of you will of noticed that this post is also pretty much two days late now (usually first thing Monday morning).  Last week turned into a long week.  On a personal note (I can&#039;t comment overly about Alex&#039;s technical struggles although he&#039;s had no shortage of challenges as well) I&#039;ve been challenged with getting our processing pipeline live.  This is the thing that collects all the tweets and keeps everything up to date.  It would run for a couple of hours, then choke up and die.  Finally this morning I nutted it out after working through close to a million lines of log file to locate the issue.</div>
<p />
<div>Over 5 days of wasted elapsed time for (I kid you not) one parameter on a command.  Not even one line of code, it was 12 characters (including the spaces). (For the technically minded, I hadn&#039;t set a timeout on the socket call to Twitter and on rare occasions the call would just never connect and not fail so it caused the thread to block and never return &#8211; adding a timeout resolved the issue).</div>
<p />
<div>Finally the pipeline is now up and running and that&#039;s one less issue I need to concern myself with.</div>
<p />
<div>Working with Alex continues to be a lot of fun. It&#039;s more challenging now he&#039;s back in Beijing and I&#039;m here in Melbourne, but we continue to be able to work through issues via email and Skype without too many problems.  Probably the most challenging thing is that in the last couple of weeks we&#039;ve passed some kind of event horizon where we no longer understand what each other is coding! The complexities of the engine and it&#039;s search are a black box to me, while the processing pipeline and the overall processing architecture is just not worth Alex investing his time in understanding while there&#039;s a search engine to work on.</div>
<p />
<div>I&#039;m often amused by our conversations &#8211; my favourite of the last week (recorded here for my memory more than anything else) is the detailed discussion on how to weight relevant people around search terms.  If two people talk about magnum and icecream 10 times, but one person mentions magnum 7 times and icecream 3 times and the other magnum 3 times and icecream 7 times, who should be listed first? </div>
<p />
<div>What if the a user searched for icecream and magnum vs magnum and icecream.  What if one persons conversations were all in the first few days and the other persons in the last two?  For the record &#8211; there are no good answers to these questions, but it does show the rabbit holes you can work yourselves into.</div>
<p />
<div>We continue to meet with investors and still have some meetings in the pipeline.  There seems to be good consensus that we have a business, the question we now need to address (at least for Venture level investors) is do we have a $100 Million business (seriously&#8230; they want to see the possibility).</div>
<p />
<div><i>Highlights</i></div>
<div>
<ul>
<li>Several meetings with some great people.</li>
<li>Receiving feedback on our reports and the product.</li>
<li>Getting back to coding &#8211; feeling like we are progressing the product towards our customers.</li>
<li>Now have a Tribalytic blog (<a href="http://blog.tribalytic.com">http://blog.tribalytic.com</a>) and Twitter account (<a href="http://twitter.com/tribalytic">http://twitter.com/tribalytic</a>) so please follow them!</li>
</ul>
</div>
<p />
<div><i>Lowlights</i></div>
<p />
<div>
<ul>
<li>Processing pipeline problems wins hands down for this one.</li>
</ul>
</div>
<p />
<div><i>Goal this week</i></div>
<p />
<div>
<ul>
<li>Get the next beta of the engine ready for release.</li>
<li>Keep meeting with people.</li>
</ul>
</div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/timbull001.wordpress.com/18447464/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/timbull001.wordpress.com/18447464/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=timbull.com&#038;blog=49080029&#038;post=18447464&#038;subd=timbull001&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://timbull.com/2010/05/11/reflections-on-start-up-life-week-25/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/45bce8c85db792fa9373bee604141b29?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">tbull001</media:title>
		</media:content>
	</item>
		<item>
		<title>Build a processing queue with multi-threading and spread over multiple servers in less than a day hours using RabbitMQ and Celery.</title>
		<link>http://timbull.com/2010/03/09/build-a-processing-queue-with-multi-threading-30252/</link>
		<comments>http://timbull.com/2010/03/09/build-a-processing-queue-with-multi-threading-30252/#comments</comments>
		<pubDate>Tue, 09 Mar 2010 02:27:58 +0000</pubDate>
		<dc:creator>Tim Bull</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[ampq]]></category>
		<category><![CDATA[celery]]></category>
		<category><![CDATA[rabbitmq]]></category>
		<category><![CDATA[technical]]></category>

		<guid isPermaLink="false">http://timbull.com/build-a-processing-queue-with-multi-threading</guid>
		<description><![CDATA[As we move through the development cycle we now have many of the essential processing modules we need for Tribalytic, but we also have a few challenges we need to deal with as well:We need to collect data faster than we can process it on a single ...<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=timbull.com&#038;blog=49080029&#038;post=13015033&#038;subd=timbull001&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>As we move through the development cycle we now have many of the essential processing modules we need for Tribalytic, but we also have a few challenges we need to deal with as well:
<ol>
<li>We need to collect data faster than we can process it on a single processor.  Requirement &#8211; we need to be able to collect the data and then spread it out for processing over multiple processors / servers.</li>
<li>Some servers have a quota on how much they can process per hour for certain things (related to API limits set by Twitter etc.) Requirement &#8211; servers need to know their available limits and not take on more than they can process.</li>
<li>We need to be able to schedule some tasks on a regular basis.</li>
</ol>
<p>There are lots of different ways that this could be done, but after some preliminary research, I&#039;m going to set out to do it using two key pieces of technology:<br /> 
<ol>
<li><a href="http://www.rabbitmq.com/" target="_blank">RabbitMQ</a> &#8211; An AMQP (Advanced Message Queuing Protocol) server.  Robust message queues help a LOT with point one in particular.</li>
<li><a href="http://ask.github.com/celery/getting-started/introduction.html" target="_blank">Celery</a> &#8211; A pythonic / Django friendly task scheduler and queuing interface to Rabbit / AMQP. Celery provides the magic for points two and three. </li>
</ol>
<p>The goal of this document is for me to both document what I&#039;ve learnt so I can replicate in our production environment and lay out the &quot;easy&quot; steps after having digested the various documentation for you to able to implement something that solves some common problems quickly.
<p /> There are all sort of reasons that I&#039;ve preselected these technologies which aren&#039;t the point of this post &#8211; this post is documenting the challenge of having selected this approach, how easy is it to implement?  Firstly a few notes about my setup:<br /> 
<ol>
<li>I have two machines I&#039;ll be configuring this on, both running Ubuntu 9.10 Karmic Koala release.  I&#039;ll call them <i>Server</i> and <i>Laptop</i>.  For the record, <i>Server</i> is a PC with 4 cores and 6Gb of RAM.  <i>Laptop</i> is an MSI laptop with a single core (1.3Ghz) and 2Gb RAM.  When you see <i>Server</i> and <i>Laptop</i>, replace them with your own machine names. </li>
<li>Python and Django are already pre-configured and working.</li>
</ol>
<p>Finally, there are a lot of great resources out there, but the few I relied on the most (and where some of the content here is adapted from) are:
<p />
<ol>
<li><a href="http://www.rabbitmq.com" target="_blank">http://www.rabbitmq.com</a></li>
<li><a href="http://ask.github.com/celery/index.html" target="_blank">http://ask.github.com/celery/index.html</a></li>
<li> <a href="http://groups.google.com/group/celery-users" target="_blank">http://groups.google.com/group/celery-users</a> </li>
</ol>
<p>Installation Steps
<p />I set up <i>Server</i> first .
<ul>
<li>Install Celery</li>
</ul>
<ol>
<ul>
<li>To do this, I used easy_install for Celery as follows.</li>
</ul>
</ol>
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"><p><span style="font-family:courier new, monospace;">$ sudo easy_install celery</span> </p></blockquote>
<div>
<ul>
<li>Next install RabbitMQ &#8211; Originally I simply used Synaptic Package Manager in Ubuntu, searched for Rabbit and installed it and its dependencies, but I noticed this is version 1.6.0 and the <a href="http://www.rabbitmq.com/server.html" target="_blank">latest here is 1.7.2</a> at time of writing, so I downloaded and installed the latest package instead.  I doubt it will make much difference which you use.<br />  </li>
<li>Test the installation &#8211; using the following steps below which I largely copied from <a href="http://ask.github.com/celery/getting-started/broker-installation.html" target="_blank">here </a></li>
</ul>
<p>Configure the security / vhost etc.  Note this is an important step you&#039;ll almost certainly come back to later if you stuff up <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />
<p />
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"> $ sudo rabbitmqctl add_user myuser mypassword<br /> $ sudo rabbitmqctl add_vhost myvhost<br />$ sudo rabbitmqctl set_permissions -p myvhost myuser &quot;&quot; &quot;.*&quot; &quot;.*&quot; </p></blockquote>
<p></div>
<p>At this point the server should actually be running (this confused me first time, I think the Rabbit Controller starts the server to add the hosts etc.).  You can check if there is a server running as follows:
<p />
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">$ sudo rabbitmqctl status</span></p></blockquote>
<div>To start and stop the service, use the following
<p />
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">$ sudo rabbitmq-server</span> </p></blockquote>
<p>or to run it in the background (recommended)
<p />
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;font-family:courier new, monospace;"> $ sudo rabbitmq-server -detached</p></blockquote>
<p>Finally, to stop the server
<p />
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">$ sudo rabbitmqctl stop </span> </p></blockquote></div>
<p>Leave the server stopped for now and the basic <i>Server</i> install is complete.
<p />I repeated these steps on <i>Laptop</i> without any problems at all.  NB you could skip configuring security etc. on Laptop if you like as it will be reset in the next step, I think it&#039;s worth it anyway just as practice.
<p /> NB &#8211; If you see a dump like the following when starting the server then in my case this was because the server was already started &#8211; use the <span style="font-family:courier new, monospace;">rabbitmqctl status</span> to check.:
<p />
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">{error_logger,{{2010,2,23},{11,26,37}},&quot;Protocol: ~p: register error: ~p~n&quot;,[&quot;inet_tcp&quot;,{{badmatch,{error,duplicate_name}},[{inet_tcp_dist,listen,1},{net_kernel,start_protos,4},{net_kernel,start_protos,3},{net_kernel,init_node,2},{net_kernel,init,1},{gen_server,init_it,6},{proc_lib,init_p_do_apply,3}]}]}etc. etc. etc. </span> </p></blockquote>
<p><b>Elapsed Time:</b> Configure two PCs with Celery and Rabbit and document it? ~1 hr. For you? 30 minutes.
<p /> Cluster the RabbitMQ Servers
<p />Knowing I want this to work across at least two servers from day one, I decided to next cluster the RabbitMQ servers.
<p />This is remarkably straight forward if you follow the step listed in the <a href="http://www.rabbitmq.com/clustering.html" target="_blank">RabbitMQ Clustering guide</a> and DON&#039;T miss the step I did on configuring the Erlang cookie.  I had a couple of minor issues that I needed to read up on, my Linux knowledge being very sketchy which slowed me up.  Here are the steps I ended up following (you could use the guide linked, but I&#039;ve just added in a couple of things relevant to what we are doing here).
<p />
<ul>
<li>Firstly, configure the erlang cookie so that the erlang installs on the two machines can share processes.  This required changing permissions first.  Make sure the RabbitMQ server is STOPPED.  When you are editing the cookie, simply replace whatever is in there with your own string.  It needs to be the same string on both <i>Server</i> and <i>Laptop</i>.  Length doesn&#039;t matter.  The default security on the cookie file is very tight, I needed to change permissions to be able to edit it, then I changed them back.  Replace gedit with your editor of choice (e.g. vi). </li>
</ul>
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"><p><span style="font-family:courier new, monospace;">$ sudo chmod 777 /var/lib/rabbitmq/.erlang.</span><span style="font-family:courier new, monospace;">cookie</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">$ sudo gedit /var/lib/rabbitmq/.erlang.</span><span style="font-family:courier new, monospace;">cookie</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">$ sudo chmod 400 /var/lib/rabbitmq/.erlang.</span><span style="font-family:courier new, monospace;">cookie</span> </p></blockquote>
<ul>
<li>Make sure that laptop and server are in each others local hosts file (by this I mean that your machine name for your <i>Server</i> equivalent needs to be in the local host for <i>Laptop</i> and vice versa).  If you can ping <i>Server</i> from <i>Laptop</i> and vice versa, you should be fine. </li>
<li>On <i>Server</i>, start RabbitMQ in detached mode, check status and make sure it&#039;s running.</li>
<li>On <i>Laptop</i>, make sure the RabbitMQ Server is stopped, then make sure you&#039;ve set the cookie as above. Start the RabbitMQ server in detached mode and now complete the following steps.</li>
</ul>
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"><p><span style="font-family:courier new, monospace;">$ sudo rabbitmqctl stop_app</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">$ sudo rabbitmqctl reset</span></p></blockquote>
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"><div><span style="font-family:courier new, monospace;">$ sudo rabbitmqctl cluster rabbit@server</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">$ sudo rabbitmqctl start_app</span> </div>
</blockquote>
<div> Note &#8211; if you have trouble with the reset (because like me you tried to actually cluster the machine BEFORE the cookie was set in Erlang) you can try <span style="font-family:courier new, monospace;">sudo rabbitmqctl force_reset</span> which should sort it out.
<p />
<ul>
<li>You now have a clustered RabbitMQ setup.  You can verify this by typing <span style="font-family:courier new, monospace;">sudo rabbitmqctl status</span> on either server and you&#039;ll see it list something like the following:</li>
</ul>
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"><p><span style="font-family:courier new, monospace;">[{running_applications,[{rabbit,&quot;RabbitMQ&quot;,&quot;1.7.2&quot;},</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">                        {mnesia,&quot;MNESIA  CXC 138 12&quot;,&quot;4.4.10&quot;},</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">                        {os_mon,&quot;CPO  CXC 138 46&quot;,&quot;2.2.2&quot;},</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">                        {sasl,&quot;SASL  CXC 138 11&quot;,&quot;2.1.6&quot;},</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">                        {stdlib,&quot;ERTS  CXC 138 10&quot;,&quot;1.16.2&quot;},</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">                        {kernel,&quot;ERTS  CXC 138 10&quot;,&quot;2.13.2&quot;}]},</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;"> {nodes,[&#039;rabbit@laptop&#039;,&#039;rabbit@server&#039;]},</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;"> {running_nodes,[&#039;rabbit@laptop&#039;,&#039;rabbit@server&#039;]}]</span></p></blockquote>
<p>This is neat, but if I read the documentation correctly what we have here is a RAM / RAM cluster.  If one of the servers goes down, we will be fine because the message state is replicated across clusters, but if the whole lot went out (because the data centre lost power, or more likely in my situation that I just turned both PCs off over night) we might really want a persistent DISK node.
<p /> To convert <i>Server</i> to being a disk node, simply execute the following on <i>Server</i> (while the cluster is running).
<p />
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"> <span style="font-family:courier new, monospace;">$ sudo rabbitmqctl stop_app</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">$ sudo rabbitmqctl cluster rabbit@laptop rabbit@server</span><br /> <span style="font-family:courier new, monospace;"></span></p></blockquote>
<p> <span style="font-family:courier new, monospace;"><br /><span style="font-family:times new roman, serif;">Or (FYI) to turn it back into a RAM node.  Note it doesn&#039;t matter for our purposes in this doco how they are configured, read up and decide what you need.</span>
<p /> </span><br />
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">$ sudo rabbitmqctl cluster rabbit@laptop</span> </p></blockquote>
<p><span style="font-family:courier new, monospace;"><br /><span style="font-family:times new roman, serif;">Finally, start it up again</span>
<p /></span><br />
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"> <span style="font-family:courier new, monospace;">$ sudo rabbitmqctl start_app</span></p></blockquote>
<p>To remove a server from a cluster at any time, simply do this (this is not a required step, it&#039;s just &quot;FYI&quot;).
<p />
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"> <span style="font-family:courier new, monospace;">$ sudo rabbitmqctl stop_app</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">$ sudo rabbitmqctl reset</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">$ sudo rabbitmqctl start_app</span></p></blockquote>
<p>NB &#8211; You&#039;ll need to use force_reset for the LAST node in the cluster to be removed (if you&#039;re separating them all out again).  REMEMBER if you do a full cluster reset (like I did in testing this), you&#039;ll need to redo the security section from the first section again <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />   This is because you&#039;ve reset both nodes so they no longer hold the security information.  If you ever get into problems with the configuration, I&#039;ve resolved them consistently by de-clustering, forcing a node reset on both nodes and then clustering and redoing the security.
<p /> <b>Elapsed Time:</b> Research, reading, implementing and trouble shooting cluster ~2 hr.  Following the above steps, prob. 30 minutes.
<p />Configure Celery with Django
<p />OK, now we are ready to get <a href="http://ask.github.com/celery/getting-started/first-steps-with-django.html" target="_blank">Celery setup with Django</a>. Create a new django project.  Mines called &quot;clifton&quot; (we use local train stations as our milestone names) and then within that I have an app called fetcher.  All of these steps need to be done on both <i>Server</i> and <i>Laptop</i> (I&#039;m using SVN so I simply make the change on <i>Server</i>, commit and then update from <i>Laptop</i>).
<p />
<ul>
<li>In the INSTALLED_APPS section of settings.py add celery as follows.</li>
</ul>
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"><p> <span style="font-family:courier new, monospace;">INSTALLED_APPS = (</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">    &#039;django.contrib.sessions&#039;,</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">    &#039;django.contrib.sites&#039;,</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">    &#039;celery&#039;,</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">    &#039;clifton.fetcher&#039;,</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">)</span><br style="font-family:courier new, monospace;" /> </p></blockquote>
<p>
<ul>
<li>Sync the DB as celery adds a couple of task tracking tables.</li>
</ul>
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"><p><span style="font-family:courier new, monospace;">$ python manage.py syncdb</span> </p></blockquote>
<ul>
<li>Add the following settings into settings.py as well.  These are just to get us going, we&#039;ll come back to this and improve them soon once we&#039;ve got this section working.</li>
</ul>
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"><p> <span style="font-family:courier new, monospace;">BROKER_HOST</span><span style="font-family:courier new, monospace;"> </span><span style="font-family:courier new, monospace;">=</span><span style="font-family:courier new, monospace;"> </span><span style="font-family:courier new, monospace;">&quot;localhost&quot;</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">BROKER_PORT</span><span style="font-family:courier new, monospace;"> </span><span style="font-family:courier new, monospace;">=</span><span style="font-family:courier new, monospace;"> </span><span style="font-family:courier new, monospace;">5672</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">BROKER_USER</span><span style="font-family:courier new, monospace;"> </span><span style="font-family:courier new, monospace;">=</span><span style="font-family:courier new, monospace;"> </span><span style="font-family:courier new, monospace;">&quot;myuser&quot;</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">BROKER_PASSWORD</span><span style="font-family:courier new, monospace;"> </span><span style="font-family:courier new, monospace;">=</span><span style="font-family:courier new, monospace;"> </span><span style="font-family:courier new, monospace;">&quot;mypassword&quot;</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">BROKER_VHOST</span><span style="font-family:courier new, monospace;"> </span><span style="font-family:courier new, monospace;">=</span><span style="font-family:courier new, monospace;"> </span><span style="font-family:courier new, monospace;">&quot;myvhost&quot;</span> </p></blockquote>
<ul>
<li>Finally, in your application directory (e.g. fetcher in my example) create a tasks.py module with the following code.  This is literally just for a test to show it&#039;s working.  Note it MUST be called tasks.py.  It uses the decorator to wrap a class definition around a simple add function that adds two numbers and returns a result.  Note that the decorator causes the class to have the name of the function (e.g. we effectively end up with a class called add in the tasks.py). </li>
</ul>
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"><p><span style="font-family:courier new, monospace;">from celery.decorators import task</span>
<p /> <span style="font-family:courier new, monospace;">@task()</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">def add(x, y):</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">    return x + y</span><br style="font-family:courier new, monospace;" /></p></blockquote>
<p>Having completed these steps on <i>Server</i> and <i>Laptop</i>, we now need to actually test and run our AMQP workers.  Just on <i>Server</i> for now, do the following steps.
<p />
<ul>
<li>Start a terminal window from the clifton directory (or wherever you put your Django app) start a celery daemon using Django.  Again, there are more options here, but this is just a starting point to show it all working. </li>
</ul>
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"><p><span style="font-family:courier new, monospace;">$ python manage.py celeryd</span> </p></blockquote>
<p>
<ul>
<li>Now open a second terminal window and execute the following from the clifton directory.   We use the Django interpretive shell to be sure all the settings are loaded nicely for us. </li>
</ul>
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"><p><span style="font-family:courier new, monospace;">$ python manage.py shell</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">&gt;&gt; from clifton.fetcher import tasks</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">&gt;&gt; result = tasks.add.delay(4,4)</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">&gt;&gt; result.ready()</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">True</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">&gt;&gt; result.result</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">8</span></p></blockquote>
<p>So what just happened? Well, Celery wrapped up the add function (which simply adds x and y and returns them) in a class through the decorator.  We then, through the Django interpreter shell, imported the class, called the delay method, passing the parameters 4 and 4.  Celery then handled all the work of pushing that out onto the RabbitMQ server, then the celeryd worker actually got the values, executed them and provided the results.  The <span style="font-family:courier new, monospace;">result.ready()</span> command told us that our results for this particular instance had been processed and then <span style="font-family:courier new, monospace;">result.result</span> returned what that result actually was.  If result.ready() is False, you may find that the AMQP server is running but you don&#039;t have the worker process running.
<p /> We&#039;ve just passed our first message. There is a LOT going on under the hood here, but this example is fairly uninspiring.  Shut down the celeryd terminal (send it a TERM signal, from command line do <span style="font-family:courier new, monospace;">ps aux | grep celeryd</span> then find the first process ID and do <span style="font-family:courier new, monospace;">kill &lt;pid&gt;</span> (e.g. <span style="font-family:courier new, monospace;">kill 1234</span>)) and see what happens if you execute the code again (perhaps with different parameters e.g. <span style="font-family:courier new, monospace;">tasks.add.delay(2, 3)</span> to see the difference).  Yup, result.ready() returns False.  It will keep returning False until you start the worker process up again.  The message has been queued and at least for this configuration, will persist until it gets processed by a worker.  Start the celeryd process again (<span style="font-family:courier new, monospace;">$ python manage.py celeryd</span>) and now try <span style="font-family:courier new, monospace;">result.ready()</span> in the shell &#8211; the message was processed and a result is able to be returned.
<p /> If you&#039;re still unsure about how cool this is, then you probably don&#039;t need queuing <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  But for the final trick in this section until we get this thing rocking and rolling properly, lets demonstrate it really is a cluster being used to process this.
<p /> Of course by default, the cluster is up and running already (presuming you haven&#039;t stopped the <i>Laptop</i>), but to be sure, just check with <span style="font-family:courier new, monospace;">rabbitmqctl status</span>.  Make sure you&#039;ve also run through these settings in this current section (Configure Celery with Django) on both <i>Laptop</i> and <i>Server</i>.  Now &#8211; on <i>Server</i>, make sure that any running <span style="font-family:courier new, monospace;">celeryd</span> terminal is shutdown.  Start the <span style="font-family:courier new, monospace;">celeryd</span> terminal on <i>Laptop</i>.  From the Python / Django interpretive shell on <i>Server</i>, repeat the same steps.
<p /> Assuming you got a result successfully returned, you&#039;ve just used the message queue cluster to have a worker on <i style="font-family:times new roman, serif;">Laptop</i><span style="font-family:times new roman, serif;"> </span>pick up and process the message from <i>Server</i>.  The subtlety here is that the underlying RabbitMQ cluster handled passing the message between the two machines &#8211; both <i>Server</i> and <i>Laptop</i> are talking to RabbitMQ locally, it was the cluster that handled making the message available in both places.
<p /> <b>Elapsed Time:</b> Research, reading, implementing, testing, trouble shooting and documenting ~4 hr.  Following the above steps, prob. 1hr minutes.
<p /> Re-Factor the Task and Demo to make life easier!
<p />Note that this previous example is of course very basic and can be significantly optimised &#8211; in particular there is no connection pooling, so every new call to add.delay() is opening its own connection to the broker, hence the delay in posting the message.  You can speed it up by passing a previously opened connection &#8211; <a href="http://ask.github.com/celery/userguide/executing.html" target="_blank">see here for more information</a>.  We&#039;ll implement this below.
<p /> Additionally, the Task Decorator, while useful for basic tasks, masks a lot of what&#039;s really happening &#8211; I prefer a more verbose approach when I&#039;m learning.  So the new fetcher/tasks.py &#8211; note we&#039;ve changed the name of the class to MyTest. Note that it MUST have a &quot;run&quot; method which is what will be called by apply_sync.
<p />
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">from celery.task import Task</span>
<p /> <span style="font-family:courier new, monospace;">class MyTest(Task):</span>
<p /><span style="font-family:courier new, monospace;">    def run(self, x, y):</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">    <a href="http://logger.info">logger.info</a>(&#039;Received %s and %s&#039; % (x, y))</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">        return int(x) + int(y)</span> </p></blockquote>
<p> We&#039;ll also make our life a bit easier by adding a bin directory into the clifton project (this is where I like to store all my cron jobs etc.) and create a simple python file to call our task.  This will let us just execute the file to generate the tasks &#8211; the sort of thing you&#039;ll need to do anyway if you want to call the tasks from a view etc.  I&#039;ll call it <span style="font-family:courier new, monospace;">demo.py</span> (for clarity, in my case this is in ~/clifton/bin/demo.py).  Copy the following code in (make any local adjustments needed).  Note the change to use <span style="font-family:courier new, monospace;">apply_async</span> instead of <span style="font-family:courier new, monospace;">delay()</span>.  It is a lower level call to do the same thing as <span style="font-family:courier new, monospace;">delay()</span>, but allows us to add arguments etc.  I&#039;ve also implemented the connection pooling.
<p />
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">#! /usr/bin/env python</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">from __future__ import with_statement</span>
<p /><span style="font-family:courier new, monospace;">import os, sys</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">#os.environ[&#039;PYTHONPATH&#039;] = &quot;/home/tim/projects&quot; # Uncomment and point this to your root django project </span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">                                                 # directory if PYTHONPATH not set properly already.</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">os.environ[&#039;DJANGO_SETTINGS_MODULE&#039;]=&#039;clifton.settings&#039;</span>
<p /> <span style="font-family:courier new, monospace;">from clifton.fetcher.tasks import MyTest</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">from celery.messaging import establish_connection</span>
<p /> <span style="font-family:courier new, monospace;">if __name__ == &quot;__main__&quot;:</span>
<p /> <span style="font-family:courier new, monospace;">    numbers = [(2, 2), (4, 4), (8, 8), (16, 16)]</span>
<p /><span style="font-family:courier new, monospace;">    results = []</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">    with establish_connection() as connection:</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">        for args in numbers:</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">            res = MyTest.apply_async(args=args, connection=connection)</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">            results.append(res)</span>
<p /> <span style="font-family:courier new, monospace;">    print([res.get() for res in results])</span></p></blockquote>
<p>If you run this from the bin directory, you should see the following result:
<p />
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">$ python demo.py</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">[4, 8, 16, 32]</span></p></blockquote>
<p>We now have a better structure to take our experiments forwards and further optimise.
<p /> <b>Elapsed Time:</b> Research, reading, implementing, trouble shooting and documenting ~2 hr.  Following the above steps, prob. 30 minutes.
<p />Directing tasks to different servers
<p /> From here on in, this will become a little less generic and begin to deal with the problems that we are having and trying to resolve.  I&#039;ll explain why we&#039;ve made our decisions and you can make your own choices.  While you can make each of these settings in settings.py on the two different machines, I suggest you create a local_settings.py and add them in there (especially if you&#039;re using SVN to sync changes &#8211; ie. running the same Django project on both servers).  Where I say settings.py below I&#039;ve actually created these changes in a local_settings.py.  Note that most of these are default settings and can generally be over-ridden at different levels.<br /> 
<div>
<ul>
<li>Run tasks on different machines.  This actual design of these queues, exchanges and bindings is a reasonably complicated topic that gets very design specific and frankly confusing!  It&#039;s also (somewhat) abstracted by Celery which implements only Direct and Topic exchanges for example.  It&#039;s worth reading this post on <a href="http://blogs.digitar.com/jjww/2009/01/rabbits-and-warrens/" target="_blank">Rabbits and Warrens</a> which gives some good background on the various options provided by AMQP.  If all you want to do is post a message and pick it up with a worker, then celery handles that for you, but in our case, we&#039;d like to make some processing choices up front and be able to use different machines to process different messages.   There is a good guide to this in the <a href="http://ask.github.com/celery/faq.html" target="_blank">FAQ for Celery</a>.  Add the following to Server in settings.py</li>
</ul>
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"><p><span style="font-family:courier new, monospace;">CELERY_DEFAULT_QUEUE = &quot;server&quot;<br /> CELERY_QUEUES = {<br />&quot;server&quot;: {<br />&quot;binding_key&quot;: &quot;server_task&quot;,<br />},<br />}<br />CELERY_DEFAULT_EXCHANGE = &quot;tasks&quot;<br />CELERY_DEFAULT_EXCHANGE_TYPE = &quot;direct&quot;<br />CELERY_DEFAULT_ROUTING_KEY = &quot;server_task&quot;</span> </p></blockquote>
<p>We&#039;ve just told celeryd that when it starts it should bind to a queue called &quot;<span style="font-family:courier new, monospace;">server</span>&quot; and listen for messages routed to <span style="font-family:courier new, monospace;">server_task</span>.  Do the same on <i>Laptop</i>, but specify laptop instead.
<p />
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"> <span style="font-family:courier new, monospace;">CELERY_DEFAULT_QUEUE = &quot;laptop&quot;</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">CELERY_QUEUES = {</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">&quot;laptop&quot;: {</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">&quot;binding_key&quot;: &quot;laptop_task&quot;,</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">},</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">}</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">CELERY_DEFAULT_EXCHANGE = &quot;tasks&quot;</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">CELERY_DEFAULT_EXCHANGE_TYPE = &quot;direct&quot;</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">CELERY_DEFAULT_ROUTING_KEY = &quot;laptop_task&quot;</span></p></blockquote>
<p />There are several different ways of implementing this (telling tasks where to execute).  For example, in this configuration, if we do nothing and simply run the demo.py we created earlier, tasks started on Server will execute on Server and vice versa, because the defaults will be applied (which are different on each machine).  Most likely we want to make some decision at execution time about where we want to route this task.  In this case, simplify modify line 19 of the demo.py to add the routing_key parameter as follows:
<p />
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">res = MyTest.apply_async(args=args, connection=connection, routing_key=&quot;laptop_task&quot;)</span> </p></blockquote>
<p>Changing it to <span style="font-family:courier new, monospace;">routing_key = &#039;server_task&#039;</span> will force the task to execute on <i>Server</i> (no matter where it originated, although strictly speaking it forces it to execute on the worker bound to the server queue, listening for server_task &#8211; we just happen to configure this queue on <i>Server</i>) and vice versa.  Test this out and make sure its all working, you might like to add some logging into the tasks.py so you can check and see what is executing where (next section shows how if you need help with this).  NB &#8211; It looks like the task files are cached in the celeryd worker, so if you modify the task, it looks like a good idea to restart the worker, at least for the moment when testing. 
<p /> As a final note, lets say you want the <i>Server</i> to ALSO be able to pick up <i>Laptop</i> tasks (to use its spare capacity &#8211; or just as a more &quot;realistic&quot; example).  Simply make the following change to <i>Server</i> settings.py CELERY_QUEUES and now <i>Server</i> is listening for <i>Laptop</i> tasks too.  Because both <i>Laptop</i> and <i>Server</i> are bound to the same queue (which happens to be called <span style="font-family:courier new, monospace;">laptop</span> in our example), Rabbit automatically round robins messages between the two.  Change the CELERY_QUEUES setting as follows to take advantage of this on <i>Server</i> only.
<p />
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">CELERY_QUEUES = {</span><br /> <span style="font-family:courier new, monospace;"> &quot;server&quot;: {</span><br /><span style="font-family:courier new, monospace;"> &quot;binding_key&quot;: &quot;server_task&quot;,</span><br /><span style="font-family:courier new, monospace;"></span><span style="font-family:courier new, monospace;">&quot;laptop&quot;: {</span><br /> <span style="font-family:courier new, monospace;"> &quot;binding_key&quot;: &quot;laptop_task&quot;,</span><br /><span style="font-family:courier new, monospace;"> },</span></p></blockquote>
<p>If you extend the demo.py so it passes a lot more numbers, you should be able to use demo.py to send messages to server_task and laptop_task and see the difference by checking the logs.  Both <i>Server</i> and <i>Laptop</i> process tasks for <span style="font-family:courier new, monospace;">laptop_task</span>, while only <i>Server</i> processes tasks for <span style="font-family:courier new, monospace;">server_task</span>.
<p />
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">numbers = [(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7), (8, 8) </span><br /> <span style="font-family:courier new, monospace;">           (9, 9), (10, 10), (11, 11), (12, 12), (13, 13), (14, 14), </span><br /><span style="font-family:courier new, monospace;">           (15, 15), (16, 16), (17, 17), (18, 18)]</span><br style="font-family:courier new, monospace;" /> </p></blockquote>
<p><b>Elapsed Time:</b> Research, reading, implementing, trouble shooting and documenting ~4.5 hr.  Following the above steps, prob. 2 hrs max.
<p />Get a (heart) beat
<p /> We now have a (basic) distributed architecture with the ability to route jobs to a specific server and we also know how to distribute across two servers (or more).  The next trick is to create an automated task that runs on a regular basis.  Now we could use a CRON job or something similar, but it would be nice if there was a way of building this into Django and Celery so we can route the messages straight on to a queue. It turns out that there is using a periodic_task which does more or less what it says <a href="http://ask.github.com/celery/getting-started/periodic-tasks.html" target="_blank">on the box</a>.  It runs periodically.  I modified our earlier <span style="font-family:courier new, monospace;">tasks.py</span> as follows below.  Now as well as the basic <span style="font-family:courier new, monospace;">add</span> task already defined, we now have a PeriodicTask which executes every 30 seconds.  It doesn&#039;t do much &#8211; I just simply had it call add again in this example.
<p />
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">from celery.task import PeriodicTask, Task<br /> from celery.messaging import establish_connection
<p />import logging<br />logger = logging.getLogger(&#039;fetcher.tasks&#039;)
<p />class MyTest(Task):<br /></span></p></blockquote>
<div> </div>
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"><p> <span style="font-family:courier new, monospace;">    def run(self, x, y):<br />    <a href="http://logger.info">logger.info</a>(&#039;Received %s and %s&#039; % (x, y))<br />        return int(x) + int(y)
<p />class MyPeriodicTask(PeriodicTask):<br />     run_every = timedelta(seconds=30)
<p />    def run(self, **kwargs):<br />        numbers = [(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7), (9, 9), (10, 10), (11, 11), (12, 12), (13, 13), (14, 14), (15, 15), (16, 16), (17, 17), (18, 18)]<br />         with establish_connection() as connection:<br />            for n in numbers:<br />                MyTest.apply_async(args=[n[0], n[1]], connection=connection, routing_key=&quot;laptop_task&quot;)<br />        <br />        <a href="http://logger.info">logger.info</a>(&#039;Ran periodic task&#039;)<br /> </span> </p></blockquote>
<p>You might need to remove the logging options unless you have a proper logger setup in your settings.py (FYI, you can add the following into settings.py and this should all work nicely for you &#8211; just change paths as appropriate and also make sure you have a log directory).
<p />
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">LOG_FILENAME= &#039;/home/tim/clifton/log/debug.log&#039;</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">import logging</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">logging.basicConfig(</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">    filename=LOG_FILENAME,</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">    format=&#039;[%(levelname)-5s] %(asctime)-8s %(name)-10s %(message)s&#039;,</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">    #datefmt=&#039;%a, %d %b %Y %H:%M:%S&#039;,</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">    datefmt=&#039;%H:%M:%S&#039;,</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">    level=logging.DEBUG)</span></p></blockquote>
<p>But how does it execute?  Well in this instance we don&#039;t actually need a demo.py, instead we can use the beat feature of the celeryd to execute the periodic task for us.  To do this, simply start your celeryd with the -B option.  e.g.
<p />
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">$ python manage.py celeryd -B</span> </p></blockquote>
<p>Alternatively you can run a dedicated celerybeat server.
<p />
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204,204,204);margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">$ python manage.py celerybeat</span> </p></blockquote>
<p>Just be aware that if you run a dedicated celerybeat server, you&#039;ll also need to start a worker (celeryd) yourself, otherwise you&#039;ll have your tasks sent to the queue, but not processed.  I have a personal preference for running the dedicated celerybeat server as it makes it easier to isolate and shutdown just the beat server process from the workers. (<span style="font-family:courier new, monospace;">ps aux | grep celerybeat</span>)
<p /> <b>Elapsed Time:</b> Implementing, limited trouble shooting and documenting ~1 hr.  Following the above steps, prob. 30 minutes.
<p />Set Static Execution Limits
<p /> I suspect for many people this is now approaching the point where you have enough of a grip to be able to use this in anger in a lot of different situations.  We have one additional problem however that we need to resolve.  In some instances we need to limit the amount of messages that a queue processes.  This is because while we might have the physical capacity to process 500,000 calls an hour, we could very well have other service API limits imposed on us by the services we are calling. 
<p /> In fact this is exactly the situation when we are talking to the Twitter API.  Twitter allocates us a limit and we can&#039;t exceed this.  To make it more complicated, this limit can actually differ by server.  We need to be able to limit execution, by server, to not exceed our API limit.
<p /> Actually there is a simple enough <i>partial</i> solution for this.  Celery defines a <span style="font-family:courier new, monospace;">rate_limit</span> which controls the amount a task can execute in a given time period.  To use it, pass it in to the @task() decorator, or specify it at the top of the class definition. e.g.
<p /> <span style="font-family:courier new, monospace;">@task(rate_limit=&quot;100/m&quot;)</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">def add(x, y):</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">    <a href="http://logger.info" target="_blank">logger.info</a>(&#039;Adding %s and %s together&#039; % (x, y))</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">    return x + y
<p />class MyTest(Task):<br />    rate_limit=&quot;100/m&quot;
<p />    def run(self, x, y):<br />    <a href="http://logger.info">logger.info</a>(&#039;Received %s and %s&#039; % (x, y))<br />         return int(x) + int(y)
<p /></span><span style="font-family:courier new, monospace;">class MyPeriodicTask(PeriodicTask):</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">    run_every = timedelta(seconds=30)<br />    rate_limit=&quot;1000/h&quot;<br style="font-family:courier new, monospace;" /></span> <br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">    def run(self, **kwargs):</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">        r = add(1, 2)</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">        <a href="http://logger.info" target="_blank">logger.info</a>(&quot;Running periodic task! Result of add was %s&quot; % r)</span><br /><span style="font-family:courier new, monospace;"><br /> </span><br />The rate_limit is a combination of &quot;how many times&quot; and a per hour (h), second(s) or minutes(m) value. e.g. &quot;1000/h&quot; is executed no more than 1000 times in an hour.
<p />Because each worker executes on a different server, and therefore gets different settings from the settings.py (or the local_settings.py) file, we can use this to change execution by server if we use a full class definition (and not the decorator) by adding ou</div>
</div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/timbull001.wordpress.com/13015033/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/timbull001.wordpress.com/13015033/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=timbull.com&#038;blog=49080029&#038;post=13015033&#038;subd=timbull001&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://timbull.com/2010/03/09/build-a-processing-queue-with-multi-threading-30252/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/45bce8c85db792fa9373bee604141b29?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">tbull001</media:title>
		</media:content>
	</item>
		<item>
		<title>Part 1: Build a processing queue with multi-threading and spread over multiple servers in less than a day using RabbitMQ and Celery.</title>
		<link>http://timbull.com/2010/03/09/part-1-build-a-processing-queue-with-multi-th/</link>
		<comments>http://timbull.com/2010/03/09/part-1-build-a-processing-queue-with-multi-th/#comments</comments>
		<pubDate>Tue, 09 Mar 2010 02:27:00 +0000</pubDate>
		<dc:creator>Tim Bull</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[ampq]]></category>
		<category><![CDATA[celery]]></category>
		<category><![CDATA[rabbitmq]]></category>
		<category><![CDATA[technical]]></category>

		<guid isPermaLink="false">http://timbull.com/build-a-processing-queue-with-multi-threading</guid>
		<description><![CDATA[This is part 1 - part 2 is over here. As we move through the development cycle we now have many of the essential processing modules we need for Tribalytic, but we also have a few challenges we need to deal with as well: We need to collect data fas...<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=timbull.com&#038;blog=49080029&#038;post=13015027&#038;subd=timbull001&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>&nbsp;</p>
<p>This is part 1 &#8211; <a href="http://timbull.com/part-2-build-a-processing-queue-with-multi-th" target="_blank">part 2 is over here</a>.</p>
<p>As we move through the development cycle we now have many of the essential processing modules we need for Tribalytic, but we also have a few challenges we need to deal with as well:</p>
<ol>
<li>We need to collect data faster than we can process it on a single processor.&nbsp; Requirement &#8211; we need to be able to collect the data and then spread it out for processing over multiple processors / servers.</li>
<li>Some servers have a quota on how much they can process per hour for certain things (related to API limits set by Twitter etc.) Requirement &#8211; servers need to know their available limits and not take on more than they can process.</li>
<li>We need to be able to schedule some tasks on a regular basis.</li>
</ol>
<p>There are lots of different ways that this could be done, but after some preliminary research, I&#8217;m going to set out to do it using two key pieces of technology:</p>
<ol>
<li><a href="http://www.rabbitmq.com/" target="_blank">RabbitMQ</a> &#8211; An AMQP (Advanced Message Queuing Protocol) server.&nbsp; Robust message queues help a LOT with point one in particular.</li>
<li><a href="http://ask.github.com/celery/getting-started/introduction.html" target="_blank">Celery</a> &#8211; A pythonic / Django friendly task scheduler and queuing interface to Rabbit / AMQP. Celery provides the magic for points two and three.</li>
</ol>
<p>The goal of this document is for me to both document what I&#8217;ve learnt so I can replicate in our production environment and lay out the &#8220;easy&#8221; steps after having digested the various documentation for you to able to implement something that solves some common problems quickly.
<p /> There are all sort of reasons that I&#8217;ve preselected these technologies which aren&#8217;t the point of this post &#8211; this post is documenting the challenge of having selected this approach, how easy is it to implement?&nbsp; Firstly a few notes about my setup:</p>
<ol>
<li>I have two machines I&#8217;ll be configuring this on, both running Ubuntu 9.10 Karmic Koala release.&nbsp; I&#8217;ll call them <em>Server</em> and <em>Laptop</em>.&nbsp; For the record, <em>Server</em> is a PC with 4 cores and 6Gb of RAM.&nbsp; <em>Laptop</em> is an MSI laptop with a single core (1.3Ghz) and 2Gb RAM.&nbsp; When you see <em>Server</em> and <em>Laptop</em>, replace them with your own machine names.</li>
<li>Python and Django are already pre-configured and working.</li>
</ol>
<p>Finally, there are a lot of great resources out there, but the few I relied on the most (and where some of the content here is adapted from) are:
<p /></p>
<ol>
<li><a href="http://www.rabbitmq.com" target="_blank">http://www.rabbitmq.com</a></li>
<li><a href="http://ask.github.com/celery/index.html" target="_blank">http://ask.github.com/celery/index.html</a></li>
<li> <a href="http://groups.google.com/group/celery-users" target="_blank">http://groups.google.com/group/celery-users</a></li>
</ol>
<p><span style="font-size:medium;"><span style="text-decoration:underline;">Installation Steps</span></span>
<p />I set up <em>Server</em> first .</p>
<ul>
<li>Install Celery</li>
</ul>
<ol>
<li>&nbsp;
<ul>
<li>To do this, I used easy_install for Celery as follows.</li>
</ul>
</li>
</ol>
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><p><span style="font-family:courier new, monospace;">$ sudo easy_install celery</span></p></blockquote>
<div> 
<ul>
<li>Next install RabbitMQ &#8211; Originally I simply used Synaptic Package Manager in Ubuntu, searched for Rabbit and installed it and its dependencies, but I noticed this is version 1.6.0 and the <a href="http://www.rabbitmq.com/server.html" target="_blank">latest here is 1.7.2</a> at time of writing, so I downloaded and installed the latest package instead.&nbsp; I doubt it will make much difference which you use.</li>
<li>Test the installation &#8211; using the following steps below which I largely copied from <a href="http://ask.github.com/celery/getting-started/broker-installation.html" target="_blank">here&nbsp;</a></li>
</ul>
<p>Configure the security / vhost etc.&nbsp; Note this is an important step you&#8217;ll almost certainly come back to later if you stuff up <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />
<p />
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">$ sudo rabbitmqctl add_user myuser mypassword</span><br /> <span style="font-family:courier new, monospace;">$ sudo rabbitmqctl add_vhost myvhost</span><br /><span style="font-family:courier new, monospace;">$ sudo rabbitmqctl set_permissions -p myvhost myuser &#8220;&#8221; &#8220;.*&#8221; &#8220;.*&#8221;</span></p></blockquote>
</div>
<p>At this point the server should actually be running (this confused me first time, I think the Rabbit Controller starts the server to add the hosts etc.).&nbsp; You can check if there is a server running as follows:
<p /></p>
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><p><span style="font-family:courier new, monospace;">$ sudo rabbitmqctl status</span></p></blockquote>
<div>To start and stop the service, use the following
<p />
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">$ sudo rabbitmq-server</span></p></blockquote>
<p>or to run it in the background (recommended)
<p />
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;font-family:courier new, monospace;">$ sudo rabbitmq-server -detached</p></blockquote>
<p>Finally, to stop the server
<p />
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">$ sudo rabbitmqctl stop </span></p></blockquote>
</div>
<p>Leave the server stopped for now and the basic <em>Server</em> install is complete.
<p />I repeated these steps on <em>Laptop</em> without any problems at all.&nbsp; NB you could skip configuring security etc. on Laptop if you like as it will be reset in the next step, I think it&#8217;s worth it anyway just as practice.
<p /> NB &#8211; If you see a dump like the following when starting the server then in my case this was because the server was already started &#8211; use the <span style="font-family:courier new, monospace;">rabbitmqctl status</span> to check.:
<p /></p>
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><p><span style="font-family:courier new, monospace;">{error_logger,{{2010,2,23},{11,26,37}},&#8221;Protocol: ~p: register error: ~p~n&#8221;,["inet_tcp",{{badmatch,{error,duplicate_name}},[{inet_tcp_dist,listen,1},{net_kernel,start_protos,4},{net_kernel,start_protos,3},{net_kernel,init_node,2},{net_kernel,init,1},{gen_server,init_it,6},{proc_lib,init_p_do_apply,3}]}]}etc. etc. etc. </span></p></blockquote>
<p><strong>Elapsed Time:</strong> Configure two PCs with Celery and Rabbit and document it? ~1 hr. For you? 30 minutes.
<p /> <span style="text-decoration:underline;"><span style="font-size:medium;">Cluster the RabbitMQ Server</span>s</span>
<p />Knowing I want this to work across at least two servers from day one, I decided to next cluster the RabbitMQ servers.
<p />This is remarkably straight forward if you follow the step listed in the <a href="http://www.rabbitmq.com/clustering.html" target="_blank">RabbitMQ Clustering guide</a> and DON&#8217;T miss the step I did on configuring the Erlang cookie.&nbsp; I had a couple of minor issues that I needed to read up on, my Linux knowledge being very sketchy which slowed me up.&nbsp; Here are the steps I ended up following (you could use the guide linked, but I&#8217;ve just added in a couple of things relevant to what we are doing here).
<p /></p>
<ul>
<li>Firstly, configure the erlang cookie so that the erlang installs on the two machines can share processes.&nbsp; This required changing permissions first.&nbsp; Make sure the RabbitMQ server is STOPPED.&nbsp; When you are editing the cookie, simply replace whatever is in there with your own string.&nbsp; It needs to be the same string on both <em>Server</em> and <em>Laptop</em>.&nbsp; Length doesn&#8217;t matter.&nbsp; The default security on the cookie file is very tight, I needed to change permissions to be able to edit it, then I changed them back.&nbsp; Replace gedit with your editor of choice (e.g. vi).</li>
</ul>
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><p><span style="font-family:courier new, monospace;">$ sudo chmod 777 /var/lib/rabbitmq/.erlang.</span><span style="font-family:courier new, monospace;">cookie</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">$ sudo gedit /var/lib/rabbitmq/.erlang.</span><span style="font-family:courier new, monospace;">cookie</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">$ sudo chmod 400 /var/lib/rabbitmq/.erlang.</span><span style="font-family:courier new, monospace;">cookie</span></p></blockquote>
<ul>
<li>Make sure that laptop and server are in each others local hosts file (by this I mean that your machine name for your <em>Server</em> equivalent needs to be in the local host for <em>Laptop</em> and vice versa).&nbsp; If you can ping <em>Server</em> from <em>Laptop</em> and vice versa, you should be fine.</li>
<li>On <em>Server</em>, start RabbitMQ in detached mode, check status and make sure it&#8217;s running.</li>
<li>On <em>Laptop</em>, make sure the RabbitMQ Server is stopped, then make sure you&#8217;ve set the cookie as above. Start the RabbitMQ server in detached mode and now complete the following steps.</li>
</ul>
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><p><span style="font-family:courier new, monospace;">$ sudo rabbitmqctl stop_app</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">$ sudo rabbitmqctl reset</span></p></blockquote>
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><div><span style="font-family:courier new, monospace;">$ sudo rabbitmqctl cluster rabbit@server</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">$ sudo rabbitmqctl start_app</span></div>
</blockquote>
<div>&nbsp;Note &#8211; if you have trouble with the reset (because like me you tried to actually cluster the machine BEFORE the cookie was set in Erlang) you can try <span style="font-family:courier new, monospace;">sudo rabbitmqctl force_reset</span> which should sort it out.
<p />
<ul>
<li>You now have a clustered RabbitMQ setup.&nbsp; You can verify this by typing <span style="font-family:courier new, monospace;">sudo rabbitmqctl status</span> on either server and you&#8217;ll see it list something like the following:</li>
</ul>
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><p><span style="font-family:courier new, monospace;">[{running_applications,[{rabbit,"RabbitMQ","1.7.2"},</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {mnesia,"MNESIA&nbsp; CXC 138 12","4.4.10"},</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {os_mon,"CPO&nbsp; CXC 138 46","2.2.2"},</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {sasl,"SASL&nbsp; CXC 138 11","2.1.6"},</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {stdlib,"ERTS&nbsp; CXC 138 10","1.16.2"},</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {kernel,"ERTS&nbsp; CXC 138 10","2.13.2"}]},</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">&nbsp;{nodes,['rabbit@laptop','rabbit@server']},</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">&nbsp;{running_nodes,['rabbit@laptop','rabbit@server']}]</span></p></blockquote>
<p>This is neat, but if I read the documentation correctly what we have here is a RAM / RAM cluster.&nbsp; If one of the servers goes down, we will be fine because the message state is replicated across clusters, but if the whole lot went out (because the data centre lost power, or more likely in my situation that I just turned both PCs off over night) we might really want a persistent DISK node.
<p /> To convert <em>Server</em> to being a disk node, simply execute the following on <em>Server</em> (while the cluster is running).
<p />
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">$ sudo rabbitmqctl stop_app</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">$ sudo rabbitmqctl cluster rabbit@laptop rabbit@server</span></p></blockquote>
<p><span style="font-family:courier new, monospace;"><br /><span style="font-family:times new roman, serif;">Or (FYI) to turn it back into a RAM node.&nbsp; Note it doesn&#8217;t matter for our purposes in this doco how they are configured, read up and decide what you need.</span>
<p /> </span><br />
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">$ sudo rabbitmqctl cluster rabbit@laptop</span></p></blockquote>
<p><span style="font-family:courier new, monospace;"><br /><span style="font-family:times new roman, serif;">Finally, start it up again</span>
<p /></span><br />
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">$ sudo rabbitmqctl start_app</span></p></blockquote>
<p>To remove a server from a cluster at any time, simply do this (this is not a required step, it&#8217;s just &#8220;FYI&#8221;).
<p />
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">$ sudo rabbitmqctl stop_app</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">$ sudo rabbitmqctl reset</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">$ sudo rabbitmqctl start_app</span></p></blockquote>
<p>NB &#8211; You&#8217;ll need to use force_reset for the LAST node in the cluster to be removed (if you&#8217;re separating them all out again).&nbsp; REMEMBER if you do a full cluster reset (like I did in testing this), you&#8217;ll need to redo the security section from the first section again <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> &nbsp; This is because you&#8217;ve reset both nodes so they no longer hold the security information.&nbsp; If you ever get into problems with the configuration, I&#8217;ve resolved them consistently by de-clustering, forcing a node reset on both nodes and then clustering and redoing the security.
<p /> <strong>Elapsed Time:</strong> Research, reading, implementing and trouble shooting cluster ~2 hr.&nbsp; Following the above steps, prob. 30 minutes.
<p /><span style="font-size:medium;"><span style="text-decoration:underline;">Configure Celery with Django</span></span>
<p />OK, now we are ready to get <a href="http://ask.github.com/celery/getting-started/first-steps-with-django.html" target="_blank">Celery setup with Django</a>. Create a new django project.&nbsp; Mines called &#8220;clifton&#8221; (we use local train stations as our milestone names) and then within that I have an app called fetcher.&nbsp; All of these steps need to be done on both <em>Server</em> and <em>Laptop</em> (I&#8217;m using SVN so I simply make the change on <em>Server</em>, commit and then update from <em>Laptop</em>).
<p />
<ul>
<li>In the INSTALLED_APPS section of settings.py add celery as follows.</li>
</ul>
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><p><span style="font-family:courier new, monospace;">INSTALLED_APPS = (</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">&nbsp;&nbsp;&nbsp; &#8216;django.contrib.sessions&#8217;,</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">&nbsp;&nbsp;&nbsp; &#8216;django.contrib.sites&#8217;,</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">&nbsp;&nbsp;&nbsp; &#8216;celery&#8217;,</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">&nbsp;&nbsp;&nbsp; &#8216;clifton.fetcher&#8217;,</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">)</span><br style="font-family:courier new, monospace;" /></p></blockquote>
<p> 
<ul>
<li>Sync the DB as celery adds a couple of task tracking tables.</li>
</ul>
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><p><span style="font-family:courier new, monospace;">$ python manage.py syncdb</span></p></blockquote>
<ul>
<li>Add the following settings into settings.py as well.&nbsp; These are just to get us going, we&#8217;ll come back to this and improve them soon once we&#8217;ve got this section working.</li>
</ul>
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><p><span style="font-size:x-small;"><span style="font-family:courier new, monospace;">BROKER_HOST</span><span style="font-family:courier new, monospace;"> </span><span style="font-family:courier new, monospace;">=</span><span style="font-family:courier new, monospace;"> </span><span style="font-family:courier new, monospace;">&#8220;localhost&#8221;</span><br style="font-family:courier new, monospace;" /> </span><span style="font-size:x-small;"><span style="font-family:courier new, monospace;">BROKER_PORT</span><span style="font-family:courier new, monospace;"> </span><span style="font-family:courier new, monospace;">=</span><span style="font-family:courier new, monospace;"> </span><span style="font-family:courier new, monospace;">5672</span><br style="font-family:courier new, monospace;" /> </span><span style="font-size:x-small;"><span style="font-family:courier new, monospace;">BROKER_USER</span><span style="font-family:courier new, monospace;"> </span><span style="font-family:courier new, monospace;">=</span><span style="font-family:courier new, monospace;"> </span><span style="font-family:courier new, monospace;">&#8220;myuser&#8221;</span><br style="font-family:courier new, monospace;" /> </span><span style="font-size:x-small;"><span style="font-family:courier new, monospace;">BROKER_PASSWORD</span><span style="font-family:courier new, monospace;"> </span><span style="font-family:courier new, monospace;">=</span><span style="font-family:courier new, monospace;"> </span><span style="font-family:courier new, monospace;">&#8220;mypassword&#8221;</span><br style="font-family:courier new, monospace;" /> </span><span style="font-size:x-small;"><span style="font-family:courier new, monospace;">BROKER_VHOST</span><span style="font-family:courier new, monospace;"> </span><span style="font-family:courier new, monospace;">=</span><span style="font-family:courier new, monospace;"> </span><span style="font-family:courier new, monospace;">&#8220;myvhost&#8221;</span></span></p></blockquote>
<ul>
<li>Finally, in your application directory (e.g. fetcher in my example) create a tasks.py module with the following code.&nbsp; This is literally just for a test to show it&#8217;s working.&nbsp; Note it MUST be called tasks.py.&nbsp; It uses the decorator to wrap a class definition around a simple add function that adds two numbers and returns a result.&nbsp; Note that the decorator causes the class to have the name of the function (e.g. we effectively end up with a class called add in the tasks.py).</li>
</ul>
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><p><span style="font-family:courier new, monospace;">from celery.decorators import task</span>
<p /> <span style="font-family:courier new, monospace;">@task()</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">def add(x, y):</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">&nbsp;&nbsp;&nbsp; return x + y</span><br style="font-family:courier new, monospace;" /></p></blockquote>
<p>Having completed these steps on <em>Server</em> and <em>Laptop</em>, we now need to actually test and run our AMQP workers.&nbsp; Just on <em>Server</em> for now, do the following steps.
<p />
<ul>
<li>Start a terminal window from the clifton directory (or wherever you put your Django app) start a celery daemon using Django.&nbsp; Again, there are more options here, but this is just a starting point to show it all working.</li>
</ul>
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><p><span style="font-family:courier new, monospace;">$ python manage.py celeryd</span></p></blockquote>
<p> 
<ul>
<li>Now open a second terminal window and execute the following from the clifton directory.&nbsp;&nbsp; We use the Django interpretive shell to be sure all the settings are loaded nicely for us.</li>
</ul>
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><p><span style="font-family:courier new, monospace;">$ python manage.py shell</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">&gt;&gt; from clifton.fetcher import tasks</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">&gt;&gt; result = tasks.add.delay(4,4)</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">&gt;&gt; result.ready()</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">True</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">&gt;&gt; result.result</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">8</span></p></blockquote>
<p>So what just happened? Well, Celery wrapped up the add function (which simply adds x and y and returns them) in a class through the decorator.&nbsp; We then, through the Django interpreter shell, imported the class, called the delay method, passing the parameters 4 and 4.&nbsp; Celery then handled all the work of pushing that out onto the RabbitMQ server, then the celeryd worker actually got the values, executed them and provided the results.&nbsp; The <span style="font-family:courier new, monospace;">result.ready()</span> command told us that our results for this particular instance had been processed and then <span style="font-family:courier new, monospace;">result.result</span> returned what that result actually was.&nbsp; If result.ready() is False, you may find that the AMQP server is running but you don&#8217;t have the worker process running.
<p /> We&#8217;ve just passed our first message. There is a LOT going on under the hood here, but this example is fairly uninspiring.&nbsp; Shut down the celeryd terminal (send it a TERM signal, from command line do <span style="font-family:courier new, monospace;">ps aux | grep celeryd</span> then find the first process ID and do <span style="font-family:courier new, monospace;">kill &lt;pid&gt;</span> (e.g. <span style="font-family:courier new, monospace;">kill 1234</span>)) and see what happens if you execute the code again (perhaps with different parameters e.g. <span style="font-family:courier new, monospace;">tasks.add.delay(2, 3)</span> to see the difference).&nbsp; Yup, result.ready() returns False.&nbsp; It will keep returning False until you start the worker process up again.&nbsp; The message has been queued and at least for this configuration, will persist until it gets processed by a worker.&nbsp; Start the celeryd process again (<span style="font-family:courier new, monospace;">$ python manage.py celeryd</span>) and now try <span style="font-family:courier new, monospace;">result.ready()</span> in the shell &#8211; the message was processed and a result is able to be returned.
<p /> If you&#8217;re still unsure about how cool this is, then you probably don&#8217;t need queuing <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  But for the final trick in this section until we get this thing rocking and rolling properly, lets demonstrate it really is a cluster being used to process this.
<p /> Of course by default, the cluster is up and running already (presuming you haven&#8217;t stopped the <em>Laptop</em>), but to be sure, just check with <span style="font-family:courier new, monospace;">rabbitmqctl status</span>.&nbsp; Make sure you&#8217;ve also run through these settings in this current section (<span style="text-decoration:underline;">Configure Celery with Django</span>) on both <em>Laptop</em> and <em>Server</em>.&nbsp; Now &#8211; on <em>Server</em>, make sure that any running <span style="font-family:courier new, monospace;">celeryd</span> terminal is shutdown.&nbsp; Start the <span style="font-family:courier new, monospace;">celeryd</span> terminal on <em>Laptop</em>.&nbsp; From the Python / Django interpretive shell on <em>Server</em>, repeat the same steps.
<p /> Assuming you got a result successfully returned, you&#8217;ve just used the message queue cluster to have a worker on <em style="font-family:times new roman, serif;">Laptop</em><span style="font-family:times new roman, serif;"> </span>pick up and process the message from <em>Server</em>.&nbsp; The subtlety here is that the underlying RabbitMQ cluster handled passing the message between the two machines &#8211; both <em>Server</em> and <em>Laptop</em> are talking to RabbitMQ locally, it was the cluster that handled making the message available in both places.
<p /> <strong>Elapsed Time:</strong> Research, reading, implementing, testing, trouble shooting and documenting ~4 hr.&nbsp; Following the above steps, prob. 1hr minutes.
<p /> <span style="font-size:medium;"><span style="text-decoration:underline;">Re-Factor the Task and Demo to make life easier!</span></span>
<p />Note that this previous example is of course very basic and can be significantly optimised &#8211; in particular there is no connection pooling, so every new call to add.delay() is opening its own connection to the broker, hence the delay in posting the message.&nbsp; You can speed it up by passing a previously opened connection &#8211; <a href="http://ask.github.com/celery/userguide/executing.html" target="_blank">see here for more information</a>.&nbsp; We&#8217;ll implement this below.
<p /> Additionally, the Task Decorator, while useful for basic tasks, masks a lot of what&#8217;s really happening &#8211; I prefer a more verbose approach when I&#8217;m learning.&nbsp; So the new fetcher/tasks.py &#8211; note we&#8217;ve changed the name of the class to MyTest. Note that it MUST have a &#8220;run&#8221; method which is what will be called by apply_sync.
<p />
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">from celery.task import Task</span>
<p /> <span style="font-family:courier new, monospace;">class MyTest(Task):</span>
<p /><span style="font-family:courier new, monospace;">&nbsp;&nbsp;&nbsp; def run(self, x, y):</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">&nbsp;&nbsp;&nbsp; <a href="http://logger.info">logger.info</a>(&#8216;Received %s and %s&#8217; % (x, y))</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return int(x) + int(y)</span></p></blockquote>
<p> We&#8217;ll also make our life a bit easier by adding a bin directory into the clifton project (this is where I like to store all my cron jobs etc.) and create a simple python file to call our task.&nbsp; This will let us just execute the file to generate the tasks &#8211; the sort of thing you&#8217;ll need to do anyway if you want to call the tasks from a view etc.&nbsp; I&#8217;ll call it <span style="font-family:courier new, monospace;">demo.py</span> (for clarity, in my case this is in ~/clifton/bin/demo.py).&nbsp; Copy the following code in (make any local adjustments needed).&nbsp; Note the change to use <span style="font-family:courier new, monospace;">apply_async</span> instead of <span style="font-family:courier new, monospace;">delay()</span>.&nbsp; It is a lower level call to do the same thing as <span style="font-family:courier new, monospace;">delay()</span>, but allows us to add arguments etc.&nbsp; I&#8217;ve also implemented the connection pooling.
<p />
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">#! /usr/bin/env python</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">from __future__ import with_statement</span>
<p /><span style="font-family:courier new, monospace;">import os, sys</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">#os.environ['PYTHONPATH'] = &#8220;/home/tim/projects&#8221; # Uncomment and point this to your root django project </span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; # directory if PYTHONPATH not set properly already.</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">os.environ['DJANGO_SETTINGS_MODULE']=&#8217;clifton.settings&#8217;</span>
<p /> <span style="font-family:courier new, monospace;">from clifton.fetcher.tasks import MyTest</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">from celery.messaging import establish_connection</span>
<p /> <span style="font-family:courier new, monospace;">if __name__ == &#8220;__main__&#8221;:</span>
<p /> <span style="font-family:courier new, monospace;">&nbsp;&nbsp;&nbsp; numbers = [(2, 2), (4, 4), (8, 8), (16, 16)]</span>
<p /><span style="font-family:courier new, monospace;">&nbsp;&nbsp;&nbsp; results = []</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">&nbsp;&nbsp;&nbsp; with establish_connection() as connection:</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; for args in numbers:</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; res = MyTest.apply_async(args=args, connection=connection)</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; results.append(res)</span>
<p /> <span style="font-family:courier new, monospace;">&nbsp;&nbsp;&nbsp; print([res.get() for res in results])</span></p></blockquote>
<p>If you run this from the bin directory, you should see the following result:
<p />
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">$ python demo.py</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">[4, 8, 16, 32]</span></p></blockquote>
<p>We now have a better structure to take our experiments forwards and further optimise.
<p /> <strong>Elapsed Time:</strong> Research, reading, implementing, trouble shooting and documenting ~2 hr.&nbsp; Following the above steps, prob. 30 minutes.
<p /><span style="font-size:medium;"><span style="text-decoration:underline;">Directing tasks to different servers</span></span>
<p /> From here on in, this will become a little less generic and begin to deal with the problems that we are having and trying to resolve.&nbsp; I&#8217;ll explain why we&#8217;ve made our decisions and you can make your own choices.&nbsp; While you can make each of these settings in settings.py on the two different machines, I suggest you create a local_settings.py and add them in there (especially if you&#8217;re using SVN to sync changes &#8211; ie. running the same Django project on both servers).&nbsp; Where I say settings.py below I&#8217;ve actually created these changes in a local_settings.py.&nbsp; Note that most of these are default settings and can generally be over-ridden at different levels.
<div>
<ul>
<li>Run tasks on different machines.&nbsp; This actual design of these queues, exchanges and bindings is a reasonably complicated topic that gets very design specific and frankly confusing!&nbsp; It&#8217;s also (somewhat) abstracted by Celery which implements only Direct and Topic exchanges for example.&nbsp; It&#8217;s worth reading this post on <a href="http://blogs.digitar.com/jjww/2009/01/rabbits-and-warrens/" target="_blank">Rabbits and Warrens</a> which gives some good background on the various options provided by AMQP.&nbsp; If all you want to do is post a message and pick it up with a worker, then celery handles that for you, but in our case, we&#8217;d like to make some processing choices up front and be able to use different machines to process different messages.&nbsp;&nbsp; There is a good guide to this in the <a href="http://ask.github.com/celery/faq.html" target="_blank">FAQ for Celery</a>.&nbsp; Add the following to Server in settings.py</li>
</ul>
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><p><span style="font-size:x-small;"><span style="font-family:courier new, monospace;">CELERY_DEFAULT_QUEUE = &#8220;server&#8221;<br /> CELERY_QUEUES = {<br />&#8220;server&#8221;: {<br />&#8220;binding_key&#8221;: &#8220;server_task&#8221;,<br />},<br />}<br />CELERY_DEFAULT_EXCHANGE = &#8220;tasks&#8221;<br />CELERY_DEFAULT_EXCHANGE_TYPE = &#8220;direct&#8221;<br />CELERY_DEFAULT_ROUTING_KEY = &#8220;server_task&#8221;</span></span></p></blockquote>
<p>We&#8217;ve just told celeryd that when it starts it should bind to a queue called &#8220;<span style="font-family:courier new, monospace;">server</span>&#8221; and listen for messages routed to <span style="font-family:courier new, monospace;">server_task</span>.&nbsp; Do the same on <em>Laptop</em>, but specify laptop instead.
<p />
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">CELERY_DEFAULT_QUEUE = &#8220;laptop&#8221;</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">CELERY_QUEUES = {</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">&#8220;laptop&#8221;: {</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">&#8220;binding_key&#8221;: &#8220;laptop_task&#8221;,</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">},</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">}</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">CELERY_DEFAULT_EXCHANGE = &#8220;tasks&#8221;</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">CELERY_DEFAULT_EXCHANGE_TYPE = &#8220;direct&#8221;</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">CELERY_DEFAULT_ROUTING_KEY = &#8220;laptop_task&#8221;</span></p></blockquote>
<p />There are several different ways of implementing this (telling tasks where to execute).&nbsp; For example, in this configuration, if we do nothing and simply run the demo.py we created earlier, tasks started on Server will execute on Server and vice versa, because the defaults will be applied (which are different on each machine).&nbsp; Most likely we want to make some decision at execution time about where we want to route this task.&nbsp; In this case, simplify modify line 19 of the demo.py to add the routing_key parameter as follows:
<p />
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">res = MyTest.apply_async(args=args, connection=connection, routing_key=&#8221;laptop_task&#8221;)</span></p></blockquote>
<p>Changing it to <span style="font-family:courier new, monospace;">routing_key = &#8216;server_task&#8217;</span> will force the task to execute on <em>Server</em> (no matter where it originated, although strictly speaking it forces it to execute on the worker bound to the server queue, listening for server_task &#8211; we just happen to configure this queue on <em>Server</em>) and vice versa.&nbsp; Test this out and make sure its all working, you might like to add some logging into the tasks.py so you can check and see what is executing where (next section shows how if you need help with this).&nbsp; NB &#8211; It looks like the task files are cached in the celeryd worker, so if you modify the task, it looks like a good idea to restart the worker, at least for the moment when testing.&nbsp;
<p /> As a final note, lets say you want the <em>Server</em> to ALSO be able to pick up <em>Laptop</em> tasks (to use its spare capacity &#8211; or just as a more &#8220;realistic&#8221; example).&nbsp; Simply make the following change to <em>Server</em> settings.py CELERY_QUEUES and now <em>Server</em> is listening for <em>Laptop</em> tasks too.&nbsp; Because both <em>Laptop</em> and <em>Server</em> are bound to the same queue (which happens to be called <span style="font-family:courier new, monospace;">laptop</span> in our example), Rabbit automatically round robins messages between the two.&nbsp; Change the CELERY_QUEUES setting as follows to take advantage of this on <em>Server</em> only.
<p />
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-size:x-small;"><span style="font-family:courier new, monospace;">CELERY_QUEUES = {</span></span><br /> <span style="font-size:x-small;"><span style="font-family:courier new, monospace;"> &#8220;server&#8221;: {</span></span><br /><span style="font-size:x-small;"><span style="font-family:courier new, monospace;"> &#8220;binding_key&#8221;: &#8220;server_task&#8221;,</span></span><br /><span style="font-size:x-small;"><span style="font-family:courier new, monospace;">&#8220;laptop&#8221;: {</span></span><br /> <span style="font-size:x-small;"><span style="font-family:courier new, monospace;"> &#8220;binding_key&#8221;: &#8220;laptop_task&#8221;,</span></span><br /><span style="font-size:x-small;"><span style="font-family:courier new, monospace;"> },</span></span></p></blockquote>
<p>If you extend the demo.py so it passes a lot more numbers, you should be able to use demo.py to send messages to server_task and laptop_task and see the difference by checking the logs.&nbsp; Both <em>Server</em> and <em>Laptop</em> process tasks for <span style="font-family:courier new, monospace;">laptop_task</span>, while only <em>Server</em> processes tasks for <span style="font-family:courier new, monospace;">server_task</span>.
<p />
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">numbers = [(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7), (8, 8) </span><br /> <span style="font-family:courier new, monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (9, 9), (10, 10), (11, 11), (12, 12), (13, 13), (14, 14), </span><br /><span style="font-family:courier new, monospace;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (15, 15), (16, 16), (17, 17), (18, 18)]</span><br style="font-family:courier new, monospace;" /></p></blockquote>
<p><strong>Elapsed Time:</strong> Research, reading, implementing, trouble shooting and documenting ~4.5 hr.&nbsp; Following the above steps, prob. 2 hrs max.
<p /><span style="font-size:medium;"><span style="text-decoration:underline;">Get a (heart) beat</span></span>
<p /> We now have a (basic) distributed architecture with the ability to route jobs to a specific server and we also know how to distribute across two servers (or more).&nbsp; The next trick is to create an automated task that runs on a regular basis.&nbsp; Now we could use a CRON job or something similar, but it would be nice if there was a way of building this into Django and Celery so we can route the messages straight on to a queue. It turns out that there is using a periodic_task which does more or less what it says <a href="http://ask.github.com/celery/getting-started/periodic-tasks.html" target="_blank">on the box</a>.&nbsp; It runs periodically.&nbsp; I modified our earlier <span style="font-family:courier new, monospace;">tasks.py</span> as follows below.&nbsp; Now as well as the basic <span style="font-family:courier new, monospace;">add</span> task already defined, we now have a PeriodicTask which executes every 30 seconds.&nbsp; It doesn&#8217;t do much &#8211; I just simply had it call add again in this example.
<p />
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">from celery.task import PeriodicTask, Task<br /> from celery.messaging import establish_connection
<p />import logging<br />logger = logging.getLogger(&#8216;fetcher.tasks&#8217;)
<p />class MyTest(Task):<br /></span></p></blockquote>
<p />
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">&nbsp;&nbsp;&nbsp; def run(self, x, y):<br />&nbsp;&nbsp;&nbsp; <a href="http://logger.info">logger.info</a>(&#8216;Received %s and %s&#8217; % (x, y))<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return int(x) + int(y)
<p />class MyPeriodicTask(PeriodicTask):<br /> &nbsp;&nbsp;&nbsp; run_every = timedelta(seconds=30)
<p />&nbsp;&nbsp;&nbsp; def run(self, **kwargs):<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; numbers = [(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7), (9, 9), (10, 10), (11, 11), (12, 12), (13, 13), (14, 14), (15, 15), (16, 16), (17, 17), (18, 18)]<br /> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; with establish_connection() as connection:<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; for n in numbers:<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; MyTest.apply_async(args=[n[0], n[1]], connection=connection, routing_key=&#8221;laptop_task&#8221;)
<p />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <a href="http://logger.info">logger.info</a>(&#8216;Ran periodic task&#8217;)<br /> </span></p></blockquote>
<p>You might need to remove the logging options unless you have a proper logger setup in your settings.py (FYI, you can add the following into settings.py and this should all work nicely for you &#8211; just change paths as appropriate and also make sure you have a log directory).
<p />
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">LOG_FILENAME= &#8216;/home/tim/clifton/log/debug.log&#8217;</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">import logging</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">logging.basicConfig(</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">&nbsp;&nbsp;&nbsp; filename=LOG_FILENAME,</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">&nbsp;&nbsp;&nbsp; format=&#8217;[%(levelname)-5s] %(asctime)-8s %(name)-10s %(message)s&#8217;,</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">&nbsp;&nbsp;&nbsp; #datefmt=&#8217;%a, %d %b %Y %H:%M:%S&#8217;,</span><br style="font-family:courier new, monospace;" /><span style="font-family:courier new, monospace;">&nbsp;&nbsp;&nbsp; datefmt=&#8217;%H:%M:%S&#8217;,</span><br style="font-family:courier new, monospace;" /> <span style="font-family:courier new, monospace;">&nbsp;&nbsp;&nbsp; level=logging.DEBUG)</span></p></blockquote>
<p>But how does it execute?&nbsp; Well in this instance we don&#8217;t actually need a demo.py, instead we can use the beat feature of the celeryd to execute the periodic task for us.&nbsp; To do this, simply start your celeryd with the -B option.&nbsp; e.g.
<p />
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">$ python manage.py celeryd -B</span></p></blockquote>
<p>Alternatively you can run a dedicated celerybeat server.
<p />
<blockquote class="gmail_quote" style="border-left:1px solid #cccccc;margin:0 0 0 .8ex;padding-left:1ex;"><span style="font-family:courier new, monospace;">$ python manage.py celerybeat</span></p></blockquote>
<p>Just be aware that if you run a dedicated celerybeat server, you&#8217;ll also need to start a worker (celeryd) yourself, otherwise you&#8217;ll have your tasks sent to the queue, but not processed.&nbsp; I have a personal preference for running the dedicated celerybeat server as it makes it easier to isolate and shutdown just the beat server process from the workers. (<span style="font-family:courier new, monospace;">ps aux | grep celerybeat</span>)
<p /> <strong>Elapsed Time:</strong> Implementing, limited trouble shooting and documenting ~1 hr.&nbsp; Following the above steps, prob. 30 minutes.</div>
<p />
<div>Ready for more? <a href="http://timbull.com/part-2-build-a-processing-queue-with-multi-th" target="_blank">Continue to part 2.</a></div>
</div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/timbull001.wordpress.com/13015027/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/timbull001.wordpress.com/13015027/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=timbull.com&#038;blog=49080029&#038;post=13015027&#038;subd=timbull001&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://timbull.com/2010/03/09/part-1-build-a-processing-queue-with-multi-th/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/45bce8c85db792fa9373bee604141b29?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">tbull001</media:title>
		</media:content>
	</item>
	</channel>
</rss>
