Javier Tordable Blog http://www.javiertordable.com Javier Tordable blog on Software, Mathematics and Technology Nounoublog https://code.google.com/p/nounoublog/ Really Simple SEO http://www.javiertordable.com/blog/2010/03/12/really-simple-seo Fri, 12 Mar 2010 00:36:31 GMT http://www.javiertordable.com/blog/2010/03/12/really-simple-seo <p> SEO stands for Search Engine Optimization, and is the process of improving a website's structure and content in order to make it easy for search engines to gather the pages and display them in search results in the best position possible. </p> <img src="http://www.javiertordable.com/img/search-engines.png" alt="Search Engines"/> <p> In this post I am going to explain a few basic principles of SEO and show examples of how I implemented them in my blog. Also, as I am the Tech Lead of Webmaster Tools backend, I am going to talk a little bit about some features of Webmaster Tools that are very helpful for SEO. Please remember that all these tips are not only good for search engines, but also for users. If you have to choose between doing something to benefit users or search engines, always choose what is best for users. </p> <p> Here is the list of simple SEO tips: </p> <ul> <li><strong>Use a good URL structure, with descriptive URLs.</strong> If the URL has keywords related to the page topic it will be easier to find in search results. For example, the url of this post includes the words <em>really simple seo</em>, that are the topic of the post <pre> http://www.javiertordable.com/blog/2010/03/11/really-simple-seo </pre> </li> <li> <strong>Use good page titles.</strong> Similar to the previous tip, good page titles, with appropriate keywords make it easier to find the page in search results. Even though the title of this page includes the full name of the site, the title begins with a good description of the content <pre> Really Simple SEO - Javier Tordable blog on Software, Mathematics and Technology </pre> </li> <li> <strong>Have a good meta description for each page.</strong> The meta description is used by some search engines to show in the snippets in search results (the small paragraph with a description of the page). If you don't use it, you will have the risk that the search engine will generate it by itself, with unexpected results. This happened to me before I had a meta description, my snippet was taken from the RSS feed and looked awful. My current meta description for the homepage is: <pre> &lt;meta name="description" content="Javier Tordable blog on Software, Mathematics and Technology. Javier Tordable is a software engineer at Google and Ph.D. candidate in Mathematics."&gt; </pre> </li> <li> <strong>Structure the page appropriately, using HTML header tags.</strong> The most important parts should be within a H1 tag, the second most important in H2, etc. until H6. For example in this post the blog title and subtitle are within H1 and H2 tags. The title of the particular post is in a H2, and other less important sections, the about box and the archives are within an H3 HTML tag <pre> &lt;h1&gt;&lt;a href="/"&gt;Javier Tordable&lt;/a&gt;&lt;/h1&gt; &lt;h2&gt;A blog on Software, Mathematics and Technology&lt;/h2&gt; &lt;h3&gt;About&lt;/h3&gt; &lt;h3&gt;Archives&lt;/h3&gt; &lt;h2&gt;Really Simple SEO&lt;/h2&gt; </pre> </li> <li> <strong>Use the simplest format possible.</strong> Currently search engines are very advanced and can process Flash, Javascript and other content types, however it's always easier to access raw HTML content. So prefer to use HTML for most content, unless Flash or Javascript are essential. Also it will be easier to access the page from old browsers or other platforms. For example iPhone users can't see Flash pages </li> <li> <strong>Have a flat internal link structure.</strong> The flatter your link structure is, the easier it will be for search engines to access a page. Also the easier it will be for users to access whatever content they are looking for. In my blog I have all the main sections linked in the top navigation bar, which appears in all pages. And in the right side of most pages there is a link to all the blog posts <pre> &lt;a href="/blog/all"&gt;All Posts&lt;/a&gt; </pre> From the homepage of the site it's possible to access any other content page in two clicks or less. </li> </ul> <p> And now, some interesting pieces of information about your site that you can find in <a href="http://www.google.com/webmasters/tools/">Webmaster Tools</a>: </p> <ul> <li> One of my favorites is <strong>Backlinks</strong>, which will show you all the links pointing to your site, from all over the Web. Having many quality links is important because it will be easier for people to find your site, and it will show to search engines that the site is relevant. <img src="http://www.javiertordable.com/img/screenshot-webmaster-tools-backlinks.png" alt="Screenshot of Webmaster Tools Backlinks"/> </li> <li>Another very useful tool is <strong>Top Search Queries</strong>, which will show for which queries my site appears in search results. For example, the "bundle adjustment" query has more requests than "javier tordable" and it appears in the second row in the following table, while the other query is in the third row. However my site appears in position 16 for bundle adjustment, and it appears in position one for searches of my own name <img src="http://www.javiertordable.com/img/screenshot-webmaster-tools-top-search-queries.png" alt="Screenshot of Webmaster Tools Top Search Queries"/> </li> <li> Also, another cool piece of information is <strong>Subscriber Stats</strong> which shows how many subscribers I have for my RSS feed from <a href="http://www.google.com/reader">Google Reader</a>. In my case I can see that I have 10 people subscribed. And I can also submit this feed as a Sitemap, which will help getting my site indexed <img src="http://www.javiertordable.com/img/screenshot-webmaster-tools-subscriber-stats.png" alt="Screenshot of Webmaster Tools Subscriber Stats"/> For example, my Sitemap statistics show that I have 11 pages in this sitemap, and they are all indexed <img src="http://www.javiertordable.com/img/screenshot-webmaster-tools-sitemaps.png" alt="Screenshot of Webmaster Tools Sitemaps"/> </li> <li> And to check that my site is being correctly crawled, I can check the <strong>Crawl Stats </strong> feature. As you can see in the graph, the number of pages that are crawled in my site per day has been going up significantly since I implemented all this SEO tips <img src="http://www.javiertordable.com/img/screenshot-webmaster-tools-crawl-stats.png" alt="Screenshot of Webmaster Tools Crawl Stats"/> </li> </ul> <p> To finish, I am just going to point out that there are a lot of SEO resources online. Doing proper search engine optimization doesn't need to be complicated or expensive. And is not only good for search engines, but also for users. For more tips, you should check the <a href="http://googlewebmastercentral.blogspot.com/2008/11/googles-seo-starter-guide.html"> Google SEO guide</a>. </p> Collaborative Mathematics and The Future of Science http://www.javiertordable.com/blog/2010/02/25/collaborative-mathematics-future-of-science Thu, 25 Feb 2010 20:02:55 GMT http://www.javiertordable.com/blog/2010/02/25/collaborative-mathematics-future-of-science <p> Mathematical research is traditionally seen as a one-man job. To quote <a href="http://books.google.com/books?id=lQosnIw05dYC"> Jean Dieudonné in The Music of Reason</a>: </p> <div id="special-text"> Research in the experimental sciences is done in laboratories, where larger and larger teams are needed to manipulate the instruments and to scrutinize the results. To do research in mathematics nothing is needed except paper and a good library. Team-work, as practiced in the experimental sciences is, then, quite unusual in mathematics, most mathematicians finding it difficult to think seriously except in silence and solitude. Collaborative work, while quite common, most often consists in putting together results that each of the collaborators has managed to obtain in isolation, albeit with mutual profit from each other's ideas, enabling them to progress form new points of departure. </div> <p> In spite of that, about a year ago <a href="http://en.wikipedia.org/wiki/Timothy_Gowers">Tim Gowers</a> asked himself if it would be possible to solve important mathematical problems by collaborating openly over the internet. Not a collaboration among a few colleagues, but among everybody that had any insight about the problems. He shared this question with the mathematical community through a <a href="http://gowers.wordpress.com/2009/01/27/is-massively-collaborative-mathematics-possible/"> post in his blog</a>. That was the birth of the <a href="http://polymathprojects.org/">Polymath</a> project. </p> <p> The first problem that the Polymath group worked on is the attempt to obtain a simple proof for the <a href="http://en.wikipedia.org/wiki/Hales–Jewett_theorem"> Hales–Jewett theorem</a>. This theorem is a very important result from <a href="http://en.wikipedia.org/wiki/Ramsey_theory"> Ramsey theory</a>. In very gross terms Ramsey theory says that for many mathematical structures, there is no such thing as complete randomness. </p> <p> For example, take a group of six people: Alice, Bob, Charles, David, Erin and Fritz. The Ramsey theorem tells us that there are either 3 people that all know each other, or 3 people that are all strangers to each other. Even in something as random as a party, if there are at least 6 people then we can find a very special subgroup of 3 people. </p> <p> Here is the proof: Take Alice, imagine that she knows less than 3 people at the party. That is, she knows only Bob, or knows only Bob and Charles. Then we consider David, Erin and Fritz. If they all know each other, we have a group of 3 people that know each other. If not, two of them don't know each other, for example David doesn't know Erin. As a consequence Alice, David and Erin are all strangers to each other. If Alice knows 3 people or more, the proof is the same, say she knows Bob, Charles and David. If none of them knows each other then there we have our group of 3 strangers. But if two of them know each other, for example Bob knows Charles, the group Alice, Bob and Charles all know each other. </p> <p> So far the Polymath group has discussed 5 problems, which are all shown in the <a href="http://michaelnielsen.org/polymath1/index.php?title=Main_Page"wiki> Wiki</a> and they have started to publish some of the results. Here is link to a paper from D.H.J. Polymath on arXiv, <a href="http://arxiv.org/abs/0910.3926"> A new proof of the density Hales-Jewett theorem</a>. </p> <p> The following image is a part of a 3D Maldelbrot fractal, as described <a href="http://www.skytopia.com/project/fractal/mandelbulb.html"> here</a>. It has nothing to do with the rest of the post, but the Hales-Jewett theorem doesn't lend itself easily to fancy pictures. </p> <img src="/img/3d-fractal.jpg" alt="Maldelbulb 3D fractal" /> <p> And for comparison purposes this is a piece of romanesco broccoli. </p> <img src="/img/romanesco-broccoli.jpg" alt="Romanesco broccoli" /> <p> Of course, Mathematics is not the only scientific discipline in which people collaborate openly in interesting problems. Michael Nielsen has a great blog <a href="http://michaelnielsen.org/blog/doing-science-online/"> post</a> about doing science online. Probably the most important point is that the way that scientists work with each other is changing. And the change is driven mostly by new online collaboration tools. To finish the post I will quote Michael: </p> <div id="special-text"> Blogs, wikis, open notebooks, InnoCentive and the like aren’t the end of online innovation. They’re just the beginning. The coming years and decades will see far more powerful tools developed. We really will enormously scale up scientific conversation; we will scale up scientific collaboration; we will, in fact, change the entire architecture of expert attention, developing entirely new ways of navigating data, making connections and inferences from data, and making connections between people. </div> New Google Chart Tools http://www.javiertordable.com/blog/2010/02/17/new-google-chart-tools Wed, 17 Feb 2010 02:03:13 GMT http://www.javiertordable.com/blog/2010/02/17/new-google-chart-tools <p> Google recently released a new set of tools for graphics and interactive visualizations called <a href="http://code.google.com/apis/charttools/"> Google Chart Tools</a>. Google Chart Tools replaces the previous Charts API (for static images) and Visualization API (for dynamic graphics). And it combines both APIs within a single framework. Here is a <a href="http://googlecode.blogspot.com/2010/02/announcing-google-chart-tools.html"> link to the official announcement</a>. </p> <p> This is an example of the Charts API, a map with a couple of countries marked in a different color: </p> <img src="http://chart.apis.google.com/chart?cht=t&chtm=world&chs=440x220&chld=USES&chd=t:10,50&chco=FFFFFF,00FF00,005500&chf=bg,s,EAF7FE" alt="Example of Google Charts API, colored map"/> <p> This map was generated with the following link: </p> <pre> http://chart.apis.google.com/chart?cht=t&chtm=world&chs=440x220 &chld=USES&chd=t:10,50&chco=FFFFFF,00FF00,005500&chf=bg,s,EAF7FE </pre> <p> Let me go over each part in that link and explain what it means: </p> <ul> <li><strong>cht=t</strong> indicates that this is a graph of type map</li> <li><strong>chtm=world</strong> says that the map should include the whole world</li> <li><strong>chs=440x220</strong> is the size of the chart</li> <li><strong>chld=USES</strong> is the list of countries to display in a different color, US and ES</li> <li><strong>chd=t:10,50</strong> is the intensity of the color of each country. US=10, ES=50</li> <li><strong>chco=FFFFFF,00FF00,005500</strong> is the color gradient FFFFFF=white for the background, 00FF00 light green (US) and 005500 medium green (ES)</li> <li><strong>chf=bg,s,EAF7FE</strong> is the background color, light blue</li> </ul> <p> Here is another example, but this time of an interactive visualization: </p> <script type='text/javascript' src='http://www.google.com/jsapi'> </script> <script type='text/javascript'> google.load('visualization', '1', {'packages': ['geomap']}); google.setOnLoadCallback(drawMap); function drawMap() { var data = new google.visualization.DataTable(); data.addRows(6); data.addColumn('string', 'Country'); data.addColumn('number', 'Coolness'); data.setValue(0, 0, 'Spain'); data.setValue(0, 1, 100); data.setValue(1, 0, 'Brazil'); data.setValue(1, 1, 80); data.setValue(2, 0, 'United States'); data.setValue(2, 1, 70); data.setValue(3, 0, 'Canada'); data.setValue(3, 1, 40); data.setValue(4, 0, 'Russia'); data.setValue(4, 1, 20); data.setValue(5, 0, 'China'); data.setValue(5, 1, 10); var options = {}; options['dataMode'] = 'regions'; options['width'] = 440; options['height'] = 220; options['colors'] = [0xEAF7FE, 0xA5EF63, 0x50AA00, 0x267114] var container = document.getElementById('map_canvas'); var geomap = new google.visualization.GeoMap(container); geomap.draw(data, options); }; </script> <p> <div id='map_canvas' style="margin-left: 70px;"></div> </p> <p> In this case the map is dynamic. Moving the mouse over the different countries will display a message, which contains the value used to select the color of the country. Now the code is a little bit longer, about 30 lines of Javascript, so I am not going to include it, but there is a detailed explanation here: <a href="http://code.google.com/apis/visualization/documentation/"> Google Chart Tools, Introduction</a>. </p> <p> These tools are probably not as powerful as custom made visualizations, like the ones that I talked about in a previous post, <a href="http://www.javiertordable.com/blog/2009/12/03/interesting-visualizations-changes-over-time"> Interesting Visualizations: Changes Over Time</a>, but they are definitely easier to create and modify. </p> <p> To finish, I am just going to quote Robert Kosara and his blog on visualization <a href="http://eagereyes.org/">Eager Eyes</a>, "JavaScript for visualization is clearly the way to go. It's fast, versatile, works much better than Flash or Java, and is obviously way ahead of static images". You can check the complete post <a href="http://eagereyes.org/blog/2010/javascript-key-to-in-browser-visualization"> here</a>. Nounoublog updated http://www.javiertordable.com/blog/2010/02/06/nounoublog-updated Sat, 06 Feb 2010 00:00:00 GMT http://www.javiertordable.com/blog/2010/02/06/nounoublog-updated <p> Over the last few weeks this blog has changed dramatically. It looks pretty much the same as when it started but under the covers the code of the blogging platform, <a href="http://code.google.com/p/nounoublog/">Nounoublog</a> is very different. I am going to talk about three of the features that I have been working on lately: </p> <ul> <li>Archives</li> <li>RSS Feed</li> <li>Admin console</li> </ul> <p> And I will show a few snippets of the actual code that powers the blog. </p> <p> For those that visit the blog for the first time, Nououblog is a small blogging platform developed in <a href="http://code.google.com/appengine/">Google App Engine</a>. I started working on it basically for two reasons. First, I wanted to learn how to develop applications for Google App Engine. And second, because I wanted a simple but highly customizable platform, with free hosting and no ads. </p> <p> <img src="http://code.google.com/appengine/images/appengine_lowres.gif" alt="Google App Engine logo" /> </p> <h3>Archives</h3> <p> Now the blog has an archives section. It is the small set of links in the right side. It will let you view all the posts since the creation of the blog. </p> <p> For example, if you click in <a href="/blog/2009">2009</a>, it will show you all the posts from the previous year. In order to enable this I had to update the url structure of the blog. Now paths have the form: </p> <pre> http://www.javiertordable.com/blog/2010/01/30/the-eternal-night </pre> <p> With slashes separating the different parts of the url. And all the following are valid urls: </p> <pre> http://www.javiertordable.com/blog/2010/01/30/ http://www.javiertordable.com/blog/2010/01/ http://www.javiertordable.com/blog/2010/ </pre> <p> Each one will show respectively all the posts of the day, the month, and the year. Each one has its own handler. Here is for example the handler that returns all the posts in a year: </p> <pre> class BlogYear(webapp.RequestHandler): """Request handler for all blog posts in a given year. This handler answers all requests for /blog/YYYY and /blog/YYYY/. """ def get(self): # Get the year from the path. path = self.request.path[len('/' + config.BLOG_PREFIX + '/'):] (year, month, day, desired_url) = extract_url_parts(path) # Get the list of all posts of the year. posts = get_posts_in_date(year, month=None, day=None) # And return an archives page with the posts. s = pages.ArchivesPageGenerator() self.response.out.write(s.generate(posts, str(year))) </pre> <p> There are similar handlers for all the posts in a month and all the posts in a day. </p> <h3>RSS Feed</h3> <p> An <a href="http://en.wikipedia.org/wiki/RSS">RSS feed</a> is a specially formatted XML file, which includes data about the posts in a blog. It is updated automatically by the blog, so that when it changes, RSS subscribers know that there is new content. There are applications like <a href="http://www.google.com/reader/">Google Reader</a> that are very helpful to keep track of many RSS feeds and alerting when there is new stuff to read. </p> <p> In the <a href="http://code.google.com/p/nounoublog/"> Nounoublog blogging platform</a>, you can access the RSS feed by clicking in the <a href="/blog/rss.xml">RSS</a> link at the top of the page. In the browser you may see only a bunch of text, but if you add this link to your Google Reader subscriptions you will see a list of the most recent posts. </p> <p> Same as before the RSS feed is powered by its own handler, which is very simple: </p> <pre> class RssFeed(webapp.RequestHandler): """Handler for the RSS feed. This feed contains a list with all the blog posts, from last to first. This list is subject to the maximum item retrieval limit of the DB. """ def get(self): # Get the list of all posts. posts = get_all_posts() # Return the xml feed with the posts. template_values = {'posts': posts} self.response.out.write(template.render(template_path("rss_feed"), template_values)) </pre> <p> Where the template provides the XML structure of the feed, and inserts the data corresponding to the posts </p> <pre> &lt;?xml version=&quot;1.0&quot; encoding=&quot;iso-8859-1&quot;?&gt; &lt;rss version=&quot;2.0&quot;&gt; &lt;channel&gt; &lt;title&gt;Javier Tordable Blog&lt;/title&gt; &lt;link&gt;http://www.javiertordable.com&lt;/link&gt; &lt;description&gt; Javier Tordable blog on Software, Mathematics and Technology &lt;/description&gt; &lt;generator&gt;Nounoublog&lt;/generator&gt; &lt;docs&gt;https://code.google.com/p/nounoublog/&lt;/docs&gt; {# Loop over all the blog posts. #} {% for post in posts %} &lt;item&gt; &lt;title&gt;{{ post.title }}&lt;/title&gt; &lt;link&gt;{{ post.absolute_url }}&lt;/link&gt; &lt;pubDate&gt;{{ post.rss_pub_date }}&lt;/pubDate&gt; &lt;guid&gt;{{ post.absolute_url }}&lt;/guid&gt; &lt;description&gt;{{ post.escaped_content }}&lt;/description&gt; &lt;/item&gt; {% endfor %} &lt;/channel&gt; &lt;/rss&gt; </pre> <p> Notice that in the Django template the post elements appear as attributes while in fact they are method calls. Also I use as GUID the url of the post, as it is intended to be a permanent link</a>. </p> <h3>Admin console</h3> <p> The last item that I have been working on is the administration console. This is still work in progress, but I expect that once I am done with it I will post more often. </p> <p> The admin console will have options to: </p> <ul> <li>Add posts</li> <li>Edit posts</li> <li>Add static pages</li> <li>Edit static pages</li> <li>Edit the CSS</li> <li>Edit redirects</li> </ul> <p> All these options seem very normal with the exception of the redirects. How does it work? For example, when going to: </p> <pre> http://www.javiertordable.com/blog/2009-12-01/my-first-blog-post </pre> <p> You are redirected to another url, which appears in the url bar. Notice how the dashes are now forward slash bars </p> <pre> http://www.javiertordable.com/blog/2009/12/01/my-first-blog-post </pre> <p> I added support for redirects because I changed the site several times (including the url structure), and I didn't want to serve 404 error pages for all old urls. </p> <p> Keep visiting the blog or subscribe to the <a href="/blog/rss.xml">RSS</a> feed for more news on Nouonublog! </p> The Eternal Night http://www.javiertordable.com/blog/2010/01/30/the-eternal-night Sat, 30 Jan 2010 00:00:00 GMT http://www.javiertordable.com/blog/2010/01/30/the-eternal-night <p> My brother David is a film director. He has been making short film for a few years, he has even won a few prizes. Most of his work is at <a href="http://www.tpmpictures.com">tpmpictures.com</a>. Today I just wanted to show his last piece of work, a science fiction short about the end of the world. The short is in Spanish but with English subtitles. </p> <object> <param name="movie" value="http://www.notodofilmfest.com/ediciones/09/es/swf/player.swf?corto=22321.flv&duracion=03:30"></param> <param name="wmode" value="transparent"></param> <embed src="http://www.notodofilmfest.com/ediciones/09/es/swf/player.swf?corto=22321.flv&duracion=03:30" type="application/x-shockwave-flash" wmode="transparent" width="500" height="370"></embed> </object> <p> There are a couple of things that are interesting in this short. First, the script is not wildly improbable. Check out this (humorous) list of the <a href="http://www.cracked.com/article_16583_the-5-scientific-experiments-most-likely-to-end-world.html"> 5 scientific experiments most likely to end the world</a>. And second, the special effects are pretty nice for a zero budget short. If you liked it, please go ahead and leave a comment at the <a href="http://www.notodofilmfest.com/ediciones/09/?lg=es&corto=22321"> Notodo film festival</a>. </p> TRANSCEND http://www.javiertordable.com/blog/2010/01/15/trascend-book-kurzweil-grossman Fri, 15 Jan 2010 00:00:00 GMT http://www.javiertordable.com/blog/2010/01/15/trascend-book-kurzweil-grossman <p> A couple of days ago I started reading <a href="http://www.transcendbook.com/">TRANSCEND</a>, the new book from Ray Kurzweil and Terry Grossman. The book starts from the principle that our knowledge of medicine and biology is increasing to a point where we can start to control effectively how fast our own bodies age. And even more important the amount of knowledge that we gather is increasing over time. If the trend continues we may reach a point where we can effectively reverse engineer our bodies in order to avoid aging. </p> <img src="/img/trascend-book-kurzweil-grossman.png" alt="TRANSCEND book by Ray Kurzweil and Terry Grossman" /> <p> Whether you believe that we will reach that point or not, the book is an interesting read. It is filled with healthy habits, complete diets and recipes, exercise programs and more. Even if you leave aside the supplements and the fancy biomedical technologies, there are plenty of actionable tips for inproving your quality of life. For example to check for food intolerances. There are millions of people out there that can't digest milk or wheat very well and are not even aware of it. </p> <p> Disclaimer: I didn't get paid to write this post. </p> MapReduce Integer Factorization in arXiv http://www.javiertordable.com/blog/2010/01/07/mapreduce-integer-factorization-in-arxiv Thu, 07 Jan 2010 00:00:00 GMT http://www.javiertordable.com/blog/2010/01/07/mapreduce-integer-factorization-in-arxiv <p> This Monday I published my article on <a href="http://arxiv.org/abs/1001.0421">MapReduce for integer factorization in arXiv</a>. The article is essentially the same that can be downloaded in the <a href="/research">research</a> section of this site. So if you have already checked it out, you won't find anything new. However I am very excited because it is my first addition to arXiv. </p> <img src="/img/mapreduce-integer-factorization-arxiv.png" alt="MapReduce for Integer Factorization in arXiv." /> <p> In case that you are not familiar with <a href="http://www.arxiv.org">arXiv</a>, it is one if the greatest scientific sites in the web. It has over half a million articles, especially in the fields of mathematics, physics and computer science, and many relevant papers are published in the arXiv months before they appear in any peer reviewed journal. </p> MapReduce Integer Factorization released! http://www.javiertordable.com/blog/2009/12/29/mapreduce-integer-factorization-released Tue, 29 Dec 2009 00:00:00 GMT http://www.javiertordable.com/blog/2009/12/29/mapreduce-integer-factorization-released <p> Recently I published the code of <a href="http://code.google.com/p/mapreduce-integer-factorization/"> MapReduce for Integer Factorization</a>. It is available under the Apache 2.0 License in Google Code. It includes everything necessary to run in <a href="http://hadoop.apache.org/">Apache Hadoop</a>, as well as the numerical libraries used. It has no dependencies apart from the last version of Hadoop. </p> <p> <img src="http://hadoop.apache.org/images/hadoop-logo.jpg" alt="Hadoop logo" /> </p> <p> This project is a proof of concept that shows how to use MapReduce, a framework for distributed computation to solve a purely numerical problem. The main conclussion is that it's possible to use MapReduce for problems that lie far ahead from its original area of application, for example number theory. Also in this case the difficulty involved in developing the MapReduce program is similar to the difficulty of creating a worksheet in a mathematical tool like Maple. But the performance of MapReduce is significantly higher. </p> <p> If you have some time, please download it from <a href="http://code.google.com/p/mapreduce-integer-factorization/">here</a>, and let me know how it works for you. </p> A small blogging platform in Google App Engine http://www.javiertordable.com/blog/2009/12/17/small-blog-platform-in-google-app-engine Thu, 17 Dec 2009 00:00:00 GMT http://www.javiertordable.com/blog/2009/12/17/small-blog-platform-in-google-app-engine <p> If you have never made a <a href="http://en.wikipedia.org/wiki/Web_application">web application</a> it may seem daunting. There are hundreds of alternative technologies and frameworks out there. And web apps development is quite different from client applications, which is what most developers are used to. </p> <p> Here is an example of a web application. Wikipedia! </p> <img src="/img/screenshot-wikipedia.png" alt="Screenshot of Wikipedia" /> <p> Most web applications share a few common elements: </p> <ul> <li>A persistence layer, for authored content or user created content</li> <li>A system to connect each user request with a part of the application</li> <li>A method to render and display that content to users</li> </ul> <p> Traditionally the persistence layer is a <a href="http://en.wikipedia.org/wiki/SQL_database">SQL database</a>, the requests are directed for example to <a href="http://en.wikipedia.org/wiki/Servlet">Java servlets</a> in an app server like <a href="http://tomcat.apache.org/">Tomcat</a>, and there is a more or less refined templating engine, in which the content is added to create the whole page returned to the user. </p> <p> Another alternative is <a href="http://code.google.com/appengine/">Google App Engine</a>. Google App Engine is a platform and a set of libraries to develop web applications based in Google's own infrastructure. It is available in Java and Python, but here I will concentrate on the Python version. </p> <p> The persistence layer of Google App Engine is the <a href="http://code.google.com/appengine/docs/python/datastore/overview.html"> Datastore</a>, a highly parallel but simple to use storage solution. The Datastore doesn't support queries as complex SQL does, however it can scale up to a level which is beyond what a normal database can do. And it can do so in a way that is trivial for the developer. </p> <p> Google App Engine uses <a href="http://en.wikipedia.org/wiki/YAML">yaml</a> and the webapp framework to answer user queries. One can set a configuration file which assigns certain url paths (via a regular expression) to instances of webapp.RequestHandler. </p> <pre> handlers: - url: /.* script: app.py </pre> <p> The instance can implement a get() method which generates the response returned to the user. </p> <pre> class App(webapp.RequestHandler): def get(self): self.response.headers['Content-Type'] = 'text/plain' self.response.out.write('Hello, World!') </pre> <p> Finally, in order to generate HTML, Google App Engine incorporates the templating engine from <a href="http://www.djangoproject.com/">Django</a>. A template is essentially a document with <em>variables</em> instead of content. When the application needs to answer a user request it can load the template and replace the <em>variables</em> with real content, for example information from the data storage. </p> <p> This is basically the way that this blog is made. It is a very simple Google App Engine application. It uses the Datastore for the blog posts, which are retrieved when it receives a request for the /blog url path. Then it replaces the blog post content into the blog template and returns that content. Here is a sample of code. It is not the actual code, but it gives a complete example: </p> <pre> class Blog(webapp.RequestHandler): def get(self): # Retrieve the posts from the database. query = 'SELECT * FROM Post WHERE public = True ORDER BY date DESC ' 'LIMIT %d ' % NUM_POSTS_IN_MAIN_PAGE posts = db.GqlQuery(query).fetch(NUM_POSTS_IN_MAIN_PAGE) # Render them into html. if len(posts) > 0: template_values = &#123;'posts': posts &#125; template_path = os.path.join(os.path.dirname(__file__), 'templates/blog') content = template.render(template_path, template_values) else: content = '' # And return a full page with the blog content. s = pages.FullPageGenerator() self.response.out.write(s.generate(content)) </pre> <p> To sum up, Google App Engine is a great option for developing web applications. Expecially if you will require it to scale seamlessly, or to integrate with Google services. Check out the <a href="http://code.google.com/appengine/docs/python/gettingstarted/introduction.html"> Google App Engine tutorial</a>. </p> <img src="http://code.google.com/appengine/images/appengine-noborder-120x30.gif" alt="Powered by Google App Engine" /> Interesting Visualizations: Changes Over Time http://www.javiertordable.com/blog/2009/12/03/interesting-visualizations-changes-over-time Thu, 03 Dec 2009 00:00:00 GMT http://www.javiertordable.com/blog/2009/12/03/interesting-visualizations-changes-over-time <p> Visualizations are simply ways of representing data. But if they are good, they can bring us deep insights, that go well beyond what is possible to understand by simply looking at the raw data. </p> <p> There are several categories of visualizations, for example: <ul> <li>Compare two entities based on a given set of metrics. An example of this is a benchmark between two competing companies or products</li> <li>Track the value of a given metric over time. A well known visualization of this type is a financial chart, with the value of an asset</li> <li>Compare the value of a single metric in different geographic locations. We have all seen maps in which the color of each region is based on the value of the metric</li> </ul> <p> Another very interesting set of visualizations are those that allow us to track a particular situation over time. Here are three examples: </p> <p> <a href="http://www.flickr.com/photos/ciaranhughes/4121291229/"> Tracking a change in ranking over time (by Ciaran Hughes)</a> </p> <p> <img src=/img/visualization-changes-ranking-over-time.png alt="Visualization for changes in ranking over time" /> </p> <p> <a href="http://www.ge.com/visualization/health_costs/index.html"> Tracking a change in distribution over time (from GE)</a>. The bottom slider changes the chart based on the age. Each section of the chart represent one kind of illness </p> <p> <img src=/img/visualization-changes-distribution-over-time-40.png alt="Visualization for changes in distribution over time" /> <img src=/img/visualization-changes-distribution-over-time-50.png alt="Visualization for changes in distribution over time" /> <img src=/img/visualization-changes-distribution-over-time-60.png alt="Visualization for changes in distribution over time" /> </p> <p> <a href="http://www.xach.com/moviecharts/2008.html"> Tracking changes in volume or magnitude over time (from xach.com)</a>. Each color block is a movie, and the size represents the box office in each week. </p> <img src=/img/visualization-changes-volume-over-time.png alt="Visualization for changes in volume or magnitude over time" /> <p> The first visualization doesn't attempt to indicate quantity because it displays an abstract concept such as brand appreciation. However in the third case the quantity is very concrete, the total box office in dollars. </p> <p> We could use also the first or third approach for the second data set. But if we went with the first kind of visualization most likely we would only remember the most important expense for each age. Also we would be constrained in the number of years to show. If we decided to go with a visualization of the third kind it would be hard to compare how the expenses change relatively to each other as all of them are likely to increase over time. </p> My first blog post http://www.javiertordable.com/blog/2009/12/01/my-first-blog-post Tue, 01 Dec 2009 00:00:00 GMT http://www.javiertordable.com/blog/2009/12/01/my-first-blog-post <p> Hello everybody. This is the first post in my new blog. This is not your common Wordpress or Blogger blog. It runs on a custom blogging platform made from scratch, on top of Google App Engine. Soon I will add a couple of posts about how it's done, and I will release the code of the platform. </p> <p> In the future I will use this blog to talk about stuff that interests me. For example: <ul> <li>Google App Engine, Django and other tools for rapid web application development</li> <li>Sage, an open source mathematics package</li> <li>Devi Prasad Shetty, and how he transformed medicine through mass production</li> <li>Interesting visualizations! As information is more and more available it is becoming increasingly important how to visualize and understand easily</li> </ul> </p> <p> The subscribe links don't work yet, so I am afraid you won't be able to read this blog in your favorite RSS reader. But I hope that I will see you again soon. Thanks for coming! </p>