Javier Tordable Blog http://www.javiertordable.com Javier Tordable blog on Software, Mathematics and Technology Nounoublog https://code.google.com/p/nounoublog/ P versus NP http://www.javiertordable.com/blog/2010/08/14/p-versus-np Sat, 14 Aug 2010 16:54:43 GMT http://www.javiertordable.com/blog/2010/08/14/p-versus-np <p> During the past week shocking news have stormed through the world of Theoretical Computer Science. A researcher from HP Labs, <a href=http://www.hpl.hp.com/personal/Vinay_Deolalikar/> Vinay Deolalikar</a> claimed that he had a proof that P ≠ NP. The paper is available <a href=http://www.hpl.hp.com/personal/Vinay_Deolalikar/Papers/pnp_8_11.pdf> here</a>. But so far it seems that it will not stand. Several researchers pointed out critical defects in the proof that Deolalikar proposed. It is especially interesting to follow the discussion in Dick Lipton’s blog (In 5 posts so far: <a href="http://rjlipton.wordpress.com/2010/08/08/a-proof-that-p-is-not-equal-to-np/"> 1</a> <a href="http://rjlipton.wordpress.com/2010/08/09/issues-in-the-proof-that-p≠np/"> 2</a> <a href="http://rjlipton.wordpress.com/2010/08/10/update-on-deolalikars-proof-that-p≠np/"> 3</a> <a href="http://rjlipton.wordpress.com/2010/08/11/deolalikar-responds-to-issues-about-his-p≠np-proof/"> 4</a> <a href="http://rjlipton.wordpress.com/2010/08/12/fatal-flaws-in-deolalikars-proof/"> 5</a>). Or in Wikipedia for the <a href="http://en.wikipedia.org/wiki/Vinay_Deolalikar#P_.E2.89.A0_NP"> short version</a>. It’s amazing to see how many of the world’s most brilliant mathematicians worked so fast to analyze Deolalikar’s proof, including <a href="http://rjlipton.wordpress.com/2010/08/10/update-on-deolalikars-proof-that-p≠np/#comment-4885"> Terence Tao</a>, and <a href="http://gowers.wordpress.com/2010/08/11/my-pennyworth-about-deolalikar/"> Timothy Gowers</a>. The fatal blow to the attempted proof seems to be in a letter from <a href="http://michaelnielsen.org/polymath1/index.php?title=Immerman's_letter"> Neil Immerman</a>. There are plenty of references about the proposed proof and the refutations in <a href="http://michaelnielsen.org/polymath1/index.php?title=Deolalikar's_P!%3DNP_paper"> this Wiki</a>. </p> <p> Now for anybody that is not an expert in complexity theory (myself included), here is a short explanation of what this is all about and why it matters: P and NP are two classes of algorithms. Colloquially, algorithms in P are those for which the amount of time that it takes to execute with an input of size n is a polynomial of n. For example, finding the maximum in a list of n elements takes approximately n steps because we just need to go through all n elements and keep track of the maximum. Problems for which the best possible algorithm is in P are in general solvable in reasonable time, because even for large values of the input the time that it takes to run the algorithm is not too big. </p> <p> Algorithms in NP are those for which, if we have a proposed solution, we can check if it actually solves the problem in polynomial time. For example, given a set of integers {−7, −3, −2, 5, 8} of size 5 we can check if a subset like {−3, −2, 5} adds up to zero in less than 5 steps. We simply need to add the numbers. However finding a solution in general takes a much bigger number of steps. One way to solve it is to analyze all subsets, and for each one, check if it’s a solution. But the number of subsets grows exponentially with n, hence the difficulty. </p> <p> Obviously all algorithms in P are also in NP, because if we can find the solution in polynomial time then we can verify the solution in polynomial time (for example by finding it again). The complex question is whether P is equal to NP or not. </p> <p> These two classes of algorithms are especially interesting for their practical applications but there are not the only classes by any means. The following diagram shows some of the most important ones. The classes at the bottom are contained in the classes on top. </p> <a href="http://en.wikipedia.org/wiki/Complexity_class"> <img src="http://www.javiertordable.com/img/algorithm-hierarchy.png" alt="Hierarchy of algorithm complexity classes"/> </a> <p> And there are many more. The <a href="http://qwiki.stanford.edu/wiki/Complexity_Zoo"> Complexity Zoo</a> lists 489 complexity classes. </p> <p> Now that we have seen what the problem is about, why does it matter? I am going to talk about two important real world consequences of a possible proof of P=NP. You can find many more <a href="http://en.wikipedia.org/wiki/P_versus_NP_problem#Consequences_of_proof"> here</a> if you are interested. </p> <p> The first application is in mathematical research. As Stephen Cook says, we can <a href="http://www.claymath.org/millennium/P_vs_NP/Official_Problem_Description.pdf"> verify a proof of a theorem in polynomial time</a> so if P=NP then we would be able to create a proof in polynomial time. This would allow us to build <a href="http://isabelle.in.tum.de/overview.html"> automatic theorem provers</a> that basically do scientific research for us 24/7, continuously discovering new knowledge. It’s obvious the tremendous repercussions that this would have in the world. </p> <a href="http://isabelle.in.tum.de/overview.html"> <img src="http://www.javiertordable.com/img/automatic-theorem-prover.png" alt="Automatic theorem prover"/> </a> <p> The second example is in biology. It is known that the <a href="http://en.wikipedia.org/wiki/Hydrophobic-polar_protein_folding_model"> HP protein folding model</a> <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.139.5547&rep=rep1&type=pdf"> is in NP</a>. This model basically simplifies the way that amino acids fold in space to form proteins, the essential actors in all cells. If we had an algorithm in P to solve this problem we could possibly design substances that have specific 3D shapes and properties, instead of simply finding them. For example <a href="http://en.wikipedia.org/wiki/Anti-idiotypic_vaccine"> some vaccines</a> work because the 3D shape of a protein in the vaccine fits the 3D shape of a protein in the pathogen. Who knows how many illnesses we could cure with that technology. </p> <a href="http://en.wikipedia.org/wiki/Protein_folding"> <img src="http://www.javiertordable.com/img/protein-folding.jpg" alt="Protein folding"/> </a> High Frequency Trading Art http://www.javiertordable.com/blog/2010/07/31/high-frequency-trading-art Sat, 31 Jul 2010 19:29:49 GMT http://www.javiertordable.com/blog/2010/07/31/high-frequency-trading-art <p> The folks at <a href="http://www.zerohedge.com/">Zero Hedge</a> published a very interesting post about <a href="http://www.zerohedge.com/article/its-not-market-its-hft-crop-circle-crime-scene-further-evidence-quote-stuffing-manipulation-"> high frequency trading and market manipulation</a>. I am not going to discuss economy or finance now, but there is something about that post that is quite shocking. The images which show the prices and volumes of the orders routed to the market have a certain aesthetic appeal. </p> <p> All the patterns below were most likely created by high frequency trading algorithms, which send thousands of orders to the financial markets and cancel them in milliseconds. The top part of the graph shows the price of the order, and the bottom part the number of shares of the bid or ask. The descriptions are copied from Zero Hedge. Click on the images to see a high resolution version. </p> <p> BATS "Flag Repeater". 15,000 quotes in 11 seconds, dropping the ASK price 1 penny each quote from $9.36 to $8.58 and back up again. 07-29-10. </p> <a href="http://www.zerohedge.com/sites/default/files/images/user5/imageroot/trichet/1%201%20Nanex.png"> <img src="http://www.javiertordable.com/img/hft-1-s.jpg" alt="High Frequency Trading Art 1"/> </a> <p> "The Crown". While not a large number of quotes, this NASDAQ/BATS Bidsize sequence was just too unusual to bypass. 07-29-10. </p> <a href="http://www.zerohedge.com/sites/default/files/images/user5/imageroot/trichet/1%202%20nanex.png"> <img src="http://www.javiertordable.com/img/hft-2-s.jpg" alt="High Frequency Trading Art 1"/> </a> <p> BATS "Batsicles". BATS price cycling through a large price range, each intermittent with a stub quote, drop it down and start over. 07-28-10. </p> <a href="http://www.zerohedge.com/sites/default/files/images/user5/imageroot/trichet/1%203%20nanex.png"> <img src="http://www.javiertordable.com/img/hft-3-s.jpg" alt="High Frequency Trading Art 1"/> </a> <p> NASDAQ "Blotter". One of the more unusual repeating Asksize cycles. 07-27-10. </p> <a href="http://www.zerohedge.com/sites/default/files/images/user5/imageroot/trichet/1%204%20nanex.png"> <img src="http://www.javiertordable.com/img/hft-4-s.jpg" alt="High Frequency Trading Art 1"/> </a> <p> BATS "Stubby Triangles". Drop the quote from a valid price to 0.001 and then back up to a lower price level. When the new price level hits 0.001 as well, do it all over again at approx. 380 times a second. 07-23-10. </p> <a href="http://www.zerohedge.com/sites/default/files/images/user5/imageroot/trichet/1%205%20nanex.png"> <img src="http://www.javiertordable.com/img/hft-5-s.jpg" alt="High Frequency Trading Art 1"/> </a> <p> NASDAQ "Flutter". 4000 quotes in 2 seconds, alternating the bid price/size in 3 increments and effecting the Best Bid along the way. 07-23-10. </p> <a href="http://www.zerohedge.com/sites/default/files/images/user5/imageroot/trichet/1%206%20nanex.png"> <img src="http://www.javiertordable.com/img/hft-6-s.jpg" alt="High Frequency Trading Art 1"/> </a> <p> BATS "Periscopes". 8000 quotes in 3 seconds, alternating the bid price each quote. Pop the size up 1 every second or so. 07-23-10. </p> <a href="http://www.zerohedge.com/sites/default/files/images/user5/imageroot/trichet/1%207%20nanex.png"> <img src="http://www.javiertordable.com/img/hft-7-s.jpg" alt="High Frequency Trading Art 1"/> </a> <p> NASDAQ "Double Dip". Symbol SH. 10,000 Quotes in 4 seconds, each affecting the Best Bid. 07-22-10. </p> <a href="http://www.zerohedge.com/sites/default/files/images/user5/imageroot/trichet/1%208%20nanex.png"> <img src="http://www.javiertordable.com/img/hft-8-s.jpg" alt="High Frequency Trading Art 1"/> </a> <p> NASDAQ "Racing Stripe". Symbol WYNN. 2000 Quotes in one second, each affecting the Best Ask. 07-19-10. </p> <a href="http://www.zerohedge.com/sites/default/files/images/user5/imageroot/trichet/1%209%20nanex.png"> <img src="http://www.javiertordable.com/img/hft-9-s.jpg" alt="High Frequency Trading Art 1"/> </a> <p> PACIFIC "Puzzle Pieces". Symbol IIC. 07-19-10. </p> <a href="http://www.zerohedge.com/sites/default/files/images/user5/imageroot/trichet/1%2010%20nanex.png"> <img src="http://www.javiertordable.com/img/hft-10-s.jpg" alt="High Frequency Trading Art 1"/> </a> <p> NASDAQ "Blue Bandsaw". Symbol SHG. (760 quotes in 1 second, taken from a total sampling of 10,000 quotes in 12 seconds). 07-14-10. </p> <a href="http://www.zerohedge.com/sites/default/files/images/user5/imageroot/trichet/1%2011%20nanex.png"> <img src="http://www.javiertordable.com/img/hft-11-s.jpg" alt="High Frequency Trading Art 1"/> </a> <p> BATS "60-Step". Symbol SAH. Take sixty steps up (a penny at a time) and one step down (0.60), reset and do it all over again (at approx. 700 times per second). 07-13-10. </p> <a href="http://www.zerohedge.com/sites/default/files/images/user5/imageroot/trichet/1%2012%20nanex.png"> <img src="http://www.javiertordable.com/img/hft-12-s.jpg" alt="High Frequency Trading Art 1"/> </a> <p> NASDAQ "Ask Mountain". Symbol IAU. Over 56,000 quotes in 10 seconds, all with same Ask Price and the Ask Size increasing or decreasing by 1 (to almost 40,000!). 07-12-10. </p> <a href="http://www.zerohedge.com/sites/default/files/images/user5/imageroot/trichet/1%2013%20nanex.png"> <img src="http://www.javiertordable.com/img/hft-13-s.jpg" alt="High Frequency Trading Art 1"/> </a> Beauty and Truth in Science http://www.javiertordable.com/blog/2010/07/24/beauty-and-truth-in-science Sat, 24 Jul 2010 00:20:05 GMT http://www.javiertordable.com/blog/2010/07/24/beauty-and-truth-in-science <object width="480" height="300"> <param name="movie" value="http://www.youtube.com/v/UuRxRGR3VpM&amp;hl=en_US&amp;fs=1"></param> <param name="allowFullScreen" value="true"></param> <param name="allowscriptaccess" value="always"></param> <embed src="http://www.youtube.com/v/UuRxRGR3VpM&amp;hl=en_US&amp;fs=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="480" height="300"> </embed> </object> <p> I just watched this fascinating talk from <a href="http://en.wikipedia.org/wiki/Murray_Gell-Mann"> Murray Gell-Mann</a>. The core idea of the talk is that in Physics, as in so many other areas of Science and Mathematics, incredibly complex ideas can be expressed in a simple and concise way. This is a remarkable property of the universe in which we live, and in that fact there is an amazing beauty. </p> <p> Gell-Mann gives an example from Physics, </p> <img src="http://www.javiertordable.com/img/newton-gravity-law.png" alt="Newton's Gravity Law"/> <p> This is the <a href="http://en.wikipedia.org/wiki/Newton's_law_of_universal_gravitation"> Law of Gravitation discovered by Newton</a>. It describes both the mechanism by with objects fall to the ground when dropped, as well as why the planets move in space. For its time it was a massive unification of many seemingly unrelated phenomena. It explains so much of how the world works, yet it can be written with just a few characters. </p> <p> Here is another example from Mathematics, the <a href="http://en.wikipedia.org/wiki/Prime_number_theorem"> Prime Number Theorem</a>, </p> <img src="http://www.javiertordable.com/img/prime-number-theorem.png" alt="The Prime Number Theorem"/> <p> Natural numbers are some of the most elemental constructions of the human mind, but they allow us to count from the fingers in our hand to the last atom in the entire universe. We know from basic arithmetic that all natural numbers can be factored into primes, so primes are in a sense the “building blocks” of all numbers. And even though primes are distributed in an incredibly intricate way, the Prime Number Theorem explains the very structure of these building blocks. </p> Edward Tufte, Presenting Data and Information http://www.javiertordable.com/blog/2010/06/20/edward-tufte-presenting-data-and-information Sun, 20 Jun 2010 01:25:20 GMT http://www.javiertordable.com/blog/2010/06/20/edward-tufte-presenting-data-and-information <p> Last week I attended <a href="http://www.edwardtufte.com/tufte/courses"> Edward Tufte's course on data visualization</a> here in Seattle. For those who don't know him, Tufte is one of the world's top experts on information visualization. Here is a short description, from his website: </p> <div id="special-text"> Edward Tufte has written seven books, including Beautiful Evidence, Visual Explanations, Envisioning Information, The Visual Display of Quantitative Information, and Data Analysis for Politics and Policy. He writes, designs, and self-publishes his books on analytical design, which have received more than 40 awards for content and design. He is Professor Emeritus at Yale University, where he taught courses in statistical evidence, information design, and interface design. His current work includes landscape sculpture, printmaking, video and a new book. </div> <p> During the course Tufte made several observations that go against what some other people in the field recommend. The most notable from my point of view is to use visualizations with lots of content. And instead of trying to simplify the data, let people understand it on their own. I believe this can be appropriate in many cases, for example in an <a href="http://www.tfl.gov.uk/assets/downloads/standard-tube-map.gif"> underground map</a>: </p> <a href="http://www.tfl.gov.uk/assets/downloads/standard-tube-map.gif"> <img src="http://www.javiertordable.com/img/underground-map-detail.png" alt="Detail of the London tube map"/> </a> <p> However in some other cases the information overload hides what is really important. For example in this awful visualization in <a href="http://www.forbes.com/2010/06/04/migration-moving-wealthy-interactive-counties-map.html"> Forbes about where Americans are moving</a>: </p> <a href="http://www.forbes.com/2010/06/04/migration-moving-wealthy-interactive-counties-map.html"> <img src="http://www.javiertordable.com/img/forbes-where-americans-move.jpg" alt="Forbes visualization. Moving patterns of Americans"/> </a> <p> It's impossible to understand everything that is going on. What I got out of it is that people are moving from the most economically impacted areas (Southern California, Detroit, Miami, etc.) and into the least impacted areas (Pacific northwest, Texas, New York, Washington, etc.). Also I think I saw a minor trend of movement from colder areas to warmer areas. However with all that clutter it's hard to understand the overall picture. It is true that overload in general is a failure of design, not a problem with information, however it's definitely easier to make bad designs when one tries to present too much data. </p> <p> In general the course was interesting, but for the most part it covered basic content. However they gave us a copy of Tufte's four books in visualization! Click on the following images to go to the corresponding Amazon pages: </p> <p> <a href="http://www.amazon.com/gp/product/0961392142?ie=UTF8&tag=javitordblogo-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0961392142">The Visual Display of Quantitative Information</a> </p> <a href="http://www.amazon.com/gp/product/0961392142?ie=UTF8&tag=javitordblogo-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0961392142"> <img src="http://www.javiertordable.com/img/tufte-visual-display.jpg" alt="Edward Tufte. The Visual Display of Quantitative Information"> </a> <p> <a href="http://www.amazon.com/gp/product/0961392118?ie=UTF8&tag=javitordblogo-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0961392118">Envisioning Information</a> </p> <a href="http://www.amazon.com/gp/product/0961392118?ie=UTF8&tag=javitordblogo-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0961392118"> <img src="http://www.javiertordable.com/img/tufte-envisioning-information.jpg" alt="Edward Tufte. Envisioning Information"> </a> <p> <a href="http://www.amazon.com/gp/product/0961392126?ie=UTF8&tag=javitordblogo-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0961392126">Visual Explanations</a> </p> <a href="http://www.amazon.com/gp/product/0961392126?ie=UTF8&tag=javitordblogo-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0961392126"> <img src="http://www.javiertordable.com/img/tufte-visual-explanations.jpg" alt="Edward Tufte. Visual Explanations"> </a> <p> <a href="http://www.amazon.com/gp/product/0961392177?ie=UTF8&tag=javitordblogo-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0961392177">Beautiful Evidence</a> </p> <a href="http://www.amazon.com/gp/product/0961392177?ie=UTF8&tag=javitordblogo-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0961392177"> <img src="http://www.javiertordable.com/img/tufte-beautiful-evidence.jpg" alt="Edward Tufte. Beautiful Evidence"> </a> The Best Cities for Singles in America http://www.javiertordable.com/blog/2010/06/06/best-cities-for-singles-in-america Sun, 06 Jun 2010 03:38:36 GMT http://www.javiertordable.com/blog/2010/06/06/best-cities-for-singles-in-america <script type='text/javascript' src='http://www.google.com/jsapi'> </script> <p> A few weeks back I had a conversation with a friend about different places in the US where she was considering moving. And one of the factors was the dating scene in each one of those places. This brought to my memory a graph that probably many of you have already seen: </p> <img src="http://www.javiertordable.com/img/national-geographic-chart.jpg" alt="National Geographic Chart"> <p> This map basically says the following: </p> <ul> <li>If the dot is blue, there are more men than women</li> <li>If the dot is orange, there are more women than men</li> <li>The bigger the dot, the greater the difference</li> </ul> <p> However it has a series of problems. First of all, it's too dense. Second, it is not clear that the number of single men minus the number of single women (or the opposite) is what really matters, because obviously the bigger the city the less important the difference is. For example, in a city of 1 million we could have 100K singles and 20K more single men than woman. This means there are 3 single men for every two single woman. However in a city of 10 million, with 1 million singles, the fact that there are 20K more single men, only means there are approximately 1.02 single men for every single woman. It seems that the proportion of single men to single woman is more indicative than the difference. And third and most important, I didn't know for sure if the data in the graph is reliable, as there are no sources cited. So, as you can imagine I decided to check if there is actually any truth in this chart. </p> <p> Here is the list of steps necessary to get the interesting data from the Census: </p> <ol> <li>Start in the <a href="http://factfinder.census.gov"> Census fact finder homepage</a></li> <li>Click on Data Sets on the left navigation bar</li> <li>Select SF3 and click on Detailed Table</li> <li>In the Geography Type section choose Metropolitan Statistical Area</li> <li>Make sure all statistical areas are selected and click Next</li> <li>Now choose the table P18. Sex by marital status for the population 15+ years</li> <li>Click on Add and then Show result</li> </ol> <p> From this table we want the "Never married" rows. Because the table is too big to analyze directly (280 columns) Let's download it and map the results using the Google Charts API. </p> <p> You may need to clean up the downloaded file, because there are copyright notices and extra end of line characters. Also you may keep only the interesting lines, which are 3: </p> <ul> <li>The header of the table, which contains the name of the metropolitan areas</li> <li>The row for Male / Never Married individuals</li> <li>The row for Female / Never Married individuals</li> </ul> <p> After removing all other rows, it's possible to parse the file and extract the interesting data with a small python script like this: </p> <pre> input_file = open('DTDownload.csv') # First process the header line. header = input_file.readline() header = header[5:-2] # Remove " ", at beginning and "\n at the end. tokens = header.split('","') cities = [] for token in tokens: token = token.replace('MSA', '') # Eliminate the MSA suffix. token = token.replace('CMSA', '') if token.find('--'): # Remove everything after -- token = token[:token.find('--')] cities.append(token) # Now process "Never married" men. line = input_file.readline() line = line[17:-2] # Remove "Never married" at beginning and " at end. tokens = line.split('","') single_men = [] for token in tokens: token = token.replace(',', '') # Eliminate the , for thousands. single_men.append(int(token)) # Now process "Never married" women. line = input_file.readline() line = line[17:-2] # Remove "Never married" at beginning and " at end. tokens = line.split('","') single_women = [] for token in tokens: token = token.replace(',', '') # Eliminate the , for thousands. single_women.append(int(token)) # And from those two quantities compute the ratio of single men # to single women. The higher, the better for women. Reverse the # quotient to get the other side. ratios = [] for i in xrange(len(single_men)): ratio = float(single_men[i]) / float(single_women[i]) ratios.append(ratio) # Now print the name of the city and ratio in the format accepted # by the Google Charts API, which is: # data.setValue(0, 0, 'New York'); # data.setValue(0, 1, 1.064382); for i in xrange(len(cities)): print 'data.setValue(%(row)d, 0, \'%(city)s\');' % \ {'row': i, 'city': cities[i]} print 'data.setValue(%(row)d, 1, %(ratio)f);' % \ {'row': i, 'ratio': ratios[i]} </pre> <p> And to display the data, just include it into a <a href="http://code.google.com/apis/ajax/playground/?type=visualization#geo_map"> Google Charts visualization</a>. Unfortunately I couldn't figure out how to make the circles bigger or smaller based on one parameter and to set the color based on another parameter. Please tell me if you know how to do it. Anyway, until I find some time to learn <a href="http://vis.stanford.edu/protovis/">Protovis</a>, bear with me. What I did is split the chart into two, one for guys and one for girls. And to have it load faster, and avoid overloading with too much detail, I just took a small number of cities (This was actually Noha's idea so the credit should go to her). Here is how the chart looks like for girls, the bigger and bluer the better. It means there are more single guys for each single girl: </p> <script type="text/javascript"> google.load('visualization', '1', {'packages': ['geomap']}); google.setOnLoadCallback(drawMap1); function drawMap1() { var data = new google.visualization.DataTable(); data.addRows(28); data.addColumn('string', 'City'); data.addColumn('number', 'Ratio'); data.setValue(0, 0, 'Atlanta'); data.setValue(0, 1, 1.167621); data.setValue(1, 0, 'Boston'); data.setValue(1, 1, 1.082052); data.setValue(2, 0, 'Chicago'); data.setValue(2, 1, 1.128832); data.setValue(3, 0, 'Cleveland'); data.setValue(3, 1, 1.080480); data.setValue(4, 0, 'Dallas'); data.setValue(4, 1, 1.250400); data.setValue(5, 0, 'Denver'); data.setValue(5, 1, 1.271842); data.setValue(6, 0, 'Detroit'); data.setValue(6, 1, 1.121086); data.setValue(7, 0, 'Houston'); data.setValue(7, 1, 1.225656); data.setValue(8, 0, 'Indianapolis'); data.setValue(8, 1, 1.116891); data.setValue(9, 0, 'Jacksonville'); data.setValue(9, 1, 1.184223); data.setValue(10, 0, 'Kansas City'); data.setValue(10, 1, 1.137844); data.setValue(11, 0, 'Las Vegas'); data.setValue(11, 1, 1.417289); data.setValue(12, 0, 'Los Angeles'); data.setValue(12, 1, 1.214651); data.setValue(13, 0, 'Memphis'); data.setValue(13, 1, 1.049954); data.setValue(14, 0, 'Miami'); data.setValue(14, 1, 1.178491); data.setValue(15, 0, 'Milwaukee'); data.setValue(15, 1, 1.089912); data.setValue(16, 0, 'Minneapolis'); data.setValue(16, 1, 1.146799); data.setValue(17, 0, 'New Orleans'); data.setValue(17, 1, 1.028731); data.setValue(18, 0, 'New York'); data.setValue(18, 1, 1.064382); data.setValue(19, 0, 'Orlando'); data.setValue(19, 1, 1.224336); data.setValue(20, 0, 'Philadelphia'); data.setValue(20, 1, 1.055936); data.setValue(21, 0, 'Phoenix'); data.setValue(21, 1, 1.325386); data.setValue(22, 0, 'Portland'); data.setValue(22, 1, 1.253985); data.setValue(23, 0, 'Salt Lake City'); data.setValue(23, 1, 1.270336); data.setValue(24, 0, 'San Diego'); data.setValue(24, 1, 1.378117); data.setValue(25, 0, 'San Francisco'); data.setValue(25, 1, 1.261324); data.setValue(26, 0, 'Seattle'); data.setValue(26, 1, 1.270545); data.setValue(27, 0, 'Washington'); data.setValue(27, 1, 1.061015); var options = {}; options['region'] = 'US'; options['width'] = 500; options['height'] = 300; options['colors'] = [0xFFFFFF, 0x0000FF, 0x000055]; options['dataMode'] = 'markers'; var container = document.getElementById('map_canvas1'); var geomap = new google.visualization.GeoMap(container); geomap.draw(data, options); }; </script> <p> <div id='map_canvas1' style="margin-left: 40px;"></div> </p> <p> It seems that the West Coast, Texas and Florida are the best places for girls. Here is how it looks like when taking all 280 cities. I did not include the visualization, but a simple image because it takes quite a while to load. </p> <img src="http://www.javiertordable.com/img/best-cities-for-singles-chart-guys.jpg" alt="Chart of the best cities for single guys"> <p> The one for guys is the opposite, the bigger and more pink, the better. It means there are more single girls for each single guy: </p> <script type="text/javascript"> google.load('visualization', '1', {'packages': ['geomap']}); google.setOnLoadCallback(drawMap2); function drawMap2() { var data = new google.visualization.DataTable(); data.addRows(28); data.addColumn('string', 'City'); data.addColumn('number', 'Ratio'); data.setValue(0, 0, 'Atlanta'); data.setValue(0, 1, 0.856442287); data.setValue(1, 0, 'Boston'); data.setValue(1, 1, 0.924170003); data.setValue(2, 0, 'Chicago'); data.setValue(2, 1, 0.885871414); data.setValue(3, 0, 'Cleveland'); data.setValue(3, 1, 0.925514586); data.setValue(4, 0, 'Dallas'); data.setValue(4, 1, 0.799744082); data.setValue(5, 0, 'Denver'); data.setValue(5, 1, 0.786261187); data.setValue(6, 0, 'Detroit'); data.setValue(6, 1, 0.891992229); data.setValue(7, 0, 'Houston'); data.setValue(7, 1, 0.815889613); data.setValue(8, 0, 'Indianapolis'); data.setValue(8, 1, 0.895342518); data.setValue(9, 0, 'Jacksonville'); data.setValue(9, 1, 0.84443555); data.setValue(10, 0, 'Kansas City'); data.setValue(10, 1, 0.878855098); data.setValue(11, 0, 'Las Vegas'); data.setValue(11, 1, 0.705572399); data.setValue(12, 0, 'Los Angeles'); data.setValue(12, 1, 0.823281749); data.setValue(13, 0, 'Memphis'); data.setValue(13, 1, 0.952422678); data.setValue(14, 0, 'Miami'); data.setValue(14, 1, 0.848542755); data.setValue(15, 0, 'Milwaukee'); data.setValue(15, 1, 0.917505266); data.setValue(16, 0, 'Minneapolis'); data.setValue(16, 1, 0.871992389); data.setValue(17, 0, 'New Orleans'); data.setValue(17, 1, 0.972071416); data.setValue(18, 0, 'New York'); data.setValue(18, 1, 0.939512318); data.setValue(19, 0, 'Orlando'); data.setValue(19, 1, 0.816769253); data.setValue(20, 0, 'Philadelphia'); data.setValue(20, 1, 0.947027093); data.setValue(21, 0, 'Phoenix'); data.setValue(21, 1, 0.75449718); data.setValue(22, 0, 'Portland'); data.setValue(22, 1, 0.797457705); data.setValue(23, 0, 'Salt Lake City'); data.setValue(23, 1, 0.787193309); data.setValue(24, 0, 'San Diego'); data.setValue(24, 1, 0.725627795); data.setValue(25, 0, 'San Francisco'); data.setValue(25, 1, 0.792817706); data.setValue(26, 0, 'Seattle'); data.setValue(26, 1, 0.787063819); data.setValue(27, 0, 'Washington'); data.setValue(27, 1, 0.942493744); var options = {}; options['region'] = 'US'; options['width'] = 500; options['height'] = 300; options['colors'] = [0xFFFFFF, 0xFF0088, 0xFF0022]; options['dataMode'] = 'markers'; var container = document.getElementById('map_canvas2'); var geomap = new google.visualization.GeoMap(container); geomap.draw(data, options); }; </script> <p> <div id='map_canvas2' style="margin-left: 40px;"></div> </p> <p> So the best places for guys seem to be the East Coast and the Great Lakes area. In particular the Bos-Wash corridor seems to be quite good. And here is how it looks like when taking all metropolitan areas: </p> <img src="http://www.javiertordable.com/img/best-cities-for-singles-chart-girls.jpg" alt="Chart of the best cities for single girls"> <p> Does anyone know of similar charts for other countries? I have never seen one myself. But after what we said before, if you can find census-like data now you can make one yourself! </p> Quantitative Ponzinomics http://www.javiertordable.com/blog/2010/05/07/quantitative-ponzinomics Fri, 07 May 2010 22:40:28 GMT http://www.javiertordable.com/blog/2010/05/07/quantitative-ponzinomics <p> Just a quick pointer to a book that I am reading right now, <strong>Quantitative Ponzinomics</strong>, an essential guide to understand what is going on currently in the financial markets around the world. </p> <img src="http://www.javiertordable.com/img/quantitative-ponzinomics.jpg" alt="Quantitative Ponzinomics"> <p> Ok, it's not a real book, but you should really check out the source, the financial blog <a href="http://www.zerohedge.com">Zero Hedge</a>. And in order to keep this blog on-topic, here is an interesting link about <a href="http://www.wikinvest.com/wiki/High-Frequency_Trading_(HFT)"> High Frequency Trading</a>, which is an automated procedure to make financial transactions using computer algorithms to get an advantage over other investors. </p> Machine Learning at Stanford http://www.javiertordable.com/blog/2010/05/01/machine-learning-stanford Sat, 01 May 2010 23:28:57 GMT http://www.javiertordable.com/blog/2010/05/01/machine-learning-stanford <p> This quarter I am leading a study group in Machine Learning at Google's Kirkland Office. And while I was looking for datasets and resources I found <a href="http://ai.stanford.edu/~ang/">Andrew Ng's</a> course in Machine Learning at Stanford. All the lectures are available online at YouTube. You can find the links in the <a href="http://see.stanford.edu/see/lecturelist.aspx?coll=348ca38a-3a6d-4052-937d-cb017338d7b1"> Stanford Engineering Everywhere Machine Learning course page</a>. </p> <p> Here is the first class of the series: </p> <object width="480" height="385"> <param name="movie" value="http://www.youtube.com/v/UzxYlbK2c7E&hl=en_US&fs=1&"></param> <param name="allowFullScreen" value="true"></param> <param name="allowscriptaccess" value="always"></param> <embed src="http://www.youtube.com/v/UzxYlbK2c7E&hl=en_US&fs=1&" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="480" height="385"></embed> </object> <p> Also you can check out the main <a href="http://see.stanford.edu/see/courses.aspx"> Stanford Engineering Everywhere page</a> for other courses in computer science. </p> Preserving the Wealth Creation Engine of America http://www.javiertordable.com/blog/2010/04/18/preserving-the-wealth-creation-engine-of-america Sun, 18 Apr 2010 04:28:46 GMT http://www.javiertordable.com/blog/2010/04/18/preserving-the-wealth-creation-engine-of-america <p> Over the last few decades hundreds of companies, thousands of jobs and billions of dollars of real wealth were created by startups with the support of angel investors. The possibility of a small company to obtain seed funding with little more than an idea is indeed one of the most important wealth creation engines of America. Now this possibility, this engine, is at stake. </p> <p> There is a bill currently under review in the Senate that includes two provisions that directly threaten the way that startups are funded. The bill is the "<strong>Restoring American Financial Stability Act of 2010</strong>” supported by Senator Dodd. Even though the intention of the bill is noble, to attempt to protect investors against fraudulent investment requests, two of its sections can have severe unintended consequences for the economy and the society of the United States. The first one is the following: </p> <div id="special-text"> <p> <strong>SEC. 412. ADJUSTING THE ACCREDITED INVESTOR STANDARD FOR INFLATION</strong>.<br/> The Commission shall, by rule—<br/> (1) increase the financial threshold for an accredited investor, as set forth in the rules of the Commission under the Securities Act of 1933, by calculating an amount that is greater than the amount in effect on the date of enactment of this Act of $200,000 income for a natural person (or $300,000 for a couple) and $1,000,000 in assets, as the Commission determines is appropriate and in the public interest, in light of price inflation since those figures were determined; and (2) adjust that threshold not less frequently than once every 5 years, to reflect the percentage increase in the cost of living. </p> </div> <p> This basically raises the minimum income and assets that an investor needs to have to be considered qualified to invest in a startup. The limit would go from $1 million of net worth to $2.3 million and from $200,000 of annual income to $449,000. This could affect more than two thirds of all angel investors. </p> <img src="http://www.javiertordable.com/img/senate-committee-banking.jpg" alt="United States Senate committee on Banking" /> <p> The second section is: </p> <div id="special-text"> <p> <strong>SEC. 926. AUTHORITY OF STATE REGULATORS OVER REGULATION D OFFERINGS</strong>.<br/> [...]<br/> IN GENERAL.—The Comission shall review any filings made relating to any security issued under Commission rules or regulations under section 4(2), other than one designated as a non-covered security under subparagraph (A)(iv), not later than 120 days of the filing with the Commission.<br/> [...]<br/> IN GENERAL.—Nothing in subparagraph (A)(iv), (B), or (C), shall be construed to prohibit a State from imposing notice filing requirements that are substantially similar to filing requirements required by rule or regulation under section 4(4) that were in effect on September 1, 20 1996.<br/> [...]<br/> </p> </div> <p> it essentially would require startups to make a filing with the SEC, which will have up to 120 days to review it. If the SEC doesn't review the filing then securities regulators in all states involved in the deal will have the chance to review it. What this means is that cash-strapped startups will have to wait 4 months for funds, and they will have to comply with securities laws of up to 50 states, with significant costs in legal fees. </p> <p> <a href="http://www.saveregd.com/">http://www.saveregd.com/</a>, contains more information about the significance of this sections and the impact that they may have on startups. Also you may find the full bill <a href="http://banking.senate.gov/public/_files/ChairmansMark31510AYO10306_xmlFinancialReformLegislationBill.pdf"> here</a>. </p> <p> I believe that these two sections of the bill will impact negatively startups, angel investors, and as a result the mechanism by which entrepreneurs get funding for new ideas, innovate, and generate jobs and wealth. To express your oposition to these two sections of the bill, and help preserve one of the most important wealth creation engines of America, I would like to ask you to review and <a href="http://gopetition.com/online/32354.html"> sign this petition now</a>. </p> How to Make an Infographic Résumé http://www.javiertordable.com/blog/2010/03/23/how-to-make-an-infographic-resume Tue, 23 Mar 2010 04:08:19 GMT http://www.javiertordable.com/blog/2010/03/23/how-to-make-an-infographic-resume <script type="text/javascript" src="http://www.google.com/jsapi"> </script> <script type="text/javascript"> google.load('visualization', '1',{packages: ['annotatedtimeline']}); google.load('visualization', '1', {packages: ['table']}); </script> <p> <strong>Infographics</strong>, or information graphics are visual representations of data. In a good visualization or representation of a data set, the author expresses an idea that is deeper that the data itself. A good visualization conveys a message that is clear and helps to extract conclusions, but also a message that is precise and based on the data, without transforming or manipulating the data in dishonest ways. </p> <p> In the case of a <strong>résumé</strong>, the data is obviously the education, professional experience and interests of the candidate. And the purpose of an <strong>infographic résumé</strong> is to show the greatest highlights of his or her education and professional experience, to convince a potential employer of the value of the candidate. To do that, the charts and visualization elements have to be adequate to the characteristics that we want to remark. For example: </p> <ul> <li> If a candidate has a lot of international professional experience, one way to show that is to include a world map with the location of the last work assignments </li> <li> Or if the candidate is a senior executive of a public corporation, it may be interesting to show the stock value of the company at the time that the executive launched new products, or took important business-wide decisions </li> <li> In case of a professor with a long publication history, it may be very valuable to show a chart of important publications and their citation index, as a measure of the impact of those publications in the scientific community </li> </ul> <h3>Some Example Résumés</h3> <p> Over the last few years several designers have experimented with inforgraphic résumés. Recently <strong>Randy Krum</strong> wrote a post about it in his blog <a href="http://www.coolinfographics.com/blog/2010/1/8/16-infographic-resumes-a-visual-trend.html"> Cool Infographics</a>. I am not going to repeat what he said, but I will mention a few of the most famous ones. </p> <p> The résumé of <strong>Michael Anderson</strong> is probably the most cited one, it includes a timeline with education and experience, and marks for the most important events. It is visually attractive, but the colors and sizes of the different elements don't really mean anything. And the whole bottom left part of the chart is only meant to show some sense of humor. You can click on the thumbnail below to see the full image. </p> <a href="http://theportfolio.ofmichaelanderson.com/wp-content/uploads/2008/05/resume-infographic.jpg"> <img src="http://www.javiertordable.com/img/michael-anderson-resume-infographics.jpg" alt="Infographic resume of Michael Anderson" /> </a> <p> <strong>Christoper Perkins</strong> designed another résumé using the common look of a subway map. Also it is visually original, but the connections between the different elements are not very complex (only two intersections between lines), so using the look of a subway map seems to add more complexity that it removes. Also this visualization uses intersections between lines to represent the fact that two events happened at the same point in time. But the fact that two work projects or education courses happened at the same time is probably not the most important characteristic of a résumé. </p> <a href="http://www.flickr.com/photos/ernestolago/4144781475/sizes/o/"> <img src="http://www.javiertordable.com/img/christopher-perkins-inforgaphic-resume.jpg" alt="Infographic resume of Christopher Perkins"/> </a> <p> Compare this use of the subway visualization with the Map of Rock of <strong>Ernesto Lago</strong>. Here what is important is how some musicians merge two different styles, or inherit from two different musical schools or traditions, and that is exactly what the subway intersections represent. </p> <a href="http://www.flickr.com/photos/ernestolago/4144781475/"> <img src="http://www.javiertordable.com/img/rockmap-ernesto-lago.jpg" alt="Rock Map of Ernesto Lago"/> </a> <p> <strong>Greg Dizzia</strong> gives a different point of view. In his résumé he adds the skills involved in each professional project. I think this is very valuable to potential employers, and it's one of the main points of a résumé. In this case I think the visualization is a little bit too charged and repetitive but the main message is very well expressed. </p> <a href="http://dizzia.deviantart.com/art/Curriculum-Vitae-PDF-69050981"> <img src="http://www.javiertordable.com/img/curriculum-vitae-by-greg-dizzia.png" alt="Infographic resume of Greg Dizzia"/> </a> <p> As a last example let's take a look at the résumé of <strong>Gabriele Bozzi</strong>. In this case he forgets almost completely about the temporal element of the résumé and concentrates on the functional areas. And he divides them between general skills, and knowledge of particular applications and tools. To me this has too much detail, and way too much <em>keyword hunting</em>, but the visual display is very interesting. </p> <a href="http://www.kaukana.be/wp/?p=430"> <img src="http://www.javiertordable.com/img/gabriele-bozzi-infographic-resume.png" alt="Infographic resume of Gabriele Bozzi"/> </a> <h3>How to Make an Infographic Résumé yourself</h3> <p> First of all, just like any other infographic project we need to decide what data we are going to show. In my case I will take short snippets of projects in which I have worked on. Because the main point of this post is to show how to develop the visual component I won't spend too much time in getting a great wording of each project, or using the correct keywords. </p> <p> Second, it's necessary to decide what is the most important messages that we want to convey from that data. In my case these messages are: </p> <ul> <li> <strong>Education</strong>: I have a thorough education in Computer Science and Mathematics. It is not too varied because I don't have courses or certifications in other areas, and I studied all my degrees in the same University. But I have spent 10 years on it. Also, I don't want to add much detail about college projects, because they are less interesting than professional projects. And I don't want to mix education and work, because I combined both over the last few years and it would just look confusing </li> <li> <strong>Work</strong>: I have worked in some of the most prestigious companies in the world, in increasing degrees of responsibility and success and I definitely want to emphasize that. Also I want to mention that I worked in some particular areas in software, like machine learning. And I started doing a little bit of research lately. So I think it's worth showing that I am versatile, in terms of work skills. Finally I think it's also worth mentioning my contributions to open source and internships, because they add variety to my professional experience </li> <li> <strong>Impact</strong>: Finally, because I have been lucky to work in projects that are public and other people are using, I have had a significant impact outside of my employers. I want to emphasize that as well </li> </ul> <p> And third, after the message is clear, it's turn to decide what kind of visualization would fit best the data and the message. I won't go very deep into this topic, because there is so much to talk about that we could fill several books. So here are the decisions I took: <p> <ul> <li> For the education I will simply show a bar with all the degrees that I have attained, in a straight line. The size of each part of the bar indicates how long it took to get that degree. The colors are only to differentiate sections. A legend at the bottom will indicate what is each section. Also as this is the weakest section, it will go at the end of the résumé </li> <li> For the experience part, I will show a timeline of professional positions, and the main projects within those positions. I will add the logos of the corresponding companies on the left side. And for each project I will indicate the functional skills that I used. In order to keep it simple I will have only 5 main skills. Also to have space for all the positions, I will have a vertical timeline, instead of the most common horizontal one </li> <li> Finally, the impact of my projects is probably the most differentiating characteristic of my resume, so I will put it on top. And in order to show data that is easy to check I will use as a measure of impact the number of search results in google.com for the project name, divided by the number of developers in the project. In order to indicate which project corresponds to which value in the graph I will use an annotated time series. </li> </ul> <p> And without any more delay, here is the result! </p> <h3> Impact </h3> <script type="text/javascript"> function drawVisualization1() { var data = new google.visualization.DataTable(); data.addColumn('date', 'Date'); data.addColumn('number', 'Impact'); data.addColumn('string', 'title1'); data.addColumn('string', 'text1'); data.addRows(8); data.setValue(0, 0, new Date(2005, 7 ,1)); data.setValue(0, 1, 11000); data.setValue(0, 2, 'Microsoft'); data.setValue(0, 3, '<em>Intern</em> <a href="http://www.google.com/search?q=windows+vista+webdav+testing">Windows Vista WebDAV Testing</a><br/>(134K / 12)'); data.setValue(1, 0, new Date(2006, 6 ,2)); data.setValue(1, 1, 7800); data.setValue(1, 2, 'McKinsey'); data.setValue(1, 3, '<em>Intern</em> <a href="http://www.google.com/search?q=mckinsey+arcelor+mittal+merger">Arcelor Mittal Merger</a><br/>(47K / 6)'); data.setValue(2, 0, new Date(2007, 3 ,3)); data.setValue(2, 1, 24000); data.setValue(2, 2, 'Microsoft'); data.setValue(2, 3, '<em>Engineer</em> <a href="http://www.google.com/search?q=live+search+relevance">Live Search Relevance</a><br/>(24M / 1K)'); data.setValue(3, 0, new Date(2008, 3 ,4)); data.setValue(3, 1, 2500); data.setValue(3, 2, 'OpenSource'); data.setValue(3, 3, '<em>Author</em> <a href="http://www.google.com/search?q=financeAI">FinanceAI</a><br/>(2.5K / 1)'); data.setValue(4, 0, new Date(2008, 10 ,5)); data.setValue(4, 1, 11000); data.setValue(4, 2, 'Google'); data.setValue(4, 3, '<em>Engineer</em>, <a href="http://www.google.com/search?q=webmaster+tools+new+GData+API">Webmaster Tools new GData API</a><br/>(11K / 1)'); data.setValue(5, 0, new Date(2009, 6 ,6)); data.setValue(5, 1, 12000); data.setValue(5, 2, 'OpenSource'); data.setValue(5, 3, '<em>Author</em> <a href="http://www.google.com/search?q=map+reduce+integer+factorization">Map Reduce Integer Factorization</a><br/>(11K / 1)'); data.setValue(6, 0, new Date(2009, 10 ,7)); data.setValue(6, 1, 47000); data.setValue(6, 2, 'Google'); data.setValue(6, 3, '<em>Engineer</em>, <a href="http://www.google.com/search?q=webmaster+tools+labs">Webmaster Tools Labs</a><br/>(700K / 15)'); data.setValue(7, 0, new Date(2009, 11 ,8)); data.setValue(7, 1, 65000); data.setValue(7, 2, 'Google'); data.setValue(7, 3, '<em>Tech Lead</em>, <a href="http://www.google.com/search?q=fetch+as+googlebot">Fetch as Googlebot</a><br/>(65K / 1)'); var annotatedtimeline = new google.visualization.AnnotatedTimeLine( document.getElementById('visualization1')); annotatedtimeline.draw(data, {'displayAnnotations': true, 'max': 70000, 'allowHtml': true}); } google.setOnLoadCallback(drawVisualization1); </script> <p> <div id="visualization1" style="width: 600px; height: 400px;"></div> </p> <p> Clicking in each one of the marks in the time series highlights the description of the project. For each case I have included the employer, the position and a very short description of the project. In parentheses is the number of search results divided by the number of people that worked in the project. Also clicking in the description takes to a search results page. The results are usually a reference my personal work, or work in which I participated. </p> <h3> Experience </h3> <script type="text/javascript"> function drawVisualization2() { // Create and populate the data table. var data = new google.visualization.DataTable(); data.addColumn('string', '<div style="text-align: center;">Position</div>'); data.addColumn('string', '<div style="text-align: center;">Software Engineering</div>'); data.addColumn('string', '<div style="text-align: center;">Cloud Computing</div>'); data.addColumn('string', '<div style="text-align: center;">Machine Learning</div>'); data.addColumn('string', '<div style="text-align: center;">Mathematical Research</div>'); data.addColumn('string', '<div style="text-align: center;">Strategic Consulting</div>'); data.addRows(6); data.setCell(0, 0, '<img src="http://www.google.com/intl/en_ALL/images/logo.gif" width=70><em>Tech Lead</em>'); data.setCell(0, 1, '<a href="http://www.google.com/search?q=fetch+as+googlebot">Fetch as Googlebot</a>'); data.setCell(0, 2, '<a href="http://www.google.com/search?q=webmaster+tools+backend">Webmaster Tools backend</a>'); data.setCell(0, 3, null); data.setCell(0, 4, null); data.setCell(0, 5, null); data.setCell(1, 0, '<img src="http://www.google.com/intl/en_ALL/images/logo.gif" width=70><em>Engineer</em>'); data.setCell(1, 1, '<a href="http://www.google.com/search?q=webmaster+tools+labs">Webmaster Tools Labs</a>'); data.setCell(1, 2, '<a href="http://www.google.com/search?q=webmaster+tools+new+GData+API">Webmaster Tools new GData API</a>'); data.setCell(1, 3, null); data.setCell(1, 4, null); data.setCell(1, 5, null); data.setCell(2, 0, '<img src="http://www.javiertordable.com/img/open-source.png" width=70>'); data.setCell(2, 1, null); data.setCell(2, 2, null); data.setCell(2, 3, '<a href="http://www.google.com/search?q=financeAI">FinanceAI</a>'); data.setCell(2, 4, '<a href="http://www.google.com/search?q=map+reduce+integer+factorization">Map Reduce Integer Factorization</a>'); data.setCell(2, 5, null); data.setCell(3, 0, '<img src="http://www.microsoft.com/library/toolbar/3.0/images/banners/ms_masthead_ltr.gif" width=70><em>Engineer</em>'); data.setCell(3, 1, null); data.setCell(3, 2, null); data.setCell(3, 3, '<a href="http://www.google.com/search?q=live+search+relevance">Live Search Relevance</a>'); data.setCell(3, 4, null); data.setCell(3, 5, null); data.setCell(4, 0, '<img src="http://www.javiertordable.com/img/mckinsey-logo.gif" width=70><em>Intern</em>'); data.setCell(4, 1, null); data.setCell(4, 2, null); data.setCell(4, 3, null); data.setCell(4, 4, null); data.setCell(4, 5, '<a href="http://www.google.com/search?q=mckinsey+arcelor+mittal+merger">Arcelor Mittal Merger</a>'); data.setCell(5, 0, '<img src="http://www.microsoft.com/library/toolbar/3.0/images/banners/ms_masthead_ltr.gif" width=70><em>Intern</em>'); data.setCell(5, 1, '<a href="http://www.google.com/search?q=windows+vista+webdav+testing">Windows WebDAV Testing</a>'); data.setCell(5, 2, null); data.setCell(5, 3, null); data.setCell(5, 4, null); data.setCell(5, 5, null); // Create and draw the visualization. visualization = new google.visualization.Table(document.getElementById('visualization2')); visualization.draw(data, {'width': 600, 'allowHtml': true}); } google.setOnLoadCallback(drawVisualization2); </script> <p> <div id="visualization2"></div> </p> <p> In the experience table I have added basically the same projects as above, but I have classified them by the main area of expertise. The link in each one of the projects point to search results, exactly the same way as before. It would have been also a good idea to point to a short description of the project. Initially I though about using color codes to indicate each main area of expertise, but this would add a little bit of duplication, because that is exactly what the columns mean. I think it is cleaner and more understandable without color coding the skills. </p> <h3> Education </h3> <img src="http://www.javiertordable.com/img/javier-tordable-education.png" alt="Education of Javier Tordable"/> <p> And finally, the education section. Here I decided to do a static image. The interactivity of the other visualizations wouldn't add anything. Also I used a similar color palette to the previous visualizations. Even though as it is there is no relationship with the visualizations above, I could have explored this idea further. Also I didn't add the university to avoid repeating the name four times. </p> <h3>Conclusion</h3> <p> To finish the post, I will repeat the core principles of how to make a good visualization: </p> <ol> <li> Collect the data to display, make sure its complete and precise </li> <li> Decide what is the most important message that you want to express </li> <li> Choose visualizations that convey that message but respect the data </li> </ol> Really Simple SEO http://www.javiertordable.com/blog/2010/03/12/really-simple-seo Fri, 12 Mar 2010 00:36:31 GMT http://www.javiertordable.com/blog/2010/03/12/really-simple-seo <p> SEO stands for Search Engine Optimization, and is the process of improving a website's structure and content in order to make it easy for search engines to gather the pages and display them in search results in the best position possible. </p> <img src="http://www.javiertordable.com/img/search-engines.png" alt="Search Engines"/> <p> In this post I am going to explain a few basic principles of SEO and show examples of how I implemented them in my blog. Also, as I am the Tech Lead of Webmaster Tools backend, I am going to talk a little bit about some features of Webmaster Tools that are very helpful for SEO. Please remember that all these tips are not only good for search engines, but also for users. If you have to choose between doing something to benefit users or search engines, always choose what is best for users. </p> <p> Here is the list of simple SEO tips: </p> <ul> <li><strong>Use a good URL structure, with descriptive URLs.</strong> If the URL has keywords related to the page topic it will be easier to find in search results. For example, the url of this post includes the words <em>really simple seo</em>, that are the topic of the post <pre> http://www.javiertordable.com/blog/2010/03/11/really-simple-seo </pre> </li> <li> <strong>Use good page titles.</strong> Similar to the previous tip, good page titles, with appropriate keywords make it easier to find the page in search results. Even though the title of this page includes the full name of the site, the title begins with a good description of the content <pre> Really Simple SEO - Javier Tordable blog on Software, Mathematics and Technology </pre> </li> <li> <strong>Have a good meta description for each page.</strong> The meta description is used by some search engines to show in the snippets in search results (the small paragraph with a description of the page). If you don't use it, you will have the risk that the search engine will generate it by itself, with unexpected results. This happened to me before I had a meta description, my snippet was taken from the RSS feed and looked awful. My current meta description for the homepage is: <pre> &lt;meta name="description" content="Javier Tordable blog on Software, Mathematics and Technology. Javier Tordable is a software engineer at Google and Ph.D. candidate in Mathematics."&gt; </pre> </li> <li> <strong>Structure the page appropriately, using HTML header tags.</strong> The most important parts should be within a H1 tag, the second most important in H2, etc. until H6. For example in this post the blog title and subtitle are within H1 and H2 tags. The title of the particular post is in a H2, and other less important sections, the about box and the archives are within an H3 HTML tag <pre> &lt;h1&gt;&lt;a href="/"&gt;Javier Tordable&lt;/a&gt;&lt;/h1&gt; &lt;h2&gt;A blog on Software, Mathematics and Technology&lt;/h2&gt; &lt;h3&gt;About&lt;/h3&gt; &lt;h3&gt;Archives&lt;/h3&gt; &lt;h2&gt;Really Simple SEO&lt;/h2&gt; </pre> </li> <li> <strong>Use the simplest format possible.</strong> Currently search engines are very advanced and can process Flash, Javascript and other content types, however it's always easier to access raw HTML content. So prefer to use HTML for most content, unless Flash or Javascript are essential. Also it will be easier to access the page from old browsers or other platforms. For example iPhone users can't see Flash pages </li> <li> <strong>Have a flat internal link structure.</strong> The flatter your link structure is, the easier it will be for search engines to access a page. Also the easier it will be for users to access whatever content they are looking for. In my blog I have all the main sections linked in the top navigation bar, which appears in all pages. And in the right side of most pages there is a link to all the blog posts <pre> &lt;a href="/blog/all"&gt;All Posts&lt;/a&gt; </pre> From the homepage of the site it's possible to access any other content page in two clicks or less. </li> </ul> <p> And now, some interesting pieces of information about your site that you can find in <a href="http://www.google.com/webmasters/tools/">Webmaster Tools</a>: </p> <ul> <li> One of my favorites is <strong>Backlinks</strong>, which will show you all the links pointing to your site, from all over the Web. Having many quality links is important because it will be easier for people to find your site, and it will show to search engines that the site is relevant. <img src="http://www.javiertordable.com/img/screenshot-webmaster-tools-backlinks.png" alt="Screenshot of Webmaster Tools Backlinks"/> </li> <li>Another very useful tool is <strong>Top Search Queries</strong>, which will show for which queries my site appears in search results. For example, the "bundle adjustment" query has more requests than "javier tordable" and it appears in the second row in the following table, while the other query is in the third row. However my site appears in position 16 for bundle adjustment, and it appears in position one for searches of my own name <img src="http://www.javiertordable.com/img/screenshot-webmaster-tools-top-search-queries.png" alt="Screenshot of Webmaster Tools Top Search Queries"/> </li> <li> Also, another cool piece of information is <strong>Subscriber Stats</strong> which shows how many subscribers I have for my RSS feed from <a href="http://www.google.com/reader">Google Reader</a>. In my case I can see that I have 10 people subscribed. And I can also submit this feed as a Sitemap, which will help getting my site indexed <img src="http://www.javiertordable.com/img/screenshot-webmaster-tools-subscriber-stats.png" alt="Screenshot of Webmaster Tools Subscriber Stats"/> For example, my Sitemap statistics show that I have 11 pages in this sitemap, and they are all indexed <img src="http://www.javiertordable.com/img/screenshot-webmaster-tools-sitemaps.png" alt="Screenshot of Webmaster Tools Sitemaps"/> </li> <li> And to check that my site is being correctly crawled, I can check the <strong>Crawl Stats </strong> feature. As you can see in the graph, the number of pages that are crawled in my site per day has been going up significantly since I implemented all this SEO tips <img src="http://www.javiertordable.com/img/screenshot-webmaster-tools-crawl-stats.png" alt="Screenshot of Webmaster Tools Crawl Stats"/> </li> </ul> <p> To finish, I am just going to point out that there are a lot of SEO resources online. Doing proper search engine optimization doesn't need to be complicated or expensive. And is not only good for search engines, but also for users. For more tips, you should check the <a href="http://googlewebmastercentral.blogspot.com/2008/11/googles-seo-starter-guide.html"> Google SEO guide</a>. </p> Collaborative Mathematics and The Future of Science http://www.javiertordable.com/blog/2010/02/25/collaborative-mathematics-future-of-science Thu, 25 Feb 2010 20:02:55 GMT http://www.javiertordable.com/blog/2010/02/25/collaborative-mathematics-future-of-science <p> Mathematical research is traditionally seen as a one-man job. To quote <a href="http://books.google.com/books?id=lQosnIw05dYC"> Jean Dieudonné in The Music of Reason</a>: </p> <div id="special-text"> Research in the experimental sciences is done in laboratories, where larger and larger teams are needed to manipulate the instruments and to scrutinize the results. To do research in mathematics nothing is needed except paper and a good library. Team-work, as practiced in the experimental sciences is, then, quite unusual in mathematics, most mathematicians finding it difficult to think seriously except in silence and solitude. Collaborative work, while quite common, most often consists in putting together results that each of the collaborators has managed to obtain in isolation, albeit with mutual profit from each other's ideas, enabling them to progress form new points of departure. </div> <p> In spite of that, about a year ago <a href="http://en.wikipedia.org/wiki/Timothy_Gowers">Tim Gowers</a> asked himself if it would be possible to solve important mathematical problems by collaborating openly over the internet. Not a collaboration among a few colleagues, but among everybody that had any insight about the problems. He shared this question with the mathematical community through a <a href="http://gowers.wordpress.com/2009/01/27/is-massively-collaborative-mathematics-possible/"> post in his blog</a>. That was the birth of the <a href="http://polymathprojects.org/">Polymath</a> project. </p> <p> The first problem that the Polymath group worked on is the attempt to obtain a simple proof for the <a href="http://en.wikipedia.org/wiki/Hales–Jewett_theorem"> Hales–Jewett theorem</a>. This theorem is a very important result from <a href="http://en.wikipedia.org/wiki/Ramsey_theory"> Ramsey theory</a>. In very gross terms Ramsey theory says that for many mathematical structures, there is no such thing as complete randomness. </p> <p> For example, take a group of six people: Alice, Bob, Charles, David, Erin and Fritz. The Ramsey theorem tells us that there are either 3 people that all know each other, or 3 people that are all strangers to each other. Even in something as random as a party, if there are at least 6 people then we can find a very special subgroup of 3 people. </p> <p> Here is the proof: Take Alice, imagine that she knows less than 3 people at the party. That is, she knows only Bob, or knows only Bob and Charles. Then we consider David, Erin and Fritz. If they all know each other, we have a group of 3 people that know each other. If not, two of them don't know each other, for example David doesn't know Erin. As a consequence Alice, David and Erin are all strangers to each other. If Alice knows 3 people or more, the proof is the same, say she knows Bob, Charles and David. If none of them knows each other then there we have our group of 3 strangers. But if two of them know each other, for example Bob knows Charles, the group Alice, Bob and Charles all know each other. </p> <p> So far the Polymath group has discussed 5 problems, which are all shown in the <a href="http://michaelnielsen.org/polymath1/index.php?title=Main_Page"wiki> Wiki</a> and they have started to publish some of the results. Here is link to a paper from D.H.J. Polymath on arXiv, <a href="http://arxiv.org/abs/0910.3926"> A new proof of the density Hales-Jewett theorem</a>. </p> <p> The following image is a part of a 3D Maldelbrot fractal, as described <a href="http://www.skytopia.com/project/fractal/mandelbulb.html"> here</a>. It has nothing to do with the rest of the post, but the Hales-Jewett theorem doesn't lend itself easily to fancy pictures. </p> <img src="/img/3d-fractal.jpg" alt="Maldelbulb 3D fractal" /> <p> And for comparison purposes this is a piece of romanesco broccoli. </p> <img src="/img/romanesco-broccoli.jpg" alt="Romanesco broccoli" /> <p> Of course, Mathematics is not the only scientific discipline in which people collaborate openly in interesting problems. Michael Nielsen has a great blog <a href="http://michaelnielsen.org/blog/doing-science-online/"> post</a> about doing science online. Probably the most important point is that the way that scientists work with each other is changing. And the change is driven mostly by new online collaboration tools. To finish the post I will quote Michael: </p> <div id="special-text"> Blogs, wikis, open notebooks, InnoCentive and the like aren’t the end of online innovation. They’re just the beginning. The coming years and decades will see far more powerful tools developed. We really will enormously scale up scientific conversation; we will scale up scientific collaboration; we will, in fact, change the entire architecture of expert attention, developing entirely new ways of navigating data, making connections and inferences from data, and making connections between people. </div> New Google Chart Tools http://www.javiertordable.com/blog/2010/02/17/new-google-chart-tools Wed, 17 Feb 2010 02:03:13 GMT http://www.javiertordable.com/blog/2010/02/17/new-google-chart-tools <p> Google recently released a new set of tools for graphics and interactive visualizations called <a href="http://code.google.com/apis/charttools/"> Google Chart Tools</a>. Google Chart Tools replaces the previous Charts API (for static images) and Visualization API (for dynamic graphics). And it combines both APIs within a single framework. Here is a <a href="http://googlecode.blogspot.com/2010/02/announcing-google-chart-tools.html"> link to the official announcement</a>. </p> <p> This is an example of the Charts API, a map with a couple of countries marked in a different color: </p> <img src="http://chart.apis.google.com/chart?cht=t&chtm=world&chs=440x220&chld=USES&chd=t:10,50&chco=FFFFFF,00FF00,005500&chf=bg,s,EAF7FE" alt="Example of Google Charts API, colored map"/> <p> This map was generated with the following link: </p> <pre> http://chart.apis.google.com/chart?cht=t&chtm=world&chs=440x220 &chld=USES&chd=t:10,50&chco=FFFFFF,00FF00,005500&chf=bg,s,EAF7FE </pre> <p> Let me go over each part in that link and explain what it means: </p> <ul> <li><strong>cht=t</strong> indicates that this is a graph of type map</li> <li><strong>chtm=world</strong> says that the map should include the whole world</li> <li><strong>chs=440x220</strong> is the size of the chart</li> <li><strong>chld=USES</strong> is the list of countries to display in a different color, US and ES</li> <li><strong>chd=t:10,50</strong> is the intensity of the color of each country. US=10, ES=50</li> <li><strong>chco=FFFFFF,00FF00,005500</strong> is the color gradient FFFFFF=white for the background, 00FF00 light green (US) and 005500 medium green (ES)</li> <li><strong>chf=bg,s,EAF7FE</strong> is the background color, light blue</li> </ul> <p> Here is another example, but this time of an interactive visualization: </p> <script type='text/javascript' src='http://www.google.com/jsapi'> </script> <script type='text/javascript'> google.load('visualization', '1', {'packages': ['geomap']}); google.setOnLoadCallback(drawMap); function drawMap() { var data = new google.visualization.DataTable(); data.addRows(6); data.addColumn('string', 'Country'); data.addColumn('number', 'Coolness'); data.setValue(0, 0, 'Spain'); data.setValue(0, 1, 100); data.setValue(1, 0, 'Brazil'); data.setValue(1, 1, 80); data.setValue(2, 0, 'United States'); data.setValue(2, 1, 70); data.setValue(3, 0, 'Canada'); data.setValue(3, 1, 40); data.setValue(4, 0, 'Russia'); data.setValue(4, 1, 20); data.setValue(5, 0, 'China'); data.setValue(5, 1, 10); var options = {}; options['dataMode'] = 'regions'; options['width'] = 440; options['height'] = 220; options['colors'] = [0xEAF7FE, 0xA5EF63, 0x50AA00, 0x267114] var container = document.getElementById('map_canvas'); var geomap = new google.visualization.GeoMap(container); geomap.draw(data, options); }; </script> <p> <div id='map_canvas' style="margin-left: 70px;"></div> </p> <p> In this case the map is dynamic. Moving the mouse over the different countries will display a message, which contains the value used to select the color of the country. Now the code is a little bit longer, about 30 lines of Javascript, so I am not going to include it, but there is a detailed explanation here: <a href="http://code.google.com/apis/visualization/documentation/"> Google Chart Tools, Introduction</a>. </p> <p> These tools are probably not as powerful as custom made visualizations, like the ones that I talked about in a previous post, <a href="http://www.javiertordable.com/blog/2009/12/03/interesting-visualizations-changes-over-time"> Interesting Visualizations: Changes Over Time</a>, but they are definitely easier to create and modify. </p> <p> To finish, I am just going to quote Robert Kosara and his blog on visualization <a href="http://eagereyes.org/">Eager Eyes</a>, "JavaScript for visualization is clearly the way to go. It's fast, versatile, works much better than Flash or Java, and is obviously way ahead of static images". You can check the complete post <a href="http://eagereyes.org/blog/2010/javascript-key-to-in-browser-visualization"> here</a>. Nounoublog updated http://www.javiertordable.com/blog/2010/02/06/nounoublog-updated Sat, 06 Feb 2010 00:00:00 GMT http://www.javiertordable.com/blog/2010/02/06/nounoublog-updated <p> Over the last few weeks this blog has changed dramatically. It looks pretty much the same as when it started but under the covers the code of the blogging platform, <a href="http://code.google.com/p/nounoublog/">Nounoublog</a> is very different. I am going to talk about three of the features that I have been working on lately: </p> <ul> <li>Archives</li> <li>RSS Feed</li> <li>Admin console</li> </ul> <p> And I will show a few snippets of the actual code that powers the blog. </p> <p> For those that visit the blog for the first time, Nououblog is a small blogging platform developed in <a href="http://code.google.com/appengine/">Google App Engine</a>. I started working on it basically for two reasons. First, I wanted to learn how to develop applications for Google App Engine. And second, because I wanted a simple but highly customizable platform, with free hosting and no ads. </p> <p> <img src="http://code.google.com/appengine/images/appengine_lowres.gif" alt="Google App Engine logo" /> </p> <h3>Archives</h3> <p> Now the blog has an archives section. It is the small set of links in the right side. It will let you view all the posts since the creation of the blog. </p> <p> For example, if you click in <a href="/blog/2009">2009</a>, it will show you all the posts from the previous year. In order to enable this I had to update the url structure of the blog. Now paths have the form: </p> <pre> http://www.javiertordable.com/blog/2010/01/30/the-eternal-night </pre> <p> With slashes separating the different parts of the url. And all the following are valid urls: </p> <pre> http://www.javiertordable.com/blog/2010/01/30/ http://www.javiertordable.com/blog/2010/01/ http://www.javiertordable.com/blog/2010/ </pre> <p> Each one will show respectively all the posts of the day, the month, and the year. Each one has its own handler. Here is for example the handler that returns all the posts in a year: </p> <pre> class BlogYear(webapp.RequestHandler): """Request handler for all blog posts in a given year. This handler answers all requests for /blog/YYYY and /blog/YYYY/. """ def get(self): # Get the year from the path. path = self.request.path[len('/' + config.BLOG_PREFIX + '/'):] (year, month, day, desired_url) = extract_url_parts(path) # Get the list of all posts of the year. posts = get_posts_in_date(year, month=None, day=None) # And return an archives page with the posts. s = pages.ArchivesPageGenerator() self.response.out.write(s.generate(posts, str(year))) </pre> <p> There are similar handlers for all the posts in a month and all the posts in a day. </p> <h3>RSS Feed</h3> <p> An <a href="http://en.wikipedia.org/wiki/RSS">RSS feed</a> is a specially formatted XML file, which includes data about the posts in a blog. It is updated automatically by the blog, so that when it changes, RSS subscribers know that there is new content. There are applications like <a href="http://www.google.com/reader/">Google Reader</a> that are very helpful to keep track of many RSS feeds and alerting when there is new stuff to read. </p> <p> In the <a href="http://code.google.com/p/nounoublog/"> Nounoublog blogging platform</a>, you can access the RSS feed by clicking in the <a href="/blog/rss.xml">RSS</a> link at the top of the page. In the browser you may see only a bunch of text, but if you add this link to your Google Reader subscriptions you will see a list of the most recent posts. </p> <p> Same as before the RSS feed is powered by its own handler, which is very simple: </p> <pre> class RssFeed(webapp.RequestHandler): """Handler for the RSS feed. This feed contains a list with all the blog posts, from last to first. This list is subject to the maximum item retrieval limit of the DB. """ def get(self): # Get the list of all posts. posts = get_all_posts() # Return the xml feed with the posts. template_values = {'posts': posts} self.response.out.write(template.render(template_path("rss_feed"), template_values)) </pre> <p> Where the template provides the XML structure of the feed, and inserts the data corresponding to the posts </p> <pre> &lt;?xml version=&quot;1.0&quot; encoding=&quot;iso-8859-1&quot;?&gt; &lt;rss version=&quot;2.0&quot;&gt; &lt;channel&gt; &lt;title&gt;Javier Tordable Blog&lt;/title&gt; &lt;link&gt;http://www.javiertordable.com&lt;/link&gt; &lt;description&gt; Javier Tordable blog on Software, Mathematics and Technology &lt;/description&gt; &lt;generator&gt;Nounoublog&lt;/generator&gt; &lt;docs&gt;https://code.google.com/p/nounoublog/&lt;/docs&gt; {# Loop over all the blog posts. #} {% for post in posts %} &lt;item&gt; &lt;title&gt;{{ post.title }}&lt;/title&gt; &lt;link&gt;{{ post.absolute_url }}&lt;/link&gt; &lt;pubDate&gt;{{ post.rss_pub_date }}&lt;/pubDate&gt; &lt;guid&gt;{{ post.absolute_url }}&lt;/guid&gt; &lt;description&gt;{{ post.escaped_content }}&lt;/description&gt; &lt;/item&gt; {% endfor %} &lt;/channel&gt; &lt;/rss&gt; </pre> <p> Notice that in the Django template the post elements appear as attributes while in fact they are method calls. Also I use as GUID the url of the post, as it is intended to be a permanent link</a>. </p> <h3>Admin console</h3> <p> The last item that I have been working on is the administration console. This is still work in progress, but I expect that once I am done with it I will post more often. </p> <p> The admin console will have options to: </p> <ul> <li>Add posts</li> <li>Edit posts</li> <li>Add static pages</li> <li>Edit static pages</li> <li>Edit the CSS</li> <li>Edit redirects</li> </ul> <p> All these options seem very normal with the exception of the redirects. How does it work? For example, when going to: </p> <pre> http://www.javiertordable.com/blog/2009-12-01/my-first-blog-post </pre> <p> You are redirected to another url, which appears in the url bar. Notice how the dashes are now forward slash bars </p> <pre> http://www.javiertordable.com/blog/2009/12/01/my-first-blog-post </pre> <p> I added support for redirects because I changed the site several times (including the url structure), and I didn't want to serve 404 error pages for all old urls. </p> <p> Keep visiting the blog or subscribe to the <a href="/blog/rss.xml">RSS</a> feed for more news on Nouonublog! </p> The Eternal Night http://www.javiertordable.com/blog/2010/01/30/the-eternal-night Sat, 30 Jan 2010 00:00:00 GMT http://www.javiertordable.com/blog/2010/01/30/the-eternal-night <p> My brother David is a film director. He has been making short film for a few years, he has even won a few prizes. Most of his work is at <a href="http://www.tpmpictures.com">tpmpictures.com</a>. Today I just wanted to show his last piece of work, a science fiction short about the end of the world. The short is in Spanish but with English subtitles. </p> <object> <param name="movie" value="http://www.notodofilmfest.com/ediciones/09/es/swf/player.swf?corto=22321.flv&duracion=03:30"></param> <param name="wmode" value="transparent"></param> <embed src="http://www.notodofilmfest.com/ediciones/09/es/swf/player.swf?corto=22321.flv&duracion=03:30" type="application/x-shockwave-flash" wmode="transparent" width="500" height="370"></embed> </object> <p> There are a couple of things that are interesting in this short. First, the script is not wildly improbable. Check out this (humorous) list of the <a href="http://www.cracked.com/article_16583_the-5-scientific-experiments-most-likely-to-end-world.html"> 5 scientific experiments most likely to end the world</a>. And second, the special effects are pretty nice for a zero budget short. If you liked it, please go ahead and leave a comment at the <a href="http://www.notodofilmfest.com/ediciones/09/?lg=es&corto=22321"> Notodo film festival</a>. </p> TRANSCEND http://www.javiertordable.com/blog/2010/01/15/trascend-book-kurzweil-grossman Fri, 15 Jan 2010 00:00:00 GMT http://www.javiertordable.com/blog/2010/01/15/trascend-book-kurzweil-grossman <p> A couple of days ago I started reading <a href="http://www.transcendbook.com/">TRANSCEND</a>, the new book from Ray Kurzweil and Terry Grossman. The book starts from the principle that our knowledge of medicine and biology is increasing to a point where we can start to control effectively how fast our own bodies age. And even more important the amount of knowledge that we gather is increasing over time. If the trend continues we may reach a point where we can effectively reverse engineer our bodies in order to avoid aging. </p> <img src="/img/trascend-book-kurzweil-grossman.png" alt="TRANSCEND book by Ray Kurzweil and Terry Grossman" /> <p> Whether you believe that we will reach that point or not, the book is an interesting read. It is filled with healthy habits, complete diets and recipes, exercise programs and more. Even if you leave aside the supplements and the fancy biomedical technologies, there are plenty of actionable tips for inproving your quality of life. For example to check for food intolerances. There are millions of people out there that can't digest milk or wheat very well and are not even aware of it. </p> <p> Disclaimer: I didn't get paid to write this post. </p> MapReduce Integer Factorization in arXiv http://www.javiertordable.com/blog/2010/01/07/mapreduce-integer-factorization-in-arxiv Thu, 07 Jan 2010 00:00:00 GMT http://www.javiertordable.com/blog/2010/01/07/mapreduce-integer-factorization-in-arxiv <p> This Monday I published my article on <a href="http://arxiv.org/abs/1001.0421">MapReduce for integer factorization in arXiv</a>. The article is essentially the same that can be downloaded in the <a href="/research">research</a> section of this site. So if you have already checked it out, you won't find anything new. However I am very excited because it is my first addition to arXiv. </p> <img src="/img/mapreduce-integer-factorization-arxiv.png" alt="MapReduce for Integer Factorization in arXiv." /> <p> In case that you are not familiar with <a href="http://www.arxiv.org">arXiv</a>, it is one if the greatest scientific sites in the web. It has over half a million articles, especially in the fields of mathematics, physics and computer science, and many relevant papers are published in the arXiv months before they appear in any peer reviewed journal. </p> MapReduce Integer Factorization released! http://www.javiertordable.com/blog/2009/12/29/mapreduce-integer-factorization-released Tue, 29 Dec 2009 00:00:00 GMT http://www.javiertordable.com/blog/2009/12/29/mapreduce-integer-factorization-released <p> Recently I published the code of <a href="http://code.google.com/p/mapreduce-integer-factorization/"> MapReduce for Integer Factorization</a>. It is available under the Apache 2.0 License in Google Code. It includes everything necessary to run in <a href="http://hadoop.apache.org/">Apache Hadoop</a>, as well as the numerical libraries used. It has no dependencies apart from the last version of Hadoop. </p> <p> <img src="http://hadoop.apache.org/images/hadoop-logo.jpg" alt="Hadoop logo" /> </p> <p> This project is a proof of concept that shows how to use MapReduce, a framework for distributed computation to solve a purely numerical problem. The main conclussion is that it's possible to use MapReduce for problems that lie far ahead from its original area of application, for example number theory. Also in this case the difficulty involved in developing the MapReduce program is similar to the difficulty of creating a worksheet in a mathematical tool like Maple. But the performance of MapReduce is significantly higher. </p> <p> If you have some time, please download it from <a href="http://code.google.com/p/mapreduce-integer-factorization/">here</a>, and let me know how it works for you. </p> A small blogging platform in Google App Engine http://www.javiertordable.com/blog/2009/12/17/small-blog-platform-in-google-app-engine Thu, 17 Dec 2009 00:00:00 GMT http://www.javiertordable.com/blog/2009/12/17/small-blog-platform-in-google-app-engine <p> If you have never made a <a href="http://en.wikipedia.org/wiki/Web_application">web application</a> it may seem daunting. There are hundreds of alternative technologies and frameworks out there. And web apps development is quite different from client applications, which is what most developers are used to. </p> <p> Here is an example of a web application. Wikipedia! </p> <img src="/img/screenshot-wikipedia.png" alt="Screenshot of Wikipedia" /> <p> Most web applications share a few common elements: </p> <ul> <li>A persistence layer, for authored content or user created content</li> <li>A system to connect each user request with a part of the application</li> <li>A method to render and display that content to users</li> </ul> <p> Traditionally the persistence layer is a <a href="http://en.wikipedia.org/wiki/SQL_database">SQL database</a>, the requests are directed for example to <a href="http://en.wikipedia.org/wiki/Servlet">Java servlets</a> in an app server like <a href="http://tomcat.apache.org/">Tomcat</a>, and there is a more or less refined templating engine, in which the content is added to create the whole page returned to the user. </p> <p> Another alternative is <a href="http://code.google.com/appengine/">Google App Engine</a>. Google App Engine is a platform and a set of libraries to develop web applications based in Google's own infrastructure. It is available in Java and Python, but here I will concentrate on the Python version. </p> <p> The persistence layer of Google App Engine is the <a href="http://code.google.com/appengine/docs/python/datastore/overview.html"> Datastore</a>, a highly parallel but simple to use storage solution. The Datastore doesn't support queries as complex SQL does, however it can scale up to a level which is beyond what a normal database can do. And it can do so in a way that is trivial for the developer. </p> <p> Google App Engine uses <a href="http://en.wikipedia.org/wiki/YAML">yaml</a> and the webapp framework to answer user queries. One can set a configuration file which assigns certain url paths (via a regular expression) to instances of webapp.RequestHandler. </p> <pre> handlers: - url: /.* script: app.py </pre> <p> The instance can implement a get() method which generates the response returned to the user. </p> <pre> class App(webapp.RequestHandler): def get(self): self.response.headers['Content-Type'] = 'text/plain' self.response.out.write('Hello, World!') </pre> <p> Finally, in order to generate HTML, Google App Engine incorporates the templating engine from <a href="http://www.djangoproject.com/">Django</a>. A template is essentially a document with <em>variables</em> instead of content. When the application needs to answer a user request it can load the template and replace the <em>variables</em> with real content, for example information from the data storage. </p> <p> This is basically the way that this blog is made. It is a very simple Google App Engine application. It uses the Datastore for the blog posts, which are retrieved when it receives a request for the /blog url path. Then it replaces the blog post content into the blog template and returns that content. Here is a sample of code. It is not the actual code, but it gives a complete example: </p> <pre> class Blog(webapp.RequestHandler): def get(self): # Retrieve the posts from the database. query = 'SELECT * FROM Post WHERE public = True ORDER BY date DESC ' 'LIMIT %d ' % NUM_POSTS_IN_MAIN_PAGE posts = db.GqlQuery(query).fetch(NUM_POSTS_IN_MAIN_PAGE) # Render them into html. if len(posts) > 0: template_values = &#123;'posts': posts &#125; template_path = os.path.join(os.path.dirname(__file__), 'templates/blog') content = template.render(template_path, template_values) else: content = '' # And return a full page with the blog content. s = pages.FullPageGenerator() self.response.out.write(s.generate(content)) </pre> <p> To sum up, Google App Engine is a great option for developing web applications. Expecially if you will require it to scale seamlessly, or to integrate with Google services. Check out the <a href="http://code.google.com/appengine/docs/python/gettingstarted/introduction.html"> Google App Engine tutorial</a>. </p> <img src="http://code.google.com/appengine/images/appengine-noborder-120x30.gif" alt="Powered by Google App Engine" /> Interesting Visualizations: Changes Over Time http://www.javiertordable.com/blog/2009/12/03/interesting-visualizations-changes-over-time Thu, 03 Dec 2009 00:00:00 GMT http://www.javiertordable.com/blog/2009/12/03/interesting-visualizations-changes-over-time <p> Visualizations are simply ways of representing data. But if they are good, they can bring us deep insights, that go well beyond what is possible to understand by simply looking at the raw data. </p> <p> There are several categories of visualizations, for example: <ul> <li>Compare two entities based on a given set of metrics. An example of this is a benchmark between two competing companies or products</li> <li>Track the value of a given metric over time. A well known visualization of this type is a financial chart, with the value of an asset</li> <li>Compare the value of a single metric in different geographic locations. We have all seen maps in which the color of each region is based on the value of the metric</li> </ul> <p> Another very interesting set of visualizations are those that allow us to track a particular situation over time. Here are three examples: </p> <p> <a href="http://www.flickr.com/photos/ciaranhughes/4121291229/"> Tracking a change in ranking over time (by Ciaran Hughes)</a> </p> <p> <img src=/img/visualization-changes-ranking-over-time.png alt="Visualization for changes in ranking over time" /> </p> <p> <a href="http://www.ge.com/visualization/health_costs/index.html"> Tracking a change in distribution over time (from GE)</a>. The bottom slider changes the chart based on the age. Each section of the chart represent one kind of illness </p> <p> <img src=/img/visualization-changes-distribution-over-time-40.png alt="Visualization for changes in distribution over time" /> <img src=/img/visualization-changes-distribution-over-time-50.png alt="Visualization for changes in distribution over time" /> <img src=/img/visualization-changes-distribution-over-time-60.png alt="Visualization for changes in distribution over time" /> </p> <p> <a href="http://www.xach.com/moviecharts/2008.html"> Tracking changes in volume or magnitude over time (from xach.com)</a>. Each color block is a movie, and the size represents the box office in each week. </p> <img src=/img/visualization-changes-volume-over-time.png alt="Visualization for changes in volume or magnitude over time" /> <p> The first visualization doesn't attempt to indicate quantity because it displays an abstract concept such as brand appreciation. However in the third case the quantity is very concrete, the total box office in dollars. </p> <p> We could use also the first or third approach for the second data set. But if we went with the first kind of visualization most likely we would only remember the most important expense for each age. Also we would be constrained in the number of years to show. If we decided to go with a visualization of the third kind it would be hard to compare how the expenses change relatively to each other as all of them are likely to increase over time. </p> My first blog post http://www.javiertordable.com/blog/2009/12/01/my-first-blog-post Tue, 01 Dec 2009 00:00:00 GMT http://www.javiertordable.com/blog/2009/12/01/my-first-blog-post <p> Hello everybody. This is the first post in my new blog. This is not your common Wordpress or Blogger blog. It runs on a custom blogging platform made from scratch, on top of Google App Engine. Soon I will add a couple of posts about how it's done, and I will release the code of the platform. </p> <p> In the future I will use this blog to talk about stuff that interests me. For example: <ul> <li>Google App Engine, Django and other tools for rapid web application development</li> <li>Sage, an open source mathematics package</li> <li>Devi Prasad Shetty, and how he transformed medicine through mass production</li> <li>Interesting visualizations! As information is more and more available it is becoming increasingly important how to visualize and understand easily</li> </ul> </p> <p> The subscribe links don't work yet, so I am afraid you won't be able to read this blog in your favorite RSS reader. But I hope that I will see you again soon. Thanks for coming! </p>