Canadian Amp

A Mixed Bag of Whatever



Is your computer running slow? Come to Newegg.ca for your CPU/processor needs!

Beginning Python Visualization - Book Review

I picked up Beginning Python Visualization: Crafting Visual Transformation Scripts (Books for Professionals by Professionals) on an impulse Amazon buy and got a chance to read it over the weekend. My initial impression is that this would serve as a solid introduction to data visualization in python for people who have a bit of programming experience (no python experience is assumed) and are interested in presenting it in a graphical form. One of the neat thigs about the book is that the first chapter just dives in to a simple case study of visualizing some GPS data. The author first walks you through how to use < a href=”http://pyserial.wiki.sourceforge.net/pySerial”>pySerial to read GPS data from a device and store it in a file, placing an emphasis on creating meaningful file names and directory structures. Once some data has been collected Python’s built in csv reader is used to parse the data and extract position and velocity information which is then used to create a nice GPS track plot including velocity vectors with < a href=”http://matplotlib.sourceforge.net/”>matplotlib. I am undecided on whether I am totally happy with the authors choice of subject for the first chapter case study. On the one hand it is nice to see a really concrete and simple real world example. On the other hand I have to wonder whether reading data from a serial port might be a little intimidating for people who are new to programming, not to mention the fact that not everyone has a GPS receiver. This is a minor quibble since the code is written clearly and is easy to follow along even if you don’t type it out and try it for yourself. Overall the first chapter gives you the “Why” of data visualization with the later chapters focusing on the “How”. Chapter 2 was a nice addition as it focuses on choosing, and setting up, a development platform with a focus on Linux but Windows and Mac are not ignored and it is made clear that all of these programs will work on whichever platform you choose. This part was fairly brief (by neccessity) but covered all of the basics including discussion of choosing an editor (Emacs, vim, Notepad++ and more), prebuilt scientific python packages including the Enthought Python Distribution, IPython and python(x,y). Vaingast also instructs how to get the bedrock scientific and visualization python packages SciPy, matplotlib and the < a href=”http://www.pythonware.com/products/pil/”>Python Imaging Library (PIL). In my experience these packages are going to cover a large percentage of your visualization needs. Chapter 3 provides a quick but good introduction to the Python language for people who have a bit of programming experience. Another great intro to Python of course is Mark Pilgrim’s Dive Into PYTHON. Chapters 4 and 5 cover reading and writing to some common file formats including csv and ini files. Both text and binary files are covered although other formats like XML and Yaml would have been a nice addition. Chapter 6 gives a very nice introduction to matplotlib and whatever isn’t covered here can be found in the extensive online documentation Lots of other important foundations are covered in Chapters 7 and 8 including complex numbers, multi-dimensional arrays and linear algebra functions. There is also coverage of more advanced concepts including interpolation, solving non-linear equations and Fourier transformations (FFT’s). There is some discussion of 2 dimensional data in chapter 9, but I felt the book falls a little bit short in this regard. While contour and filled contour plots are mentioned briefly earlier in the book, there is very little discussion of working with data that is a function of more than one coordinate. In this sense I feel the book falls a bit short since part of the appeal of matplotlib is that is has excellent support for < a href=”http://matplotlib.sourceforge.net/api/pyplot_api.html?highlight=contourf#matplotlib.pyplot.contourf”>contours built in. There is also no discussion of how to use tools like MayaVi and VTK to visualize 3d scientific data which is a shame since this is where visualizing data starts to get really interesting to me. Again this is a minor quibble since this book is aimed at beginners and does not set out to cover everything. In summary I would highly recommend Beginning Python Visualization: Crafting Visual Transformation Scripts (Books for Professionals by Professionals) to anyone looking to get there feet wet with data visualization.

June 2nd, 2009 by randlet Get 25 FREE iPod® compatible downloads from eMusic! Choose from over 2.8 Million DRM free songs! Works on ANY MP3 player

Simple Huckel Molecular Orbitals with Orbis!

So I just released the first beta of a program called Orbis that does Simple Huckel Molecular Orbital (or SHMO for short) calculations. The beta version is free and valid until June 1st after which it will likely go on sale for around $20 US per license.

Right now there is only a windows version, however I hope to have a Mac version ready in the coming months.

Here is a couple quick videos showing Orbis in action.

Creating a simple benzene ring and then converting it to pyridine by changing one of the atom types.

Orbis displaying the 24 membered molecule Coronene

You can download a trial at www.simplehuckel.com.

May 1st, 2009 by randlet Get 25 FREE iPod® compatible downloads from eMusic! Choose from over 2.8 Million DRM free songs! Works on ANY MP3 player

Dreaming In Code

I just finished this excellent book: Dreaming in Code: Two Dozen Programmers, Three Years, 4,732 Bugs, and One Quest for Transcendent Software last night and thought I would share it. The book follows the development of Chandler from start to, ummm…not finished over a period of 3 or 4 years. It details all of the trials and tribulations of the group and really just emphasizes just how hard it is to write good software.

Much of what is said in Dreaming in Code will resonate really loudly (read: gives you that sinking feeling in your stomach) with anybody who has ever sat down and tried to write or fix code, particulary in a group setting. The book is written by Scott Rosenberg who founded Salon.com and I think he does a great job of striking a balance between making it technical enough to keep programmers entertained while keeping it accessible to people with little or no software development experience who want a glimpse into the world of software. The latter crowd will definitely gain an appreciation for why many of the programs they use everyday still contain mistakes, crash often or don’t behave the way they feel like it should.

Another bonus for me was that Chandler is written in Python and there is some discussion in Dreaming in Code about the merits of python and why it was chosen for the project.

Anyway I thought it was a great and entertaining read which gave some insight into startups, open source software projects, philosophy, and the joys and immense frustrations of software development. Highly recommended.

February 25th, 2009 by randlet Get 25 FREE iPod® compatible downloads from eMusic! Choose from over 2.8 Million DRM free songs! Works on ANY MP3 player

Enabling GPX support for GDAL in Ubuntu (for use in geodjango)

I have been building a new project with django, pinax and geodjango. Part of the project involves parsing GPX files created by GPS devices. In the DjangoCon talk (at youtube by Justin Bronn he mentions that GDAL can parse GPX files which would be very helpful for me.

Originally I followed the GeoDjango iinstall instructions However, when I tried to parse a simple gpx file I was present with this error

CODE:
  1. randlet:~&gt; ogrinfo -ro -al Hockley-Rosseau.gpx
  2. Error 6: OGR/GPX driver has not been built with read support. Expat library required"
  3. .
  4. .
  5. .

After a bit of googling I came up with the following solution. First step is to get the expat xml library (including dev) installed like so;

CODE:
  1. sudo apt-get install expat libexpat1-dev

After this you should have a libexpat.h header file and an expat.so located in /usr/include/ and /usr/lib/ respectively.

Now change into the directory that you installed GDAL from during this step of the GeoDjango install process. If you have already installed GDAL then you will need to do a clean install and configure like so (output removed):

CODE:
  1. randlet:~/gdal-1.6.0&gt; make clean
  2. randlet:~/gdal-1.6.0&gt; ./configure --with-expat=/usr/ --with-python
  3. GDAL is now configured for i686-pc-linux-gnu
  4.  
  5. Installation directory:    /usr/local
  6. C compiler:                gcc -g -O2
  7. C++ compiler:              g++ -g -O2
  8.  
  9. LIBTOOL support:           yes
  10.  
  11. LIBZ support:              external
  12. GRASS support:             no
  13. CFITSIO support:           no
  14. PCRaster support:          internal
  15. NetCDF support:            no
  16. LIBPNG support:            internal
  17. LIBTIFF support:           internal (BigTIFF=yes)
  18. LIBGEOTIFF support:        internal
  19. LIBJPEG support:           internal
  20. LIBGIF support:            internal
  21. OGDI support:              no
  22. HDF4 support:              no
  23. HDF5 support:              no
  24. Kakadu support:            no
  25. JasPer support:            no
  26. ECW support:               no
  27. MrSID support:             no
  28. MSG support:               no
  29. GRIB support:              yes
  30. cURL support (wms/wcs/...):no
  31. PostgreSQL support:        yes
  32. MySQL support:             no
  33. Ingres support:            no
  34. Xerces-C support:          no
  35. NAS support:               no
  36. Expat support:             yes
  37. ODBC support:              no
  38. PGeo support:              no
  39. OCI support:               no
  40. GEORASTER support:         no
  41. SDE support:               no
  42. DODS support:              no
  43. SQLite support:            no
  44. DWGdirect support          no
  45. INFORMIX DataBlade support:no
  46. GEOS support:              yes
  47.  
  48. Old-gen python          no
  49. SWIG Bindings:          python
  50.  
  51. Statically link PROJ.4:    no
  52. enable OGR building:       yes
  53. enable pthread support:    no
  54. hide internal symbols:     no
  55. randlet:~/gdal-1.6.0&gt; make #now go get yourself a beer
  56. randlet:~/gdal-1.6.0&gt; sudo make install

After that you should be able to use a gpx file as a datasource in your geodjango project:

PYTHON:
  1. randlet:~/my_project/&gt; python manage.py shell
  2. &gt;&gt;&gt; from django.contrib.gis.gdal import DataSource
  3. &gt;&gt;&gt; ds = DataSource('Billings.gpx')
  4. &gt;&gt;&gt; ds.layer_count
  5. 5
  6. &gt;&gt;&gt; ds.name
  7. '/Billings.gpx'
  8. &gt;&gt;&gt; layer = ds[0]
  9. &gt;&gt;&gt; wpt = layer.get_geoms()[0]
  10. &gt;&gt;&gt; wpt.coords
  11. (-108.644942, 45.132831000000003)
  12. &gt;&gt;&gt;

Hopefully that will save someone some time in the future as the solution was not immediately obvious to me when I googled around for an answer.

January 17th, 2009 by randlet Get 25 FREE iPod® compatible downloads from eMusic! Choose from over 2.8 Million DRM free songs! Works on ANY MP3 player

Simple Geocoding with python and urllib

I recently had to geocode a huge number of addresses for a project I'm working on. Here is a simple python script that reads in a bunch of addresses from a text file and then uses the Google maps api to geocode them (get their latitude and longitude).

The format of the address file should be roughly like:

CODE:
  1. 123 mystreet, Beverly Hills, CA, 90210
  2. 456 mystreet, Beverly Hills, CA, 90210
  3. 789 mystreet, Beverly Hills, CA, 90210
  4. .
  5. .
  6. .

although the Google geocoder is quite forgiving about the format of addresses so you can probably get away with something like

CODE:
  1. CN Tower Toronto Ontario
  2. Bloor Street Toronto Ontario
  3. .
  4. .
  5. .

You will also need to get your own google maps api key which you can get over at Google.

PYTHON:
  1. import urllib,urllib2,time
  2.  
  3. addr_file = 'addresses.csv'
  4. out_file  = 'addresses_geocoded.csv'
  5. out_file_failed = 'failed.csv'
  6.  
  7. sleep_time = 2
  8. root_url = "http://maps.google.com/maps/geo?"
  9. gkey = "YourGoogleKeyGoesHere"
  10.  
  11. return_codes = {'200':'SUCCESS',
  12. '400':'BAD REQUEST',
  13. '500':'SERVER ERROR',
  14. '601':'MISSING QUERY',
  15. '602':'UNKOWN ADDRESS',
  16. '603':'UNAVAILABLE ADDRESS',
  17. '604':'UNKOWN DIRECTIONS',
  18. '610':'BAD KEY',
  19. '620':'TOO MANY QUERIES'
  20. }
  21.  
  22. def geocode(addr,out_fmt='csv'):
  23.  
  24. #encode our dictionary of url parameters
  25. values = {'q' : addr, 'output':out_fmt, 'key':gkey}
  26. data = urllib.urlencode(values)
  27.  
  28. #set up our request
  29. url = root_url+data
  30. req = urllib2.Request(url)
  31.  
  32. #make request and read response
  33. response = urllib2.urlopen(req)
  34. geodat = response.read().split(',')
  35. response.close()
  36.  
  37. #handle the data returned from google
  38. code = return_codes[geodat[0]]
  39. if code == 'SUCCESS':
  40. code,precision,lat,lng = geodat
  41. return {'code':code,'precision':precision,'lat':lat,'lng':lng}
  42. else:
  43. return {'code':code}
  44.  
  45. def main():
  46.  
  47. #open our i/o files
  48. outf = open(out_file,'w')
  49. outf_failed = open(out_file_failed,'w')
  50. inf = open(addr_file,'r')
  51.  
  52. for address in inf:
  53.  
  54. #get latitude and longitude of address
  55. data = geocode(address)
  56.  
  57. #output results and log to file
  58. if len(data)&gt;1:
  59. print "Latitude and Longitude of "+address+":"
  60. print "\tLatitude:",data['lat']
  61. print "\tLongitude:",data['lng']
  62. outf.write(address.strip()+data['lat']+','+data['lng']+'\n')
  63. outf.flush()
  64. else:
  65. print "Geocoding of '"+addr+"' failed with error code "+data['code']
  66. outf_failed.write(address)
  67. outf_failed.flush()
  68.  
  69. #play nice and don't just pound the server with requests
  70. time.sleep(sleep_time)
  71.  
  72. #clean up
  73. inf.close()
  74. outf.close()
  75. outf_failed.close()
  76.  
  77. if __name__ == "__main__":
  78. main()

Nice and easy. You may also want to check out the geopy package which looks very nice and includes support for distance calculations.

December 5th, 2008 by randlet Get 25 FREE iPod® compatible downloads from eMusic! Choose from over 2.8 Million DRM free songs! Works on ANY MP3 player

Steven Hawking is coming to Canada!

It was just announced that Stephen Hawking will be accepting a postion at the Perimeter Institute for theoretical physics in Waterloo. It appears that Dr. Hawking will not be moving to Waterloo full time but rather making regular stays in Waterloo starting in the summer of '09.

Hawking is likely the most well known modern scientists outside of the scientific community. Largely due to his hugely successful book A Brief History of Time and its follow up The Universe in a Nutshell both of which brought the realms of things such as superstring theory and how the universe evolved to laypersons.

Also, you really know you've made it big in science when you make a cameo on the simpsons.

Hopefully he'll do a couple public lectures that I'll be able to attend.

December 1st, 2008 by randlet Get 25 FREE iPod® compatible downloads from eMusic! Choose from over 2.8 Million DRM free songs! Works on ANY MP3 player

Speeding up your python by calling c functions and using Psyco

One of the greatest things about python is how quickly you can develop fully functioning scripts/programs in it. Both from the perspective of the large number of library functions available which make your life easier and also the fact that you don't have to go through a compile step every time you edit your code. Unfortunately this rapid development usually comes at the cost of raw run time speed. Often this isn't a problem, for example if your python script takes 1 second to run when the equivalent program written in c would take 0.2 s to run the 500% difference is practically negligible. Even if the python script takes 10 hours to run where the equivalent c program takes 2 hours this may not be a large problem if you are only going to ever run the script once or twice since the huge amount of time you saved in developing the script in python more than makes up for the difference. The speed deficit of an interpreted language like python really starts to pop up when you have a program that needs to be run repeatedly or requires lots of user interaction and therefore must be fast enough to prevent a user from becoming frustrated.

Luckily, in python there are a number of possibilities that allow developers to take advantage of the rapid development possible in python while still retaining reasonable run times. Probably the two most popular choices are using the python module Psyco and/or rewriting the most computationally intensive functions in c and calling those c functions from your python script. As Jeremy Jones at onlamp.com puts it:

One of the frequently repeated mantras I hear from Python veterans is, "Premature optimization is the root of all evil" (a quote from Donald Knuth, which apparently originated from Tony Hoare). The Pythonic approach to writing code that performs acceptably is:

* Write the code.
* Test it to see whether performance is OK right off the bat.
* If so, you're done.
* If not, optionally try Psyco.
* Profile it and identify the bottlenecks.
* Modify the code until the performance is as fast as you need, rewriting bottlenecks in C (or Pyrex) if necessary.

As an example of this development cycle I wrote a quick python script to produce images of a Julia set using the Python Imaging Library.

Here is the first implementation of this script written purely in python

PYTHON:
  1. #!/usr/bin/env python
  2. #Create an image of a Julia set
  3. #
  4. #R. Taylor Nov 25. 2008
  5. #email: randlet@canadianamp.ca
  6. #web: canadianamp.ca
  7. #This code is available under the WTF licence. Do Whatever The F&$#
  8. #you want with it but links to this blog are always appreciated.
  9.  
  10. import numpy,scipy
  11. import Image
  12.  
  13. def julia(a,b,ca,cb,maxit):
  14.   numit = 0
  15.   while (a*a+b*b<4. and numit <maxit):
  16.     temp = a*a-b*b
  17.     b = 2*a*b
  18.     a = temp + ca
  19.     b += cb
  20.     numit += 1
  21.    
  22.   return numit
  23.  
  24. #set up the max and min coordinates to display in our image
  25. xmin,xmax = -2.5,2.5
  26. ymin,ymax = -1.0,1.0
  27.  
  28. #set number of pixels in our image
  29. height = 600
  30. width = int(height*((xmax-xmin)/(ymax-ymin)))
  31.  
  32. #create a new image using PIL with a black background
  33. im = Image.new('RGB',[width,height],0)
  34.  
  35. #this allows us to access the pixels in the image directly
  36. pixels = im.load()
  37.  
  38. #maximum number of iterations for each point
  39. max_its = 255
  40.  
  41. #constant value for our julia set - i.e. the c in z_i+1 = z_i**2 + c
  42. ca,cb = -0.835,-0.232
  43.  
  44. #calculate the coordinates of all the pixels
  45. x_set = scipy.linspace(xmin,xmax,width)
  46. y_set = scipy.linspace(ymax,ymin,height)
  47.  
  48. #loop over all points and set the color
  49. for j,y in enumerate(y_set):
  50.   for i,x in enumerate(x_set):
  51.  
  52.     num_its = julia(x,y,ca,cb,max_its)
  53.  
  54.     if num_its != max_its:
  55.       pixels[i,j]= (num_its*2,0,0)
  56.  
  57. im.save('julia.png')

Ok, that was really quick to write and produces pretty pictures now how quickly does it run?

CODE:
  1. smalldev:~/code/python/pil> time ./julia.py
  2. 56.003u 0.268s 0:59.48 94.5%    0+0k 336+328io 0pf+0w
  3. smalldev:~/code/python/pil>

So roughly a minute. That's okay for producing an image or two every once in a while, but what if I wanted to create a huge amount of images from my own book of fractal images? It's going to take me forever :(

Bar none the easiest way to speed up your code is to try using Psyco. If you're using Ubuntu Psyco is a simple "sudo apt-get install python-psyco" away.

Adding the lines:

PYTHON:
  1. import psyco
  2. psyco.full()

to the top of our script gives us a roughly 20% increase in speed:

CODE:
  1. smalldev:~/code/python/pil> time ./julia_psy.py
  2. 47.642u 0.192s 0:51.77 92.3%    0+0k 0+328io 0pf+0w
  3. smalldev:~/code/python/pil>

Not bad but still not nearly as fast as we would like. That means moving on to profile our code to find out where the speed bottleneck is. In this case our program is fairly trivial and we can pretty easily guess before hand that the julia function is burning up most of our time. Turning to the cProfile module we can confirm our guess:

CODE:
  1. smalldev:~/code/python/pil> python -m cProfile -o profout julia.py
  2. smalldev:~/code/python/pil> python
  3. Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52)
  4. [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
  5. Type "help", "copyright", "credits" or "license" for more information.
  6. >>> import pstats
  7. >>> p = pstats.Stats('profout')
  8. >>> p.sort_stats('time').print_stats(10)
  9. Sun Nov 30 02:33:59 2008    profout
  10.  
  11.          985421 function calls (981638 primitive calls) in 68.518 CPU seconds
  12.  
  13.    Ordered by: internal time
  14.    List reduced from 707 to 10 due to restriction <10>
  15.  
  16.    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  17.    900000   58.821    0.000   58.821    0.000 julia.py:13(julia)
  18.         1    8.549    8.549   68.519   68.519 julia.py:10(<module>)
  19.         3    0.337    0.112    0.337    0.112 {built-in method encode}
  20.      5332    0.052    0.000    0.084    0.000 /usr/lib/python2.5/posixpath.py:56(join)

So a huge fraction of time is being spent in the julia function as we expected so the next step we can take is to rewrite this function in c and then call that c function from our python script. This is accomplished in a few steps. First we rewrite our function in c:

C:
  1. int julia(float a, float b,float ca, float cb,int max_it){
  2.   int num_its=0;
  3.   float temp;
  4.   while ((num_its <max_it) && (a*a+b*b <4.)){
  5.     temp = a*a-b*b;
  6.     b = 2*a*b;
  7.     a = temp+ca;
  8.     b += cb;
  9.     num_its++;
  10.   }
  11.  
  12.   return num_its;
  13. }

And next create a shared library containing our new julia function:

CODE:
  1. smalldev:~/code/python/pil> gcc -o julia.so --shared julia.c
  2. smalldev:~/code/python/pil>

We can then load this library into our python script using the NumPy ctypeslib load_library function. and replace our python implementation of julia() with this:

PYTHON:
  1. from ctypes import *
  2. #load the library and assign the library function julia to a local function
  3. lib = numpy.ctypeslib.load_library('julialib','./')
  4. julia = lib.julia

The only other change we have to make to our script is to cast our variables to ctypes like so:

PYTHON:
  1. #maximum number of iterations for each point
  2. max_its = c_int(255)
  3.  
  4. #constant value for our julia set - i.e. the c in z_i+1 = z_i**2 + c
  5. ca,cb = c_float(-0.835),c_float(-0.232)
  6.  
  7. #calculate the coordinates of all the pixels and cast to type c_float
  8. x_set = map(c_float,scipy.linspace(xmin,xmax,width))
  9. y_set = map(c_float,-scipy.linspace(ymax,ymin,height))

and when we're done our full script will look like this:

PYTHON:
  1. #!/usr/bin/env python
  2. #Demonstration of how to use ctypes to access a library function written in c
  3. #Allowing us to speed up the creation of a Julia set image
  4. #
  5. #R. Taylor Nov 25. 2008
  6. #email: randlet@canadianamp.ca
  7. #web: canadianamp.ca
  8. #This code is available under the WTF licence. Do Whatever The F&$#
  9. #you want with it but links to my website are always appreciated.
  10.  
  11. import numpy,scipy
  12. from ctypes import *
  13. import Image
  14.  
  15. #load the library and assign the library function julia to a local function
  16. lib = numpy.ctypeslib.load_library('julialib','./')
  17. julia = lib.julia
  18.  
  19. #set up the max and min coordinates to display in our image
  20. xmin,xmax = -2.5,2.5
  21. ymin,ymax = -1.0,1.0
  22.  
  23. #set number of pixels in our image
  24. height = 600
  25. width = int(height*((xmax-xmin)/(ymax-ymin)))
  26.  
  27. #create a new image using PIL with a black background
  28. im = Image.new('RGB',[width,height],0)
  29.  
  30. #this allows us to access the pixels in the image directly
  31. pixels = im.load()
  32.  
  33. #maximum number of iterations for each point
  34. max_its = c_int(255)
  35.  
  36. #constant value for our julia set - i.e. the c in z_i+1 = z_i**2 + c
  37. ca,cb = c_float(-0.835),c_float(-0.232)
  38.  
  39. #calculate the coordinates of all the pixels and cast to type c_float
  40. x_set = map(c_float,scipy.linspace(xmin,xmax,width))
  41. y_set = map(c_float,-scipy.linspace(ymax,ymin,height))
  42.  
  43.  
  44. #loop over all points and set the color
  45. for j,y in enumerate(y_set):
  46.   for i,x in enumerate(x_set):
  47.  
  48.     num_its = julia(x,y,ca,cb,max_its)
  49.  
  50.     if num_its != max_its:
  51.       pixels[i,j]= (num_its*2,0,0)
  52.  
  53. im.save('julia_c.png')

Simple and straightforward, but how much of a gain do we get?

CODE:
  1. smalldev:~/code/python/pil> time ./julia_c.py
  2. 3.660u 0.140s 0:03.82 99.40+0k 0+328io 0pf+0w
  3. smalldev:~/code/python/pil>

That is a whopping 15 fold decrease in run time for very little programming effort! Of course this is a fairly trivial example, but I hope it gives you an idea of how python can be used for computationally intensive problems without the large decrease in speed normally associated with interpreted languages.

If you enjoyed this post you may want to check out Python Scripting for Computational Science :

November 30th, 2008 by randlet Get 25 FREE iPod® compatible downloads from eMusic! Choose from over 2.8 Million DRM free songs! Works on ANY MP3 player

Preventing Rogers Search Results on Mistyped Address

Sometime around July 2008 Rogers started the very annoying practice of doing DNS redirects to sponsored search results when you mistyped a url in the search bar or tried to access a non existent website. This may or may not be a violation of Net Neutrality but more than anything it was just annoying to have to retype an address when I mistyped one letter. In some cases this breaks the address bar auto searches of some browsers and their are reports of it killing remote web apps. So just standard Rogers not really paying attention to what it's customers want.

There is a lot of disscussion of this topic at MichaelGeist.ca and most importantly for me was I found a way to avoid this problem in the future. It seems that Rogers recieved a lot of complaints about this and in response quietly provided an alternate dns server which won't do a redirect.

Here is how to set up that DNS server for yourself in Windows XP:
1. Open the Control Panel (Start->Settings->Control Panel)

2. Double click on "Network Connections"

XP Control Panel

XP Control Panel

3. Right click on the connection used to access your internet (usually Local Area Network)

Network Connections

Network Connections

4. Click on TCP/IP and then click on the Properties button

tcp/ip properties

tcp/ip properties

5. Click on "Use the following DNS Addresses" and enter 64.71.255.202 (this is an alternate Rogers DNS, see the comments below if you want to use Open DNS instead)

Alternate DNS Server

Alternate DNS Server

6. Click Ok, then OK and then you're done!

Now when you type www.adsfasdlfkasdlf.com you should no longer be directed to a rogers sponsered search results.

The process is similar on Vista:

Start->Control Panel->View Network Status and Tasks->Manage Network Connections->right click on your connection and then follow from step 3 above.

If you are feeling really ambitious fire off a complaint to :

The Office of the President
Rogers Cablesystems - Rogers Yahoo! HiSpeed Internet
855 York Mills Rd
Don Mills, Ontario
M3B 1Z1

November 21st, 2008 by randlet Get 25 FREE iPod® compatible downloads from eMusic! Choose from over 2.8 Million DRM free songs! Works on ANY MP3 player

Visualizing 3D scientific data with free software, python NumPy, vtk & MayaVi

Recently I was tasked with creating a visual representation of a table of scientific data so that a user could explore and visualize their dataset interactively. Visualizing a lot of data at one time can really lead to a much better understanding of what your data looks like and how it behaves as a function of it's parameters. For example, which one of the following representations of temperature difference data provides you with the most intuitive feeling of how the data varies as a function of the positions (x,y)?

Exhibit A - The Spreadsheet:

spread sheet showing data

spread sheet showing data

Exhibit B - A Colour Map:

colour map of temperature data

colour map of temperature data

It's not even close, the colour map instantly shows you where the hottest and coldest regions in space are. We can even take it a step further and get a full 3d representation of the data like so:

3d colour map

3d colour map

Now we are able to get a better idea of how steep the temperature gradients are than we can with the colour map alone.

Okay, so great, now we know how much nicer looking at 3d data is how do you go from your boring spreadsheet to a 3d plot? How about making the data interactive so that we can zoom in, rotate or otherwise manipulate the data?

Well, Excel has very limited 3d plotting capabilities, particulary if your coordinates aren't spaced linearly in space. Ditto for Origin. My next thought was to download some free or very cheap plotting software of which there seems to be a < href="http://www.google.com/search?&q=3d+plotting+software">tonne of options, however my first attempts didn't really give me what I was looking for and most of these products aren't cross platform. Since I move regularly between windows and linux environments this is definitely a must for me. Commercial software such as Matlab and MathCad or Mathematica are all great but expensive so I just decided it would be easier to roll-my-own.

After a bit of searching around, I decided to use a combination of python and the related modules NumPy & SciPy and pyvtk. All three are a snap to install on either windows or linux. The plan is to use python to create a Visual Toolkit file that can be displayed using MayaVi. Yeah, installing all of that might seem like a lot of work, but it truly is pretty simple on either linux or windows and they are all very useful in their own right.

The data I was given was a csv file with three colums (x,y,z) that looked like:

CODE:
  1. 0   1.5    0.61
  2. 0   1.8    0.07
  3. 0   2  -0.19
  4. 0   2.2    -0.39
  5. 0   2.4    -0.51
  6. 0   2.6    -0.56
  7. 0   2.8    -0.57
  8. 0   3  -0.53
  9. 0.5 1.5  0.43
  10. 0.5 1.8  -0.06
  11. 0.5 2    -0.31
  12. 0.5 2.2  -0.48
  13. 0.5 2.4  -0.59
  14. 0.5 2.6  -0.64
  15. 0.5 2.8  -0.63
  16. 0.5 3    -0.59
  17. 1   1.5    0.04
  18. 1   1.8    -0.36
  19. 1   2  -0.55
  20. 1   2.2    -0.69
  21. 1   2.4    -0.76
  22. 1   2.6    -0.78
  23. 1   2.8    -0.76
  24. 1   3  -0.7
  25. 1.5 0    1.02
  26. 1.5 0.5  0.81
  27. .
  28. .
  29. .
  30. etc

So the first thing that needed to be done is to read the data in, extract all the x and y coordinates and convert the data to a matrix rather than the unordered columns. (There is probably a more elegant way to do this but oh well, it works). Depending on the format that your data is in you may have to manipulate it differently to get it into the correct format

PYTHON:
  1. #!/usr/bin/env python
  2.  
  3. from numpy import *
  4. import pyvtk
  5.  
  6. df = open('dat.csv','r')
  7.  
  8. #set up arrays to hold data and coordinates
  9. x=[]
  10. y=[]
  11. z=[]
  12. all_dat = []
  13.  
  14. for line in df:
  15.   #read in a line of data and convert it to floats
  16.   [xd,yd,zd] = map(float,line.split(','))
  17.  
  18.   #save the unique x and y coordinates
  19.   if not xd in x: x.append(xd)
  20.   if not yd in y: y.append(yd)
  21.  
  22.   #create a list of all the data points 
  23.   all_dat.append([xd,yd,zd])
  24.  
  25. df.close()
  26.  
  27. #make sure our coordinates are in increasing order
  28. x.sort()
  29. y.sort()
  30.  
  31. #create a matrix big enough to hold our data
  32. plot_grid = zeros((len(x),len(y)),float)
  33.  
  34. #now we iterate over our data and fill in the matrix
  35. for i, xv in enumerate(x):
  36.   for j, yv in enumerate(y):
  37.     for line in all_dat:
  38.       xo = line[0]
  39.       yo = line[1]
  40.       if xo == xv and yo == yv:
  41.         plot_grid[i][j] = line[2]

Which gives you a grid that looks like this:

CODE:
  1. [[ 0.    0.    0.    0.61  0.07 -0.19 -0.39 -0.51 -0.56 -0.57 -0.53]
  2.  [ 0.    0.    0.    0.43 -0.06 -0.31 -0.48 -0.59 -0.64 -0.63 -0.59]
  3.  [ 0.    0.    0.    0.04 -0.36 -0.55 -0.69 -0.76 -0.78 -0.76 -0.7 ]
  4.  [ 1.02  0.81  0.3  -0.27 -0.56 -0.69 -0.79 -0.83 -0.83 -0.8  -0.73]
  5.  [ 0.66  0.51  0.12 -0.33 -0.55 -0.66 -0.73 -0.77 -0.76 -0.73 -0.67]
  6.  [ 0.52  0.39  0.07 -0.3  -0.49 -0.58 -0.65 -0.68 -0.68 -0.65 -0.6 ]
  7.  [ 0.45  0.34  0.08 -0.22 -0.39 -0.47 -0.53 -0.56 -0.56 -0.55 -0.51]
  8.  [ 0.43  0.34  0.13 -0.12 -0.26 -0.33 -0.39 -0.42 -0.43 -0.42 -0.4 ]
  9.  [ 0.45  0.38  0.21  0.   -0.12 -0.18 -0.24 -0.27 -0.29 -0.3  -0.29]
  10.  [ 0.48  0.42  0.29  0.13  0.02 -0.04 -0.09 -0.13 -0.16 -0.18 -0.18]
  11.  [ 0.5   0.46  0.37  0.23  0.14  0.09  0.03 -0.01 -0.05 -0.07 -0.09]]

Now all that's left to do is create a vtk file that MayaVi can use. I modified the example from MayaVi guide to create our vtk file.

PYTHON:
  1. #change the matrix to the proper format for vtk
  2. z1 = reshape(transpose(plot_grid), (-1,))
  3. point_data = pyvtk.PointData(pyvtk.Scalars(z1))
  4.  
  5. #create our grid
  6. grid = pyvtk.RectilinearGrid(x, y, 0)
  7.  
  8. #write our vtk file
  9. vt = pyvtk.VtkData(grid, point_data)
  10. vt.tofile('dat.vtk')

And we're done. Now we load up MayaVi and go to File->Open->VTK file and open up our freshly created vtk file. When you first open the file nothing will appear on the canvas. We need to add a module in order to be able to see our data. So go to Visualize->Modules->Surface Map and click okay. This will show you a 2D colour map of your data. In order to get a 3d representaion of the data we need to add the WarpScalar filter via Visualize->Filters->WarpScalar. Boom. A full 3d representation of the data that you can zoom in on rotate and much more. Add axes and labels using the Visualize->Modules->Axes option. MayaVi also allows you to take snapshots of the data and export screenshots via File->Save Scene To-> png, bmp etc etc.

mayavi window displaying data

mayavi window displaying data

I also was able to package the file into a windows executable using the great bbfreeze which allowed me to distribute the script to computers without python and the other packages installed (except of course mayavi which is needed to display the file).

Dive Into Python is a great introduction to Python if you are unfamiliar with the language. I highly recommend learning the language if you are a scientist who currently uses spreadsheets for doing tedious manipulations of data. It is relatively simple to pick up and can save you a huge amount of time by automating tasks for you.

If you like you can download the visualization script and the sample data.

November 20th, 2008 by randlet Get 25 FREE iPod® compatible downloads from eMusic! Choose from over 2.8 Million DRM free songs! Works on ANY MP3 player

Testing out WordPress Syntax Highlighting

I just came across an excellent syntax highlighting plugin for WordPress: iG Syntax Hiliter. It has support for a bunch of languages and allows readers to toggle between good looking highlighted html code and plain text (just click on the "PLAIN TEXT" bar) that can easily be cut and pasted into a text editor. Couple examples shown below.

PYTHON:
  1. if __name__ == '__main__':
  2.     print "Hello Word"

C:
  1. #include <stdio.h>
  2. int main() {
  3.   printf("Hello World!\n");
  4.   return 0;
  5. }

sweet!

November 20th, 2008 by randlet Get 25 FREE iPod® compatible downloads from eMusic! Choose from over 2.8 Million DRM free songs! Works on ANY MP3 player

Click here for your 50 FREE songs from eMusic!