I picked up Beginning Python Visualization: Crafting Visual Transformation Scripts (Books for Professionals by Professionals) on an impulse Amazon buy and got a chance to read it over the weekend. My initial impression is that this would serve as a solid introduction to data visualization in python for people who have a bit of programming experience (no python experience is assumed) and are interested in presenting it in a graphical form. One of the neat thigs about the book is that the first chapter just dives in to a simple case study of visualizing some GPS data. The author first walks you through how to use < a href=”http://pyserial.wiki.sourceforge.net/pySerial”>pySerial to read GPS data from a device and store it in a file, placing an emphasis on creating meaningful file names and directory structures. Once some data has been collected Python’s built in csv reader is used to parse the data and extract position and velocity information which is then used to create a nice GPS track plot including velocity vectors with < a href=”http://matplotlib.sourceforge.net/”>matplotlib. I am undecided on whether I am totally happy with the authors choice of subject for the first chapter case study. On the one hand it is nice to see a really concrete and simple real world example. On the other hand I have to wonder whether reading data from a serial port might be a little intimidating for people who are new to programming, not to mention the fact that not everyone has a GPS receiver. This is a minor quibble since the code is written clearly and is easy to follow along even if you don’t type it out and try it for yourself. Overall the first chapter gives you the “Why” of data visualization with the later chapters focusing on the “How”. Chapter 2 was a nice addition as it focuses on choosing, and setting up, a development platform with a focus on Linux but Windows and Mac are not ignored and it is made clear that all of these programs will work on whichever platform you choose. This part was fairly brief (by neccessity) but covered all of the basics including discussion of choosing an editor (Emacs, vim, Notepad++ and more), prebuilt scientific python packages including the Enthought Python Distribution, IPython and python(x,y). Vaingast also instructs how to get the bedrock scientific and visualization python packages SciPy, matplotlib and the < a href=”http://www.pythonware.com/products/pil/”>Python Imaging Library (PIL). In my experience these packages are going to cover a large percentage of your visualization needs. Chapter 3 provides a quick but good introduction to the Python language for people who have a bit of programming experience. Another great intro to Python of course is Mark Pilgrim’s
Dive Into PYTHON
. Chapters 4 and 5 cover reading and writing to some common file formats including csv and ini files. Both text and binary files are covered although other formats like XML and Yaml would have been a nice addition. Chapter 6 gives a very nice introduction to matplotlib and whatever isn’t covered here can be found in the extensive online documentation Lots of other important foundations are covered in Chapters 7 and 8 including complex numbers, multi-dimensional arrays and linear algebra functions. There is also coverage of more advanced concepts including interpolation, solving non-linear equations and Fourier transformations (FFT’s). There is some discussion of 2 dimensional data in chapter 9, but I felt the book falls a little bit short in this regard. While contour and filled contour plots are mentioned briefly earlier in the book, there is very little discussion of working with data that is a function of more than one coordinate. In this sense I feel the book falls a bit short since part of the appeal of matplotlib is that is has excellent support for < a href=”http://matplotlib.sourceforge.net/api/pyplot_api.html?highlight=contourf#matplotlib.pyplot.contourf”>contours built in. There is also no discussion of how to use tools like MayaVi and VTK to visualize 3d scientific data which is a shame since this is where visualizing data starts to get really interesting to me. Again this is a minor quibble since this book is aimed at beginners and does not set out to cover everything. In summary I would highly recommend Beginning Python Visualization: Crafting Visual Transformation Scripts (Books for Professionals by Professionals)
to anyone looking to get there feet wet with data visualization.
Beginning Python Visualization - Book Review
Simple Huckel Molecular Orbitals with Orbis!
So I just released the first beta of a program called Orbis that does Simple Huckel Molecular Orbital (or SHMO for short) calculations. The beta version is free and valid until June 1st after which it will likely go on sale for around $20 US per license.
Right now there is only a windows version, however I hope to have a Mac version ready in the coming months.
Here is a couple quick videos showing Orbis in action.
Creating a simple benzene ring and then converting it to pyridine by changing one of the atom types.
Orbis displaying the 24 membered molecule Coronene
You can download a trial at www.simplehuckel.com.
Dreaming In Code
I just finished this excellent book: Dreaming in Code: Two Dozen Programmers, Three Years, 4,732 Bugs, and One Quest for Transcendent Software last night and thought I would share it. The book follows the development of Chandler from start to, ummm…not finished over a period of 3 or 4 years. It details all of the trials and tribulations of the group and really just emphasizes just how hard it is to write good software.
Much of what is said in Dreaming in Code will resonate really loudly (read: gives you that sinking feeling in your stomach) with anybody who has ever sat down and tried to write or fix code, particulary in a group setting. The book is written by Scott Rosenberg who founded Salon.com and I think he does a great job of striking a balance between making it technical enough to keep programmers entertained while keeping it accessible to people with little or no software development experience who want a glimpse into the world of software. The latter crowd will definitely gain an appreciation for why many of the programs they use everyday still contain mistakes, crash often or don’t behave the way they feel like it should.
Another bonus for me was that Chandler is written in Python and there is some discussion in Dreaming in Code about the merits of python and why it was chosen for the project.
Anyway I thought it was a great and entertaining read which gave some insight into startups, open source software projects, philosophy, and the joys and immense frustrations of software development. Highly recommended.
Enabling GPX support for GDAL in Ubuntu (for use in geodjango)
I have been building a new project with django, pinax and geodjango. Part of the project involves parsing GPX files created by GPS devices. In the DjangoCon talk (at youtube by Justin Bronn he mentions that GDAL can parse GPX files which would be very helpful for me.
Originally I followed the GeoDjango iinstall instructions However, when I tried to parse a simple gpx file I was present with this error
-
randlet:~> ogrinfo -ro -al Hockley-Rosseau.gpx
-
Error 6: OGR/GPX driver has not been built with read support. Expat library required"
-
.
-
.
-
.
After a bit of googling I came up with the following solution. First step is to get the expat xml library (including dev) installed like so;
-
sudo apt-get install expat libexpat1-dev
After this you should have a libexpat.h header file and an expat.so located in /usr/include/ and /usr/lib/ respectively.
Now change into the directory that you installed GDAL from during this step of the GeoDjango install process. If you have already installed GDAL then you will need to do a clean install and configure like so (output removed):
-
randlet:~/gdal-1.6.0> make clean
-
randlet:~/gdal-1.6.0> ./configure --with-expat=/usr/ --with-python
-
GDAL is now configured for i686-pc-linux-gnu
-
-
Installation directory: /usr/local
-
C compiler: gcc -g -O2
-
C++ compiler: g++ -g -O2
-
-
LIBTOOL support: yes
-
-
LIBZ support: external
-
GRASS support: no
-
CFITSIO support: no
-
PCRaster support: internal
-
NetCDF support: no
-
LIBPNG support: internal
-
LIBTIFF support: internal (BigTIFF=yes)
-
LIBGEOTIFF support: internal
-
LIBJPEG support: internal
-
LIBGIF support: internal
-
OGDI support: no
-
HDF4 support: no
-
HDF5 support: no
-
Kakadu support: no
-
JasPer support: no
-
ECW support: no
-
MrSID support: no
-
MSG support: no
-
GRIB support: yes
-
cURL support (wms/wcs/...):no
-
PostgreSQL support: yes
-
MySQL support: no
-
Ingres support: no
-
Xerces-C support: no
-
NAS support: no
-
Expat support: yes
-
ODBC support: no
-
PGeo support: no
-
OCI support: no
-
GEORASTER support: no
-
SDE support: no
-
DODS support: no
-
SQLite support: no
-
DWGdirect support no
-
INFORMIX DataBlade support:no
-
GEOS support: yes
-
-
Old-gen python no
-
SWIG Bindings: python
-
-
Statically link PROJ.4: no
-
enable OGR building: yes
-
enable pthread support: no
-
hide internal symbols: no
-
randlet:~/gdal-1.6.0> make #now go get yourself a beer
-
randlet:~/gdal-1.6.0> sudo make install
After that you should be able to use a gpx file as a datasource in your geodjango project:
-
randlet:~/my_project/> python manage.py shell
-
>>> from django.contrib.gis.gdal import DataSource
-
>>> ds = DataSource('Billings.gpx')
-
>>> ds.layer_count
-
5
-
>>> ds.name
-
'/Billings.gpx'
-
>>> layer = ds[0]
-
>>> wpt = layer.get_geoms()[0]
-
>>> wpt.coords
-
(-108.644942, 45.132831000000003)
-
>>>
Hopefully that will save someone some time in the future as the solution was not immediately obvious to me when I googled around for an answer.
Simple Geocoding with python and urllib
I recently had to geocode a huge number of addresses for a project I'm working on. Here is a simple python script that reads in a bunch of addresses from a text file and then uses the Google maps api to geocode them (get their latitude and longitude).
The format of the address file should be roughly like:
-
123 mystreet, Beverly Hills, CA, 90210
-
456 mystreet, Beverly Hills, CA, 90210
-
789 mystreet, Beverly Hills, CA, 90210
-
.
-
.
-
.
although the Google geocoder is quite forgiving about the format of addresses so you can probably get away with something like
-
CN Tower Toronto Ontario
-
Bloor Street Toronto Ontario
-
.
-
.
-
.
You will also need to get your own google maps api key which you can get over at Google.
-
import urllib,urllib2,time
-
-
addr_file = 'addresses.csv'
-
out_file = 'addresses_geocoded.csv'
-
out_file_failed = 'failed.csv'
-
-
sleep_time = 2
-
root_url = "http://maps.google.com/maps/geo?"
-
gkey = "YourGoogleKeyGoesHere"
-
-
return_codes = {'200':'SUCCESS',
-
'400':'BAD REQUEST',
-
'500':'SERVER ERROR',
-
'601':'MISSING QUERY',
-
'602':'UNKOWN ADDRESS',
-
'603':'UNAVAILABLE ADDRESS',
-
'604':'UNKOWN DIRECTIONS',
-
'610':'BAD KEY',
-
'620':'TOO MANY QUERIES'
-
}
-
-
def geocode(addr,out_fmt='csv'):
-
-
#encode our dictionary of url parameters
-
values = {'q' : addr, 'output':out_fmt, 'key':gkey}
-
data = urllib.urlencode(values)
-
-
#set up our request
-
url = root_url+data
-
req = urllib2.Request(url)
-
-
#make request and read response
-
response = urllib2.urlopen(req)
-
geodat = response.read().split(',')
-
response.close()
-
-
#handle the data returned from google
-
code = return_codes[geodat[0]]
-
if code == 'SUCCESS':
-
code,precision,lat,lng = geodat
-
return {'code':code,'precision':precision,'lat':lat,'lng':lng}
-
else:
-
return {'code':code}
-
-
def main():
-
-
#open our i/o files
-
outf = open(out_file,'w')
-
outf_failed = open(out_file_failed,'w')
-
inf = open(addr_file,'r')
-
-
for address in inf:
-
-
#get latitude and longitude of address
-
data = geocode(address)
-
-
#output results and log to file
-
if len(data)>1:
-
print "Latitude and Longitude of "+address+":"
-
print "\tLatitude:",data['lat']
-
print "\tLongitude:",data['lng']
-
outf.write(address.strip()+data['lat']+','+data['lng']+'\n')
-
outf.flush()
-
else:
-
print "Geocoding of '"+addr+"' failed with error code "+data['code']
-
outf_failed.write(address)
-
outf_failed.flush()
-
-
#play nice and don't just pound the server with requests
-
time.sleep(sleep_time)
-
-
#clean up
-
inf.close()
-
outf.close()
-
outf_failed.close()
-
-
if __name__ == "__main__":
-
main()
Nice and easy. You may also want to check out the geopy package which looks very nice and includes support for distance calculations.
Steven Hawking is coming to Canada!
It was just announced that Stephen Hawking will be accepting a postion at the Perimeter Institute for theoretical physics in Waterloo. It appears that Dr. Hawking will not be moving to Waterloo full time but rather making regular stays in Waterloo starting in the summer of '09.
Hawking is likely the most well known modern scientists outside of the scientific community. Largely due to his hugely successful book A Brief History of Time and its follow up The Universe in a Nutshell
both of which brought the realms of things such as superstring theory and how the universe evolved to laypersons.
Also, you really know you've made it big in science when you make a cameo on the simpsons.
Hopefully he'll do a couple public lectures that I'll be able to attend.
Speeding up your python by calling c functions and using Psyco
One of the greatest things about python is how quickly you can develop fully functioning scripts/programs in it. Both from the perspective of the large number of library functions available which make your life easier and also the fact that you don't have to go through a compile step every time you edit your code. Unfortunately this rapid development usually comes at the cost of raw run time speed. Often this isn't a problem, for example if your python script takes 1 second to run when the equivalent program written in c would take 0.2 s to run the 500% difference is practically negligible. Even if the python script takes 10 hours to run where the equivalent c program takes 2 hours this may not be a large problem if you are only going to ever run the script once or twice since the huge amount of time you saved in developing the script in python more than makes up for the difference. The speed deficit of an interpreted language like python really starts to pop up when you have a program that needs to be run repeatedly or requires lots of user interaction and therefore must be fast enough to prevent a user from becoming frustrated.
Luckily, in python there are a number of possibilities that allow developers to take advantage of the rapid development possible in python while still retaining reasonable run times. Probably the two most popular choices are using the python module Psyco and/or rewriting the most computationally intensive functions in c and calling those c functions from your python script. As Jeremy Jones at onlamp.com puts it:
One of the frequently repeated mantras I hear from Python veterans is, "Premature optimization is the root of all evil" (a quote from Donald Knuth, which apparently originated from Tony Hoare). The Pythonic approach to writing code that performs acceptably is:
* Write the code.
* Test it to see whether performance is OK right off the bat.
* If so, you're done.
* If not, optionally try Psyco.
* Profile it and identify the bottlenecks.
* Modify the code until the performance is as fast as you need, rewriting bottlenecks in C (or Pyrex) if necessary.
As an example of this development cycle I wrote a quick python script to produce images of a Julia set using the Python Imaging Library.
Here is the first implementation of this script written purely in python
-
#!/usr/bin/env python
-
#Create an image of a Julia set
-
#
-
#R. Taylor Nov 25. 2008
-
#email: randlet@canadianamp.ca
-
#web: canadianamp.ca
-
#This code is available under the WTF licence. Do Whatever The F&$#
-
#you want with it but links to this blog are always appreciated.
-
-
import numpy,scipy
-
import Image
-
-
def julia(a,b,ca,cb,maxit):
-
numit = 0
-
while (a*a+b*b<4. and numit <maxit):
-
temp = a*a-b*b
-
b = 2*a*b
-
a = temp + ca
-
b += cb
-
numit += 1
-
-
return numit
-
-
#set up the max and min coordinates to display in our image
-
xmin,xmax = -2.5,2.5
-
ymin,ymax = -1.0,1.0
-
-
#set number of pixels in our image
-
height = 600
-
width = int(height*((xmax-xmin)/(ymax-ymin)))
-
-
#create a new image using PIL with a black background
-
im = Image.new('RGB',[width,height],0)
-
-
#this allows us to access the pixels in the image directly
-
pixels = im.load()
-
-
#maximum number of iterations for each point
-
max_its = 255
-
-
#constant value for our julia set - i.e. the c in z_i+1 = z_i**2 + c
-
ca,cb = -0.835,-0.232
-
-
#calculate the coordinates of all the pixels
-
x_set = scipy.linspace(xmin,xmax,width)
-
y_set = scipy.linspace(ymax,ymin,height)
-
-
#loop over all points and set the color
-
for j,y in enumerate(y_set):
-
for i,x in enumerate(x_set):
-
-
num_its = julia(x,y,ca,cb,max_its)
-
-
if num_its != max_its:
-
pixels[i,j]= (num_its*2,0,0)
-
-
im.save('julia.png')
Ok, that was really quick to write and produces pretty pictures now how quickly does it run?
-
smalldev:~/code/python/pil> time ./julia.py
-
56.003u 0.268s 0:59.48 94.5% 0+0k 336+328io 0pf+0w
-
smalldev:~/code/python/pil>
So roughly a minute. That's okay for producing an image or two every once in a while, but what if I wanted to create a huge amount of images from my own book of fractal images? It's going to take me forever :(
Bar none the easiest way to speed up your code is to try using Psyco. If you're using Ubuntu Psyco is a simple "sudo apt-get install python-psyco" away.
Adding the lines:
-
import psyco
-
psyco.full()
to the top of our script gives us a roughly 20% increase in speed:
-
smalldev:~/code/python/pil> time ./julia_psy.py
-
47.642u 0.192s 0:51.77 92.3% 0+0k 0+328io 0pf+0w
-
smalldev:~/code/python/pil>
Not bad but still not nearly as fast as we would like. That means moving on to profile our code to find out where the speed bottleneck is. In this case our program is fairly trivial and we can pretty easily guess before hand that the julia function is burning up most of our time. Turning to the cProfile module we can confirm our guess:
-
smalldev:~/code/python/pil> python -m cProfile -o profout julia.py
-
smalldev:~/code/python/pil> python
-
Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52)
-
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
-
Type "help", "copyright", "credits" or "license" for more information.
-
>>> import pstats
-
>>> p = pstats.Stats('profout')
-
>>> p.sort_stats('time').print_stats(10)
-
Sun Nov 30 02:33:59 2008 profout
-
-
985421 function calls (981638 primitive calls) in 68.518 CPU seconds
-
-
Ordered by: internal time
-
List reduced from 707 to 10 due to restriction <10>
-
-
ncalls tottime percall cumtime percall filename:lineno(function)
-
900000 58.821 0.000 58.821 0.000 julia.py:13(julia)
-
1 8.549 8.549 68.519 68.519 julia.py:10(<module>)
-
3 0.337 0.112 0.337 0.112 {built-in method encode}
-
5332 0.052 0.000 0.084 0.000 /usr/lib/python2.5/posixpath.py:56(join)
So a huge fraction of time is being spent in the julia function as we expected so the next step we can take is to rewrite this function in c and then call that c function from our python script. This is accomplished in a few steps. First we rewrite our function in c:
-
int julia(float a, float b,float ca, float cb,int max_it){
-
int num_its=0;
-
float temp;
-
while ((num_its <max_it) && (a*a+b*b <4.)){
-
temp = a*a-b*b;
-
b = 2*a*b;
-
a = temp+ca;
-
b += cb;
-
num_its++;
-
}
-
-
return num_its;
-
}
And next create a shared library containing our new julia function:
-
smalldev:~/code/python/pil> gcc -o julia.so --shared julia.c
-
smalldev:~/code/python/pil>
We can then load this library into our python script using the NumPy ctypeslib load_library function. and replace our python implementation of julia() with this:
-
from ctypes import *
-
#load the library and assign the library function julia to a local function
-
lib = numpy.ctypeslib.load_library('julialib','./')
-
julia = lib.julia
The only other change we have to make to our script is to cast our variables to ctypes like so:
-
#maximum number of iterations for each point
-
max_its = c_int(255)
-
-
#constant value for our julia set - i.e. the c in z_i+1 = z_i**2 + c
-
ca,cb = c_float(-0.835),c_float(-0.232)
-
-
#calculate the coordinates of all the pixels and cast to type c_float
-
x_set = map(c_float,scipy.linspace(xmin,xmax,width))
-
y_set = map(c_float,-scipy.linspace(ymax,ymin,height))
and when we're done our full script will look like this:
-
#!/usr/bin/env python
-
#Demonstration of how to use ctypes to access a library function written in c
-
#Allowing us to speed up the creation of a Julia set image
-
#
-
#R. Taylor Nov 25. 2008
-
#email: randlet@canadianamp.ca
-
#web: canadianamp.ca
-
#This code is available under the WTF licence. Do Whatever The F&$#
-
#you want with it but links to my website are always appreciated.
-
-
import numpy,scipy
-
from ctypes import *
-
import Image
-
-
#load the library and assign the library function julia to a local function
-
lib = numpy.ctypeslib.load_library('julialib','./')
-
julia = lib.julia
-
-
#set up the max and min coordinates to display in our image
-
xmin,xmax = -2.5,2.5
-
ymin,ymax = -1.0,1.0
-
-
#set number of pixels in our image
-
height = 600
-
width = int(height*((xmax-xmin)/(ymax-ymin)))
-
-
#create a new image using PIL with a black background
-
im = Image.new('RGB',[width,height],0)
-
-
#this allows us to access the pixels in the image directly
-
pixels = im.load()
-
-
#maximum number of iterations for each point
-
max_its = c_int(255)
-
-
#constant value for our julia set - i.e. the c in z_i+1 = z_i**2 + c
-
ca,cb = c_float(-0.835),c_float(-0.232)
-
-
#calculate the coordinates of all the pixels and cast to type c_float
-
x_set = map(c_float,scipy.linspace(xmin,xmax,width))
-
y_set = map(c_float,-scipy.linspace(ymax,ymin,height))
-
-
-
#loop over all points and set the color
-
for j,y in enumerate(y_set):
-
for i,x in enumerate(x_set):
-
-
num_its = julia(x,y,ca,cb,max_its)
-
-
if num_its != max_its:
-
pixels[i,j]= (num_its*2,0,0)
-
-
im.save('julia_c.png')
Simple and straightforward, but how much of a gain do we get?
-
smalldev:~/code/python/pil> time ./julia_c.py
-
3.660u 0.140s 0:03.82 99.4% 0+0k 0+328io 0pf+0w
-
smalldev:~/code/python/pil>
That is a whopping 15 fold decrease in run time for very little programming effort! Of course this is a fairly trivial example, but I hope it gives you an idea of how python can be used for computationally intensive problems without the large decrease in speed normally associated with interpreted languages.
If you enjoyed this post you may want to check out Python Scripting for Computational Science :
Preventing Rogers Search Results on Mistyped Address
Sometime around July 2008 Rogers started the very annoying practice of doing DNS redirects to sponsored search results when you mistyped a url in the search bar or tried to access a non existent website. This may or may not be a violation of Net Neutrality but more than anything it was just annoying to have to retype an address when I mistyped one letter. In some cases this breaks the address bar auto searches of some browsers and their are reports of it killing remote web apps. So just standard Rogers not really paying attention to what it's customers want.
There is a lot of disscussion of this topic at MichaelGeist.ca and most importantly for me was I found a way to avoid this problem in the future. It seems that Rogers recieved a lot of complaints about this and in response quietly provided an alternate dns server which won't do a redirect.
Here is how to set up that DNS server for yourself in Windows XP:
1. Open the Control Panel (Start->Settings->Control Panel)
2. Double click on "Network Connections"
3. Right click on the connection used to access your internet (usually Local Area Network)
4. Click on TCP/IP and then click on the Properties button
5. Click on "Use the following DNS Addresses" and enter 64.71.255.202 (this is an alternate Rogers DNS, see the comments below if you want to use Open DNS instead)
6. Click Ok, then OK and then you're done!
Now when you type www.adsfasdlfkasdlf.com you should no longer be directed to a rogers sponsered search results.
The process is similar on Vista:
Start->Control Panel->View Network Status and Tasks->Manage Network Connections->right click on your connection and then follow from step 3 above.
If you are feeling really ambitious fire off a complaint to :
The Office of the President
Rogers Cablesystems - Rogers Yahoo! HiSpeed Internet
855 York Mills Rd
Don Mills, Ontario
M3B 1Z1
Visualizing 3D scientific data with free software, python NumPy, vtk & MayaVi
Recently I was tasked with creating a visual representation of a table of scientific data so that a user could explore and visualize their dataset interactively. Visualizing a lot of data at one time can really lead to a much better understanding of what your data looks like and how it behaves as a function of it's parameters. For example, which one of the following representations of temperature difference data provides you with the most intuitive feeling of how the data varies as a function of the positions (x,y)?
Exhibit A - The Spreadsheet:
Exhibit B - A Colour Map:
It's not even close, the colour map instantly shows you where the hottest and coldest regions in space are. We can even take it a step further and get a full 3d representation of the data like so:
Now we are able to get a better idea of how steep the temperature gradients are than we can with the colour map alone.
Okay, so great, now we know how much nicer looking at 3d data is how do you go from your boring spreadsheet to a 3d plot? How about making the data interactive so that we can zoom in, rotate or otherwise manipulate the data?
Well, Excel has very limited 3d plotting capabilities, particulary if your coordinates aren't spaced linearly in space. Ditto for Origin. My next thought was to download some free or very cheap plotting software of which there seems to be a < href="http://www.google.com/search?&q=3d+plotting+software">tonne of options, however my first attempts didn't really give me what I was looking for and most of these products aren't cross platform. Since I move regularly between windows and linux environments this is definitely a must for me. Commercial software such as Matlab and MathCad
or Mathematica
are all great but expensive so I just decided it would be easier to roll-my-own.
After a bit of searching around, I decided to use a combination of python and the related modules NumPy & SciPy and pyvtk. All three are a snap to install on either windows or linux. The plan is to use python to create a Visual Toolkit file that can be displayed using MayaVi. Yeah, installing all of that might seem like a lot of work, but it truly is pretty simple on either linux or windows and they are all very useful in their own right.
The data I was given was a csv file with three colums (x,y,z) that looked like:
-
0 1.5 0.61
-
0 1.8 0.07
-
0 2 -0.19
-
0 2.2 -0.39
-
0 2.4 -0.51
-
0 2.6 -0.56
-
0 2.8 -0.57
-
0 3 -0.53
-
0.5 1.5 0.43
-
0.5 1.8 -0.06
-
0.5 2 -0.31
-
0.5 2.2 -0.48
-
0.5 2.4 -0.59
-
0.5 2.6 -0.64
-
0.5 2.8 -0.63
-
0.5 3 -0.59
-
1 1.5 0.04
-
1 1.8 -0.36
-
1 2 -0.55
-
1 2.2 -0.69
-
1 2.4 -0.76
-
1 2.6 -0.78
-
1 2.8 -0.76
-
1 3 -0.7
-
1.5 0 1.02
-
1.5 0.5 0.81
-
.
-
.
-
.
-
etc
So the first thing that needed to be done is to read the data in, extract all the x and y coordinates and convert the data to a matrix rather than the unordered columns. (There is probably a more elegant way to do this but oh well, it works). Depending on the format that your data is in you may have to manipulate it differently to get it into the correct format
-
#!/usr/bin/env python
-
-
from numpy import *
-
import pyvtk
-
-
df = open('dat.csv','r')
-
-
#set up arrays to hold data and coordinates
-
x=[]
-
y=[]
-
z=[]
-
all_dat = []
-
-
for line in df:
-
#read in a line of data and convert it to floats
-
[xd,yd,zd] = map(float,line.split(','))
-
-
#save the unique x and y coordinates
-
if not xd in x: x.append(xd)
-
if not yd in y: y.append(yd)
-
-
#create a list of all the data points
-
all_dat.append([xd,yd,zd])
-
-
df.close()
-
-
#make sure our coordinates are in increasing order
-
x.sort()
-
y.sort()
-
-
#create a matrix big enough to hold our data
-
plot_grid = zeros((len(x),len(y)),float)
-
-
#now we iterate over our data and fill in the matrix
-
for i, xv in enumerate(x):
-
for j, yv in enumerate(y):
-
for line in all_dat:
-
xo = line[0]
-
yo = line[1]
-
if xo == xv and yo == yv:
-
plot_grid[i][j] = line[2]
Which gives you a grid that looks like this:
-
[[ 0. 0. 0. 0.61 0.07 -0.19 -0.39 -0.51 -0.56 -0.57 -0.53]
-
[ 0. 0. 0. 0.43 -0.06 -0.31 -0.48 -0.59 -0.64 -0.63 -0.59]
-
[ 0. 0. 0. 0.04 -0.36 -0.55 -0.69 -0.76 -0.78 -0.76 -0.7 ]
-
[ 1.02 0.81 0.3 -0.27 -0.56 -0.69 -0.79 -0.83 -0.83 -0.8 -0.73]
-
[ 0.66 0.51 0.12 -0.33 -0.55 -0.66 -0.73 -0.77 -0.76 -0.73 -0.67]
-
[ 0.52 0.39 0.07 -0.3 -0.49 -0.58 -0.65 -0.68 -0.68 -0.65 -0.6 ]
-
[ 0.45 0.34 0.08 -0.22 -0.39 -0.47 -0.53 -0.56 -0.56 -0.55 -0.51]
-
[ 0.43 0.34 0.13 -0.12 -0.26 -0.33 -0.39 -0.42 -0.43 -0.42 -0.4 ]
-
[ 0.45 0.38 0.21 0. -0.12 -0.18 -0.24 -0.27 -0.29 -0.3 -0.29]
-
[ 0.48 0.42 0.29 0.13 0.02 -0.04 -0.09 -0.13 -0.16 -0.18 -0.18]
-
[ 0.5 0.46 0.37 0.23 0.14 0.09 0.03 -0.01 -0.05 -0.07 -0.09]]
Now all that's left to do is create a vtk file that MayaVi can use. I modified the example from MayaVi guide to create our vtk file.
-
#change the matrix to the proper format for vtk
-
z1 = reshape(transpose(plot_grid), (-1,))
-
point_data = pyvtk.PointData(pyvtk.Scalars(z1))
-
-
#create our grid
-
grid = pyvtk.RectilinearGrid(x, y, 0)
-
-
#write our vtk file
-
vt = pyvtk.VtkData(grid, point_data)
-
vt.tofile('dat.vtk')
And we're done. Now we load up MayaVi and go to File->Open->VTK file and open up our freshly created vtk file. When you first open the file nothing will appear on the canvas. We need to add a module in order to be able to see our data. So go to Visualize->Modules->Surface Map and click okay. This will show you a 2D colour map of your data. In order to get a 3d representaion of the data we need to add the WarpScalar filter via Visualize->Filters->WarpScalar. Boom. A full 3d representation of the data that you can zoom in on rotate and much more. Add axes and labels using the Visualize->Modules->Axes option. MayaVi also allows you to take snapshots of the data and export screenshots via File->Save Scene To-> png, bmp etc etc.
I also was able to package the file into a windows executable using the great bbfreeze which allowed me to distribute the script to computers without python and the other packages installed (except of course mayavi which is needed to display the file).
Dive Into Python is a great introduction to Python if you are unfamiliar with the language. I highly recommend learning the language if you are a scientist who currently uses spreadsheets for doing tedious manipulations of data. It is relatively simple to pick up and can save you a huge amount of time by automating tasks for you.
If you like you can download the visualization script and the sample data.
Testing out WordPress Syntax Highlighting
I just came across an excellent syntax highlighting plugin for WordPress: iG Syntax Hiliter. It has support for a bunch of languages and allows readers to toggle between good looking highlighted html code and plain text (just click on the "PLAIN TEXT" bar) that can easily be cut and pasted into a text editor. Couple examples shown below.
-
if __name__ == '__main__':
-
print "Hello Word"
-
#include <stdio.h>
-
int main() {
-
return 0;
-
}
sweet!









