I know I said I’d post up a database sync script but on the basis that this blog has low readership and that it was actually not as cool as I thought it was I changed my mind. I looked at it and realised that most of it is really simple stuff; most of which is tied into other parts of the program. Today I found that I will be starting a new job in early November and on the basis that I’ll meet people and do cool stuff each day this blog may find itself becoming more interesting and updated more frequently. Who knows, maybe I’ll be able to post stories of my attempts to woo young ladies, probably not though…





You may notice that I’ve not posted a caching article yet, that’s because I’ve simply not needed to. I have cached only a few functions and have achieved a query duplication rate of only 1.5-3 times. Sure that’s not good to have on a public web server but this is my computer and the script runs approximately once a day, it’s simply a matter of the cost-gain ratio. But fear not, I have something related to write about and that’s the underlying “problem”.

As I said, this is a 23 thousand lines of code (not counting those comments and empty lines) program, actually it’s 24.8 now because I added some features but that’s not the point. The point is that it’s not a small script that’ll do one or two simple things, it’s a pretty big application and I knew from the start it would be. I designed it knowing these things, I knew these things because I previously wrote something very similar in PHP, I knew what I wanted to avoid. But enough about how I apparently know what I’m doing, lets get to the nitty gritty.

Modularity
Rob2 is very modular. Modularity means that I reduce code duplication and that I can copy+paste+edit code from one part and use it in another. Modularity brings with it the curse that things need to slot together which means that some things will be sub-optimal. A good example is that several places in the code can run a query that goes something like this:

SELECT * FROM armies WHERE id = X

Where X is a number, part of my initial optimisation was to find these things and make them call a complete list of the table as it’s cached.

Laziness
Like any good programmer I’m lazy. Like any good programmer; this can cause problems for me. Since I use a hand-rolled Database layer I’m often selecting * from my tables and often from the entire table. Normally this is good and I then cache it so that it runs real fast. In many instances however I will call * from the entire table and not need it. Worse yet I’ll call it for the whole table when I need only 1/20th of the table! In a situation where resources are limited this is clearly not acceptable but like I said, this is not one of those situations and the code is so much easier to follow because it all follows the same design.

It is however not all bad; there are some things that I’ve done well. Some things are common sense while others might not be (I’m not common and may or may not possess sense).

Indexes
Most of my tables have indexes. The trick is to put the index on the field that you apply a WHERE statement to, not the field that you’re trying to get at. I didn’t know this at first and thought that my queries were loads faster because I used indexes! Indexes provide a saving in speed for a cost in space, I have 58GB of space left so I probably have a couple of indexes I don’t need.

IDs
I store stuff by it’s ID; not by it’s name. Searching through an integer field is a lot faster than searching through a length 40 varchar field. Better yet, the integer has no reason to change but a user can change their username and that causes all manner of “fun” for the developer(s).

Duplicate data
I intentionally duplicated one of the fields in my database and for a good reason. If I didn’t then at several points there would have been a 3 table join and several queries that don’t use a join currently would have needed one. It’s data that won’t change and it’s an integer, thus it doesn’t come into conflict with the main reasons to ensure data normalisation is applied.

Finally…

Lack of obsession
As a developer it’s very easy to focus on one thing to the exclusion of others, premature optimisation being a pitfall I fall into if not thinking carefully. In this instance I implemented a basic cache system and left it at that until a few weeks ago. By not obsessing about the speed of my program I completed it faster and then when I did optimise it a bit, I continued to not obsess and have since added new features to it. If a script runs 10% faster but it took you 2 days to make the change then it’s probably not worth it. I say probably because as I’ve been so happy to point out, my situation is not your situation, there are cases where that 10% will be the difference between success and failure.

For my next trick…
I will be sharing the database checker that I wrote for Rob2. It runs through the data types and matches them to the table, automatically adding/removing columns and indexes.





I know I said that I’d develop on that caching system but I’ve not had a chance! I have something more interesting anyway; yes it’s hard to imagine anything more interesting than a caching system but it’s true, there are things more interesting than caching (granted not many). I had a programming-exam-interview… thingie.

The test itself was not overly complex, I had an hour to complete as much of a simple yet tricky task, a good understanding of regular expressions will mean that you can do it, you just probably won’t have time to do all of it. The test itself is not the interesting part though, screen scraping it as anybody knows, boring. The part I want to talk about is the fact that there’s a test and the way it was done.

In theory the test will do the following:

  1. Prove that a candidate can actually program as listed on their CV
  2. Show exactly how well a candidate can perform at a task that will be a key part of their employment at a company
  3. Show how good the candidate is at working on their own

I want to explain how this is both good and bad.

Bad
Bad comes in two parts, exam and my selfishness. As with any exam, if I was having a bad day, if I’d not gotten much sleep (which incidentally I didn’t) or any number of other things then it’d have counted against me, it’d have made the test unfair and lowered my chances of progressing further. Obviously this is pot luck and nothing specific against me so by the very same logic I can have good conditions and all other candidates can be so ill they can’t even check their email.

The second part of the bad is my personal selfishness. If I am below average compared to the other candidates (and statistically there’s about a 50% chance I am) then it’ll count against me as would any test of skill. The test was pretty open, I was allowed to use any language I wanted and to output the data into any one of several common formats so that was okay. Again this can go both ways, if I’m above average then this obviously increases my chance of progressing further so is it really bad?

Good
There’s one good thing about it but it’s a big good thing and I think it outweighs the bad by a lot. Assuming all those that work there undergo a similar test it means that the people I work with will be better qualified and if I do slip up (and everybody slips up) then they’re more likely to be of help and it also means that I’m more likely to be able to learn from them and converse in the same terms.

I think it’s a great idea and will be thinking more highly of any companies that employ such a scheme.





The program is written in python, twenty three thousand lines of code (not counting empties and comments), it’s source weighs in at 3.4MB. It runs as a web app and connects to a postgreSQL database for it’s data storage, it’s got a caching system to reduce database load and all around is the best program I’ve ever written. I knew from the outset that it’d be hard to optimise as everything is very small and modular, everything calls a lot of other functions and often had to deal with a large amount of data.

This led me to three assumptions:

  1. Any optimisation would produce only small gains as each script was calling many different functions, not the same one over and over again
  2. 64 seconds for the script execution time was quite okay considering the size of output
  3. The caching system meant I’d not be overloading the database

The first thing I did was split the script into sections and time each section. A print of the timings showed that some sections took far longer than others, this was expected as some sections had to handle a lot more. On a whim I decided to try using the cProfile library I was confronted by several pages of functions each of which was taking up less than a tenth of a second of the total time. I also found that one of the functions was sucking up I think about 50 seconds, it was the database execute function.

I hacked into the database script something to record the queries in a list and then used the set() function to find out how many unique queries there were. 40,000 queries in total, 3,000 of which were unique. It seems as if my caching system was not working, but it was, the caching system that I’d written was just fine. What I wasn’t caching were smaller queries that often went “SELECT name FROM table WHERE id = %d”.

Caching these has knocked it down to about 5000 queries in total and the total script execution time to 10. What I want to share here are two things. Firstly you need to throw out assumptions, you never know if your program is wasteful until you look at it. Secondly is the code I use to handle the cache, it’s something I whipped up in just a few minutes so I’ll post an improved version in a few days.

import collections
function_cache = collections.defaultdict(dict)
 
def my_database_function(arg1, arg2, recache=False):
	f = 'my_database_function'
	a = "%s,%s" % (arg1, arg2)
	if a not in function_cache[f]:# This is to prevent us getting key errors
		function_cache[f][a] = {}
 
	# If we have the cache and we're not being told to recache, send them the cache
	if function_cache[f][a] != {} and not recache:
		return function_cache[f][a]
 
	function_cache[f][a] = do_our_query
 
	return function_cache[f][a]

As I said, this was something I whipped up in the space of a few minutes so there’s probably a better way to do it with decorators and a dedicated class. I’ve posted it up like this because I’m happy to show that my code isn’t perfect first time around and because somebody may provide some useful input before I start on the more polished version.





I have for some time been trying to get Python to connect to my MySQL database on my Mac. I use MAMP for my MySQL database.

I decided to have a go at it again today and found this tutorial on it. The tutorial is spot on except for one MAMP specific issue.

Where the tutorial says to change the mysql_config.path to “/usr/local/mysql/bin/mysql_config”, instead change it to “/Applications/MAMP/Library/bin/mysql_config”.

The rest of the tutorial should work for you. The only other thing I found the tutorial left out was a demo script to test your new install. If you’ve used MAMP with your PHP you know that you have to connect to the socket rather than “localhost”. Due to the way that Python MySQL was installed, this is (happily) not the case with Python.

#!/usr/bin/env python
 
import MySQLdb
# If you don't get an error from this then it's probably installed correctly
 
conn = MySQLdb.connect (host = "localhost",
                        user = "root",
                        passwd = "root",
                        db = "A database")
cursor = conn.cursor ()
cursor.execute ("SELECT VERSION()")
row = cursor.fetchone ()
print "server version:", row[0]
cursor.close ()
conn.close ()




After switching over to a new ISP we’ve found that after about 2 hours the internet connection dies and we need to reset the router. I sit in another room from the router so it’s something of an annoyance to me, I thus poked around and found that it was possible to reboot it from a web interface. Not brilliant because you still have to login and click several times to do it.

I poked around some more and managed to get a call directly to the function that would reboot the router, having saved it as a bash script I can now reboot the router somewhat faster.

curl –anyauth –user admin -d action=reboot -d todo=reboot http://192.168.0.1/setup.cgi

I hope this proves useful to someone sometime.





Tables have a use in HTML, it’s to display tabular data. Of course, you can also use it for site layout though that’s bad practice. It’s bad for people with screen readers, pages often load slower, harder to maintain and poorer SEO results to name what Google’s first result told me. Of course, that’s if you are using a table to layout your whole page. When I first started with Dreamweaver I also used tables, now as I’ve learnt more about HTML and CSS I use CSS to layout my page.

Then today I wanted to make a horizontal navigation bar. It’s 3 items (though could grow to about 6) and each item needs to be 160 pixels wide. At first I started googling for horizontal nav in CSS but it turns out that those that did work in Safari, looked like they’d not work in IE6. Then I used a table and all my problems vanished!

  • Screen readers can still tab to the content on the page, the page content is not in a table.
  • Pages will not load slower because it’s a small table and the content is not in a table
  • Maintenance is easy, it’s a 1 row table
  • I don’t want to rank for the words in the links and google can still crawl them

@SimianE and @ticklefish were both less than happy to hear about my use of tables but then neither of them play WoA so I don’t have to support them. Plus the table will work just fine for the both of them and they’d only know about it if they looked through the source which I plan to minify anyway!





Anybody that’s in some way connected to web development or databases has likely heard of MySQL. It’s a free database system, any host that supports databases supports MySQL (that I’ve seen, I’m sure there are exceptions). It’s a good system but if you want to use it commercially I believe you need to pay them money, I don’t know if this is still the case but it used to be.

I have now been pointed at PostgreSQL. It’s license is not even half a page long and it basically says “do with as you will, just don’t say you made it or try to sue us”. It’s apparently faster than MySQL (not normally an issue but it might be for something that myself and Pete do soon) and in my experience, a real pain to get running on the Mac. Yet I really like it because it either offers more features than MySQL or it’s just a lot easier to learn about those features because they have really good documentation.

So here’s how I got it running on my Mac.

Mac OSX 10.5 (Leopard) comes with PHP5 installed, using MAMP I run a MySQL database on my Mac for web development fun. Of course, that is MySQL, not PostgreSQL. So, firstly I needed to install PostgreSQL and in all honesty, it took me several tries to find the correct download from the infinite typewriters that make up the web.

After much searching, downloading, typing into the terminal, confusion and cans of carbonated drink; I finally arrived at this site, Postgreformac.com. I downloaded the Unified installer and then installed it from the “server” installer which put everything I need on the system. Setting up the databases was a bit confusing at first but nothing a little default settings couldn’t solve. Of course, I also wanted the PostgreSQL version of PhpMyAdmin, this was not hard to find and is called PhpPgAdmin.

As PHP requires a plugin to run PG queries I had to run my PHP scripts from my MAMP server rather than my default one as it had the plugin that my normal one did not. I put the PhpPgAdmin on and it started to tell me that I was not allowed to login to my database. It turns out that the default usernames or a blank password (I had all of them, though I’d never do this on an actual server) were a bad idea. I edited the relevant file “/classes/Misc.php” to fix the problem, I simply commented out the relevant lines around line 313.

And then it all worked! A bit of googling soon revealed how to write PHP that actually connected to it and I was set.