| Building a mobile site and applications with Django and PythonPosted on September 30, 2009 by Mikko OhtamaaFiled Under Business, django, iphone, linux, mobile, pys60, python, technology Recently we created a mobile site for an interactive bicycle tour. oulugo.mobi (you need to use mobile browser to access the site or you’ll get a redirect) is a multimedia enriched bicycle tour through the historic parts of the city of Oulu. All content is provided by OnGo. The route, which you can bicycle through is drawn on Google Maps. There are nine action points where the user can listen to streaming audio clips, with still images, in his/her mobile phone. This is sort of augmented reality experience: The user sees the real world (where he/she is now bicycling) combined with the historic events (audio playback narrative). For example, at Linnansaari (a location on the route) you’ll see the actual 17th century castle ruins and the narrator tells how the castle exploded when fire, caused by a lighting, reached gunpowder warehouse… boom. The explosion caused stones fly over 400 meters. Alternatively, the clips are available as podcasts from Oulu Tourism pages. You can download them into your iPod for offline listening and use in conjuction with a paper map. This demostrates interesting mix of multichannel publishing: paper, web, mobile and podcasts. The tour is bilingual in Finnish and English. There exists unreleased iPhone application, based on PhoneGap, which allows the user to track his/her location real-time on the web page. We didn’t see it worth of trouble to go through Apple iPhone application review process. When location based service support comes for the browser this feature is indended to be included as the standard HTML5 feature of the service. There also exists Nokia Series 60 mobile application, based on PyS60 and Series 60 BrowserControl API, which allows the user to track his/her location in real-time. The application provides wrapper around Series 60 WebKit control and allows Javascript to access phone native functions (GPS) over localhost socket communication. Like with Apple, we didn’t see real-time tracking feature interesting enough to go through Symbian Signed process to get our application released. Also, BrowserControl had seriousquality problems and we didn’t consider it stable enough for the end users. Some work is available in PyS60 Community Edition repository. The service is hosted on Python specific virtual server on Twinapex services server farm. Features
Software stack
Development effortDevelopment time: Around 100 hours. Three different developers where involved. Used development tools: Eclipse, PyDev, Subclipse, Subversion. There were around five meetings between the content provider and the technology provider. Few beta testing rounds using iPhone application were performed by bicycling in -10 celcius degrees weather (north and so on…). No polar bears were harmed during the creation of this mobile service. The service is linked in from Oulu Tourism pages and thousands of paper brochures printed for Oulu summer season 2009. About the author Mikko Ohtamaa Python code management & deployment – a glance at zc.buildout and few othersPosted on September 2, 2008 by Tuukka MustonenFiled Under development tools, django, python, zope We’ve been using zc.buildout for Plone deployment and it’s working out great. A few days ago implemented a buildout recipe for Django project deployment, automatic web configuration, symlinking, media-folder structuring etc. and while I got it working, I came up with twisted feelings. Buildout is from the creators of Zope (I suppose) so you can expect a powerful project code management tool. The question is, however, whether or not it suits your needs. In my case I found out it too heavy. I mean, to add even a simple task you have to create a new “recipe” (a package) that does the tricks. Of course some recipes are generic (found from PyPi) and you can just run them with your own INI options, but in my case I had to do some custom implementation. Creating a new python package isn’t that hard for sure I found out that zc.buildout has some nice features like:
The problems?
There’s no denying zc.buildout is powerful, but I wouldn’t use it for projects which need reasonable amount of customization. It’s just plain easier and quicker to write shell scripts and while those won’t provide you with any sort of ready tools you won’t propably need them. For bringing up somewhat static environment, where you don’t need to hack things (like that for Plone) it’s quite a decent option, however. I also explored alternatives to zc.buildout. I’ve been reading about earlier virtualenv but haven’t really tried it out until now. It looks very promising and creates a more flexible environment compared to zc.buildout. Of course their goals are not exactly the same. Also, there are a few other alternatives out there, among them a new Python code management tool called Paver (just look at that cool logo.. it does remind you of Indiana Jones, does it not?). I glanced through the Paver docs and it looks like it might be the way to go (Paver also supports virtualenv), but didn’t quite get the grasp of the benefits just yet. Anyway, if you are still interested in code management and deployment, I’d recommend you to read the Paver release announcement and also Paver forewords. They should clear things up. Debugging Django memory leak with TrackRefs and GuppyPosted on March 7, 2008 by Mikko OhtamaaFiled Under django, python I run Django in a standalone long-running application (video encoding server). It leaked memory severely. By using htop, one was seeing two gigabytes reserved for /usr/bin/python after a while. Before starting the debugging session, I had no faintest idea what could be the cause of the problem. Django is robust technology – this kind of things haven’t happened for me before. Since I was running Django in standalone mode, I suspected that some query cache does not get cleared. But random poking around the source code didn’t give any clues. It was time to do some serious memory debugging for Python. Python as is doesn’t leak memory, since it’s garbage collected virtual machine. All “leaks” are design problems in the application logic.I found a good primer here what’s going inside Python’s memory management. First I tried this nice TrackRefs class from Zope. It relies on Python’s own in-interpreter functions to monitor objects. class TrackRefs:
"""Object to track reference counts across test runs."""
def __init__(self, limit=40):
self.type2count = {}
self.type2all = {}
self.limit = limit
def update(self):
obs = sys.getobjects(0)
type2count = {}
type2all = {}
for o in obs:
all = sys.getrefcount(o)
if type(o) is str and o == '<dummy key>':
# avoid dictionary madness
continue
t = type(o)
if t in type2count:
type2count[t] += 1
type2all[t] += all
else:
type2count[t] = 1
type2all[t] = all
ct = [(type2count[t] - self.type2count.get(t, 0),
type2all[t] - self.type2all.get(t, 0),
t)
for t in type2count.iterkeys()]
ct.sort()
ct.reverse()
printed = False
logger.debug("----------------------")
logger.debug("Memory profiling")
i = 0
for delta1, delta2, t in ct:
if delta1 or delta2:
if not printed:
logger.debug("%-55s %8s %8s" % ('', 'insts', 'refs'))
printed = True
logger.debug("%-55s %8d %8d" % (t, delta1, delta2))
i += 1
if i >= self.limit:
break
self.type2count = type2count
self.type2all = type2all
You need to have Python compiled in debug mode to have sys.getobjects() method. Luckily this beefed up Python binary is availalble from Ubuntu’s stock repository: sudo apt-get install python-dbg python-mysqldb-dbg Note that native Python extensions don’t work unless they are specifically compiled against the Python debug build (python-mysqldb-dbg).. Then I add TrackRefs to my main loop: def run(self):
self.running = True
logger.info("Started worker " + self.get_worker_id_string())
# Memory leak tracking
tracker = TrackRefs()
while self.running:
self.mark_for_download()
self.process_downloads()
self.process_encodings()
tracker.update() # Dump memory here
time.sleep(settings.WORKER_POLL_DELAY) And after running a while I start getting interesting results: 7956 [2008-03-07 02:59:28,767] INFO Jobs needing sources to download 0 7956 [2008-03-07 02:59:28,768] DEBUG Processable jobs: 0 7956 [2008-03-07 02:59:29,754] DEBUG ---------------------- 7956 [2008-03-07 02:59:29,754] DEBUG Memory profiling 7956 [2008-03-07 02:59:29,754] DEBUG insts refs 7956 [2008-03-07 02:59:29,754] DEBUG <type 'int'> 150 137406 7956 [2008-03-07 02:59:29,755] DEBUG <type 'tuple'> 117 130211 7956 [2008-03-07 02:59:29,755] DEBUG <type 'dict'> 5 8331 7956 [2008-03-07 02:59:29,755] DEBUG <type 'str'> 3 27643 7956 [2008-03-07 02:59:29,755] DEBUG <type 'unicode'> 3 4606 7956 [2008-03-07 02:59:29,755] DEBUG <type 'list'> 3 3492 7956 [2008-03-07 02:59:29,756] DEBUG <type 'frame'> 1 962 7956 [2008-03-07 02:59:29,756] DEBUG <type 'cell'> 0 12948 7956 [2008-03-07 02:59:29,756] DEBUG <type 'function'> 0 9479 Woah! Who reserved 130 000 ints and tuples? No wonder that soon python gulps 1 gigabytes of memory. Since this is the only number which grows during the main loop cycling and there is no references to classes or objects debugging becomes a bit more difficult. I need to try to cross-reference the difficult tuple objects. This didn’t go well – with gc.get_referrers() recurive parsing I got some results (example below). But it became soom clearthat debug references from the system itself was impossible: the memory debugging code will always create nasty cyclic references to the system, since it needs to track the objects. I gave up. There had to be something better. 9154 [2008-03-07 04:05:23,571] DEBUG /var/lib/python-support/python2.5/MySQLdb/connections.pyc 9154 [2008-03-07 04:05:23,571] DEBUG <type 'function'> 9154 [2008-03-07 04:05:23,572] DEBUG defaulterrorhandler 9154 [2008-03-07 04:05:23,572] DEBUG <type 'function'> 9154 [2008-03-07 04:05:23,572] DEBUG string_literal 9154 [2008-03-07 04:05:23,572] DEBUG <type 'function'> 9154 [2008-03-07 04:05:23,573] DEBUG unicode_literal 9154 [2008-03-07 04:05:23,573] DEBUG <type 'function'> 9154 [2008-03-07 04:05:23,573] DEBUG string_decoder 9154 [2008-03-07 04:05:23,573] DEBUG <type 'function'> 9154 [2008-03-07 04:05:23,573] DEBUG __exit__ 9154 [2008-03-07 04:05:23,574] DEBUG <type 'function'> 9154 [2008-03-07 04:05:23,574] DEBUG begin 9154 [2008-03-07 04:05:23,574] DEBUG <type 'function'> 9154 [2008-03-07 04:05:23,574] DEBUG __init__ 9154 [2008-03-07 04:05:23,574] DEBUG <type 'function'> 9154 [2008-03-07 04:05:23,575] DEBUG show_warnings There was: Guppy. Thank you Sverker Nilsson! You saved my day. Since the API of Guppy is a little eccentric, here are some examples for you: # init heapy
heapy = guppy.hpy()
# Print memory statistics
def update():
print heapy.heap()
# Print relative memory consumption since last sycle
def update():
print heapy.heap()
heapy.setref()
# Print relative memory consumption w/heap traversing
def update()
print heapy.heap().get_rp(40)
heapy.setref()
With heapy.heap() ; heapy.setref() I got this output: Partition of a set of 12 objects. Total size = 3544 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 3 25 2244 63 2244 63 unicode
1 2 17 708 20 2952 83 types.FrameType
2 3 25 432 12 3384 95 dict (no owner)
3 3 25 120 3 3504 99 str
4 1 8 40 1 3544 100 list
One adds get_rp() travelsal magic and everything becomes clear: Reference Pattern by <[dict of] class>. 0: _ --- [-] 14 (dict (no owner) | list | str | types.FrameType | types.Gene... 1: a [-] 3 dict (no owner): 0x8c11f34*2, 0x8c1bd54*2, 0x8c1f854*2 2: aa ---- [-] 1 list: 0x833c504*18 3: a3 [-] 1 dict of django.db.backends.mysql.base.DatabaseWrapper: 0x8... 4: a4 ------ [-] 1 dict (no owner): 0x83a65d4*2 5: a5 [R] 1 guppy.heapy.heapyc.RootStateType: 0xb787c7a8L 6: a3b ----- [-] 1 django.db.backends.mysql.base.DatabaseWrapper: 0x8356a34 7: a3ba [S] 7 dict of module: ..db, ..models, ..query, ..transaction... 8: b ---- [S] 1 types.FrameType: <<lambda> at 0x8b16ecc> 9: c [-] 2 list: 0x833c504*18, 0xb7dafe6cL*5 <Type e.g. '_.more' for more.> What there could in DatabaseWrapper object which is growing and growing… query debugger. Django keeps track of all queries for debugging purposes (connection.queries). This list is reseted at the end of HTTP request. But in standalone mode, there are no requests. So you need to manually reset to queries list after each working cycle. while self.running:
self.mark_for_download()
self.process_downloads()
self.process_encodings()
tracker.update()
time.sleep(settings.WORKER_POLL_DELAY)
# Clear database connection ad reset query debugger
# between cycles to make sure that
# related resources get released
reset_queries()
connection.close()
print str(connection.queries)
But even after this fix, I got increase in tuple and int usage when monitoring with TrackRefs. But when I run heapy.heap() alone, there is no increase. So the tuple and int consumption must have been caused by TrackRef, sys.getobjects, gc, etc. magic itself. |
