About

mFabrik Blog is about mobile and web software development, open source and Linux. We tell exciting tales where business, technology, web and mobile convergence.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License.

Generic Python validation frameworks?

All Python ORM and form frameworks love to define own field/schema model. This seems to lead to a situation where they define their own validation functions too.

Some examples:

Isn’t writing one’s own validation code a bit redundant and exactly “reinventing the wheel” what open source principles so hard try to avoid? Could validation be a low hanging fruit to share among fellow Python projects? As I see it, for the simple data validation, like email and URL, the core code could be easily shared and different Python projects. You basically want just method is_valid_phonenumber(str) and then framework specific way to raise the error to the user.

Do such frameworks already exist? At least I haven’t seen one being used in any big Python project yet :(

… or is validation so complex thing, so that validation functions must be tightly integrated with the parent framework and I am missing some big things (like locales, etc.) here?

 

Get developers  Subscribe mFabrik blog in a reader Follow me on Twitter

Reducing MySQL memory usage on Ubuntu / Debian Linux

If you are running your services on a low end virtual hosting every byte of memory you can save is important. The memory is often the limiting factor of how many applications you can run on VPS: CPUs are shared, memory not, on the same physical host.

  • Low-end VPS come with 512 MB memory or less
  • Front front-end server Apache / Nginx / Varnish takes > 100 MB +  min. 20 MB for each child process
  • Memecached takes its toll
  • MySQL takes 200 – 400 MB
  • Each Python / PHP process takes at least 15 MB and you need parallel processes for paraller HTTP requests (FCGI, pre-fork, others… )
  • Operating system processes need some memory (SSH, cron, sendmail)

As you can see it gets very crowded in 512 MB.

It’s especially troublesome since the memory is allocated lazily and the memory usage builds up slowly. In some point caches are no longer caches, but swapped to a disk – virtual memory usage grows beyond available RAM. To keep the server response, everything time critical should fit to RAM once and if the processes themselves don’t know how to release memory in this situation you need to tune a memory cap for them.

MySQL memory consumption

MySQL can be a greedy bastard what comes to memory consumption. Here on this server MySQL seems to take 417M virtual memory which seems to be little excessive for just running two WordPress instances and one Django / Python application:

1310 mysql     20   0  417M 21100  2776 S  0.0  1.2  0:00.00 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --pid-file=/v

After some tuning I was able to bring it down a bit

3354 mysql     20   0  276m  19m 2848 S    0  1.2   3:41.19 mysqld

A reduction of 130 MB, or 1/4 of the server total memory. Not bad.

Use mtop to monitor running MySQL, its querieries, etc. so you know what’s going on. As you can see this MySQL has very good cache rate meaning that basically it is keeping everything in memory. If the content of the sites is less than 10 MBytes total, 400 MB contains plenty of space to cache the content:

load average: 0.05, 0.08, 0.16 mysqld 5.0.51a-3ubuntu5.8-log up 1 day(s), 19:47 hrs                                                             
2 threads: 1 running, 6 cached. Queries/slow: 187.1K/0 Cache Hit: 99.39%

What eats memory

I am not an expert on MySQL, so I hope someone with more insight could post comments regarding how to tune MySQL for low memory situations and how it is expected to behave.

Some ideas I run through my head

  • MySQL default cache settings are not too tight on Ubuntu/Debian, making it suitable for moderate loads, not low loads. If you don’t have much content, everything is just kept in memory (even if not needed)
  • MySQL uses round robin for connections and if there is 100 max connections it will allocate a thread stack for each connection (someone please confirm this – I found contracting infos).

Configuring MySQL

Here are listed some methods how to reduce the memory usage. This is what I done on this little box

MySQL is mostly configured in /etc/mysql/my.cnf on Ubuntu / Debian.

The final adjustments

key_buffer              = 8M
max_connections         = 30
query_cache_size        = 8M
query_cache_limit       = 512K
thread_stack            = 128K

More info

Send in more tips please! Is 32-bit better than 64-bit for low end VPS, how much this affects MySQL?

Get developers  Subscribe mFabrik blog in a reader Follow me on Twitter

When Python sucks: how you call a function and document it

Though maybe written tongue-in-cheek, this Python Makes Me Nervous article has some excellent points.

  • Because of duck-typing, you should rigorously document how methods should be called (try epytext and its fields).
  • Most open source Python projects do the exact opposite
  • Even Python standard library is poorly documented and sets a very bad example (missing manual ???)
  • Thus, programming in Python becomes nightmare of grepping through source code (the implementation) or stepping into it in pdb just to figure out how APIs should work (Plone/Zope, anyone?)

Should Python community stop in some point to focus on delivering better documentation instead of focusing on new features and goodies (like the syntax moratorium which was recently lifted)?

From my personal experience

  • The best, and the only, person to document the code adequately is the person how originally wrote it
  • Because the author already knows how to use the code he doesn’t need to care about the fact how to enable the code for other users.  Many libraries and projects are driven by “scratching your own need” mentality, not by “let’s make this a happy community” mentality. The exception is something like Facebook or Google whose sole purpose is to attract new users their platform bringing in new €€€.
  • If you are developing a framework or community project make the documentation a requirement for deliverable and stick with it. If you let one person to skip one hour of writing documentation you are making 10 persons spending one hour figuring out how to use the damn thing.
  • Doctests are not documentation. They are tests. They are extremely unreadable way to say “how I should use this thing”, because doctests are often executed in the context of test stubs which do not reflect connections to the other parts of the framework or real contexts.
  • “Buy a book – it tells you everything” business model is not feasible in long run. Books get old. Books are not searchable. People don’t buy books.

The good documentation is a way to differentiate, and win, in the situation where there are competing frameworks. I believe the success of Django was mostly driven by its good documentation.

This points could be applied to other duck-typed, open source driven programming languages (PHP anyone?). With good documentation we can reduce the need of Valium recipes for everyone of us.

 

Get developers  Subscribe mFabrik blog in a reader Follow me on Twitter

Apple push notifications (APN) with Python

We have created a middleware service which inputs RSS feeds and outputs Apple Push Notification. This allows integrate push notification support for your existing content management system easily. This blog post should give you some ideas if you are planning to create similar services.

To have the über-experience of customer engagement with mobile push notifications you need

  • A mobile application (iOS, Android 2.2)
  • RSS feed to notifications middleware server (our solution)
  • RSS feeds themselves
  • Windows/UNIX server running the middleware

How it is put together

Tornado web server is used to handle incoming HTTP requests in scalable manner.

feedparser library fetches RSS feeds and processes them to client notifications.

BitReader (post, source) library is used to create messages to interact with Apple push notification service (APNs). The protocol is bit based protocol running directly over TCP/IP. Apple service has been designed to handle high volumes of traffic – it does not use anything like stateless HTTP to waste bandwidth.

Django models are used to store the state of each individual subscriber. Django’s ORM abstraction allows us to use the same middleware for small distributions (< 1000 clients, SQLite database) or big ones (millions of clients, MySQL database). The stored state information includes the subscriber id and the current badge number – the red circle on the app icon showing the count unread posts. When the application is launched, it can decrease its badge number by doing a HTTP call to the server.

Django settings are used to put together required certificates and whether the application is run in sandbox mode.

Walkthrough

There is a core IO loop, running in a separate process, called stream observer. This loops updates fetches RSS feeds’ status and passes updates to Tornado server over HTTP.  With this arrangement, any HTTP capable client can send push notifications.

Tornado handles incoming updates, updates the related subscribe status – how many unread notifications, etc. through exposed Django views. The notification is formatted according to the variables available on the subscriber mobile platform. In Apple’s case, the notification message gets title, badge, sound and a launch image. Payload is checked against hard 256 byte limit.

Then the payload is pushed to Apple servers over TCP/IP protocol. SSL certificaties needed.

A subscriber is registered  when the mobile application is launched. The application asks a subscriber id from Apple servers. Then, this subscriber id is delivered to our middleware over normal HTTP call.

The middleware also handles feedback service which gives you list of devices which have unsubscribed from your service. This way you can cut off notifications from unsubscribed clients. This is also done using BitReader and TCP/IP.

Future

The architecture is built so that different push backends can be included in the service. Android support is on the roadmap and we probably will have Blackberry and Meego support (when/if Nokia announces such a service).

We have currently tested this solution with RSS streams from WordPress and Plone.

We may release source code when it’s ready.

More info

Get developers  Subscribe mFabrik blog in a reader Follow me on Twitter