• About

    Twinapex Blog is the voice of mobile and Internet experts. We tell tales about our exciting life in the world where communication methods convergence and you can access whatever information you wish, wherever, on whichever device you want.

    If you find us interesting and talented and you are looking for developers, please contact us and we might just be able to help you.

    Creative Commons License
    This work is licensed under a Creative Commons Attribution 3.0 Unported License.

PyDev, Python and system default Unicode encoding problem



Python 2 has a thing called “default encoding” to automagically encode Unicode strings when they are presented as byte strings. This is evil and has been discussed various times before.

What could be even more evil? Something in your development environment messes this setting set for you, without telling you that. This way you never encounter Unicode problems on your development computer and when you roll out your seemingly working code to production, the world goes haywire.

Evil. Evil. Evil. Thousands of curses and overworking hours to fix the problems.

I encountered this problem. And this is the code I used to track the problem down in site.py:

# Trap the bastard messing with the default encoding
# using a monkey patch
old_set_default_encoding = sys.setdefaultencoding

def aargh(x):
    import pdb ; pdb.set_trace()

sys.setdefaultencoding = aargh
And the result was surprising:
--Return--
> /home/moo/py24/lib/python2.4/site.py(485)aargh()->None
-> import pdb ; pdb.set_trace()
(Pdb) bt
/home/moo/py24/lib/python2.4/site.py(613)?()
-> main()
/home/moo/py24/lib/python2.4/site.py(604)main()
-> execsitecustomize()
/home/moo/py24/lib/python2.4/site.py(514)execsitecustomize()
-> import sitecustomize
/home/moo/Desktop/Aptana Studio 2.0/plugins/org.python.pydev_1.5.3.1260479439/PySrc/pydev_sitecustomize/sitecustomize.py(99)?()
-> sys.setdefaultencoding(encoding) #@UndefinedVariable (it's deleted after the site.py is executed -- so, it's undefined for code-analysis)
> /home/moo/py24/lib/python2.4/site.py(485)aargh()->None
-> import pdb ; pdb.set_trace()
--Return--> /home/moo/py24/lib/python2.4/site.py(485)aargh()->None-> import pdb ; pdb.set_trace()(Pdb) bt  /home/moo/py24/lib/python2.4/site.py(613)?()-> main()  /home/moo/py24/lib/python2.4/site.py(604)main()-> execsitecustomize()  /home/moo/py24/lib/python2.4/site.py(514)execsitecustomize()-> import sitecustomize  /home/moo/Desktop/Aptana Studio 2.0/plugins/org.python.pydev_1.5.3.1260479439/PySrc/pydev_sitecustomize/sitecustomize.py(99)?()-> sys.setdefaultencoding(encoding) #@UndefinedVariable (it's deleted after the site.py is executed -- so, it's undefined for code-analysis)> /home/moo/py24/lib/python2.4/site.py(485)aargh()->None-> import pdb ; pdb.set_trace()

Looks like the culprint was PyDev (Eclipse Python plug-in).  The interfering source code is here. Looks like the reason was to co-operate with Eclipse console. However it has been done incorrectly. Instead of setting the console encoding, the encoding is set to whole Python run-time environment, messing up the target run-time where the development is being done.

There is a possible fix for this problem. In Eclipse Run… dialog settings you can choose Console Encoding on Common tab. There is a possible value US-ASCII. I am not sure what Python 2 thinks “US-ASCII” encoding name, since the default is “ascii”.

Installing Python Imaging Library (PIL) under virtualenv or buildout



I have greatly struggled to have PIL library support in isolated Python environments like virtualenv –no-site-packages.

For example, when installing Satchmo shop under virtualenv:

../bin/clonesatchmo.pyhe Python Imaging Library is not installed. Install from your distribution binaries.
../bin/clonesatchmo.py The Python Imaging Library is not installed. Install from your distribution binaries.

Though it clearly is there, installed by easy_install PIL command:

ls ../lib/python2.5/site-packages/PIL-1.1.7-py2.5-linux-x86_64.egg
ArgImagePlugin.py	 ExifTags.py		  GimpGradientFile.pyc...

Does anyone know if this problem is with PIL itself, eggified PIL or something else?

In any case, there is an easy workaround: use system-wide PIL (sudo apt-get install python-imaging) and symlink PIL from your site-wide installation under the isolated Python environment:

(satchmo-py25)mulli% pwd
/srv/plone/mmaspecial/satchmo-py25/lib/python2.5/site-packages
(satchmo-py25)mulli% ln -s /usr/lib/python2.4/PIL .
That works for now, but I’d like to learn how to make virtualenv and buildout install PIL egg bullet-proof way.

Cannot sort custom content item in Plone folder listing



Bug: Plone folder manual sorting does not move items even though you try all tricks. The first suspect would be a Javascript bug, but it isn’t.

It is bug 8161.

Your custom content meta_type must not contain spaces.

You can fix this on-line by editing meta type in portal_types in ZMI and remove all spaces from meta type name.

Subversion global-ignores and .egg-info in Python/Plone development



Subversion does a good job by ignoring most of build/temporary/unwanted files by default.

However, there is one exception still existing at least in Subversion 1.6: Python egg folders. All folders whose name ends up with .egg-info should not committed or considered in version controlling actions. your.package.name.egg-info folder is generated inside your Python egg source folder when you run setup.py / setuptools.

If you are working with Python source code eggs, add the following line to your ~/.subversion/config

global-ignores = *.o *.lo *.la #*# .*.rej *.rej .*~ *~ .#* .DS_Store *.egg-info *.pyc *.pyo .project .pydevproject

Otherwise development tools like Mr. Developer might get confused.

Plone Developer Manual, take #0.1



The first public version of  Plone developer manual is available here.

It is still very much draft, but I assure you will find it useful. You will find it even more useful after you put in the answers for your own problems.

In my previous Plone developer documentation rant my flow of though was little abstract and I couldn’t clearly explain how I want the community to maintain this crucial piece of documentation.  This time I made a comic.

* How to get support

** How to update Plone Developer Manual

Packing and copying Data.fs from production server for local development



These instructions help you to copy and transfer production server  ZODB database (Data.fs) to your local computer for development and testing. This allows you to do the testing against the copy of real data and the production server Plone instance set up.

See the original tip by cguardia.

Data.fs is ZODB file storage for transactional database. Journal history takes quite a lot of disk space there. Packing, i.e. removing the journal history,  usually reduces the size file considerably, making the file lighter for wire transfer. Depending on the database age the packed copy is less than 10% of the original size.

These instructions apply for Ubuntu/Debian based Linux systems. Apply to your own system using the operating system best practices.

We need ZODB Python package to work with the database. To use it, we’ll create virtualenv Python installation in /tmp. In virtualenv installation, installed Python packages do not pollute or break the system wide setup. Note that you might use easy-install-2.4 depending on the OS. The latest stable ZODB can be picked from PyPi listing. Plone 3.x default is ZODB 3.7.x, which is not available as Python egg, but you can use ZODB 3.8.x.

sudo easy-install virtualenv

cd /tmp

virtualenv packer

/tmp/packer/bin/easy_install ZODB=3.8.3

Data.fs cannot be modified in-place. You must create a copy of it to work with it. Data.fs copy can be created from a running system without the fear of corrupting the database, since ZODB is append only database.

cp /yoursite/var/filestorage/Data.fs /tmp/Data.fs.copy

Then create the following script snippet /tmp/pack.py using your favorite terminal editor.

import time
import ZODB.FileStorage
import ZODB.serialize

storage=ZODB.FileStorage.FileStorage('/tmp/Data.fs.copy')
storage.pack(time.time(),ZODB.serialize.referencesf)

And run it using virtualenv’ed Python setup with ZODB installed:

/tmp/packer/bin/python /tmp/pack.py

Lots of patience here… packing may take a while, but it’s still definitely faster than your Internet connection transfer rate.

Verify that the file is succesfully packed:

ls -lh Data.fs.copy
-rw-r--r-- 1 user user 30M 2009-09-01 13:24 Data.fs.copy

Woohoo 1 GB was shrunk to 30 MB. Then copy the file to your local computer using scp and place it to your development buildout.

scp user@server:/tmp/Data.fs.copy ~/mybuildout/var/filestorage/Data.fs

You just saved about 30-90 minutes of waiting of file transfer.

SEO tips: query strings, multiple languages, forms and other content management system issues



This post is collection of search engine optimization tips for content management systems, especially for Plone.

Do not index query strings

It is often desirable to make sure that query string pages (http://yoursite/page?query_string_action=something) do not end up into the search indexes. Otherwise search bots might index pages like site’s own search engine results  (yoursite/search?SearchableText=…) lowering the visibility of  actual content pages.

GoogleBot has regex support in robots.txt and can be configured to ignore any URL ? in it. See the example below.

Query string indexing causes the crawler crawl things like

  • Various search results (?SearchableText)
  • Keyword lists (?Subject)
  • Language switching code (?set_language)… making set_language appear as the document in the search results

Also, “almost” human readable query strings look ugly in the address bar…

Top level domains and languages

Using top level domain name (.fi for Finland, .uk for United Kingdoms, and so on.) to make distinction between different languages and areas is optimal solution from the SEO point of view. Search engines use TLD information to reorder the search results based on where  the search query is performed  (there is difference between google.com and google.fi results).

Plone doesn’t use any query strings for content pages. Making robots to ignore query strings is especially important if you are hosting multilingual site and you use top level domain name (TLD) to separate languages: if you don’t configure robots.txt to ignore ?set_language links only one of your top level domains (.com, .fi, .xxx) will get proper visibility in the search results. For example we had situation where our domain www.twinapex.fi did not get proper visibility because Google considered www.twinapex.com?set_language=fi as the primary content source (accessing Finnish content through English site and  language switching links).

Shared forms

Plone has some forms (send to, login) which can appear on any content page. These must be disallowed or otherwise you might have a search result where the link goes to the form page instead of the actual content page.

Hidden content and content excluded from the navigation

Any content excluded from the sitemap navigation  should be put under disallowed in robots.txt. E.g. if you check “exclude from navigation” for Plone folder remember to update robots.txt also.

In our case, our internal image bank must not end up being indexed, though images themselves are visible on the site. Otherwise you get funny search result: if you search by person’s name the photo will be the first hit instead of biography.

Sitemap protocol

Crawlers use Sitemap protocol to help determining the content pages on your site (note: sitemap seems to be used for hinting only and it is not authoritative).  Since version 3.1 Plone can automatically generate sitemap.xml.gz. You still need to register sitemap.xml.gz in Google webmaster tools manually.

There exists a sitemap protocol extension for mobile sites.

Webmaster tools

Google Webmaster tools enable you to monitor your site visibility in Google and do some search engine specific tasks like submitting sitemaps.

I do not know what kind of similar functionality other search provides have. Please share your knowledge in the blog comments regarding this.

HTML <head> metadata

Search engines mostly ignore <meta> tags besides title so there is no point of trying fine-tune them.

Example robots.txt

Here is our optimized robots.txt for www.twinapex.com:

# Normal robots.txt body is purely substring match only
# We exclude lots of general purpose forms which are available in various mount points of the site
# and internal image bank which is hidden in the navigation tree in any case
User-agent: *
Disallow: set_language
Disallow: login_form
Disallow: sendto_form
Disallow: /images

# Googlebot allows regex in its syntax
# Block all URLs including query strings (? pattern) - contentish objects expose query string only for actions or status reports which
# might confuse search results.
# This will also block ?set_language
User-Agent: Googlebot
Disallow: /*?*
Disallow: /*folder_factories$

# Allow Adsense bot on entire site
User-agent: Mediapartners-Google*
Disallow:
Allow: /*

Useful resources

Putting views, like sitemap, into Plone content tree using Easy Template add-on



Plone has two kind of pages

  • Content pages which have a path and will appear in the navigation and in the sitemap. These are stored in the database.
  • View based pages and template based pages which usually present an action  (accessibility, sitemap, contact info form). They do not appear in the navigation. They are stored as source code on the file system. You cannot navigate to view based pages and just click edit. To change them you need to use various customization methods (add-on product, Zope management interface) to modify the code.

Sometimes it is desirable, for the sake of uniformness, to put view based pages (accessibility, sitemap) into the content tree. For example, one could want to have the sitemap link appearing only in the navigation tree under the site section “About this site”.

Plone add-on product Easy Template provides an easy method to show any Plone view(s) on a normal page. Easy Template uses Django like template syntax (Jinja 2 engine). It gives you great power to drop dynamic content easily on pages. Easy Template also has some security awarness ensuring the members using it cannot escape from their sandbox.

Easy Template works in WYSIWYG and non-WYSIWYG modes

  • You can directly mix templates into text in Visual Editor (Kupu). This is mostly useful for non-HTML aware content editors, who use WYSIWYG editor and can use snippets from a reference card prepared by a developer. Note: Visual Editor has some limitations or undesired behavior. Sometimes it tries to put arbitary HTML tags into text (&nbsp; which breaks the template code).
  • You can write templatized HTML source code in “raw” mode. You can write source code on “Template” schemata in Edit view.

Example how to show a sitemap on an arbitary Plone page

  1. Install Easy Template (if you are a developer I suggest you to try trunk version)
  2. Create a Templated document content
  3. Write some arbitary text in Kupu
  4. Put in the code snippet {{ view(“sitemap”, “createSiteMap”) }} which triggers the sitemap view rendering
  5. Save and view the document in View mode

Picture 1

It turns out to be:

Picture 3

There is no such thing as a “views reference” for Plone. View names and functions can be figured out by searching and reading through ZCML and Python files in Plone source tree. Some developer insight is needed. For example. for sitemap we can do the grep search:

grep -Ri --include="*.zcml" sitemap *

Then read Products/CMFPlone/browser/configure.zcml and Products/CMFPlone/browser/sitemap.py.

The same thing works in portlets. Use Templated Portlet portlet type. See Easy Template PyPi homepage for the full reference of the product’s potential.

About the author Mikko Ohtamaa

Setup.py sdist not including all files



Setuptools has many silent failure modes. One of them is failure to include all files in sdist release (well not exactly a failure, you could RTFM, but the default behavior is unexpected). This post will serve as a google-yourself-answer for this problem, until we get new, shinier, Distribute solving all of our problems.

I b0rked the release for plonetheme.twinapex. Version 1.0 package didn’t include media assets and ZCML configuration files. Luckily Python community reacted quickly and I got advised how to fix it.

By default, setuptools include only *.py files. You need to explicitly declare support for other filetypes in MANIFEST.in file.

Example MANIFEST.in (plonetheme, built in PyDev):

recursive-include plonetheme *
recursive-include docs *
global-exclude *pyc
global-exclude .project
global-exclude .pydevproject

About the author Mikko Ohtamaa

XHTML mobile profile transformer and cleaner for Python



Mobile phones, and especially mobile site validators, are very picky about the validy of XHTML. It must not be any XHTML, but special mobile profile XHTML. Also, search engines like Google, will punish you in the mobile search results if your site fails to conform to mobile profile.

This is especially troublesome if you display external content (RSS feeds, ATOM feeds) on your mobile site. Incoming HTML cannot be guaranteed to follow any specification.

To solve this problem, we have created gomobile.xhtmlmp Python library which helps you to transform any HTML to content to valid XHTML MP. The library is piloted on plonecommunity.mobi site which  uses aggregated content from varying sources. The library is based on lxml.html.Cleaner. The library is part of GoMobile project which aims to create world class Python mobile web development tools.

Highlights

  • Turn any incoming HTML/XHTML to mobile profile compatible
  • Enforce ALT text on images – especially useful for external tracking images (feedburner tracker). ALT texts are required by XHTML MP.
  • Protect against Cross-Site Scripting Attacks (XSS) and other nastiness, as provided by lxml.xhtml.clean
  • Unicode compliant – eats funky characters

As an example we integrated gomobile.xhtmlmp  to Feedfeeder Plone add-on product.

Enjoy.

Next Page →