Automatically generating description based on body text
Posted on June 4, 2010 by Mikko Ohtamaa
Filed Under plone, python, technology
Below is a sample script to automatically generate descriptions based on page body text. It is for Plone CMS, but should be applicable to any Python based CMS with some modifications.
The idea is that we take three first sentences and use them as a description.
- Add Script (Python) item through Zope Management interface to any Plone folder
- Put in the code payload below
- Hit Test tab or type in Script URL manually – note that the operation is one shot only
- The script iterates through all content items in that folder
- The script will provide logging output to standard Plone log (var/log and stdout if Plone is run in the debug mode).
Since Zope uses RestrictedPython for through-the-web created scripts, the user of this script cannot breach the server security (they cannot make Python calls they have no permission for). This sets some limitations for automating tasks like this, but we don’t hit those limitations in our use case.
def create_automatic_description(content, text_field_name="text"):
""" Creates an automatic description from HTML body by taking three first sentences.
Takes the body text
@param content: Any Plone contentish item (they all have description)
@param text_field_name: Which schema field is used to supply the body text (may very depending on the content type)
"""
# Body is Archetype "text" field in schema by default.
# Accessor can take the desired format as a mimetype parameter.
# The line below should trigger conversion from text/html -> text/plain automatically using portal_transforms
field = content.Schema()[text_field_name]
# Returns a Python method which you can call to get field's
# for a certain content type. This is also security aware
# and does not breach field-level security provded by Archetypes
accessor = field.getAccessor(content)
# body is UTF-8
body = accessor(mimetype="text/plain")
# Now let's take three first sentences or the whole content of body
sentences = body.split(".")
if len(sentences) > 3:
intro = ".".join(sentences[0:3])
intro += "." # Don't forget closing the last sentence
else:
# Body text is shorter than 3 sentences
intro = body
content.setDescription(intro)
# context is the reference of the folder where this script is run
for id, item in context.contentItems():
# Iterate through all content items (this ignores Zope objects like this script itself)
# Use RestrictedPython safe logging.
# plone_log() method is permission aware and available on any contentish object
# so we can safely use it from through-the-web scripts
context.plone_log("Fixing:" + id)
# Check that the description has never been saved (None)
# or it is empty, so we do not override a description someone has
# set before automatically or manually
desc = context.Description() # All Archetypes accessor method, returns UTF-8 encoded string
if desc is None or desc.strip() == "":
# We use the HTML of field called "text" to generate the description
create_automatic_description(item, "text")
# This will be printed in the browser when the script completes succesfully
return "OK"
Read our blog
Subscribe mFabrik blog in a reader
Follow us on Twitter
Mikko Ohtamaa on LinkedIn
Other posts by Mikko Ohtamaa



Hey,
nice idea — perhaps splitting on “. ” to get whole sentences would be a better idea, at least that’s what I use. Simple dots are far too common in normal text, IMHO.
How can I use an HTML check box next to each paragraph to print only selected paragraphs?…
I found your post so useful that I added a trackback to it on my blog….