Tuesday, February 3, 2009

Indexing Freemind MindMaps with Alfresco - Alf Hack # 2

The idea of this Alfresco hack is to use a command line tool for text extraction of the Freemind .mm file. Steps to include this into Alfresco will be:
  1. Add Mimetype application/x-freemind for .mm
  2. Add transformer from appplication/x-freemind to text/plain
This article will talk about the second step. For adding a new MIME type please refer to the Alfresco Wiki. The MIME type of Freemind mid maps is application/x-freemind. There is also a nice blog post about adding the freemind MIME type and a nice map integration available.

Extract the text

An example shows how Freemind stores this sample map in a XML file:
<map version="0.7.1">
  <node text="Alfresco Hack No 2">
    <node text="Explore how Freemind XML looks like" position="right">
Quite simple XML without namespaces. The text of the map nodes is stored in a the value of the attribute text. To extract the text I will use a quick-and-dirty XSLT:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output omit-xml-declaration="yes" indent="no"/>
    <xsl:template match="/">
     <xsl:call-template name="t1"/>
   <xsl:template name="t1">
     <xsl:for-each select="//node">
       <xsl:value-of select="@TEXT"/>
       <xsl:value-of select="' '"/>

Throwing this XSLT on the Freemind XML results in the extracted text:
Alfresco Hack No 2 Explore how Freemind XML looks like

Add transformer to Alfresco
To keep things simple, I will use the Alfrescos feature to do content transformations with external tools or programs. This is done by configuring a RuntimeExecutableContentTransformer bean. But first, the command line of the external tool has to be figured out. I will use the xmlstarlet command line tool from http://xmlstar.sourceforge.net/. Depending on your linux distribution the executable will be called just xml or xmlstarlet. There is also a Windows version available from the download page. Transforming the above XSLT to xmlstarlets commandline results in:
xmlstarlet sel -t -m //node -v @TEXT -o ' ' Alfresco\ Hack\ No\ 2.mm
Sadly, the output always go to stdout and no output file can be specified. But this is required for the RuntimeExecutableContentTransformer, so a simple script wrapper can be used. I put the following to a file /home/lothar/bin/freemind2text.sh (made executable with chmod 775) which will be configured to the transformer bean:
# save arguments to variables

# to see what gets extracted append arguments to logfile
echo "from $SOURCE to $TARGET" >>/tmp/freemindtransform.log

# call xmlstarlet tool and redirect output to $TARGET
xmlstarlet sel --text --encoding UTF-8 -t -m //node -v @TEXT -o ' ' "$SOURCE" > "$TARGET"
Now we are ready to configure the RuntimeExecutableContentTransformer bean:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE beans PUBLIC "-//SPRING//DTD BEAN//EN" "http://www.springframework.org/dtd/spring-beans.dtd">
  <bean id="transformer.freemindToText" class="org.alfresco.repo.content.transform.RuntimeExecutableContentTransformer" parent="baseContentTransformer">
    <property name="transformCommand">
      <bean name="transformer.freemind.Command" class="org.alfresco.util.exec.RuntimeExec">
        <property name="commandMap">
            <entry key="Linux.*">
              <value>/home/lothar/bin/freemind2text.sh ${source} ${target}</value>
            <entry key="Windows.*">
              <value>...whatever windows needs here....</value>
        <property name="defaultProperties">
            <prop key="options"/>
    <property name="explicitTransformations">
        <bean class="org.alfresco.repo.content.transform.ContentTransformerRegistry$TransformationKey">

Now indexing of Freemind mindmaps will take place. On the plus side: No Java coding, just configuration of the standard Alfresco features. On the down side: ...is there anything? Anybody who could contribute the Windows batch file wrapper for the xmlstarlet call?


Vince Rothenberg said...

Interesting post, however it doesn't seem to work with the latest version of Alfresco (4.1) on Windows 7.

I applied your transformer code in freemind-transformer-context.xml within the Alfresco extension folder, yet the server threw a missing resource error. Tried using the latest transformer code from the wiki (http://wiki.alfresco.com/wiki/Content_Transformations), which fixed the error, yet it didn't index any .mm files.

If you have any ideas on how to update the code that's be excellent! Thanks in advance.

Will said...

Nice write-up, and I agree with the aim of minimizing any requirement for custom Java code for a simple requirement such as this.

However since Alfresco 4.0 you can use Apache Tika to perform the text extraction. Tika is already embedded in the repository and has the ability to index the text of any XML document.

In theory this should be more efficient that relying on an external executable process to perform the translation.

The only issue is that Alfresco has no knowledge of the Freemind format, but adding that should be simple config.