<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-3862704576763351872</id><updated>2012-02-08T02:06:39.539+01:00</updated><category term='Alfresco'/><category term='pdftotext'/><category term='cmis'/><category term='Freemind'/><category term='Groovy'/><category term='PDFBox'/><title type='text'>Think Alfresco</title><subtitle type='html'>Nice pieces about and with the Alfresco ECM software.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://thinkalfresco.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3862704576763351872/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://thinkalfresco.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Lothar Märkle</name><uri>http://www.blogger.com/profile/02395488456732866307</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>5</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-3862704576763351872.post-7233420412972540080</id><published>2009-03-06T21:09:00.012+01:00</published><updated>2009-03-06T23:19:28.792+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='pdftotext'/><category scheme='http://www.blogger.com/atom/ns#' term='Alfresco'/><category scheme='http://www.blogger.com/atom/ns#' term='PDFBox'/><title type='text'>Speeding up PDF indexing - Alfresco Hack #3</title><content type='html'>&lt;p&gt;
What is your document type distribution like? If the majority of your stored documents in Alfresco are PDFs, this article is for you. It describes how to speed up the full-text indexing process dramatically by factor of 4! But relax, no coding required:)
&lt;/p&gt;
&lt;p&gt;
To estimate the impact on your installation, some rough distribution numbers can be gathered like this:
&lt;pre class="bash" name="code"&gt;
dmc@alfresco: find ./alf_data/contentstore -type f -exec file -inb {} \;| sort |uniq -c|sort -nr
&lt;/pre&gt;
Output for example from our installation:
&lt;pre&gt;
image/png          28 %
application/msword 13 %
application/pdf    10 %
image/gif           8 %
other              41 %
&lt;/pre&gt;
&lt;/p&gt;
&lt;p&gt;
The PDF format is the most used text format after microsoft word. From these numbers it is clear, that a speed up of PDF indexing would be a great benefit for the overall system. The indexing process first extracts the plain text content from the PDF document by the help of a pdf-&gt;text transformer based on the PDFBox library. The extracted text is then fed to the lucene indexer component.
&lt;/p&gt;
&lt;p&gt;
For testing the text extraction, I created a small PDF document collection consisting of 15 PDF documents, total of 15MB. To rule out java startup and library loading times, I used this small sample program to time the PDFBox text stripper:
&lt;/p&gt;
&lt;pre class="java" name="code"&gt;
package de.dmc.alfresco.pdfbox;

import java.io.IOException;
import org.pdfbox.pdmodel.PDDocument;
import org.pdfbox.util.PDFTextStripper;

public class ExtractText {
        // Extract the text from the pdfs given on the command line
 public static void main(String[] args) throws IOException {
  long start = System.currentTimeMillis();
  for (String pdfFile : args) {
   PDDocument document = PDDocument.load(pdfFile);
   PDFTextStripper stripper = new PDFTextStripper();
   stripper.getText(document);
   document.close();
  }
  long stop = System.currentTimeMillis();
  long diff = stop - start;
  System.err.printf("pdfs: %d, total: %d seconds, average: %.3f seconds per document\n", args.length, diff/1000, diff/1000d/args.length );
 }
}
&lt;/pre&gt;
Output on my 2Ghz dual core:
&lt;pre name="bash" class="code"&gt;
pdfs: 15, total: 10 seconds, &lt;b&gt;average: 0.727 seconds per document&lt;/b&gt;
&lt;/pre&gt;

Results: about a whole &lt;i&gt;small&lt;/i&gt; second is spend on text extraction in every pdf added to Alfresco!
&lt;br/&gt;
In our projects, we replaced the &lt;a href="http://www.pdfbox.org" target="_blank"&gt;PDFBox&lt;/a&gt; transformer with the &lt;a href="http://en.wikipedia.org/wiki/Pdftotext" target="_blank"&gt;pdftotext&lt;/a&gt; console tool.
&lt;pre class="code" name="bash"&gt;
lothar@lothar-laptop:~/devenv/lib/collections/pdf$ time for pdffile in *.pdf; do pdftotext $pdffile - &gt;/dev/null;done

real 0m2.738s
user 0m2.460s
sys 0m0.136s
&lt;/pre&gt;

Results: about 3 seconds for all 15 pdf files! &lt;b&gt;average: 0.2 seconds per document&lt;/b&gt;
&lt;br/&gt;
&lt;br/&gt;
&lt;span style="font-weight:bold;"&gt;Configuration of the transformer:&lt;/span&gt;
&lt;p&gt;
Add this configuration to shared/classes/alfresco/extension/pdf-transformer-context.xml:
&lt;/p&gt;
&lt;pre class="xml" name="code"&gt;
&amp;lt;?xml version="1.0" encoding="UTF-8"?&gt;
&amp;lt;!DOCTYPE beans PUBLIC "-//SPRING//DTD BEAN//EN" "http://www.springframework.org/dtd/spring-beans.dtd"&gt;
&amp;lt;beans&gt;
        &amp;lt;!-- disable standard pdfbox text transformer --&gt;
        &amp;lt;bean id="transformer.PdfBox" class="java.lang.String"/&gt;
        &amp;lt;!-- has the above injected, is newly created below --&gt;
    &amp;lt;bean id="transformer.complex.OpenOffice.PdfBox" class="java.lang.String"/&gt;

        &amp;lt;!-- pdftotext command line binary --&gt;
        &amp;lt;bean id="transformer.PdfToTextTool"
                class="org.alfresco.repo.content.transform.RuntimeExecutableContentTransformer"
                parent="baseContentTransformer"&gt;
                &amp;lt;property name="transformCommand"&gt;
                        &amp;lt;bean name="transformer.pdftotext.Command"
                                class="org.alfresco.util.exec.RuntimeExec"&gt;
                                &amp;lt;property name="commandMap"&gt;
                                        &amp;lt;map&gt;
                                                &amp;lt;entry key="Linux.*"&gt;
                                                        &amp;lt;value&gt;${catalina.base}/webapps/alfresco/WEB-INF/bin/pdftotext-linux -enc UTF-8 ${options} ${source} ${target}&amp;lt;/value&gt;
                                                &amp;lt;/entry&gt;
                                                &amp;lt;entry key="Windows.*"&gt;
                                                        &amp;lt;value&gt;${catalina.base}/webapps/alfresco/WEB-INF/bin/pdftotext-win32.exe -enc UTF-8 ${options} ${source} ${target}&amp;lt;/value&gt;
                                                &amp;lt;/entry&gt;
                                        &amp;lt;/map&gt;
                                &amp;lt;/property&gt;
                                &amp;lt;property name="defaultProperties"&gt;
                                        &amp;lt;props&gt;
                                                &amp;lt;prop key="options"&gt;&amp;lt;/prop&gt;
                                        &amp;lt;/props&gt;
                                &amp;lt;/property&gt;
                        &amp;lt;/bean&gt;
                &amp;lt;/property&gt;
                &amp;lt;!-- ensure executable bits of binaries on unix --&gt;
                &amp;lt;property name="checkCommand"&gt;
                        &amp;lt;bean name="transformer.pdftotext.checkCommand"
                                class="org.alfresco.util.exec.RuntimeExec"&gt;
                                &amp;lt;property name="commandMap"&gt;
                                        &amp;lt;map&gt;
                                                &amp;lt;entry key="Linux.*"&gt;
                                                        &amp;lt;value&gt;chmod 775 ${catalina.base}/webapps/alfresco/WEB-INF/bin/pdftotext-linux&amp;lt;/value&gt;
                                                &amp;lt;/entry&gt;
                                                &amp;lt;entry key="Windows.*"&gt;
                                                &amp;lt;!--  dummy value --&gt;
                                                        &amp;lt;value&gt;cmd.exe /C dir&amp;lt;/value&gt;
                                                &amp;lt;/entry&gt;
                                        &amp;lt;/map&gt;
                                &amp;lt;/property&gt;
                                &amp;lt;property name="defaultProperties"&gt;
                                        &amp;lt;props&gt;
                                                &amp;lt;prop key="options"&gt;&amp;lt;/prop&gt;
                                        &amp;lt;/props&gt;
                                &amp;lt;/property&gt;
                        &amp;lt;/bean&gt;
                &amp;lt;/property&gt;
                &amp;lt;property name="explicitTransformations"&gt;
                        &amp;lt;list&gt;
                                &amp;lt;bean
                                        class="org.alfresco.repo.content.transform.ContentTransformerRegistry$TransformationKey"&gt;
                                        &amp;lt;constructor-arg&gt;
                                                &amp;lt;value&gt;application/pdf&amp;lt;/value&gt;
                                        &amp;lt;/constructor-arg&gt;
                                        &amp;lt;constructor-arg&gt;
                                                &amp;lt;value&gt;text/plain&amp;lt;/value&gt;
                                        &amp;lt;/constructor-arg&gt;
                                &amp;lt;/bean&gt;
                        &amp;lt;/list&gt;
                &amp;lt;/property&gt;
        &amp;lt;/bean&gt;

   &amp;lt;!-- replaces bean transformer.complex.OpenOffice.PdfBox --&gt;
   &amp;lt;bean id="transformer.complex.OpenOffice.PdfToTextTool"
        class="org.alfresco.repo.content.transform.ComplexContentTransformer"
        parent="baseContentTransformer" &gt;
      &amp;lt;property name="transformers"&gt;
         &amp;lt;list&gt;
            &amp;lt;ref bean="transformer.OpenOffice" /&gt;
            &amp;lt;ref bean="transformer.PdfToTextTool" /&gt;
         &amp;lt;/list&gt;
      &amp;lt;/property&gt;
      &amp;lt;property name="intermediateMimetypes"&gt;
         &amp;lt;list&gt;
            &amp;lt;value&gt;application/pdf&amp;lt;/value&gt;
         &amp;lt;/list&gt;
      &amp;lt;/property&gt;
   &amp;lt;/bean&gt;
&amp;lt;/beans&gt;
&lt;/pre&gt;
&lt;br/&gt;
Let me know how it worked for you!
&lt;br/&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3862704576763351872-7233420412972540080?l=thinkalfresco.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thinkalfresco.blogspot.com/feeds/7233420412972540080/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3862704576763351872&amp;postID=7233420412972540080&amp;isPopup=true' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3862704576763351872/posts/default/7233420412972540080'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3862704576763351872/posts/default/7233420412972540080'/><link rel='alternate' type='text/html' href='http://thinkalfresco.blogspot.com/2009/03/speeding-up-pdf-indexing-alfresco-hack.html' title='Speeding up PDF indexing - Alfresco Hack #3'/><author><name>Lothar Märkle</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3862704576763351872.post-8742997654157369131</id><published>2009-02-03T16:17:00.024+01:00</published><updated>2009-02-05T14:12:11.191+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Freemind'/><category scheme='http://www.blogger.com/atom/ns#' term='Alfresco'/><title type='text'>Indexing Freemind MindMaps with Alfresco - Alf Hack # 2</title><content type='html'>The idea of this Alfresco hack is to use a command line tool for text extraction of the Freemind .mm file.

Steps to include this into Alfresco will be:
&lt;ol&gt;
&lt;li&gt;Add Mimetype application/x-freemind for .mm&lt;/li&gt;
&lt;li&gt;Add transformer from appplication/x-freemind to text/plain&lt;/li&gt;
&lt;/ol&gt;

This article will talk about the second step. For adding a new MIME type please refer to the &lt;a href="http://wiki.alfresco.com/wiki/Adding_a_Mime_Type" target="_new"&gt;Alfresco Wiki&lt;/a&gt;. The MIME type of Freemind mid maps is application/x-freemind. There is also a nice blog post about adding the freemind MIME type and a &lt;a target="_new" href="http://www.techbits.de/2007/03/02/integrating-freemind-documents-into-alfresco"&gt;nice map integration&lt;/a&gt; available.

&lt;br/&gt;&lt;br/&gt;&lt;span style="font-weight:bold;"&gt;Extract the text&lt;/span&gt;&lt;br/&gt;&lt;br/&gt;

An example shows how Freemind stores this sample map in a XML file:
&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_ECNUDHkljtU/SYhuEryXzSI/AAAAAAAAAA8/cfMZU1oD_Uo/s1600-h/MindMapSample.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 64px;" src="http://4.bp.blogspot.com/_ECNUDHkljtU/SYhuEryXzSI/AAAAAAAAAA8/cfMZU1oD_Uo/s320/MindMapSample.png" alt="" id="BLOGGER_PHOTO_ID_5298605988353920290" border="0" /&gt;&lt;/a&gt;

&lt;pre name="code" class="xml"&gt;
&amp;lt;map version="0.7.1"&gt;
  &amp;lt;node text="Alfresco Hack No 2"&gt;
    &amp;lt;node text="Explore how Freemind XML looks like" position="right"&gt;
    &amp;lt;/node&gt;
  &amp;lt;/node&gt;
&amp;lt;/map&gt;
&lt;/pre&gt;

Quite simple XML without namespaces. The text of the map nodes is stored in a the value of the attribute &lt;span style="font-weight: bold;"&gt;text&lt;/span&gt;.

To extract the text I will use a quick-and-dirty XSLT:
&lt;pre name="code" class="xml"&gt;
&amp;lt;?xml version="1.0"?&gt;
&amp;lt;xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
  &amp;lt;xsl:output omit-xml-declaration="yes" indent="no"/&gt;
    &amp;lt;xsl:template match="/"&gt;
     &amp;lt;xsl:call-template name="t1"/&gt;
   &amp;lt;/xsl:template&gt;
   &amp;lt;xsl:template name="t1"&gt;
     &amp;lt;xsl:for-each select="//node"&gt;
       &amp;lt;xsl:value-of select="@TEXT"/&gt;
       &amp;lt;xsl:value-of select="' '"/&gt;
     &amp;lt;/xsl:for-each&gt;
  &amp;lt;/xsl:template&gt;
&amp;lt;/xsl:stylesheet&gt;

&lt;/pre&gt;

Throwing this XSLT on the Freemind XML results in the extracted text:
&lt;pre&gt;Alfresco Hack No 2 Explore how Freemind XML looks like&lt;/pre&gt;


&lt;br/&gt;&lt;span style="font-weight:bold;"&gt;Add transformer to Alfresco&lt;/span&gt;&lt;br/&gt;

To keep things simple, I will use the Alfrescos feature to do content transformations with
external tools or programs. This is done by configuring a RuntimeExecutableContentTransformer
bean. But first, the command line of the external tool has to be figured out. I will use the &lt;span style="font-style:italic;"&gt;xmlstarlet&lt;/span&gt; command line tool from &lt;a href="http://xmlstar.sourceforge.net/" target="_new"&gt;http://xmlstar.sourceforge.net/&lt;/a&gt;. Depending on your linux distribution the executable will be called just &lt;span style="font-style:italic;"&gt;xml&lt;/span&gt; or &lt;span style="font-style:italic;"&gt;xmlstarlet&lt;/span&gt;. There is also a Windows version available from the download page.

Transforming the above XSLT to xmlstarlets commandline results in:
&lt;pre name="code" class="bash"&gt;xmlstarlet sel -t -m //node -v @TEXT -o ' ' Alfresco\ Hack\ No\ 2.mm&lt;/pre&gt;

Sadly, the output always go to stdout and no output file can be specified. But this is required for the RuntimeExecutableContentTransformer, so a simple script wrapper can be used. I put the following to a file /home/lothar/bin/freemind2text.sh (made executable with chmod 775) which will be configured to the transformer bean:
&lt;pre name="code" class="bash"&gt;
#!/bin/bash
# save arguments to variables
SOURCE=$1
TARGET=$2

# to see what gets extracted append arguments to logfile
echo "from $SOURCE to $TARGET" &gt;&gt;/tmp/freemindtransform.log

# call xmlstarlet tool and redirect output to $TARGET
xmlstarlet sel --text --encoding UTF-8 -t -m //node -v @TEXT -o ' ' "$SOURCE" &gt; "$TARGET"
&lt;/pre&gt;

Now we are ready to configure the RuntimeExecutableContentTransformer bean:
&lt;pre name="code" class="xml"&gt;
&amp;lt;?xml version="1.0" encoding="UTF-8"?&gt;
&amp;lt;!DOCTYPE beans PUBLIC "-//SPRING//DTD BEAN//EN" "http://www.springframework.org/dtd/spring-beans.dtd"&gt;
&amp;lt;beans&gt;
  &amp;lt;bean id="transformer.freemindToText" class="org.alfresco.repo.content.transform.RuntimeExecutableContentTransformer" parent="baseContentTransformer"&gt;
    &amp;lt;property name="transformCommand"&gt;
      &amp;lt;bean name="transformer.freemind.Command" class="org.alfresco.util.exec.RuntimeExec"&gt;
        &amp;lt;property name="commandMap"&gt;
          &amp;lt;map&gt;
            &amp;lt;entry key="Linux.*"&gt;
              &amp;lt;value&gt;/home/lothar/bin/freemind2text.sh ${source} ${target}&amp;lt;/value&gt;
            &amp;lt;/entry&gt;
            &amp;lt;entry key="Windows.*"&gt;
              &amp;lt;value&gt;...whatever windows needs here....&amp;lt;/value&gt;
            &amp;lt;/entry&gt;
          &amp;lt;/map&gt;
        &amp;lt;/property&gt;
        &amp;lt;property name="defaultProperties"&gt;
          &amp;lt;props&gt;
            &amp;lt;prop key="options"/&gt;
          &amp;lt;/props&gt;
        &amp;lt;/property&gt;
      &amp;lt;/bean&gt;
    &amp;lt;/property&gt;
    &amp;lt;property name="explicitTransformations"&gt;
      &amp;lt;list&gt;
        &amp;lt;bean class="org.alfresco.repo.content.transform.ContentTransformerRegistry$TransformationKey"&gt;
          &amp;lt;constructor-arg&gt;
            &amp;lt;value&gt;application/x-freemind&amp;lt;/value&gt;
          &amp;lt;/constructor-arg&gt;
          &amp;lt;constructor-arg&gt;
            &amp;lt;value&gt;text/plain&amp;lt;/value&gt;
          &amp;lt;/constructor-arg&gt;
        &amp;lt;/bean&gt;
      &amp;lt;/list&gt;
    &amp;lt;/property&gt;
  &amp;lt;/bean&gt;
&amp;lt;/beans&gt;

&lt;/pre&gt;


&lt;br/&gt;&lt;br/&gt;&lt;span style="font-weight:bold;"&gt;Finished!&lt;/span&gt;&lt;br/&gt; Now indexing of Freemind mindmaps will take place. 
On the plus side: No Java coding, just configuration of the standard Alfresco features.
On the down side: ...is there anything?

Anybody who could contribute the Windows batch file wrapper for the xmlstarlet call?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3862704576763351872-8742997654157369131?l=thinkalfresco.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thinkalfresco.blogspot.com/feeds/8742997654157369131/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3862704576763351872&amp;postID=8742997654157369131&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3862704576763351872/posts/default/8742997654157369131'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3862704576763351872/posts/default/8742997654157369131'/><link rel='alternate' type='text/html' href='http://thinkalfresco.blogspot.com/2009/02/indexing-freemind-mindmaps-with.html' title='Indexing Freemind MindMaps with Alfresco - Alf Hack # 2'/><author><name>Lothar Märkle</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_ECNUDHkljtU/SYhuEryXzSI/AAAAAAAAAA8/cfMZU1oD_Uo/s72-c/MindMapSample.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3862704576763351872.post-7791755364935029689</id><published>2009-02-02T13:36:00.015+01:00</published><updated>2009-02-08T23:52:51.594+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cmis'/><title type='text'>CMIS Link collection</title><content type='html'>Random link collection about CMIS: Blogs, Specs, Samples from Alfresco, EMC and others

&lt;a href="http://newton.typepad.com/content/2009/01/cmis-face-to-face-at-microsoft-in-redmond.html"&gt;John Newton F2F&lt;/a&gt;&lt;br/&gt;


&lt;a href="http://craigrandall.net/archives/2008/09/cmis/"&gt;http://craigrandall.net/archives/2008/09/cmis/&lt;/a&gt;&lt;br/&gt;

&lt;a href="https://community.emc.com/servlet/JiveServlet/previewBody/1606-102-1-2762/h3951-cmis-wp_2.pdf"&gt;https://community.emc.com/servlet/JiveServlet/previewBody/1606-102-1-2762/h3951-cmis-wp_2.pdf&lt;/a&gt;&lt;br/&gt;

&lt;a href="http://chucksblog.typepad.com/chucks_blog/2008/09/cmis----its-not.html"&gt;http://chucksblog.typepad.com/chucks_blog/2008/09/cmis----its-not.html&lt;/a&gt;&lt;br/&gt;

&lt;a href="https://community.emc.com/docs/DOC-1606"&gt;https://community.emc.com/docs/DOC-1606&lt;/a&gt;&lt;br/&gt;

&lt;br/&gt;&lt;span style="font-weight:bold;"&gt;OASIS&lt;/span&gt;&lt;br/&gt;
CMIS home: &lt;a href="http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=cmis"&gt;http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=cmis&lt;/a&gt;&lt;br/&gt;
Members: &lt;a href="http://www.oasis-open.org/committees/membership.php?wg_abbrev=cmis"&gt;http://www.oasis-open.org/committees/membership.php?wg_abbrev=cmis&lt;/a&gt;&lt;br/&gt;
JIRA: &lt;a href="http://tools.oasis-open.org/issues/browse/CMIS"&gt;http://tools.oasis-open.org/issues/browse/CMIS&lt;/a&gt;&lt;br/&gt;
CMIS TC list:&lt;a href="http://lists.oasis-open.org/archives/cmis/"&gt;http://lists.oasis-open.org/archives/cmis/&lt;/a&gt;&lt;br/&gt;
CMIS comments list:&lt;a href="http://lists.oasis-open.org/archives/cmis-comment/"&gt;http://lists.oasis-open.org/archives/cmis-comment/&lt;/a&gt;

&lt;a href="http://xml.coverpages.org/cmis.html"&gt;http://xml.coverpages.org/cmis.html&lt;/a&gt;


&lt;a href="http://info.emc.com/mk/get/DAP_RE?P.ctp_program_execution.Source_ID=16706"&gt;http://info.emc.com/mk/get/DAP_RE?P.ctp_program_execution.Source_ID=16706&lt;/a&gt;&lt;br/&gt;

&lt;a href="https://community.emc.com/community/labs/cmis"&gt;https://community.emc.com/community/labs/cmis&lt;/a&gt;&lt;br/&gt;

&lt;a href="http://roy.gbiv.com/untangled/2008/no-rest-in-cmis"&gt;http://roy.gbiv.com/untangled/2008/no-rest-in-cmis&lt;/a&gt;&lt;br/&gt;

&lt;a href="http://intertwingly.net/blog/?q=cmis"&gt;http://intertwingly.net/blog/?q=cmis&lt;/a&gt;&lt;br/&gt;

&lt;a href="http://www-01.ibm.com/software/data/content-management/cm-interoperablity-services.html"&gt;http://www-01.ibm.com/software/data/content-management/cm-interoperablity-services.html&lt;/a&gt;&lt;br/&gt;

&lt;a href="http://blogs.msdn.com/ecm/archive/2008/09/09/announcing-the-content-management-interoperability-services-cmis-specification.aspx"&gt;http://blogs.msdn.com/ecm/archive/2008/09/09/announcing-the-content-management-interoperability-services-cmis-specification.aspx&lt;/a&gt;&lt;br/&gt;

&lt;a href="http://blogs.msdn.com/ecm/"&gt;http://blogs.msdn.com/ecm/&lt;/a&gt;&lt;br/&gt;


&lt;a href="http://blogs.nuxeo.com/sections/blogs/florent_guillaume/2009_02_02_cmis-meeting-notes"&gt;http://blogs.nuxeo.com/sections/blogs/florent_guillaume/2009_02_02_cmis-meeting-notes&lt;/a&gt;&lt;br/&gt;

&lt;a href="http://blogs.the451group.com/information_management/2008/09/10/cmis-and-industry-standards-in-ecm/"&gt;http://blogs.the451group.com/information_management/2008/09/10/cmis-and-industry-standards-in-ecm/&lt;/a&gt;

&lt;br/&gt;Also a nice link collection on CMIS&lt;br/&gt;
&lt;a href="http://weblogs.goshaky.com/weblogs/test/search?q=collaboration"&gt;http://weblogs.goshaky.com/weblogs/test/search?q=collaboration&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3862704576763351872-7791755364935029689?l=thinkalfresco.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thinkalfresco.blogspot.com/feeds/7791755364935029689/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3862704576763351872&amp;postID=7791755364935029689&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3862704576763351872/posts/default/7791755364935029689'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3862704576763351872/posts/default/7791755364935029689'/><link rel='alternate' type='text/html' href='http://thinkalfresco.blogspot.com/2009/02/cmis-link-collection.html' title='CMIS Link collection'/><author><name>Lothar Märkle</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3862704576763351872.post-2049498922101819989</id><published>2009-01-27T21:45:00.004+01:00</published><updated>2009-02-02T11:23:25.150+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Alfresco'/><category scheme='http://www.blogger.com/atom/ns#' term='Groovy'/><title type='text'>Groovy Scripting for Alfresco - Alf Hack # 1</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_ECNUDHkljtU/SX-kEggd5ZI/AAAAAAAAAAU/AZMv48D2nRM/s1600-h/ideabulb-squared-75.jpg"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 75px; height: 75px;" src="http://1.bp.blogspot.com/_ECNUDHkljtU/SX-kEggd5ZI/AAAAAAAAAAU/AZMv48D2nRM/s320/ideabulb-squared-75.jpg" alt="" id="BLOGGER_PHOTO_ID_5296132084163536274" border="0" /&gt;&lt;/a&gt;
This is the first post of my &lt;span style="font-style: italic;"&gt;Alfresco Hacks&lt;/span&gt; series, showing some very useful tricks for the development with Alfresco. After working with Alfresco since beginning of 2007 with Version 1.4, I feel now well prepared for sharing some code. I do appreciate any feedback, feel free to comment and add suggestions.

I'm tired of doing the somewhat lengthy process of editing java source, compiling, alfresco.war building, deploying to tomcat and finally starting tomcat.
Just to try something out, this is too tedious. Therefore I gave the Groovy Server from http://iterative.com/GroovyServer.tar.gz a try.

It will give you access to a Groovy shell using:
&lt;pre&gt;telnet localhost 6789&lt;/pre&gt;
The simple steps I did:
&lt;ul&gt;&lt;li&gt;build the groovyserver.jar&lt;/li&gt;&lt;li&gt;copy groovyserver.jar, groovy-all*.jar, jline*.jar to WEB-INF/lib/&lt;/li&gt;&lt;li&gt;add this to a spring context file grooyserver-context.xml in the extensions directory:
&lt;/li&gt;&lt;/ul&gt;
&lt;pre name="code" class="xml"&gt;
&amp;lt;bean id="groovyService" abstract="true" method="initialize" method="destroy"&gt;
&amp;lt;property name="bindings"&gt;
&amp;lt;map&gt;
&amp;lt;entry key="ServiceRegistry" ref="ServiceRegistry"&gt;
&amp;lt;/map&gt;
&amp;lt;/property&gt;
&amp;lt;/bean&gt;

&amp;lt;bean id="groovyShellService" class="com.iterative.groovy.service.GroovyShellService" parent="groovyService"&gt;
&amp;lt;property name="socket" value="6789"&gt;
&amp;lt;property name="launchAtStart" value="true"&gt;
&amp;lt;/bean&gt;
&lt;/pre&gt;

After starting Alfresco up, the Groovy shell can be access with a telnet client connection to port 6789 on the Alfresco server:
&lt;pre&gt;
lothar@lothar-laptop:~$ telnet localhost 6789
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Groovy Shell (1.6-RC-2, JVM: 1.6.0_11)
Type 'go' to execute statements; Type 'help' for more information.
groovy&amp;gt;
&lt;/pre&gt;

As an example, searching for the term "alfresco":
&lt;pre name="code" class="java"&gt;
import org.alfresco.service.cmr.repository.StoreRef;
import org.alfresco.repo.transaction.RetryingTransactionHelper.RetryingTransactionCallback;
workspaceStoreRef = new StoreRef("workspace://SpacesStore");
ServiceRegistry.getAuthenticationService().authenticate("admin", "admin".toCharArray());
retryTXService = ServiceRegistry.getRetryingTransactionHelper();
def doWork() {
results = ServiceRegistry.getSearchService().query(workspaceStoreRef, "lucene", "alfresco");
for(r in results) { out.println(r.document); }
}
def sow = [ execute: { doWork() } ] as RetryingTransactionCallback&amp;lt;Void&gt;;
retryTXService.doInTransaction(sow);
go
&lt;/pre&gt;

Looks simple? Not at the first glance, but it is easy. The real work has to be put into the doWork() function lines 6 to 9. The rest can stay the same, it is just plumbing code.

&lt;span style="font-weight: bold;"&gt;Conclusion:&lt;/span&gt;
Now the Alfresco API is just a very small step away. Would like to try the VersionService? Just
do a ServiceRegistry.getVersionService()..... and fire up your Groovy script.

Other links to Alfresco and Groovy:
WebScripts with Groovy: &lt;a href="http://gradecak.blogspot.com/2008/04/alfresco-webscripts-with-groovy.html"&gt;http://gradecak.blogspot.com/2008/04/alfresco-webscripts-with-groovy.html&lt;/a&gt;
Alfresco with Grails: &lt;a href="http://forge.alfresco.com/projects/minigrails/"&gt;http://forge.alfresco.com/projects/minigrails/&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3862704576763351872-2049498922101819989?l=thinkalfresco.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thinkalfresco.blogspot.com/feeds/2049498922101819989/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3862704576763351872&amp;postID=2049498922101819989&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3862704576763351872/posts/default/2049498922101819989'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3862704576763351872/posts/default/2049498922101819989'/><link rel='alternate' type='text/html' href='http://thinkalfresco.blogspot.com/2009/01/groovy-scripting-for-alfresco-alf-hack.html' title='Groovy Scripting for Alfresco - Alf Hack # 1'/><author><name>Lothar Märkle</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_ECNUDHkljtU/SX-kEggd5ZI/AAAAAAAAAAU/AZMv48D2nRM/s72-c/ideabulb-squared-75.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3862704576763351872.post-7732038703836374441</id><published>2009-01-27T17:52:00.000+01:00</published><updated>2009-01-27T18:01:00.275+01:00</updated><title type='text'></title><content type='html'>&lt;span style="font-family: verdana;"&gt;Mission statement:&lt;/span&gt;
&lt;ul style="font-family: verdana;"&gt;&lt;li&gt;Share thoughts about Alfresco ECM in general
&lt;/li&gt;&lt;li&gt;Share some of my favorite Alfresco hacks&lt;/li&gt;&lt;li&gt;Share thoughts about Alfresco architecture&lt;/li&gt;&lt;li&gt;Talk about computing books&lt;/li&gt;&lt;li&gt;achieve at least some points from the above:)
&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3862704576763351872-7732038703836374441?l=thinkalfresco.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://thinkalfresco.blogspot.com/feeds/7732038703836374441/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3862704576763351872&amp;postID=7732038703836374441&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3862704576763351872/posts/default/7732038703836374441'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3862704576763351872/posts/default/7732038703836374441'/><link rel='alternate' type='text/html' href='http://thinkalfresco.blogspot.com/2009/01/mission-statement-share-thoughts-about.html' title=''/><author><name>Lothar Märkle</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
