Saturday, November 28, 2009

Word 2007 Document Processing Using OpenXML

One interesting topic is how to handle Word documents using code. I did a test to export the page content from a page in publishing site's Pages library to a Word 2007 document, and save it to a separate document library with success.

Code:
   /// <summary>
/// Export publishing page's content to Word 2007 document controls
/// Exported documents stored in a separate document library
/// </summary>
/// <param name="sourceItem">A list item from Pages' library</param>
/// <param name="targetList">A document library saves exported Word 2007 documents</param>
public static void ExportPubPageContentToWordDoc(SPListItem sourceItem, SPList targetList)
{
SPDocumentLibrary lib = targetList as SPDocumentLibrary;
if (lib == null)
{
throw new Exception("Target list is not a Document Library type");
}

foreach (SPContentType ctype in lib.ContentTypes)
{
if (ctype.Name.ToLower() != "document" && ctype.Name.ToLower() != "folder")
{
SPFile tempFile = ctype.ResourceFolder.Files[ctype.DocumentTemplate];
using (Stream fileStream = tempFile.OpenBinaryStream())
{
BinaryReader reader = new BinaryReader(fileStream);
MemoryStream memString = new MemoryStream();
BinaryWriter writer = new BinaryWriter(memString);
writer.Write(reader.ReadBytes((int)fileStream.Length));
writer.Flush();
reader.Close();

using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(memString, true))
{
MainDocumentPart mainPart = wordDoc.MainDocumentPart;
IEnumerator<CustomXmlPart> xmlPartEnumerator = mainPart.CustomXmlParts.GetEnumerator();
xmlPartEnumerator.MoveNext();
CustomXmlPart XMLPart = xmlPartEnumerator.Current;

// Create an XML document that matches our structure
XmlDocument doc = new XmlDocument();

// Create some nodes
XmlElement rootNode = doc.CreateElement("propertydata");
XmlElement titleNode = doc.CreateElement("title");
XmlElement body = doc.CreateElement("body");

titleNode.InnerText = GetFieldValueString(sourceItem, "Title");
rootNode.AppendChild(titleNode);
doc.AppendChild(rootNode);

body.InnerText = GetFieldValueString(sourceItem, "Article Body");
rootNode.AppendChild(body);
doc.AppendChild(rootNode);

MemoryStream resultStream = new MemoryStream();
doc.Save(resultStream);
resultStream.Flush();
resultStream.Position = 0;
XMLPart.FeedData(resultStream);

string fileName = sourceItem.File.Name;
if (fileName.IndexOf('.') > 0)
fileName = fileName.Substring(0, fileName.LastIndexOf('.'));
fileName += ".docx";
string docUrl = lib.RootFolder.Url + "/" + fileName;
SPFile newDoc = lib.RootFolder.Files.Add(docUrl, memString, true);
lib.Update();
}
}
}
}
}
OpenXML SKD 2.0 (http://www.microsoft.com/downloads/details.aspx?FamilyId=C6E744E5-36E9-45F5-8D8C-331DF206E0D0&displaylang=en) is required to run above code. Word 2007 Content Control tool-kit (http://dbe.codeplex.com/) is handy to manipulate Word 2007 documents'XML, and I used it to create the document library template file.

Good references on this topic:
http://blogs.msdn.com/mikeormond/archive/2008/06/20/word-2007-content-controls-databinding-and-schema-validation.aspx
http://www.craigmurphy.com/blog/?p=913
http://www.microsoft.com/uk/msdn/screencasts/screencast/236/Word-2007-Content-Controls-and-Schema-Validation.aspx

Sunday, November 22, 2009

SharePoint Content Type And Word Template

One nice feature in SharePoint is that the SharePoint content type and its Word template can be cooperating together. The content type field values could be treated as metadata and is injected to its Word 2007 document template as document properties. These document properties can be viewed in Word’s information panel (enable it by Word 2007 setting Prepare->Properties). User can update these properties directly in information panel and upload the Word document back to SharePoint Document Library. The SharePoint list item’s corresponding fields will be updated automatically. The steps are:
  1. Create an empty Word 2007 template.
  2. Create a new Content Type by “Site Actions > Site Settings > Site Content Types > Create”, select “Document Content types – Document” as parent.
  3. Add required fields to the new Content Type.
  4. Upload the Word 2007 template by “Advanced settings > Upload a new document template”.
  5. Create a new Document Library and enable the content type management by “Settings > Advanced settings > Allow management of content types? > Yes”.
  6. Add content type created in step 2 to the document library created in step 5.
  7. Add new document library item by selecting the template created in step 1.
If you don’t like working with Word information panel, you have option to use the Word content controls inside the Word document body to do similar things, and sync the metadata back to SharePoint. We can define the word template in our desire and associate document properties to Word document content controls. Following screen-shot illustrates how document properties can be tied to Word content controls inside Word document, note that Title, Title_fr, Sub_Title, Sub_Title_fr in the example are the Content Type fields inside SharePoint Document Library:


We can also create Word 2007 content controls under developer tab (Ribbon), but there is no direct association between the content controls and document properties if we do so. OpenXML and Word 2007 custom properties techniques are required for Word content automation (SharePoint/Word data binding). I will put more details about this in my next post.

Although SharePoint can generate document properties automatically, not all SharePoint fields are supported by document properties; and not all types of document property are supported by Word content controls. Following table lists mapping of common SharePoint fields and Word 2007 content control:



SharePoint Field Type

Word 2007 Content Control

Single line of text

Text (Not allow carriage returns)

Multiple lines of text

Text (Allow carriage returns)

Choice

Dropdown list

Number

Text with Schema validation

Currency

Text with Schema validation

Date and Time

Date picker

Yes/No

Dropdown list

Lookup

N/A

Person or Group

N/A

Hyperlink or Picture

N/A

Calculated

N/A

Custom filed

N/A



For those SharePoint fields missing World 2007 equivalent content control, we could create a Word content control compatible field in content type, and manually convert the original SharePoint field to that compatible field and versa vise inside list item event receiver. For example, a Text field can be used to map a custom field which is not recognized by Word 2007.