Convert HTML to Text with NPOI
Apache POI (Poor Obfuscation Implementation) is an API that allows programmers to create, modify, and display Microsoft Office files using Java programs NPOI. It’s a C# port of the POI Java project and is an open-source library and is a stand-alone implementation. This is especially popular in creating an Excel file or reading data from one.
However, while working with the NPOI library to generate Excel documents, the user might want to render HTML contents to an excel cell. But there is no inbuilt way in it to parse the HTML tags and render them. In this post, we will see the steps to export or convert DataTable data into Excel files using NPOI in C# and render HTML contents in an Excel cell.
The HTML styles used in this scope are:
- Line-breaks (BR, P, DIV)
- Bold, Italic, and Underline
We are using the free NPOI DLL for this export which is free to use. To parse HTML text, we will be using HTML Agility Pack.
The following need to be installed:
Install-Package NPOI -Version 2.5.1
Install-Package HtmlAgilityPack -Version 1.11.24
After these packages are installed, we need to add the respective namespaces for accessing NPOI classes and HTML Agility Pack.
- Method for selecting text and style parameters from HTML nodes recursively
This method uses the HtmlAgilityPack namespace
This is the recommended way of selecting a node safely.
- Create a method to convert HTML text to NPOI rich-text
This method uses the HtmlAgilityPack namespace.
- Create a method for Exporting data table into Excel
This method uses NPOI.SS.UserModel and NPOI.XSSF.UserModel namespaces