Blog

Convert HTML to Text with NPOI

Apache POI (Poor Obfuscation Implementation) is an API that allows programmers to create, modify, and display Microsoft Office files using Java programs NPOI. It’s a C# port of the POI Java project and is an open-source library and is a stand-alone implementation. This is especially popular in creating an Excel file or reading data from one.

However, while working with the NPOI library to generate Excel documents, the user might want to render HTML contents to an excel cell. But there is no inbuilt way in it to parse the HTML tags and render them. In this post, we will see the steps to export or convert DataTable data into Excel files using NPOI in C# and render HTML contents in an Excel cell.

The HTML styles used in this scope are:

  1. Line-breaks (BR, P, DIV)
  2. Bulleting
  3. Numbering
  4. Bold, Italic, and Underline

We are using the free NPOI DLL for this export which is free to use. To parse HTML text, we will be using HTML Agility Pack.

Installing dependencies:

The following need to be installed:

  • Install-Package NPOI -Version 2.5.1

  • Install-Package HtmlAgilityPack -Version 1.11.24

After these packages are installed, we need to add the respective namespaces for accessing NPOI classes and HTML Agility Pack.

  1. Method for selecting text and style parameters from HTML nodes recursively

This method uses the HtmlAgilityPack namespace

i-24-68.jpg

This is the recommended way of selecting a node safely.

i-24-69.jpg

  1. Create a method to convert HTML text to NPOI rich-text

This method uses the HtmlAgilityPack namespace.

i-24-70.png

i-24-73.png

i-24-71.png

  1. Create a method for Exporting data table into Excel

This method uses NPOI.SS.UserModel and NPOI.XSSF.UserModel namespaces

i-24-72.png

i-24-74.png

References

Want to discuss your project?
We can help!

close

Hello.

We’re glad you’re here. Tell us a little about your requirement.