Generating Table of Contents from HTML in C#

One Class to Cover all Requirements of Generating Your TOC

In-page links are now an important issue for SEO, so I wanted to auto generate a TOC for my blog posts. Firstly, I stack overflowed for a solution but I could not find a good and working library. So I wrote for myself and here is the complete code for you which I am using on all pages on this website.

First problem is, the links are not necessarily be hierarchical, I mean I can start with a h2 header and then add a h1 header, even add an h4 header inside that h1 header. Yes it is not the right way to use but still a valid HTML.

Second Problem: Generating ID's associated with our links

Ok so we generated our In-Page Links but what will it point to? So my library also have to add associated in-page ID s to article headers. My library also takes care of this. I used HtmlAgilityPack for better performance.

Enaugh with the problems, here is your code

using System;
using System.Collections.Generic;
using System.Linq;
using HtmlAgilityPack;

namespace HolyOne.Web.Services
{
    public class TOCGenerator
    {
        public class TOCNode
        {
            public List<TOCNode> Children { get; set; } = new List<TOCNode>();
            public TOCNode Parent { get; set; }
            public int Level { get; set; }
            public string Text { get; set; }

            public string TargetElementId { get; set; }

            public override string ToString()
            {
                return $"H{Level} | {Text}";
            }
        }

        public string SourceHtmlCode { get; set; }
        public string AnchoredHtmlCode { get; private set; }

        public List<TOCNode> Tree { get; private set; } = new List<TOCNode>();

        private string ProcessNode(TOCNode n, int index)
        {
            StringBuilder sb = new StringBuilder();

            sb.AppendLine(@"<li>");
            sb.AppendLine(@"<a href=""#" + n.TargetElementId + @""">");
            sb.AppendLine(n.Text);
            sb.AppendLine("</a>");
            int childIndex = 0;

            if (n.Children.Any())
            {
                sb.AppendLine("<ul>");
                foreach (TOCNode item in n.Children)
                {
                    childIndex++;
                    string ln = ProcessNode(item, childIndex);
                    sb.AppendLine(ln);
                }
                sb.AppendLine("</ul>");
            }

            sb.AppendLine(@"</li>");
            return sb.ToString();
        }
        public string getTOCHtmlCode()
        {

            StringBuilder sb = new StringBuilder();
            sb.AppendLine(@"<div class=""toc"">");
            sb.AppendLine(@"<ul>");
            int childIndex = 0;
            foreach (TOCNode item in Tree)
            {
                childIndex++;
                string ln = ProcessNode(item, childIndex);
                sb.AppendLine(ln);
            }
            sb.AppendLine(@"</ul>");
            sb.AppendLine(@"</div>");
            return sb.ToString();
        }

        readonly static char[] turChars = { 'Ğ', 'ğ', 'Ü', 'ü', 'Ş', 'ş', 'İ', 'ı', 'Ö', 'ö', 'Ç', 'ç' };
        readonly static char[] engChars = { 'G', 'g', 'U', 'u', 'S', 's', 'I', 'i', 'O', 'o', 'C', 'c' };
        //*1000 => 2ms
        private string GenerateSlug(string str, int maxlen = 50)
        {
            if (str == null) return "";
            StringBuilder sb = new StringBuilder();
            bool wasHyphen = true;
            int MaxCnt = str.Length > maxlen ? maxlen : str.Length;

            for (int i = 0; i < MaxCnt; i++)
            {
                char c = str[i];
                bool wastr = false;

                if ((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || (c >= '0' && c <= '9') || c == '-')
                {
                    sb.Append(c);
                    wasHyphen = false;
                }
                else if (char.IsWhiteSpace(c) && !wasHyphen)
                {
                    sb.Append('-');
                    wasHyphen = true;
                }
                else
                {
                    for (int j = 0; j < turChars.Length; j++)
                    {

                        if (c == turChars[j])
                        {
                            sb.Append(engChars[j]);
                        }
                        wastr = true;
                        wasHyphen = false;
                    }
                    if (!wastr) sb.Append("-");
                }

            }
            // Avoid trailing hyphens
            if (wasHyphen && sb.Length > 0)
                sb.Length--;
            str = sb.ToString();

            return str.ToLowerInvariant();
        }

        public void GenerateTOC()
        {
            var doc = new HtmlDocument();
            doc.LoadHtml(SourceHtmlCode);

            const string xpath = "//*[self::h1 or self::h2 or self::h3 or self::h4 or self::h5 or self::h6]";
            TOCNode parent = null;
            List<TOCNode> allNodes = new List<TOCNode>();
            Tree.Clear();
            HtmlNodeCollection DocNodes = doc.DocumentNode.SelectNodes(xpath);
            if (DocNodes != null)
                foreach (var node in doc.DocumentNode.SelectNodes(xpath))
                {
                    int level = 0;
                    int.TryParse(node.Name.TrimStart(new char[] { 'h', 'H' }), out level);
                    if (level > 6 || level < 1) level = 0;

                    parent = allNodes.FindLast(o => o.Level < level);

                    if (String.IsNullOrWhiteSpace(node.InnerText)) continue;  // ignore whitespace headings
                    TOCNode n = new TOCNode()
                    {
                        Text = node.InnerText,
                        Level = level,
                        Parent = parent,
                    };
                    n.TargetElementId = $"H{n.Level}_{this.GenerateSlug(n.Text)}";
                    node.Id = n.TargetElementId;

                    allNodes.Add(n);
                    if (parent == null) Tree.Add(n);
                    else
                    {
                        parent.Children.Add(n);
                    }
                }
            AnchoredHtmlCode = doc.DocumentNode.InnerHtml;

        }
    }
}

Adding ID's to Headers

I added an additional method getTocHTMLCode() for adding the ID's for the headers. I did this in the same loop when generating the TOC tree for best performance.

Third Problem: Presenting the TOC in a well formed HTML

Ok so our code generates a tree of sections of our HTML document. I wanted to output this tree to fit for all kinds of designs so I did not add tabs or levels to generated code, Instead I left that job for my CSS code. Here is the CSS code for our TOC div.

Here is your CSS code

.tocframe {
    width:100%;
    font-size: 0.85em;
    background-color: #fff;
    border: 1px inset gray;
    display: inline-block;
    padding-right: 10px;
    padding-top: 6px;
    margin-bottom: 14px;
}

.toc a {
    text-decoration: none;
}
.toc a:hover {
    text-decoration: underline;
}

.toc ul {
    list-style-type: none;
    margin-left: 10px;
    counter-reset: css-counters 0; /* intializes counter, set -1 for zero-based counters */
}

.toc ul li:before {
    font-weight: 600;
    counter-increment: css-counters;
    content: counters(css-counters, ".") " "; /* generates inherited counters from parents */
}

How to Use with your Article

This is the way. mandolorian helmet

TOCGenerator tg = new TOCGenerator();

tg.SourceHtmlCode = @"...some html code here...";
tg.GenerateTOC();  

@Html.Raw(tg.getTOCHtmlCode())  
@Html.Raw(tg.AnchoredHtmlCode)
    

 

I hope it helps somebody

Admin Programming