Bug #23172 Accepted

XHTML list item and nested list parsing issues

Version: 3.5.10 Reporter: Paul Bailey

Two separate issues here, but both relate to XHTML parsing of list tags. Tested with EE3.5.10, but I believe the issues have been around for a while.

Firstly, when lists are nested, main list items after the first seem to have

tags applied spuriously. For example, the below is already valid XHTML, so nothing should be added:

<ul>
<li>Item 1
<ul>
<li>Sub-item 1.1</li>
<li>Sub-item 1.2</li>
<li>Sub-item 1.3</li>
</ul>
</li>
<li>Item 2
<ul>
<li>Sub-item 2.1</li>
<li>Sub-item 2.2</li>
<li>Sub-item 2.3</li>
</ul>
</li>
<li>Item 3
<ul>
<li>Sub-item 3.1</li>
<li>Sub-item 3.2</li>
<li>Sub-item 3.3</li>
</ul>
</li>
</ul>

This is rendered as:

<ul>
<li>Item 1
<ul>
<li>Sub-item 1.1</li>
<li>Sub-item 1.2</li>
<li>Sub-item 1.3</li>
</ul>
</li>
<li>Item 2<ul>
<li>Sub-item 2.1</li>
<li>Sub-item 2.2</li>
<li>Sub-item 2.3</li>
</ul>
</li>
<li>Item 3<ul>
<li>Sub-item 3.1</li>
<li>Sub-item 3.2</li>
<li>Sub-item 3.3</li>
</ul>
</li>
</ul>

Note the

tags added to Items 2 and 3 (and the removal of the line-breaks after each of those items). I’ve tested this with a longer list, and it seems to persist. Only the first main list item doesn’t have the spurious

tag applied. This behaviour is easily squished visually in CSS, but what the parser is doing isn’t necessary or consistent.

Secondly, list tags which are indented with spaces seem to produce spurious non-breaking spaces in the XTML parse. For example, using three spaces to indent list items as follows:

<ul>
<li>Item 1
<ul>
   <li>Sub-item 1.1</li>
   <li>Sub-item 1.2</li>
   <li>Sub-item 1.3</li>
</ul>
</li>
<li>Item 2
<ul>
   <li>Sub-item 2.1</li>
   <li>Sub-item 2.2</li>
   <li>Sub-item 2.3</li>
</ul>
</li>
<li>Item 3
<ul>
   <li>Sub-item 3.1</li>
   <li>Sub-item 3.2</li>
   <li>Sub-item 3.3</li>
</ul>
</li>
</ul>

the code is rendered as:

[code]
<ul>
<li>Item 1
<ul>
   <li>Sub-item 1.1</li>
   <li>Sub-item 1.2</li>
   <li>Sub-item 1.3</li>
</ul>
</li>
<li>Item 2<ul>
   <li>Sub-item 2.1</li>
   <li>Sub-item 2.2</li>
   <li>Sub-item 2.3</li>
</ul>
</li>
<li>Item 3<ul>
   <li>Sub-item 3.1</li>
   <li>Sub-item 3.2</li>
   <li>Sub-item 3.3</li>
</ul>
</li>
</ul>

With additional indenting to show the tag structure more clearly:

<ul>
<li>Item 1
   <ul>
      <li>Sub-item 1.1</li>
      <li>Sub-item 1.2</li>
      <li>Sub-item 1.3</li>
   </ul>
</li>
<li>Item 2
   <ul>
      <li>Sub-item 2.1</li>
      <li>Sub-item 2.2</li>
      <li>Sub-item 2.3</li>
   </ul>
</li>
<li>Item 3
   <ul>
      <li>Sub-item 3.1</li>
      <li>Sub-item 3.2</li>
      <li>Sub-item 3.3</li>
   </ul>
</li>
</ul>

more spurious non-breaking spaces are added:

<ul>
<li>Item 1
  <ul>
      <li>Sub-item 1.1</li>
      <li>Sub-item 1.2</li>
      <li>Sub-item 1.3</li>
   </ul>
</li>
<li>Item 2
<ul>
      <li>Sub-item 2.1</li>
      <li>Sub-item 2.2</li>
      <li>Sub-item 2.3</li>
   </ul>
</li>
<li>Item 3
<ul>
      <li>Sub-item 3.1</li>
      <li>Sub-item 3.2</li>
      <li>Sub-item 3.3</li>
   </ul>
</li>
</ul>

In all of the above cases, the original code passed to EE is already standard XHTML, so nothing needs to be added or modified. It’s very useful when laying out code for clarity to be able to indent, without this causing the addition of spurious code in the parse.

I see a discussion of the issue where non-breaking spaces are added in the bug tracker from about three years ago, but it doesn’t look like it was formally accepted as a bug. My reading of the XHTML spec is that the only valid content inside

<ul>

and

<ol>

tags are

<li>

list items, which would mean that the addition of non-breaking spaces actually breaks XHTML that’s already valid.

  • Dammit. Clearly the above doesn’t make much sense as displayed. I tried to set off all of the code using markdown, but it obviously hasn’t worked. A preview option would be great!

    Paul Bailey
    10th July, 2017 at 5:55pm
  • Okay, my apologies, but this is a bit meta. Looking at the source for this page, my impression is that parsing oddities here are making it hard for me to discuss parsing oddities in EE. Inside code blocks, paragraph tags seem to have been stripped away entirely (why?), rather than htmlentity-ified, and non-breaking spaces also haven’t been htmlentity-ified, causing them to render invisibly. Paging Kafka.

    Paul Bailey
    10th July, 2017 at 6:21pm
  • I see this has been accepted already, which is great. Thank you. Because of the rendering issues in this report, I just wanted to be completely clear about the missing pieces:

    • The first part of the report concerns spurious paragraph tags added to main list items when there are nested lists. The paragraph tags that have been added by EE don’t show up in the code examples I’ve included.

    • The second part of the report concerns spurious non-breaking spaces that are added before list items when they’re indented. Again, these don’t show up in my code examples, but the text descriptions probably make it clear what I’m showing.

    Anyway, thanks again.

    Paul Bailey
    12th July, 2017 at 7:11pm

You must be signed in to comment on a bug report.

ExpressionEngine News

#eecms, #events, #releases