Modify

Opened 13 years ago

Last modified 4 years ago

#8377 new defect

Valid acronyms with underlined wiki markup are not tagged as acronyms

Reported by: Ben Allen Owned by:
Priority: normal Component: AcronymsPlugin
Severity: normal Keywords:
Cc: Trac Release: 0.12

Description (last modified by Ryan J Ollos)

When an acronym is underlined, AcronymsPlugin does not detect it or add the <acronym> tag to it. If the acronym contains other style elements "inside" the underline, then the acronym is tagged as expected.

For example, take the following wiki content: Acronym test: SCSI __SCSI__ '''SCSI''' ''SCSI'' '''''SCSI''''' __''SCSI''__ __'''SCSI'''__ ''__SCSI__'' '''__SCSI__''' '''__''SCSI''__'''

The following HTML is generated (newlines added for readability):

Acronym test:
<acronym title="Small Computer Simple Interface">SCSI</acronym>
<span class="underline">SCSI</span>
<strong><acronym title="Small Computer Simple Interface">SCSI</acronym></strong>
<em><acronym title="Small Computer Simple Interface">SCSI</acronym></em>
<strong><em><acronym title="Small Computer Simple Interface">SCSI</acronym></em></strong>
<span class="underline"><em><acronym title="Small Computer Simple Interface">SCSI</acronym></em></span>
<span class="underline"><strong><acronym title="Small Computer Simple Interface">SCSI</acronym></strong></span>
<em><span class="underline">SCSI</span></em>
<strong><span class="underline">SCSI</span></strong>
<strong><span class="underline"><em><acronym title="Small Computer Simple Interface">SCSI</acronym></em></span></strong>

which displays as:

Acronym test: SCSI SCSI SCSI SCSI SCSI SCSI SCSI SCSI SCSI SCSI

The underlined text was not made into an acronym. The underline + italics and underline + bold cases were, but only if the underline markup was on the *outside* of the bold/italics markup. Curiously enough, the underline + bold + italics case works as long as the underline is not the innermost markup element.

My guess is that the parser is allowing underscores in an acronym and is interpreting the double-underscore as part of the acronym (thus it doesn't match anything in the acronym list so it doesn't get tagged). Stripping off leading and trailing non-alphanumeric characters before comparing the text to the acronym list should fix this problem, but I haven't tried to patch it myself so I can't say for sure.

Attachments (0)

Change History (9)

comment:1 Changed 13 years ago by Ryan J Ollos

Description: modified (diff)
Priority: normalhigh
Status: newassigned

comment:2 in reply to:  description Changed 13 years ago by Ryan J Ollos

Replying to AllenB:

My guess is that the parser is allowing underscores in an acronym and is interpreting the double-underscore as part of the acronym (thus it doesn't match anything in the acronym list so it doesn't get tagged). Stripping off leading and trailing non-alphanumeric characters before comparing the text to the acronym list should fix this problem, but I haven't tried to patch it myself so I can't say for sure.

Thank you for the detailed report. I'm surprised by this, given [9662]. I'll investigate now and see if a quick fix can be made.

comment:3 Changed 13 years ago by Ryan J Ollos

(In [9740]) Strip trailing whitespace when parsing the AcronymDefinitions page. Previously, whitespace after the end of a row in the table would prevent that row from being parsed. Refs #8377.

comment:4 Changed 13 years ago by Ryan J Ollos

This issue here appears to be similar to what I found in comment:1:ticket:8267.

We implement the IWikiSyntaxProvider method:

    # IWikiSyntaxProvider methods
    def get_wiki_syntax(self):
        if self.compiled_acronyms:
            yield (self.compiled_acronyms, self._acronym_formatter)

__SCSI__ is not passed to the callback _acronym_formatter, so it is not being matched to compiled_acronyms. I'm not sure if this an internal issue with Trac, or if I can somehow modify compiled_acronyms, which for my test page looks like:

\b(?P<acronym>RFC2316|SCSI|ROM|URL|RFC)(?P<acronymselector>\w*)\b

comment:5 Changed 13 years ago by Ryan J Ollos

Priority: highnormal

See also #857. I suspect all these tickets (#857, #8267, #8377) are related.

Since I'm stuck on this at the moment and have a number of other tickets that I know how to fix, I'm dropping the priority and will keep an eye out for a solution as I study the Trac source code.

If you want to do any additional investigation, I'll take quick action on any hints or a patch.

comment:6 in reply to:  4 Changed 13 years ago by Ben Allen

Replying to rjollos:

__SCSI__ is not passed to the callback _acronym_formatter, so it is not being matched to compiled_acronyms. I'm not sure if this an internal issue with Trac, or if I can somehow modify compiled_acronyms, which for my test page looks like:

\b(?P<acronym>RFC2316|SCSI|ROM|URL|RFC)(?P<acronymselector>\w*)\b

If Trac isn't passing the text to the callback, then I don't think that it's an error in the plugin. You might be able to work around it, however.

I haven't delved too deeply into the Trac source regarding this, but I suspect that the Trac source uses a similar regular expression. Since the \w character class is equivalent to [A-Za-z0-9_], the regular expression will pick up underscores as a valid part of the word. You might be able to work around this Trac behavior by slightly modifying the _update_acronyms method. Whenever you add an acronym into the self.compiled_acronyms list, add the "underlined version" of the acronym as well. If I'm understanding the source correctly, this would mean changing line 34 from:

self.acronyms[a] = (escape(d), escape(u), escape(s))

to something similar to:

self.acronyms[a] = (escape(d), escape(u), escape(s))
a_2 = "__%s__" % a
self.acronyms[a_2] = (escape(d), escape(u), escape(s))

The drawback is that this would double the length of the self.compiled_acronyms list and would cause Trac to spend more time processing acronyms. You would probably also want something more intelligent for constructing a_2 that will first verify that the acronym doesn't already have leading or trailing underscores or that the list doesn't already contain an acronym with that name.

I haven't had a chance to test this myself, so it's merely a conjecture at this point.

comment:7 Changed 13 years ago by Ryan J Ollos

Summary: Underlined acronyms aren't always convertedValid acronyms with underlined wiki markup are not tagged as acronyms

comment:8 Changed 11 years ago by Ryan J Ollos

Status: assignednew

comment:9 Changed 4 years ago by Ryan J Ollos

Owner: Ryan J Ollos deleted

Modify Ticket

Change Properties
Set your email in Preferences
Action
as new The ticket will remain with no owner.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.