Autogenerating an excerpt when a summary doesn’t exist
Posted: 22 February 2007 02:28 AM   [ Ignore ]  
Research Assistant
Avatar
RankRankRank
Total Posts:  361
Joined  01-13-2004

I guess my real question is just is there any built in way, or do you know of a plugin, to auto-generate an excerpt of a certain maximum length from a field?  I swear that there was a way to do it somewhere, but a search through the documentation, the interface itself, and the pmachine site has been fruitless.

My end goal is to do something like the following:

{if summary}
  {summary}
{if
:else}
  {body excerpt
="200"}
{
/if}

To use the body truncated down to 200 characters if a summary had not been input.  Ideally the excerpt generator would intelligently break at word breaks when dealing with roman character-based languages.

 Signature 

———-
1.6.4— Build: 200800808

Profile
 
 
Posted: 22 February 2007 04:01 AM   [ Ignore ]   [ # 1 ]  
Moderator
Avatar
RankRankRankRankRankRankRank
Total Posts:  16190
Joined  05-15-2004

You’ll have to use a plugin for that. There are several, my vote goes out to Word Limit Plus.

 Signature 

Everything will be good in the end. If it’s not good, it’s not the end.

Profile
MSG
 
 
Posted: 22 February 2007 06:20 AM   [ Ignore ]   [ # 2 ]  
Research Scientist
Avatar
RankRankRankRankRankRank
Total Posts:  6086
Joined  08-04-2002

The plugin I’ve used is Filter-HTML

Profile
 
 
Posted: 22 February 2007 08:45 AM   [ Ignore ]   [ # 3 ]  
Moderator
Avatar
RankRankRankRankRankRankRankRank
Total Posts:  33276
Joined  05-14-2004

Just as a note, Filter HTML has been abandonware for well over a year, possibly approaching two years.  It’s also extremely complicated and doesn’t, in its current state, work with several versions of PHP.  I’d suggest that it should be avoided. =)

 Signature 
Profile
MSG
 
 
Posted: 22 February 2007 09:09 AM   [ Ignore ]   [ # 4 ]  
Research Scientist
Avatar
RankRankRankRankRankRank
Total Posts:  6086
Joined  08-04-2002

Ahhh…I was aware of it…should have noted it also. Seems to be working where I’m using it but others should be wary I guess.
Thanks LJ!

Profile
 
 
Posted: 25 February 2007 07:37 PM   [ Ignore ]   [ # 5 ]  
Research Assistant
Avatar
RankRankRank
Total Posts:  361
Joined  01-13-2004

Thanks everyone.  Will look into word limit plus.

 Signature 

———-
1.6.4— Build: 200800808

Profile
 
 
Posted: 25 February 2007 08:36 PM   [ Ignore ]   [ # 6 ]  
Research Assistant
Avatar
RankRankRank
Total Posts:  361
Joined  01-13-2004

OK, Word Limit Plus works great for the English page sof the site, but since it splits on words Japanese text is effectively limitless (having no spaces).  A search through some other plugins left me with these options.

1) Update the Filter-HTML plugin so that it will work with more modern versions of PHP (In this case I need to see if I can get permission from the author to start maintaining the code)
2) Expand the character limiter plugin to insert the more link for me
3) Write my own plugin
4) Get a fantastic alternate plugin solution from someone on this thread.

Right now, without looking at any code, option 2 is looking the easiest.  Any opinions/suggestions?

 Signature 

———-
1.6.4— Build: 200800808

Profile
 
 
Posted: 26 February 2007 12:25 AM   [ Ignore ]   [ # 7 ]  
Research Assistant
Avatar
RankRankRank
Total Posts:  361
Joined  01-13-2004

I reimplemented the char_limit plugin to let me specify the text to append on truncation and to make it multibyte aware and that seems to be working for me.

 Signature 

———-
1.6.4— Build: 200800808

Profile
 
 
Posted: 26 February 2007 05:35 PM   [ Ignore ]   [ # 8 ]  
Lab Technician
Avatar
RankRankRankRank
Total Posts:  1841
Joined  03-26-2006

I’m having a real dilemma in making decent excerpts. I feel that Filter HTML is way too intense. However, my client often puts an image as the first thing in his post, and he also uses numerous tags like <strong> and <em>. A plugin like word limit plus will not strip these out. So, if a tag is chopped in half…you have a page that doesn’t validate AND it looks ugly because, for example, a strong tag won’t be closed. Also, an image will still come through in the excerpt.

The perfect solution would be a a plugin called oh, I don’t know…EExcerpt that is similar to filter html (but not as bloated, I guess) but a little more powerful than word_limit_plus.

Gah…the excerpts on the site I’m building are killing me!

 Signature 

ryan masuga
—————
Masuga Design | Member, EE Pro Network
My EE Add-Ons | {devot:ee}
Twitter: masuga | masugadesign | devot_ee

Profile
 
 
Posted: 28 February 2007 05:08 PM   [ Ignore ]   [ # 9 ]  
Research Assistant
Avatar
RankRankRank
Total Posts:  873
Joined  05-14-2004
UltraBob - 26 February 2007 12:25 AM

I reimplemented the char_limit plugin to let me specify the text to append on truncation and to make it multibyte aware and that seems to be working for me.

(hmm since installing the textarea resizer greasemonkey my return key don’t work in here.) I’m looking for a character limiter that will strip to a number of charcters and the put ... at the end. No need for a link or anything. Bob care to share your improvement? would it do what I’d like it to? Thanks in advance.

Profile
 
 
Posted: 28 February 2007 06:13 PM   [ Ignore ]   [ # 10 ]  
Research Assistant
Avatar
RankRankRank
Total Posts:  361
Joined  01-13-2004

Hi David,
Sure, but be warned it has no html awareness so just like Masuga was talking about if someone starts a <strong> tag and we reach the limit before the strong tag is closed you could potentially end up with the whole page being styled strong.  For that reason this plugin is really only appropriate for a situation where people actively check their work and fix things when they go astray.  I’ll go try to make sure the code is clean (I think I still need to write the usage text) and send it to you.  Later on given proper permission, I’ll release it generally.

 Signature 

———-
1.6.4— Build: 200800808

Profile
 
 
Posted: 28 February 2007 06:33 PM   [ Ignore ]   [ # 11 ]  
Research Assistant
Avatar
RankRankRank
Total Posts:  361
Joined  01-13-2004

pm’d it to you.

 Signature 

———-
1.6.4— Build: 200800808

Profile
 
 
Posted: 28 February 2007 06:41 PM   [ Ignore ]   [ # 12 ]  
Lab Technician
Avatar
RankRankRankRank
Total Posts:  1841
Joined  03-26-2006

I did end up making a plugin called eexcerpt (see thread here) that strips out all HTML tags and then does the word limit thing. Works wonderfully.

{exp:eexcerpt if_exceeds="42" stop_after="40" the_link='<br /><p class="more_link"><a href="{url_title_path=weblog/article}">More</a></p>'}{body}{/exp:eexcerpt}

Makes
:

This is the sample text that will get chopped...
More


You could also just leave out the_link parameter, which would just leave you with an ellipsis. It is exactly like word_limit_plus, but strips tags.

I don’t yet have permission to share, so I can’t share it right now…but if you’re interested, I can try to PM the right people to see if it’s OK for me to share.

I could probably modify the char_limit plugin in the same way pretty easily…

 Signature 

ryan masuga
—————
Masuga Design | Member, EE Pro Network
My EE Add-Ons | {devot:ee}
Twitter: masuga | masugadesign | devot_ee

Profile
 
 
Posted: 28 February 2007 06:45 PM   [ Ignore ]   [ # 13 ]  
Research Assistant
Avatar
RankRankRank
Total Posts:  361
Joined  01-13-2004

Masuga,

Sounds great!  It would be wonderful if it had an option for limiting characters instead of words too.  Then it seems like it would be really useful for languages like Japanese too.  Any chance of that happening?  (love the idea of stripping out the html tags, does it do anything to replace line breaks that would get stripped out?

 Signature 

———-
1.6.4— Build: 200800808

Profile
 
 
Posted: 28 February 2007 07:08 PM   [ Ignore ]   [ # 14 ]  
Lab Technician
Avatar
RankRankRankRank
Total Posts:  1841
Joined  03-26-2006
UltraBob - 28 February 2007 06:45 PM

Masuga,

Sounds great!  It would be wonderful if it had an option for limiting characters instead of words too.  Then it seems like it would be really useful for languages like Japanese too.  Any chance of that happening?  (love the idea of stripping out the html tags, does it do anything to replace line breaks that would get stripped out?

You mean keeping anything like ‘<br />’ or ‘<br/>’ or ‘<br>’? Is that what you mean? Uh, maybe. I’m no PHP genius. I just needed something to strip all tags and limit characters, and that didn’t have the mega overhead of Filter_HTML!

I’m sure it would be relatively easy to make a new parameter like ‘units’ and have that be either words or characters. Probably could make a ‘keep’ parameter too, like ‘keep=”<br />”’, but now we’re getting into another realm…like Filter_HTML_Lite.

 Signature 

ryan masuga
—————
Masuga Design | Member, EE Pro Network
My EE Add-Ons | {devot:ee}
Twitter: masuga | masugadesign | devot_ee

Profile
 
 
Posted: 28 February 2007 07:58 PM   [ Ignore ]   [ # 15 ]  
Research Assistant
Avatar
RankRankRank
Total Posts:  361
Joined  01-13-2004

It seems to me like it would be appropriate for it to identify two-tag html elements that generate line breaks such as <p><h1-6> and keep them in while generating a closing tag for any of those elements that got clipped in the middle.

so I guess my logic would look something like this (just of the top of my head so I’m sure it isn’t the most elegant way):

strip out all html tags once and do a calculation for the cut point.
Apply that cut point to the original string.
search out <p> and <hx> tags and convert the <> to #% %# or something so a p tag pair becomes #%p%# #%/p%# sample regex for the matching pairs:

#<(p|h(1-6))[^>]*>.*?</>#mi;


for any tag pair that has an opening, but no closing before the end of the string add the closing tag to the end of the string sample regex:

#<(p|h(1-6))[^>]*>#i

(replacing the brackets with the new bracket alternatives of course)
search out <br /> (sample regex:

#<br\s*?/?\s*?>#i

) and replace the brackets on those as well.
Strip out any remaining html tags
replace #%(.*?)%# with <>
output

regexes are untested and logic is probably flawed but hopefully it gives some inspiration.  Was very hard to resist the temptation to try to write the whole thing.  must ...stay ...on ...task

One more post and I go from Junior Member to full-fledged Member

 Signature 

———-
1.6.4— Build: 200800808

Profile
 
 
Posted: 28 February 2007 08:23 PM   [ Ignore ]   [ # 16 ]  
Lab Technician
Avatar
RankRankRankRank
Total Posts:  1841
Joined  03-26-2006

My eexcerpt plugin strips everything out first, then counts based on what’s left over. And in my case, I’ll never use it on a heading and another element…always a body element or basic block of text. I generally think of an excerpt as being more on a block of text than on a heading and block of text. Can you give an example of what you’re thinking? Like a sample excerpt?

I’m sure yours could be frankensteined together. I’d use code from strip_HTML, character_count, word_count_plus, and php.net and put it all together and do the hokey pokey. I’m sure it could be done.

Are <hx>, <p>, and <br /> tags the only things you would preserve?

 Signature 

ryan masuga
—————
Masuga Design | Member, EE Pro Network
My EE Add-Ons | {devot:ee}
Twitter: masuga | masugadesign | devot_ee

Profile
 
 
Posted: 02 March 2007 01:23 AM   [ Ignore ]   [ # 17 ]  
Research Assistant
Avatar
RankRankRank
Total Posts:  361
Joined  01-13-2004

Sorry for the delayed reply

My usage of an excerpter is to generate a summary when one isn’t given by the user.  Therefore I can’t really say for certain what they might use, really it seems like it would be good to keep all the html and close any open tags but any block level element has the potential to really screw up the way things look.  For example if the user input

<h2>Subtitle</h2><p>Here is a bunch of paragraph text that would get cut in the middle.</p>

  It would end up looking like SubtitleHere is a bunch of… which is, of course, undesirable.  If I were to implement it, and if I ever find some time and you don’t do it I might, I would add a parameter to determine truncation by word or character and one determine strip html, strip inline elements, or leave html (not sure about that middle option, might be a lot of extra work for no good reason).  The biggest difficulty seems like it would be properly nesting unbalanced tags.  strip all html would be great when generating an excerpt for an tell a friend e-mail or something like that and the leave html option would be preferable generally on a web page, at least for me.

 Signature 

———-
1.6.4— Build: 200800808

Profile
 
 
   
 
 
Post Marker Legend
New Topic New posts Hot Topic Hot Topic with new posts New Poll New Poll Moved Topic Moved Topic Sticky Topic Sticky topic
Old Topic No new posts Hot Old Topic Hot Topic with no new posts Old Poll Old Poll Closed Topic Closed Topic Announcement Announcements
Theme
Change Theme
Visitor Statistics
The most visitors ever was 1149, on July 16, 2007 09:33 AM
Total Registered Members: 66469 Total Logged-in Users: 35
Total Topics: 84867 Total Anonymous Users: 19
Total Replies: 455369 Total Guests: 225
Total Posts: 540236    
Members ( View Memberlist )