Jump to content

Do links with commas work?


DiskusjonNO

Recommended Posts

  • Replies 53
  • Created
  • Last Reply

At first we parsed urls with commas in them.

Then someone submitted a bug report that the following incorrectly parsed

I went to http://google.com,and it was great



Unfortunately, the software cannot read minds and know exactly what you're trying to do. If you use the url tag, it works correctly. Otherwise, it's a little conservative for the reason posted above. If we autoparsed commas, you wouldn't be able to post the above sentence properly at all. With the current parsing method, you can have commas parsed or unparsed.

Link to comment
Share on other sites


Unfortunately, the software cannot read minds and know exactly what you're trying to do.



I am not a programmer, I barely get some shell scripting down, so maybe I am off base here, but a parsing routine a bit like this should work, no?

It relies on the assumption that a comma appearing with the domain is faulty, but commas appearing behing the TLD / the third slash would indeed be part of the URL.

Parse for type of URL to determine service ('http', 'https', 'mailto', 'ftp', ...). Then parse the domain and see if there if you find a) either a TLD or b) another "/". If it indeed ends with a TLD and then a comma or something else, unexpected like in the (probably once in a million occurence) example you could even fix that and fix the output.

(Additionally, you could check if the link is an internal link (linking to some other thread on the forum) and maybe just set it up as a relative link within the site, this would have the additional benefit of not breaking those links when a forum changes domain.)

If there is a slash behind the TLD and more characters to the URL it is pretty safe to assume that the commas embedded are part of the link and should be parsed accordingly.

This is just a very rough sketch of what I think could work if a real programmer put his mind on it for a few hours. Please do not feel miffed for "lecturing" you, but I think you can apply some logical rules to that parsing process and come out with decent results, WITHOUT reading anyone's mind. But then again I might be totally off base and missing comething crucial.


If you use the url tag, it works correctly. Otherwise, it's a little conservative for the reason posted above. If we autoparsed commas, you wouldn't be able to post the above sentence properly at all. With the current parsing method, you can have commas parsed or unparsed.
Link to comment
Share on other sites

  • Management

Of course there are ways to do it programatically but that is a considerable slice of logic and server CPU for a trivial feature.

The board takes a fair stab at auto parsing. Simply use the URL tags or click the editor's link icon to ensure a correct link in a post.

Link to comment
Share on other sites

What about only ignoring the comma if there's a space right after it? Sure, it'll still get false positives, such as when someone types in something like http://immagoober.com,rocks! or something like that, but should work otherwise.

Also, shouldn't this be in the feedback forum or customer lounge? :P

Link to comment
Share on other sites

  • Management

I think the not-so-subtle dig was to point out that we (IPS) are adding superfluous fluff and not 'fixing' core issues.

As mentioned above, though, it's not something that we're able to fix without writing pages of code to do.

Link to comment
Share on other sites

Considering how hard it is to properly determine what is or is not a URL or even part of a URL, I wouldn't call it a core issue myself. Just a matter of doing it one way vs another. So I personally think that little "dig" doesn't deserve much credit.

Even other applications have trouble determining what is or is not a link, so I think it's unfair to expect IPS to miraculously know how to do it when it would truly require mind reading and even an A.I. within the software.

Link to comment
Share on other sites


At first we parsed urls with commas in them.



Then someone submitted a bug report that the following incorrectly parsed



I went to http://google.com,and it was great



Unfortunately, the software cannot read minds and know exactly what you're trying to do. If you use the url tag, it works correctly. Otherwise, it's a little conservative for the reason posted above. If we autoparsed commas, you wouldn't be able to post the above sentence properly at all. With the current parsing method, you can have commas parsed or unparsed.



Personally, I think the above should parse as a link. Most people know (or should be pushed towards learning) that links need a space before and after.

We have to determine what 'most' people do, and from my experience, I would say most people simply paste a url into their post - a tiny fraction use the URL bbcode or click on 'link' and add it that way. :)
Link to comment
Share on other sites


I am not a programmer, I barely get some shell scripting down, so maybe I am off base here, but a parsing routine a bit like this should work, no?



It relies on the assumption that a comma appearing with the domain is faulty, but commas appearing behing the TLD / the third slash would indeed be part of the URL.



Parse for type of URL to determine service ('http', 'https', 'mailto', 'ftp', ...). Then parse the domain and see if there if you find a) either a TLD or b) another "/". If it indeed ends with a TLD and then a comma or something else, unexpected like in the (probably once in a million occurence) example you could even fix that and fix the output.



(Additionally, you could check if the link is an internal link (linking to some other thread on the forum) and maybe just set it up as a relative link within the site, this would have the additional benefit of not breaking those links when a forum changes domain.)



If there is a slash behind the TLD and more characters to the URL it is pretty safe to assume that the commas embedded are part of the link and should be parsed accordingly.



This is just a very rough sketch of what I think could work if a real programmer put his mind on it for a few hours. Please do not feel miffed for "lecturing" you, but I think you can apply some logical rules to that parsing process and come out with decent results, WITHOUT reading anyone's mind. But then again I might be totally off base and missing comething crucial.




If you use the url tag, it works correctly. Otherwise, it's a little conservative for the reason posted above. If we autoparsed commas, you wouldn't be able to post the above sentence properly at all. With the current parsing method, you can have commas parsed or unparsed.




What about this?

I searched http://www.google.com/#hl=en&q=test,but it didn't work



You're still making a presumption, and one which won't always be correct. Note the above link gives two entirely different results whether we parse with or without comma.



Personally, I think the above should parse as a link. Most people know (or should be pushed towards learning) that links need a space before and after.



We have to determine what 'most' people do, and from my experience, I would say most people simply paste a url into their post - a tiny fraction use the URL bbcode or click on 'link' and add it that way. :)




You're making a big presumption there. When it was only techies online, sure. There are a LOT of people online now that don't "know" a link needs a space before adding a comma, especially since English class tells you to do the opposite of what you're saying (comma should "touch" the last "word" there).


No matter what way we go with this, whether we parsed it or didn't, one camp is going to call it a bug. As I said before, if we auto parsed the comma too, those that think it shouldn't would have no recourse except to alter how they actually write out the text. At least as it is, you can use a bbcode to achieve the result you want.
Link to comment
Share on other sites


You're making a big presumption there. When it was only techies online, sure. There are a LOT of people online now that don't "know" a link needs a space before adding a comma, especially since English class tells you to do the opposite of what you're saying (comma should "touch" the last "word" there).




No matter what way we go with this, whether we parsed it or didn't, one camp is going to call it a bug. As I said before, if we auto parsed the comma too, those that think it shouldn't would have no recourse except to alter how they actually write out the text. At least as it is, you can use a bbcode to achieve the result you want.



We were also taught to put a space after a comma, not to have the next word touching it. I still say that when a link is auto parsed, if the last part of it is a comma, to trim that from the link. That would probably settle the problem for most people.
Link to comment
Share on other sites


What about this?



I searched http://www.google.com/#hl=en&q=test,but it didn't work



You're still making a presumption, and one which won't always be correct. Note the above link gives two entirely different results whether we parse with or without comma.




You're making a big presumption there. When it was only techies online, sure. There are a LOT of people online now that don't "know" a link needs a space before adding a comma, especially since English class tells you to do the opposite of what you're saying (comma should "touch" the last "word" there).


No matter what way we go with this, whether we parsed it or didn't, one camp is going to call it a bug. As I said before, if we auto parsed the comma too, those that think it shouldn't would have no recourse except to alter how they actually write out the text. At least as it is, you can use a bbcode to achieve the result you want.





We were also taught to put a space after a comma, not to have the next word touching it. I still say that when a link is auto parsed, if the last part of it is a comma, to trim that from the link. That would probably settle the problem for most people.




What Wolfie said ^^
Link to comment
Share on other sites


Of course there are ways to do it programatically but that is a considerable slice of logic and server CPU for a trivial feature.



I would not call it trivial, because it takes moderators time and/or adds to user confusion. And correct me if I am wrong, but parsing the link is a one-time action, right? So i think that bit of CPU load is negligible, although I appreciate that your programmers try to keep the light and lean. Maybe the extent and behavior of the link trimming engine could be made configurable?

The board takes a fair stab at auto parsing. Simply use the URL tags or click the editor's link icon to ensure a correct link in a post.


If it was that simple, I would not complain. As has been pointed out, the internet these days is full of people with a less technical understanding of how things work, and teaching everyone to use URL tags would of course be the favorable way. That is why I feel a software that is being used by people with varying grades of technical expertise should make an attempt at correcting such mistakes (within a reasonable margin).


I think the not-so-subtle dig was to point out that we (IPS) are adding superfluous fluff and not 'fixing' core issues.



I apologize if my sarcastic comment has been taken for an insult. The remark that this has been a topic for a few months now made me wonder though how priorities to features implementations are handed out though. If customer demand requested those backgrounds, fine, go for it, just be sure I can turn them off. If customers are complaining about the way links are parsed it should be taken into consideration as well though. Just pointing to URL tag usage is not the ideal way to handle such request IMO, cause it an issue generated by less-tech-savvy users.


We have to determine what 'most' people do, and from my experience, I would say most people simply paste a url into their post - a tiny fraction use the URL bbcode or click on 'link' and add it that way. :)



Exactly. And from my forum administration experience, users who paste the URL plain are the majority. However, at least here, most of them post something like

Check this out: http://blafoo.net/reallyfunny.html

and then use a newline to keep typing, or even put the URLs into separate lines, with no typing directly afterwards. I think that parsing the links including commas, until the first whitespace is discovered would have less fault rates and more customer satisfaction.

What about this?

I searched http://www.google.com/#hl=en&q=test,but it didn't work

You're still making a presumption, and one which won't always be correct. Note the above link gives two entirely different results whether we parse with or without comma.

Agreed. Of course I am not expecting the code to really "foresee" each and every way a user can screw up the posting of links, but neglecting commas the way it is currently done is not the smartest approach IMO.

No matter what way we go with this, whether we parsed it or didn't, one camp is going to call it a bug. As I said before, if we auto parsed the comma too, those that think it shouldn't would have no recourse except to alter how they actually write out the text. At least as it is, you can use a bbcode to achieve the result you want.

Maybe in such situations it might be wise to make a poll between the customers so what they consider the lesser evil? I mean the two links used for examples here (initial post and the google one) are really borderline, extreme examples of where parsing hits its limits. But not even this parses right:

http://www.spiegel.de/wissenschaft/natur/0,1518,689506,00.html, but

Maybe a compromise would be to look for "URL delimiters" like html, asp, php, etc. and then allow all commas between the domain and the delimiter or something... I feel there is gotta be a way to make the parser "catch" more exceptions without causing headaches with "the other camp". Maybe we just need a feature that detects "untagged" URLs and then slams the user in the face with a popup, requiring him to correct his evildoing. That would certainly settle things in the long run... %7Boption%7D(Side note: I even fired up my old phpbb installation to check how it handles this, and of course it will parse

http://www.google.com/#hl=en&q=test,but it didn't work

wrong, but it parses

http://www.spiegel.de/wissenschaft/natur/0,1518,689506,00.html

right out of the box.

http://www.spiegel.de/wissenschaft/natur/0,1518,689506,00.html, but

is also parsed correctly, the comma is recognized at not being part of the URL.

http://www.spiegel.de/wissenschaft/natur/0,1518,689506,00.html,but

fails again.
Just an observation, not comparing or trying to start a religious war here...)

Link to comment
Share on other sites

Any url with a comma should be parsed - if people cannot type correctly that should not stop correct URL's from showing up. They can easily go back and correct their mistake, whereas in order to correctly display a url with a comma in it, they need to just through an additional hoop which they the average user is unlikely to do.

Let's face it most people do not take advantage of the media button, so getting them to use the %7Boption%7D is probably not going to happen either.

Link to comment
Share on other sites

I'm with the crowd that thinks commas, so long as they aren't followed by a space, should be included in the URL. For what it's worth, I've noticed far more bug reports about not parsing commas (when you can simply search for a "WAI" bug report to use as a stock response for new ones, it should be a hint...) than when it was parsed.

The way the parser works now it breaks intended functionality in the URL. If changed, it would simply break "functionality" in unintended user behavior, something you guys have used as reasons not to fix other bug reports ;-)

Link to comment
Share on other sites

I may be wrong in this, but if a comma was in a URL shouldn't it be %2c not ",". With the logic that a comma should be placed in a URL: You could make the same argument about spaces. But with a properly url encoded string it should either be "+" or "%20" (urlencode vs rawurlencode). A comma is part of punctuation, and if you were to follow proper sentence structure a comma or period has no space before them. Thus it would make sense to end the link parsing just before it, regardless of what comes after it. I rarely see commas in URL's anyway. The only bug I'm concerned about (not sure if it still happens) is when you have square brackets in the url (which is used in bbcode).

Link to comment
Share on other sites


I may be wrong in this, but if a comma was in a URL shouldn't it be %2c not ",". With the logic that a comma should be placed in a URL: You could make the same argument about spaces. But with a properly url encoded string it should either be "+" or "%20" (urlencode vs rawurlencode). A comma is part of punctuation, and if you were to follow proper sentence structure a comma or period has no space before them. Thus it would make sense to end the link parsing just before it, regardless of what comes after it. I rarely see commas in URL's anyway. The only bug I'm concerned about (not sure if it still happens) is when you have square brackets in the url (which is used in bbcode).





http://community.invisionpower.com/index.php?c=180,308 is one example ;)
Link to comment
Share on other sites

It wasn't - but that does not justify the fact that legitimate URL's with commas in are not being parsed.

Or are you saying that I should not be using underscores or hash signs either and should urlencode them as well? Or perhaps we should not be typing spaces into posts?

It is not down to the person typing a post on the forum to know about urlencoding (which 99% will not) instead it is down to the software authors correctly interpreting to the best of their ability the characters entered, whether that software be VB, IPB, Wordpress or whatever.

It would be far better here to assume that a comma after a link is part of that rather than a soft break in a sentence. It is not the fault of IPB if people cannot form a sentence and put commas into the correct locations.

Link to comment
Share on other sites

Underscores, dashes, and even periods don't need to be URL encoded. If you put these characters through a URL encoder (even a raw one) you'll see that they don't convert at all. My point is that commas are not valid characters in a URL. Just because you can put them into the browser address bar raw doesn't make them valid characters. Technically spaces are not valid and you'd either have to substitute it for a +, or encode it raw to %20.

I get your point, so you don't need to convince me of anything. I'm just pointing out that technically they aren't valid characters in a URL string. If you want to make some sort of logic to make "url,text" part of the url but "url, " not (the comma + space), you have no argument from me. Really though I'd be more concerned about square brackets not working (because they happen to be part of bbcode).

Link to comment
Share on other sites

I guess someone should be telling Google then that they should not be using commas in their URLs and should encode them ;)

http://maps.google.co.uk/maps?f=q&hl=en&geocode=&q=the+london+eye&sll=51.505243,-0.114691&sspn=0.007052,0.012767&ie=UTF8&ll=51.503373,-0.11939&spn=0.007052,0.012767&z=16&iwloc=A

The comma is a reserved character is is perfectly legal if used in the query string, as in Google's URL above or http://community.invisionpower.com/index.php?c=180,308 as posted earlier, it just cannot be used in the domain part.

As far as I can gather the reserved characters are ; / ? . @ & = + $ ,

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.

×
×
  • Create New...