Okie dokie, tested out on a live site of mine, caught an upgrade hitch or two, and I think that's that. Update will be live in the Marketplace a little bit after this post, writing it up here first before doing the Marketplace bit.
Version 5 Released!
Option to disable links for guests/crawlers on the Search Wall and front side widgets and search results page recent searches block. The searches still display but just as words, not followable links.
All front side links on the Search Wall, widgets, and results page recent searches block now carry ref=nofollow tags not that any bad actor crawlers care about such things...
When determining to save searches, anything that at least publicly claims it is a crawler should now get sweeped. This is over and beyond the built-in IPS bot flagging. User agents are swept for matches on bot, crawler, spider, spyder, and totally empty user agent strings.
Searches are now storing ip addresses for all guests (and bots that get through) and temporarily for searches coming from members.
User agent strings are stored for all searches. Future stats based on browser type, desktop vs. mobile, now possible. Also useful for bot hunting!
Geolocation now stored for all searches (minus any addresses returned). Viewing a search in the ACP search ledger will display a map of where the search was made from. FYI geo-geolocation is an IPS service that only works for active licenses.
Tooltips active for the guest/member icons in the search ledger. Ledger also has been reformatted.
Duplicate searches based on case-insensitive match on the EXACT term (and term alone) are now no longer stored if coming from the same ip address within the last 20 minutes.
Need to wipe the ledger and start over again? There's a button for that!
General code improvements, abstractions, template changes, new templates...
NEW! My Recent Searches feature for members! Last five searches made available from user menu. Also includes running count of number of searches made by member. This is entirely browser side via cookies. That means it is locked to the user's browser and not the user's member account and results will vary between devices. Nothing tracked internally to tie searches back to members in the ACP.
Before discussing the above you should probably see that I'm likely at the limit for not storing member ids with searches directly and instead abstracting out to "search made by a member". There certainly is an end user case to be made for storing them - looking back at all of the searches you have made, and it would be available across all devices, for one. Of course, once that's opened up as a feature that means GDPR and other compliance hilarity including the ability to dump out a user's searches on demand, nulling them on demand, and so on and so forth. You'd have to update your privacy junk, etc. I'm not adverse to it per se but it is a bunch of work I'd rather not get into yet; also the creep factor depending on the type of site you run.
Let's talk about the bot stuff first. As stated before, my guess is it was a bad actor crawler - the type that do not respect any noindex or nofollow tags. The search wall has always had a noindex output on it but, again, it requires the bot to give a damn about those things... The end result meant I had to make some changes. The nofollow tags on all the front end links is useful, but again, only useful for those bots following the rules. I guess if things are really bad for a site we need to flat out disable the links on the front side stuff for guests/bots. On the storing search side, still need to deal as stuff can still get through. For bot purposes IPS does flag out some bots, but it isn't that robust - that means I can use it as a front line check but need to do more. Alright, so lets start pulling user_agent strings into this and sweep them for some obvious things. I've got them so might as well store them and use that for search analysis as well later on (mobile vs. desktop, etc...).
What about dupe searches? I could store member_ids but what about guests and bots? IP addresses it is then... And so on and etc... You can see once this door opened it just led on and on...
Geolocation was always on the road map and with the ip address and user_agent storing for bot hunting purposes it seemed a good time to add. The idea here is if you think you've got a bot sliding into the stored searches you can take a quick peak at the ledger, click one of the suspect searches, see the ip address and user agent string and see if it is skeezy or not. Check the ip address against known bot lists you can get online, or if running an active license, viewing search in the ledger will bring up the map and if it is a bot it will display your favorite India/Russia/China/SE Asia, Eastern Euro location 🥰 - Note to self, maybe button to spot check for bot in the search ledger? Hmmm... Geolocation has the potential to return an address - no worries, I'm not storing that.
This details view from the search ledger will get overhauled - bit rough as is, I know.
Naturally, with geolocation, there are some tasty stats/displays available in future versions of Social Search... Remember! Geolocation is an IPS service for ACTIVE licenses. Not active, no geo. That just means if it is not available when I store the searches, it doesn't get stored and won't be displayed. Will affect any stat stuff you do with geo in future versions if you drop in an out of active status though. Is what it is. Moving on...
IP address storage is forever (as long as you are storing searches) for guests/bots. That will of course include members who search while signed out (or people who are guests forever and then sign up) so there is a possibility to match stuff back to them if it's all from the same ip address. I'll think about a task to sweep for member-flagged searches and the IPS known devices stuff and clear on matches but thats a low low priority honestly. Not a concern for me really. Otherwise, on searches made by members, the id is still not stored, the search is just flagged as coming from a member and then the ip address is stored for a little while. Every six hours a task will sweep through and null out those ip addresses.
The dupe searches check kinda got pushed to the front with this bot stuff too - same search over and over again with really strange parameters appended. It can get really messy trying to match the exact parameters of a search being made when the intent is really to just clear out the person working through what is effectively the same search. So if a search comes in from the same ip address over the course of 20 minutes, case insensitive exact match on term, we'll not store it. I'll have that period of time configurable in a later version.
With all this going on, or maybe I just caught a similar thing in a side-glance while on a shopping site, I decided to let members see their last five searches. Since we are not storing member searches directly, that means device-based cookies. So if anyone asks, make sure they know it's only by device. If I ever move on to storing member ids with searches, it will be universal then, but not now.
Yes, that's a total search counter in the corner. Cookies are set to five years of life, like that will ever happen... And the large button should clearly indicate they can wipe this out and start over again anytime they like - its just the device cookie that clears, as, again, there are no specifically tied-to-the-member searches stored anywhere. I'll have settings for this in a later version - feature on/off (its on automatically now), group permissions so you can use it as a member perk, configurable amount of searches, different display options, etc. Early days here.
Version 6? I'll come back around to this soon enough but there is such a backlog of updates to other stuff... ack. Only so much we can do stats-wise internally, and although nothing is stopping you from dumping the database table and running with it I do have an export-to-csv feature on the list for those who want to do their own statistical analysis with the data with external sites/software.
If you are just reading this junk and thinking of buying, I'll be off the Black Friday 20% in a few hours but will carry on through to New Years at 15% off.
Merry Happy Joyous Whatevers.
EDIT: 115PM Central 12/6/2019 - it's up!
EDIT: Saturday Noon Central 12/7/2019 - small version bump to 5.0.2 5.0.3 to tackle small geolocation display hitch in search ledger. Sometimes geolocation service returns limited information such as only lat and long and I had some should-this-be-displayed checks set on another geo var. Now, if it provided city, region, country, those will display, and if it gave lat/long, the map should display (maps are generated based on lat/long) - and another one four hours later for My Recent Searches.