Clover13 Posted February 9 Posted February 9 I'm looking to use a third party API to GET some data that doesn't change often throughout the day and then display it on a Pages page. The API has these considerations: There is no known timeframes when the API data will be updated. We are only concerned with the current snapshot of data, not any history. The API is rate limited, and we certainly don't want to call it for every page load. The amount of data is small but each record does have a start/end date for managing expired records. However, it may be easier to just replace all of the data stored versus trying to perform an upsert operation. What would you recommend as an approach for this when trying to display it within Invision Community? Goals: Retrieve data from third party API to format and display within a Pages page. Leverage caching of the data for performance. Rinse/repeat calling the third party API under set intervals or after expiry, removing all data from storage/cache and replacing it with the current API response data. Possible approaches: Use Zapier OR code a script (scheduled under cron) to GET from the third party API and POST the data to a Pages DB. Build a Pages template to display the data as desired. How to manage data caching in this instance? Write custom code directly in a Pages PHP block and embed that block within a Pages page. I'm assuming the PHP is what's cached in a block not the data, so I can use a separate self managed cache (say Memcached or Redis) to store the data and verify expiry between calls such that the API is only called after expiry of the cache?
BN_IT_Support Posted February 9 Posted February 9 5 hours ago, Clover13 said: Use Zapier OR code a script (scheduled under cron) to GET from the third party API and POST the data to a Pages DB If you're going to go down this path then I would be inclined to use a 'task' within your Invision Application - that way you can use the API to pull data directly into your application rather than pull the data with Zapier or a separate cron script and then post it into your application. An important question - how far out of date can you allow the data to be? If the data must be up to date within 5 minutes then your script will have to run every 5 minutes and if your application does not display all the data frequently then you may well generate more activity pulling data on a schedule than you would pulling it on demand. Is there a way (in the API) to detect data that has been updated - so that you don't have to pull all the data periodically? The most useful thing would probably be if the API includes webhooks that will fire and notify your application if a data item/record is created and when one is updated -- i.e. you then use your webhook to pull data that has changed and don't bother to pull data that has not changed. If there are no webhooks in the API as an alternative will the API let you 'list all records that have been updated in last 10 minutes' (for example) - that way you would use the list of recent updates to select which records to pull (i.e. only those that have been updated since last pull). Finally, I would be inclined to store the data in your own database (i.e. table defined by your application). Effectively, the database table acts as your cache and if your cache is only updated on data change (webhook or whatever) then it would probably be far more efficient than trying to use the Invision cache (which will have its own timeouts that are not necessarily in sync with updates to the data...) John Clover13 1
Clover13 Posted February 12 Author Posted February 12 On 2/9/2024 at 4:46 PM, BN_IT_Support said: If you're going to go down this path then I would be inclined to use a 'task' within your Invision Application - that way you can use the API to pull data directly into your application rather than pull the data with Zapier or a separate cron script and then post it into your application. An important question - how far out of date can you allow the data to be? If the data must be up to date within 5 minutes then your script will have to run every 5 minutes and if your application does not display all the data frequently then you may well generate more activity pulling data on a schedule than you would pulling it on demand. Is there a way (in the API) to detect data that has been updated - so that you don't have to pull all the data periodically? The most useful thing would probably be if the API includes webhooks that will fire and notify your application if a data item/record is created and when one is updated -- i.e. you then use your webhook to pull data that has changed and don't bother to pull data that has not changed. If there are no webhooks in the API as an alternative will the API let you 'list all records that have been updated in last 10 minutes' (for example) - that way you would use the list of recent updates to select which records to pull (i.e. only those that have been updated since last pull). Finally, I would be inclined to store the data in your own database (i.e. table defined by your application). Effectively, the database table acts as your cache and if your cache is only updated on data change (webhook or whatever) then it would probably be far more efficient than trying to use the Invision cache (which will have its own timeouts that are not necessarily in sync with updates to the data...) John Thank you for the detailed response John, I appreciate it! Ideally, I'm trying to avoid a custom app tied to IPS if possible and just use the native Pages impl and customize it within Pages. But perhaps doing it with a custom app is a reasonable approach to, as it would require some level of custom coding anyway. The data can be out of date, it's really just daily data list from an API, of which gets updated at unknown times. I definitely want to pull it on a schedule vs on demand (by a user viewing the page where the data will be presented). There's no API notification of updates that I'm aware of. As far as caching, I guess this comes down to what the Pages DB can do. With that in mind, there are many sites using Pages with a Pages DB and data under heavy user volume, so I'd have to imagine there is some level of data and page caching when a Pages DB is inserted as a block on a page. I'll likely start with that approach before I go custom and see how it works out. Thanks again for the insights!
Nathan Explosion Posted February 12 Posted February 12 Zero knowledge of what API it is that you are referring to wanting to use so can't look at the data structure, and have no idea about how you want it to be displayed but here is what I would be doing on the page/block... Check if the data exists in \IPS\Data\Store If not, retrieve the data via the API and then store the data in \IPS\Data\Store::i()->whateveryourkeyisgoingtobe, adding something to the data to provide a timestamp for it being added. Display the data however it is you wish it to be done. On subsequent loads, step 1 should also be checking if that timestamp should be considered 'out of date' - if it is, get the data again. I don't think you should think about storing the data in a Page DB - I can envision that become overly complex for what it appears that you are trying to do. And if the data store gets cleared, your code would just pull in the data again into the store anyway. Clover13 1
Clover13 Posted February 12 Author Posted February 12 Regarding @Nathan Explosion's reply, for anyone interested in reviewing that approach, this link is helpful: https://invisioncommunity.com/developers/docs/general/caching-r8/
BN_IT_Support Posted February 13 Posted February 13 On 2/9/2024 at 4:11 PM, Clover13 said: I'm assuming the PHP is what's cached in a block not the data, Would I be correct in deducing that you're referring to this (on a custom block)? If so, no, it would be the data that would be cached so you would lose control of how and when it would be refreshed. Certainly, you could use a custom block, but just make sure that you leave 'Cache this block' set to 'No' and then do your own caching. The way that Nathan Explosion suggests is by far the best way... Firstly, there is "how/when you retrieve the data": Scheduled polling would collect all the items even if some have not been updated. Daily polling would work OK if all items are updated every day but if only half the items are updated each day (for example) then you would be retrieving twice as many items as required. If you need the most up to date data for each item then you really need webhooks to notify you of changes - in the absence of webhooks you either set a very short scheduling interval or else use Nathan's solution with a short cache lifetime. You have now told us that you don't mind data which is not completely up to date (say - one day out of date) so webhooks are not essential. Nathan's solution where you load the cache on demand (i.e. only retrieve an item is it is not in your cache OR it is in the cache and already expired) - set the cache timeout to 12 hours or 24 hours or whatever depending how far out of date is OK. If the rate limiting still causes problems then you might want to (need to) randomise the cache lifetime a bit. For example, with a fixed cache lifetime of 1 day if you start with an empty cache and if the users demand all data in a very short time then you might exceed the rate limit. (Not likely, but possible?) With a fixed cache timeout the same thing will happen the next day as all items will expire at the same time. Randomise the cache time with +-60 minutes and lookups will be distributed over time very quickly. Secondly, there is "how/where to cache the data": If you only need a single lookup key per item (e.g. a name OR an id) then Nathan's suggestion of using the data store is the best way to go. (I've used it where I only need one key per item and it works very well.) If you need multiple lookup keys per item (e.g. name AND id so you sometimes retrieve from cache by name and sometimes by id) then the data store does not have that flexibility so you would probably need an SQL table with multiple keys - unless you double your storage usage by saving each item under both the name key and the id key. (I have multiple tables with multiple keys which is why I think that way 😉 ) John Clover13 1
Clover13 Posted February 14 Author Posted February 14 So I was thinking about @Nathan Explosion's proposed solution, however one stickler came to mind in that for larger API responses where filtering might be useful, there may not be a straightforward way to do this. Possibly would require parsing the API response and storing key/value pairs based on some discriminator and then having a dropdown multi-select to only show the values for those keys selected. A Pages DB would have this filtering capability built in it already by defining what fields are filterable and then putting the Pages DB Filter block within the Page.
Nathan Explosion Posted February 14 Posted February 14 (edited) First mention of filtering there, so wasn't aware it was a requirement. If it is then crack on with a Pages DB - you then have to figure out how to keep that data up to date, and when to do it. Alternatively, look at IPS\Helpers\Table\Custom to allow you to create an table based on an array datasource instead of a DB table. With all this in mind, I'd now advise you to look into developing an application to do all this instead... You can design your own table to store the data You can create a module/controller that will display that data, and add filtering on there easily. You can create a task that runs on a schedule to retrieve/store/update the data. Not much more to add really - if I knew what this mysterious API was, and where it was getting the data from, and what the data looked like then I might even get bored and throw together a POC of it. Edited February 14 by Nathan Explosion Clover13 1
BN_IT_Support Posted February 15 Posted February 15 23 hours ago, Clover13 said: A Pages DB would have... As a general opinion on "Pages DB"... They are not really "databases" in SQL terms -- but SQL "tables" in the Invision database. When you are considering which sort of SQL table to use (within Invision) you have two choices: "Pages DB" (as you have mentioned) SQL tables as defined under the 'Database Schema' tab for an Application Pages DB works well for: Where you only need a single table for all your data - for example, in the classic database design if you only need a "products" table then it will work well. If you actually need a "products" table and a "suppliers" table with cross references then it does not work so well - and in any case you would need two Pages DB (one for products and one for suppliers) Where the solution involves a human adding and editing records it works well - for example, where you have a human adding products (descriptions, part numbers, prices, etc.) to a product table. We probably use between 5 and 10 Pages DB for various unrelated things such as "news articles", "places to go" and so on. Using Pages DB automatically gives you various widgets such as "Database filters" which is good. I would strongly recommend that you use the standard display templates wherever possible. If you create your own display templates based on the standard display templates then that effectively bases your template on a snapshot of the standard template - you don't benefit from future improvements and bug fixes to the standard. We have some of our own templates from 6 or 8 years ago and regret not putting more effort into display formatting for individual fields. With some clever display formats you can combine fields to produce complex output (for example, latitude and longitude fields to display a map) and if you need to do conditional display of some fields dependent upon the contents of others then you can add a dummy field (with a non-blank default value so it triggers display) and you can write custom code to display the contents of several other fields according to the rules that you want to apply. Do the fancy stuff in a field display rather than a template wherever possible. When you write a scheduled task (for example) to update the data in a Pages DB there are a couple of (minor) extra steps that you need to take in order to write the data. Firstly, you need to know the classname that will be used to write to the correct table and this will be something like \IPS\cms|\Records23 where 23 is the database number. So, you have to look up the database number from your database key (I recommend against hard coding the 23 😉 ). Secondly, if you have fields for name, address, region, country, phone, etc. then the names in the table will be something like field_91, field_92, field_93, field_94, etc so you have to 'map' from the field key to the real field name in the database. Not difficult, but that's what you have to do. Pages DB will work OK where you have a scheduled task to update the data once per day (for example - it could obviously be much more frequently if you wanted). A scheduled task will work fine where nearly all the data items/records are accessed every day - but what about a scenario where only 5% of records are accessed each day? Doing a daily scheduled update of all records means that 95% of what you retrieve is not going to be used in the next 24 hours. I rather liked Nathan's suggestion of retrieving data on demand (i.e. every record has a lifetime field that you add so if you want the record but the lifetime has expired then you retrieve the data through the API). That would mean that you would not waste resources retrieving data that is not being used. Unfortunately, "retrieval on demand" would be extremely difficult or impossible to implement using Pages DB - unless you write your own very fancy templates and in that case you could do better to write an Application to do the entire job. Updating Pages DB from a scheduled job would work easily. Application tables will work well for: You need several tables with relationships between them (e.g. products and suppliers) You need an application that does a lot of processing of records (rather than simple human entry of records as indicated previously) Access to records is more intuitive - field names are what you want to call them rather than being 'mapped'. Also, access to the correct table is very simple as you need to create a sub-class of \IPS\Patterns\ActiveRecord that will get you to the correct table plus a whole load more stuff. If you have hierarchical data (for example, categories and records) then you should consider using the Node/Model/Item/Content model although that gets a lot more complicated it does give you access to many of the features that you see time and again in Invision. You have complete control so retrieving data on demand is much more simple than it would be using Pages DB. Regards, John The Old Man and Clover13 1 1
Clover13 Posted February 15 Author Posted February 15 2 hours ago, BN_IT_Support said: As a general opinion on "Pages DB"... They are not really "databases" in SQL terms -- but SQL "tables" in the Invision database. When you are considering which sort of SQL table to use (within Invision) you have two choices: "Pages DB" (as you have mentioned) SQL tables as defined under the 'Database Schema' tab for an Application Pages DB works well for: Where you only need a single table for all your data - for example, in the classic database design if you only need a "products" table then it will work well. If you actually need a "products" table and a "suppliers" table with cross references then it does not work so well - and in any case you would need two Pages DB (one for products and one for suppliers) Where the solution involves a human adding and editing records it works well - for example, where you have a human adding products (descriptions, part numbers, prices, etc.) to a product table. We probably use between 5 and 10 Pages DB for various unrelated things such as "news articles", "places to go" and so on. Using Pages DB automatically gives you various widgets such as "Database filters" which is good. I would strongly recommend that you use the standard display templates wherever possible. If you create your own display templates based on the standard display templates then that effectively bases your template on a snapshot of the standard template - you don't benefit from future improvements and bug fixes to the standard. We have some of our own templates from 6 or 8 years ago and regret not putting more effort into display formatting for individual fields. With some clever display formats you can combine fields to produce complex output (for example, latitude and longitude fields to display a map) and if you need to do conditional display of some fields dependent upon the contents of others then you can add a dummy field (with a non-blank default value so it triggers display) and you can write custom code to display the contents of several other fields according to the rules that you want to apply. Do the fancy stuff in a field display rather than a template wherever possible. When you write a scheduled task (for example) to update the data in a Pages DB there are a couple of (minor) extra steps that you need to take in order to write the data. Firstly, you need to know the classname that will be used to write to the correct table and this will be something like \IPS\cms|\Records23 where 23 is the database number. So, you have to look up the database number from your database key (I recommend against hard coding the 23 😉 ). Secondly, if you have fields for name, address, region, country, phone, etc. then the names in the table will be something like field_91, field_92, field_93, field_94, etc so you have to 'map' from the field key to the real field name in the database. Not difficult, but that's what you have to do. Pages DB will work OK where you have a scheduled task to update the data once per day (for example - it could obviously be much more frequently if you wanted). A scheduled task will work fine where nearly all the data items/records are accessed every day - but what about a scenario where only 5% of records are accessed each day? Doing a daily scheduled update of all records means that 95% of what you retrieve is not going to be used in the next 24 hours. I rather liked Nathan's suggestion of retrieving data on demand (i.e. every record has a lifetime field that you add so if you want the record but the lifetime has expired then you retrieve the data through the API). That would mean that you would not waste resources retrieving data that is not being used. Unfortunately, "retrieval on demand" would be extremely difficult or impossible to implement using Pages DB - unless you write your own very fancy templates and in that case you could do better to write an Application to do the entire job. Updating Pages DB from a scheduled job would work easily. Application tables will work well for: You need several tables with relationships between them (e.g. products and suppliers) You need an application that does a lot of processing of records (rather than simple human entry of records as indicated previously) Access to records is more intuitive - field names are what you want to call them rather than being 'mapped'. Also, access to the correct table is very simple as you need to create a sub-class of \IPS\Patterns\ActiveRecord that will get you to the correct table plus a whole load more stuff. If you have hierarchical data (for example, categories and records) then you should consider using the Node/Model/Item/Content model although that gets a lot more complicated it does give you access to many of the features that you see time and again in Invision. You have complete control so retrieving data on demand is much more simple than it would be using Pages DB. Regards, John Awesome! Thanks John for the very detailed reply and providing your experiences! That's very helpful!
Recommended Posts