PurplePixel Posted February 9, 2021 Posted February 9, 2021 I am trying to create a custom conversion class and I am not sure what is the right approach. I have seen these two buttons inside the ACP conversion view when I am in developer mode: And I have used them to create Library and Converter class files, but I don't understand how they hook with the converter or with IPS framework. Also, is there any specific path I need to create these files into in order to be included? I tried searching for documentation related to these buttons and the classes they create, but did not seem to find any. Thank you!
Solution bfarber Posted February 9, 2021 Solution Posted February 9, 2021 This isn't a commonly used feature so it's probably largely undocumented. You almost certainly don't need to create a Library class. These classes define how to translate data from a converter into the applications installed (e.g. blog, forums, downloads, etc.). Unless you are wanting to convert data from an external source into a new application within Invision Community, the library class would be unnecessary, because we already have libraries for all of our core applications. A Converter class is needed for each type of content you wish to convert, and translates to the files you see under applications/convert/sources/Software/*. You'll see there are folders for each app (which are each defined by Library classes), and then underneath each folder is a class for each type of software supported. If you want to allow conversions of blog, calendar and forum content from "Some Random Made Up Forum v2.0", then you'll want to create software classes for Core, Blog, Calendar and Forum. From there, you will need to define certain mapping methods that outline what is convertable, and you will need to create the methods that actually get called to perform the conversion. Your best option is to copy code from one of the existing software classes and modify it to suit your needs in most cases.
PurplePixel Posted February 9, 2021 Author Posted February 9, 2021 Thank you for your answer @bfarber! Will the converter class be included automagically? Because for the moment I have overrider the IPS\convert\_Software::software() method through a plugin in order to make the converter show. I am trying to adapt the vBulletin 3.8.xx/4.x migration to my needs as we have custom data in our previous solution.
CoffeeCake Posted February 9, 2021 Posted February 9, 2021 42 minutes ago, PurplePixel said: I am trying to adapt the vBulletin 3.8.xx/4.x migration to my needs as we have custom data in our previous solution. You may be overthinking this, but... don't let me dissuade you. We were in a similar situation, and our solution was to modify the vb3/4 converter itself to handle the custom data, and then replaced our customizations with stock files once we were converted. We created another branch on our code repository for the migration. This may make sense for you. You only need the logic to bring the data over for the duration of the migration. Unless you're working to maintain URLs outside of threads/forums and need something post-migration to keep things working. You may consider putting that post-migration logic in a plugin/application instead of a converter.
PurplePixel Posted February 10, 2021 Author Posted February 10, 2021 14 hours ago, Paul E. said: You may be overthinking this, but... don't let me dissuade you. I guess you are right @Paul E.. I'm a perfectionist and try to do this as clean as possible, but after all this will only happen once. The forum has a few million posts forum and I just want to make sure I am in control when this will be done. Also I am trying to do this using command line since ACP does not feel right. Any ideas if anyone did this before? I tried to scrap the developers forum but did not find any answers. Also @bfarber is there any way to contribute to the IPB code? I have a few ideas that may help everyone. Thanks!
Stuart Silvester Posted February 10, 2021 Posted February 10, 2021 As Brandon notes, using our converter framework would be the best way to go. It already does a lot of the legwork for you. All you really need to do is feed it the relevant data, it'll put it in the right places and process it as needed. You would be better to use an application and store your converter files within that application rather than writing a plugin to load them from elsewhere (it's nicer to keep them in their own location than throw them in another (IPS) application). You will still need to add a hook to the software() method to define your custom converter location. Doing a conversion via AdminCP is perfectly fine, command line conversions are not supported.
PurplePixel Posted February 10, 2021 Author Posted February 10, 2021 37 minutes ago, Stuart Silvester said: You would be better to use an application and store your converter files within that application rather than writing a plugin to load them from elsewhere (it's nicer to keep them in their own location than throw them in another (IPS) application). You will still need to add a hook to the software() method to define your custom converter location. I wasn't planning to create an application, but wasn't sure what was the purpose of the two buttons in converter view. I guess I'll just ignore them 😄. So I already created a plugin that hooks to the software() method, that works well as I see and I think that is the cleanest approach. Thanks! 40 minutes ago, Stuart Silvester said: Doing a conversion via AdminCP is perfectly fine, command line conversions are not supported. I see. So I should trust ACP converter even for millions of posts as you say. One question though: do you have any benchmarks on how much time should I expect for conversion for 1 million posts?
Stuart Silvester Posted February 10, 2021 Posted February 10, 2021 (edited) 2 hours ago, PurplePixel said: I see. So I should trust ACP converter even for millions of posts as you say. One question though: do you have any benchmarks on how much time should I expect for conversion for 1 million posts? I did a conversion on a site with around 60 million posts late last year (there were actually millions more posts in the source data). We did a performance sweep of the converters for 4.5 for this project and incorporated many things that improved the speed. One of these performance improvements would be in the userland code (your converter), I would recommend taking a look at our vBulletin converters, especially the use of `$this->app->preCacheLinks(...)` which can hugely reduce the number of SQL queries needed. The speed can greatly depend on how you write your converter though, make sure your queries are efficient in loading data and using indexes. Speed is also dependant on the server/configuration you're running the conversion on too. The above mentioned project took about 1.5 days to run convert (clarification: describing the conversion process itself, not the post conversion background tasks) the site (excluding image attachments - having that many is a completely different story of its own). Edited February 10, 2021 by Stuart Silvester Clarification
CoffeeCake Posted February 10, 2021 Posted February 10, 2021 (edited) Prior to whatever optimizations were made and before the release of 4.5, our conversion took about 4-5 days (I think, from memory?) to get to the point where we reopened the community. At the time, we had about 7 million posts (text only, we don't use images or attachments) and 500k members. We allowed the post rebuilding to run for a bit (it goes from newest post to oldest) before reopening the community as things looked absolutely wonky for a bit, and kept our membership informed well in advance of beginning and throughout the process. How nice to hear that this got much needed attention. We documented and scripted and tested the process over and over again on a test environment taking snapshots of our production vBulletin install and going through the conversion process again and again. This helped us identify bugs in the conversion process (though, regretably there were some we didn't identify until after we had migrated the live site as there were features of vBulletin we didn't have in our test plan, but many members were using to our surprise, and a rather large snafu with the handling of private messages, which has now been fixed in the converter, but was bizarely designed to just remove the entire private message if any participant in the message had deleted any part of the conversation on their view). We also made optimizations that made sense for our use of vBulletin, and changes that migrated over modifications we had made to vBulletin to their IPS counterparts. This also helped us nail down a solid approximation of how long it would take by repeating the process over and over on identical infrastructure, until it was a well choreographed dance. To address the travesty of the web-based conversion, we stood up a client machine within our infrastructure and used a remote KVM to control it. This was the most absurd part of this process, but.... hey, if it's 60 million posts in 1.5 days now, great. We are entirely virtualized and increased the specs of virtual machines until we saw no change in performance. It was very iterration heavy, so tons of queries being thrown around. I'm going to go stare at the deltas for what's been changed now, because it was painful getting it down to the number of days we did. Edited February 10, 2021 by Paul E.
PurplePixel Posted February 10, 2021 Author Posted February 10, 2021 Thank you so much for this detailed feedback @Paul E.! I am pretty sure there are going to be some hard working days until we get the conversion right. I hope we won't miss any parts, but you never know on such complex migrations until it's too late. And then it's impossible to revert and very hard to migrate the missing data. Anyway, your feedback really helps! As for migration taking 4-5 days, that's a lot. All this time, the forum was down or could be partially used somehow? I also plan a profiling session to see what I can optimise in the migration scripts. Also we have to see how to plan to migrate the assets apart from database. Our plan is to start with a small sample of the old solution (1500 members / posts / etc.) and iterate the conversion process until we are happy with it. After that, we will make a clone of the live solution which we will convert (and optimise the process even further if possible), then get 50 members from the live forum to test the data integrity apart from our team. I hope that will be enough to ensure the data integrity. Any insights of issues with the migration process really help to not repeat them again 🙂
CoffeeCake Posted February 10, 2021 Posted February 10, 2021 8 minutes ago, PurplePixel said: As for migration taking 4-5 days, that's a lot. Yes, though @Stuart Silvester promises insane improvements. I'd need to see it myself first to believe it, that's how scarred I am from the experience. 😅 10 minutes ago, PurplePixel said: All this time, the forum was down or could be partially used somehow? There is no support for partial conversions. You can't convert the first million posts and then go back and finish off the rest. Would this be nice? Yes. But... if they managed to get this down to a few hours of downtime, then great news for you. This means you have some choices in how to approach this. Our approach was to develop modifications to vBulletin to essentially create a "read-only" mode, make a snapshot of the vBulletin database at that point that would be our source copy for conversion, and then run the conversion on that snapshot in a separate migration environment from production. We had separate migration specific MySQL instances that were tuned specifically to perform as efficiently as possible to the migration, and removing any and all external load from slowing things down. We announced our planned migration window well in advance, had established a solid estimate for the length of time it would take, and kept members informed through a status page of our progress. Our production servers continued serving out vBulletin in read-only mode for the length of the migration, until we were ready for the switch over. Our cutover involved swapping out the underlying VMs, so after confirming everything was ready to go and all of our tests passed indicating as best as we could assume success, we redirected our IP traffic to the new IPS configured VMs and began serving out requests from our new production IPS environment. This resulted in minimal downtime of our content, though had the unfortunate impact of having to be in a read-only mode for the duration. We could have left vBulletin in full operation, but all changes that would have been made after we took the migration snapshot of the database would have been lost, and we determined that this was not acceptable for our case. 29 minutes ago, PurplePixel said: Our plan is to start with a small sample of the old solution (1500 members / posts / etc.) and iterate the conversion process until we are happy with it. I would not do this. I would only work with complete copies of your vBulletin database. Because of the layered approach the process takes, you can't be sure that things end up correctly until the end. If you want to create a sample of vBulletin data and then run migrations on that to just confirm a test to test conversion, that may be a good idea, yet I would early on start using copies of production. Every time we ran another test migration, we started with the exact process we'd use and take a fresh snapshot of prod. 32 minutes ago, PurplePixel said: then get 50 members from the live forum to test the data integrity apart from our team. I hope that will be enough to ensure the data integrity. This is exactly the process we used. Don't overrely on your members though, and encourage them to report anything that seems off. Many of the issues we found that our testing members did not find were things they assumed were supposed to change. Make sure you include members with long histories on the community and new members, and that you have representative samples from all of your usergroups if there are varying levels of permissions in your implementation. And! Don't forget to test e-mails and make sure that test e-mails don't go to your members. We had a special process for redirecting all outbound e-mail from our test and migration environments into a viewer that allowed us to see the contents of those e-mails and make appropriate changes. There is a constants.php setting you can set to ensure that e-mails go to /dev/null until you're ready for that. Stuart Silvester 1
Stuart Silvester Posted February 10, 2021 Posted February 10, 2021 @Paul E. makes some great points. Communication is the key with your members, let them know when things are going to happen and (if you choose to run the background tasks after launch as they're designed) let them know that some things may look slightly wonky for a short time. I would also recommend not making knee-jerk reactions to comments your members make immediately after the conversion. People generally do not like change and will be saying "make this work like vBulletin". Take some time and let them settle in with how the new platform works and then collate their feedback. We did throw some good muscle at that conversion (and it wasn't vBulletin) so you might not be as quick, but we've definitely made some good performance related changes. A note I made when working on the performance changes says "[vBulletin 3/4] Posts step was around 900ms per 200 posts instead of 19 seconds". Your mileage may vary of course based on many factors. @Paul E. There were many performance specific tweaks, but the changes to how links are handled showed the biggest improvements (a lot less queries).
PurplePixel Posted February 11, 2021 Author Posted February 11, 2021 Thanks @Paul E. and @Stuart Silvester for the great feedback you both shared! I can see you both are very experienced in migration and all the advices you wrote will be added to our todo before migration list. It's like war: expect for the worst and hope for the best. 13 hours ago, PurplePixel said: Our plan is to start with a small sample of the old solution (1500 members / posts / etc.) and iterate the conversion process until we are happy with it. I think I wasn't very clear about this part. I was trying to say that this is my local database that I am iterating with such sample so I can see what fails and how to improve fast. When the sample works as desired, we can start testing the conversion on the whole database, then see what will fail, benchmark it, profile it and improve it. After we are happy with this part, I think we can create on a temporary server with the real data for beta testing. Server where our team and forum members can test the solution. 13 hours ago, Paul E. said: Don't overrely on your members though, and encourage them to report anything that seems off. That's a great piece of advice. We have to instruct our members to report anything, not only things that are clearly off. 13 hours ago, Paul E. said: And! Don't forget to test e-mails and make sure that test e-mails don't go to your members. That is what I had in mind, either /dev/null, or a fake smtp catcher so we can also see if the communication is working and looking as desired. 13 hours ago, Paul E. said: There is a constants.php setting you can set to ensure that e-mails go to /dev/null until you're ready for that. I think you are talking about the EMAIL_DEBUG_PATH constant, which can write the outgoing e-mails to disk instead of sending them. Sounds like a good approach since we can use /dev/null while testing and write to disk on a later stage. 12 hours ago, Stuart Silvester said: I would also recommend not making knee-jerk reactions to comments your members make immediately after the conversion. People generally do not like change and will be saying "make this work like vBulletin". Take some time and let them settle in with how the new platform works and then collate their feedback. Yes, I am pretty sure there will be a lot of members that will be reluctant to changes and will probably complain a lot but for sure we will respect our community since it's the heart of the project. 12 hours ago, Stuart Silvester said: A note I made when working on the performance changes says "[vBulletin 3/4] Posts step was around 900ms per 200 posts instead of 19 seconds". That sounds like a great improvement! I suppose this tweaks exist in Invision Community 4.5.0 beta 9, right? We did not upgrade to latest yet, but will upgrade in the next few weeks. @Stuart Silvester is there a unit testing suite for the whole application? We were thinking to create some unit testing for the plugins we are developing and I was wondering if we can integrate in something that already exists or we should create our own system. I have to say, I am still thinking to adapt the code to a command line interface and run the forum migration from there. If feels much safer and more controllable than doing this through requests to a web service. And some existing unit tests would be really great so I can make sure the migration tool works as desired. @Paul E. did you and your team implement an automated unit tests suite or all the tests were manually done? Thank you all for the great feedback you provided! It really helps to see that others did conversion with millions of posts and they worked (although not perfect, but with a bit of sweat, it was done). All of this input really helps!
Recommended Posts