From Paperclip to Active Storage at scale
My experience of migrating from Paperclip to Active Storage on a production site.
I was until recently working on a Ruby on Rails project as part of my day job, which used Paperclip for attaching files within the application. With Paperclip being deprecated in May 2018, it was high time we moved to an alternative.
Active Storage was added to Rails 5.2 in April 2018 and having used it in other projects, I was confident it was the right choice for us.
These articles really helped us out with the migration, read them a few times:
- https://github.com/thoughtbot/paperclip/blob/master/MIGRATING.md
- https://www.elitmus.com/blog/technology/migration-from-paperclip-to-activestorage/
- https://www.tokyodev.com/2021/03/23/paperclip-activestorage/
You’ll notice that I’m not sharing much in the way of code below. That’s because much of what was written draws from the articles above and any additional bits and pieces are referenced below. This article is not intended to be a tutorial, but more of a list of things we did, considerations we made and changes we followed up with in response to the effects on our production environment.
Background
At the time of the migration, the site had roughly 70,000 images (~50GB) across 7 different models, stored in a single AWS bucket with their path representing the type of the model.
Images on the site are critical to the user experience, so getting this process right was of paramount importance. We developed a plan which allowed us to deploy the migration in a safe and reversible way, giving us confidence and allowing us to back out any given change if required.
The migration plan
Due to the number of attachments we were dealing with and complexity involved, we opted to split the migration up into 7 different tasks, one for each model. We additionally split this process into two distinct deploys, so we had good rollback options and could minimise the risk.
Step 1: Mirror attachments from Paperclip to Active Storage
This step meant that all new attachments would be created in both Paperclip and Active Storage.
We deployed this first, so that we knew the backfill task would correctly capture all the old attachments, rather than chasing down the new ones created by users after the change was deployed. It also allowed us to pause directly after deploying to monitor the impact of our change.
Mirroring the files
We implemented this using a generic Rails concern, which allowed us to apply the mirroring to any of our models. This concern was borrowed from the tokyodev.com article above, so thanks very much to Paul McMahon for publishing that.
The backfilling task
- Populate the Active Storage tables with the Paperclip file information
- The new record in the Active Storage blobs table needed the checksum of the file, so we needed to reach out for the existing file in S3 to find out what it was.
- It must be exit-safe, so if it crashed or timed out, it could restart from where it left off
- This was possible by checking if the model already had an ActiveStorage attachment saved for the right field
This task was also borrowed from the aforementioned tokyodev.com article with a minor change - we went with this to check if the model is valid for migration, rather than relying on attributes being defined on the model. We only had one attachment per file, so this worked fine:
def self.valid_for_migration?(klass)
klass.include?(PaperclipToActiveStorage)
end
After we’d run this task for a model, we validated the following things:
- Number of new ActiveStorage::Attachment
- Number of new ActiveStorage::Blob
- Performed a new upload in the UI to validate:
- No user change in upload process & files appeared correctly
- New ActiveStorage Attachment & Blob was created
Running this job via delayed_job on Heroku took ~800ms per execution. We scaled up the worker dynos so we could run this in parallel - 25 workers gave us ~33/second. Be careful not to overload your DB if scaling this much.
Step 2: Update user path to use Active Storage
This was the second and final step for each model - allow our users to interact with Active Storage instead of Paperclip.
- Update specs
- Update views
- Update controllers
- Update models (ensuring to add new validations for ActiveStorage that replicate the Paperclip validation functionality)
- Deploy.
- Validate full end-to-end image handling for the model in question
Validations
Active Storage doesn’t ship with native validations yet, so we went with the active_storage_validations gem which met all our requirements and was easy to implement.
Attachment Variants
At the time of writing, the site was running on Rails 6.1 but didn’t have the config.active_storage.track_variants
option enabled. After turning this on, we saw responses for images drop from about 160ms down to 40ms. Not bad for a config change.
There’s currently no support in Rails for predefined variants at the model level, which is apparently coming in Rails 7 - see https://github.com/rails/rails/pull/39135.
To get around this limitation, I ended up writing a simple Variantable
concern to accomplish the same functionality. This was removed when the app was upgraded to Rails 7.
Default paths
Paperclip supported having a default path for the model, which would be resolved if there was no attachment present. This worked well as you could just call .url
and be confident you’d always receive one back. However, this isn’t possible in Active Storage (yet?) so I ended up monkey-patching ActiveStorage itself to give us this functionality. Not perfect, but you gotta do what you gotta do.
This method also hooked into the custom code for the variants (above). I’m hoping a better long-term solution can be found to this.
Deployment
Deploying this step increased the load on the application as it started generating the variants in response to user requests for images. This was immediately visible in the metrics. At this point, we were still using the Redirect method of ActiveStorage, which means every image request must hit the app, so the only performance gain we had was the variants being generated over time and reused.
We also saw increased throughput as requests for images starting hitting the app instead of AWS.
We also switched to the Proxy method so that we had we could allow Cloudflare to cache the variant requests. Since Cloudflare won’t cache responses which include a Set-Cookie
header, we applied this monkey-patch to ActiveStorage. As more variants were generated, the throughput started to drop as Cloudflare cached the responses.
Step 3: Cleanup
Once all models were migrated to Active Storage, we removed the Paperclip gem & configuration files & deployed. Then we removed the Paperclip fields from the database for all models and finally removed the old assets from S3. The last task was to remove the migration job & concerns.
Over a few days, the site found a baseline for serving images, the vast majority of which are now handled by Cloudflare’s cache. New uploads still require variants to be generated, but that’s a fairly cold path and doesn’t concern us.
Thoughts & conclusions
ActiveStorage has been working well for that app for a while now, and I’m glad we put the effort in to remove some debt and switch to ActiveStorage, giving us more control and features. Sticking close to Rails defaults has served me well so far, so I’m glad to be able to remove another dependency.
We couldn’t figure out a way of allowing the models to mirror from Active Storage to Paperclip, instead of the inverse. This meant that if we had to rollback Step 2, we’d lose any uploads the user had created whilst the change was live.
Be careful if you’re ever modifying the parameters for a variant. It’ll cause all the variants to be regenerated in response to user requests. If this is a particularly hot path with lots of different images, you may well end up with a lot of processing required. Having the flexibility to change the transformations of an image easily at runtime is super convenient though, so it’s a worthy tradeoff.