Rationale for choosing Elixir for the MoodleNet back-end
Author: Mayel de Borniol (Technical Architect)
Date: 6 August 2018
MoodleNet is a new open social media platform for educators, focused on professional development and open content. It will be an integral part of the Moodle ecosystem.
Just like the ~7,000 languages that are spoken in the world today, there are many different types of programming languages that allow humans to communicate with computers (and tell them nicely what to do). An important decision for our team was what technical language(s) Moodlenet should speak, and this document highlights the decision-making process which ended up with choosing the ‘Elixir’ language.
The process began with an extensive review of web application programming languages. More specifically, the technologies used for apps based on ActivityPub federation standard - which will enable MoodleNet to be a decentralised platform, you will be able self-host an instance (just like you can install Moodle on a server, and customise it) but all the instances will be interconnected.
A spreadsheet of fediverse apps and their stacks (and crowdsourced extra information via the fediverse) was compiled. This showed that besides Ruby on Rails, there were a few ActivityPub projects using Python (and a couple intending to use PHP) and some other languages, however, most projects are still in early development and many are not yet federated. Considering that ActivityPub is still relatively new it is understandable that there are limited examples of full implementations, let alone ones that are proven to work and be interoperable.
One of the first languages that stood out was Go, one of the only a few to actually have an ActivityPub library. The developer was contacted and offered his assistance for future collaboration (although in full-time employment and developing the Go library in his spare time). What this meant was that the MoodleNet team would have to modify the library in order to tailor it to the project’s needs. The main obstacle to this, however, is that Go is a ‘statically typed’ language which means every ActivityStreams extension would have to be hard coded pre-compilation. This would make things less flexible, so this option was set aside.
Alongside researching the documentations, codebases, issue trackers and communicating with developers of the apps in the spreadsheet, many of them were installed and tested (including Mastodon, Pleroma, and Prismo). The latter, while having some functionality that MoodleNet will need, confirmed an earlier impression from running a Mastodon instance: Rails apps tend to not be lightweight and often rely on many external dependencies.
After comparing the possible language candidates, the focus shifted on detailing the pros and cons for each, finally focusing on the following:
- The Erlang VM is time tested (more than 20 years old), very fast and has good concurrency and inter-process communication (called OTP).
- Elixir is a new (7+ years) functional language designed for productivity and maintainability that runs on the Erlang VM.
- Phoenix is a modern web framework that makes building APIs and web applications easy. Since it’s built with Elixir and runs on Erlang VM, it is very fast and has excellent support for handling very large numbers of simultaneous users.
Benefits of Elixir
A list of reasons to use Elixir in our context:
The Phoenix web framework has been found to perform better than Rails, PHP, and Python. One benchmark showed Phoenix handling over 10x more requests than Rails in a given period. Phoenix was also much more consistent under load - Rails was more prone to have some requests bog down. This can cause a “chain reaction”, because Rails apps are configured with a fixed number of application processes, so if some of them are slow, it can mean that others have to wait in line, which dramatically increases the app’s response time.
Furthermore, Phoenix apps without caching drastically outperform Rails apps with caching. This is important because caching is notorious for being a source of complexity and bugs, and because caching can’t be used for moment-by-moment, personalised content. The Bleacher Report for example went from over 100 AWS servers to 5, with CPU usage rarely going above 10%, and saw a 10x performance improvement over their Rails app.
There are a growing number of software projects using Elixir - e.g. Pinterest who say:
“We like Elixir and have seen some pretty big wins with it. The system that manages rate limits for both the Pinterest API and Ads API is built in Elixir. Its 50 percent response time is around 500 microseconds with a 90 percent response time of 800 microseconds. Yes, microseconds.”
And an adopter of the Phoenix framework says:
“A rather large JSON request was taking about 1.5-2.0s against our Rails backend (no caching). That same request (same h/w, database queries and data) takes about 400ms with Phoenix. Our entire application went from using just shy of 1 GB across 2 dynos to a single dyno using less than 100 MB. That single dyno is notably faster and can handle higher concurrency (about 10x). Importantly, Phoenix (without caching) performs better than Rails (with caching).”
One thing that makes Erlang/Elixir so efficient is the actor model, which ActivityPub seems to have taken inspiration from.
The Erlang VM’s model of concurrency is great for multi-core CPUs, but it was created before they existed. Its original purpose was to support concurrency and fault-tolerance via the use of many different machines. This makes it an excellent tool for building systems that can handle more load by simply adding more servers.
As the docs for an Erlang web server put it:
“At the time of writing there are application servers written in Erlang that can handle more than two million connections on a single server in a real production application, with spare memory and CPU! The Web is concurrent, and Erlang is a language designed for concurrency, so it is a perfect match.
Of course, various platforms need to scale beyond a few million connections. This is where Erlang’s built-in distribution mechanisms come in. If one server isn’t enough, add more! Erlang allows you to use the same code for talking to local processes or to processes in other parts of your cluster, which means you can scale very quickly if the need arises.”
Developer Productivity & Happiness
When people argue about programming languages, often the performance card is pulled (“this language is so much faster in these benchmarks”). But performance isn’t the only consideration: while it’s possible to just add more servers, developer time often costs more than servers.
More important then is which framework helps write software faster, of a better quality, and which benefits the programmer’s wellbeing.
As Pinterest concluded from using Elixir:
“We’ve also seen an improvement in code clarity. We’re converting our notifications system from Java to Elixir. The Java version used an Actor system and weighed in at around 10,000 lines of code. The new Elixir system has shrunk this to around 1000 lines. The Elixir based system is also faster and more consistent than the Java one and runs on half the number of servers.”
And Bleacher Report, who went from 150 servers to 5 using Elixir: > "The new language has led to cleaner code base and much less technical debt. It has also increased the speed of development.
Some advantages of Elixir for programmers include:
- No extra engineering spent on “making it faster” (if the runtime is already fast enough they don’t need to bother with caching for example).
- All the goodies of modern day programming, like a package manager.
- Pattern Matching: you can make assertions on the structure and can get values directly out of a deeply nested map (or list) and put their value into a variable. You also have method overloading and Elixir will try to match the functions from top to bottom which means you can have different function definitions based on the structure of your input data.
- Immutable data structures & pure functions: Everything that the function depends on is a parameter, and the only thing that “happens” is the return value of the function. There are no instance variables that are difficult to debug.
- Explicit code: For example, you explicitly specify a template by name, and pass the values you’ll need from the controller. Ecto also gives you more control over what data will be loaded from the DB.
- Ecto Changesets: filtering, casting, validation and definition of constraints when manipulating DB data.
- Optional type-checking: use typing only when you want to.
- Parallelism: it’s not just about performance! Parallelism in the Erlang VM low cost and seamless. Spawning a new process (not like an OS process, they are more like actors) is easy and has a very low overhead, unlike starting a new thread. You can have millions of them on one machine. And thanks to immutability guarantees and every process being isolated you don’t have to worry about processes messing with each other.
- Doc tests: It is possible to write code examples directly in the documentation of a method. These will be executed during test runs and check if they still return the same values/still pass. They are also included in the generated documentation.
While Elixir is a fairly new language, it is mostly a friendly interface to the Erlang virtual machine, which has been used since the 80s to build some of the most reliable systems in the world: telephone systems.
It is made for running a complex system with almost no downtime, like having multiple levels of “supervising” processes in a system to reboot parts that have errors. Basically, microservices before microservices were cool.
CPUs are no longer getting dramatically faster each year. Instead, we get machines with more cores. Our code can run faster, but only if it can run concurrently - meaning, different bits of code run simultaneously on different cores.
The main problem with concurrent code is having two pieces of code mess with the same data at the same time, creating unexpected results. Object-oriented languages like Ruby don’t provide great tools for avoiding such problems. But functional languages, like Elixir, do.
Writing concurrent code in Elixir is extremely easy, and it’s nearly impossible to accidentally interfere with other code that is running at the time.
For modern web applications (written in languages like PHP or Ruby on Rails) to do everything we require of them, they depend on a lot of other pieces running on the server, for example:
- Only one web request can be handled at a time, and it is not possible to spin up new processes as needed, so tools are needed to spawn multiple application servers up front and put a web server like Nginx in front of them to hand off requests.
- It is not possible to do slow background tasks without blocking web requests, so something like Resque needs to be added to take care of running background jobs.
- It is not advisable to use a precious process to maintain a websocket connection with a user, so something like Pusher needs to be added to get realtime functionality (meaning extra costs).
- It is not possible to keep a big process running all the time to do scheduled tasks, so a dependency on cron needs to be added.
- No built-in tool for managing multiple parts of a running system currently exists, so tools like foreman are used to start and monitor them.
In Elixir on the other hand, it is possible to spin up a nearly limitless number of processes as needed, so it is possible (in theory) to forego all those tools above.
Elixir code is also simpler to understand than object-oriented code because it has explicitness as a value.
“Functional programming is associated with concurrency but it was not by design. It just happens that, by making the complex parts of our system explicit, solving more complicated issues like concurrency becomes much simpler.” - Jose Valim
Using Phoenix and Elixir opens the door to building applications that are not possible with other web frameworks.
The Phoenix framework has first-class support for realtime communication via websockets (with polling as a fallback). In benchmarks, the creators have been able toserve 2 million simultaneously-connected clients. Additionally, they already have native channel clients for iOS, Android, and Windows devices. With that kind of support, you can confidently build servers to support chat, networked games, and more.
In Rails, ActiveRecord encourages developers to do all data validation in application code. However, validations that depend on the state of the database can only be reliably done by the database, using constraints or locks. This includes things like:
- Does any user have this username right now? (uniqueness constraint)
- Does post 5 exist still right now, before I comment on it? (foreign key constraint)
- Does this user’s account have enough money to cover this purchase right now? (CHECK constraint)
- Is this rental property reserved for June 8 right now? (EXCLUDE constraint on date ranges)
It is possible to use such constraints with a Rails application, but it is not typical to do so, and the tools do not encourage it.
Elixir’s Ecto database library embraces database constraints, with built-in support for adding them, catching constraint violations, and turning them back into friendly user-facing error messages.
Where Elixir is especially suited
Any project that is a good fit for Rails or Python is something that could be done in Phoenix, especially if the project involves any or multiple of the following:
- System expecting high traffic or requiring very fast / consistent response times
- Minimal downtime being crucial
- Realtime updates (eg, stock ticker, social media)
- Bidirectional realtime communication with websockets (eg, chat, games)
Drawbacks of Elixir
Some potential downsides of using Elixir, along with mitigating actions:
- Fewer Elixir developers available. However:
- There are also significantly fewer projects competing for those people.
- There is a trend in Ruby developers wanting to cross over to Elixir.
- Developers with Erlang experience are also well suited.
- Elixir is a shiny new thing with real promise, which helps attract very enthusiastic developers.
- While Phoenix is productive, it is not as productive as older frameworks like Rails… yet. Reasons include:
- Lack of developer experience with Elixir.
- The relative newness of its open source ecosystem (documentation, forums, example code available).
- There are fewer libraries in Elixir than in older languages, so there are more times when you’d have to write something yourself. However:
- You can use the many existing Erlang libraries.
- New Elixir libraries are being added quickly.
- Any remaining gaps are a chance to contribute to the community by creating a great open source tool.
- It may be harder to convince people to use Elixir, given that it’s currently obscure - not in the top 50 TIOBE index. However:
- Erlang is better-known. Elixir compiles to Erlang bytecode, runs on the Erlang VM and can use or be used by Erlang code. It’s basically “Erlang made friendlier”.
- Joe Armstrong, co-creator of Erlang, said of pre-1.0 Elixir in 2013 that it was "good s**t".
- Therefore, if we can convince someone on Erlang, Elixir is a no-brainer.
What’s next for MoodleNet’s development
A generic agent-centric approach to federation?
Another fediverse app being developed in Elixir (Fontina, for photo sharing - although it has been put on hiatus) is now working on a generic ActivityPub server to serve as back-end (there may be a good opportunity to join efforts…) Here is how the developer describes it:
“I’m working on pubstomp, a generic AP server that is intended for services like mastodon/pixelfed/peertube to work as frontends to it and would indeed support single account [per person, versus having an account on an instance of each of those apps]. > That said, one of my goals down the road is to build”api adaptors" that would basically plug into pubstomp and adapt to, say, the mastoAPI, pleromaAPI, etc.”
I started a thread about this on the fediverse (please join the conversation!):
There’s need for a generic agent-centric #ActivityPub server, so that instead of signing up to a bunch of different servers, a user could have their identity and data all in one place, and all the apps they use (clients, but if necessary server-side “plugins” as well) would interact with the activity/objects types that they support.
The plan for Moodlenet (so far)
- Fork the Pleroma back-end to create a set of ActivityPub libraries (or a generic ActivityPub server)
- Add suport for authenticating as OAuth client
- Add support for group accounts (agents that represent a several users)
- Create ActivityStreams extensions
- Test federation
- Possibly add GraphQL client-server API
- Create front-end app with React.js (web and mobile)
We are planning to fork the Pleroma back-end to create a set of ActivityPub/ActivityStreams libraries. If that proves to not be feasible, we will turn it into a generic ActivityPub server back-end that can support any type of activity and object (including extensions to ActivityStreams) and is easily extensible which doesn’t ship with any frontend (possible name: Pub of the Commons). Pleroma is written with Elixir’s Phoenix Framework, and licensed as AGPL (which is also being used for MoodleNet) so the upstream project will be able to use any improvements we make.
Firstly, any parts of the code coming from Pleroma that we don’t needed will be removed (like support for the deprecated OStatus protocol, and the two front-ends which ship with Pleroma - its own Vue.js interface and Mastodon’s React.js interface), and we’ll add some things that can be universally useful like support for groups (actors made of several users), a basic HTML registration/authentication interface (and OAuth client functionality, so users can sign in using their existing accounts elsewhere) so that at least those things can be done without relying on a front-end app.
While we work on creating models/vocabulary for MoodleNet, we can use this generic ActivityPub server to test our ActivityStreams extensions and make sure federation works well in real time (using the API only). Only once that is ready, will we start plugging in custom MoodleNet logic and developing the front-end app (which is the reverse from many fediverse app projects, who have first implemented MVPs of their use case, and then often find it challenging to add federation on top).
Regarding the connection between the back-end and front-end app, the ActivityPub standard comes with a suggested client-server REST API specification, but for some reason Mastodon and many other fediverse apps have created their own custom REST APIs (and Pleroma implemented Mastodon’s so that it can share front-end apps). So an interesting option at this point for MoodleNet (and Pub of the Commons), may be to still support Mastodon’s client REST API (for compatibility with existing front-ends like Pinafore) and additionally to implement a new GraphQL-based API that would allow more flexibility and extensibility between the server and front-end apps (any functionality/activities/objects not supported by Mastodon would only have to be implemented in the server-to-server REST API and the GraphQL API).
- Spreadsheet of fediverse apps and their stacks
- ActivityPub standard (for federation)
- ActivityStreams & its vocabulary (which ActivityPub uses as a data format)
- How to implement a basic Mastodon-compatible server
- Why elixir (big tanks to Big Nerd Ranch for some of the great material used in this doc!)
- Choosing Elixir for the Code, not the Performance
- Pleroma’s architecture