Credport Technical Post-mortem
Development
Startups
This is a companion post to our Tale of Credport post in which I will go into further detail about the technical problems we had and decisions we made. You probably want to read that one before this one. Credport’s evolution in technical design on both the front-end and the back-end effectively coincided with our evolution from Boston/Webcred to Berlin/Credport.
Backend - All of the Data!
I fondly remember the first days of the summer. I just came back from Germany and we just got introduced to the luxury of a well-run Highland Capital Office. Samir was still in town, so quickly we got the task of writing a production-worthy system. While we have extensively talked about how we envisioned Webcred and a rough prototype to learn from, we wanted to start from 0, truly brainstorming the requirements we had for our MVP.
Much of our initial efforts can be attributed to streamlining existing information from social networks and other sources, figuring out an efficient way to store and query such information, as well as ultimately design an API that would allow our partners to consume as well as contribute to our data.
As mentioned in the main post, we doubled down on Neo4j, as we saw common connection queries to be our competitive advantage. Maybe a bit too hastily, we assumed that meant that all of our data were to fit into Neo4j, as we expected managing both a relational DB and Neo4j to be too much complexity overhead (a decision we ultimately made with Credport). Thus the task of data normalization merged with the task of fitting it into a graph structure. After some time, we settled on the following schema:
At the heart of this schema is the User. It has simple key-value attributes such as names, as well as more complex attributes such as work places via a “works” edge to semantic job title/work nodes. Furthermore, the graph represents the identities the user holds on various social networks via what we called TpSubidentities, that may or may not be connected via email nodes. Connections between identities and other identities/entities were modeled with a Hypergraph approach by using a dedicated Connection node, which connects to the two objects via “halfedges” (to accommodate asymmetry) as well as a context-node describing the kind of connection (think “facebook-like” or “twitter-follow”).
While it has certainly not been the most practical, the reasoning for it was as follows:
Expressiveness
As mentioned above, the primary goal was data normalization across networks, and I believe this schema definitely fits that. Not only can this schema fit Facebook’s, Twitter’s and LinkedIn’s schema (which were our primary data sources), but any arbitrary representation of identities and their connections to any other identities and entities. In that sense, it took a page from the OpenGraph playbook which does things similarly. That expressiveness also allowed us to design a flexible API that would enable anyone to display the user’s profile without having to know any network’s specific schema (an issue you don’t have with Facebook, as third-party OpenGraph data is not consumable by other third parties). In my mind, that was the most exciting part, as I saw every social networks’ data intertwine with perfect harmony. We could add a social network and any of our partners would be able to immediate display that data. Oh, the irony of thinking we could design the penultimate schema.
Common queries
With the graph being as semantic as possible, queries showing common connections to identities/entities, common workplaces/schools, even common majors were a non-brainer with this schema.
Unfortunately, only after quite some time did we see the pitfalls of this approach. The biggest issue was that we had data in the graph, that just didn’t feel right in the graph instead of a relational DB. I think our original desire to fit everything in the graph to avoid the complexity of managing two databases made us blind to the problem of over-normalization. Not everything belongs in the graph and I think we did a better job with the second iteration.
Another issue of using Neo4j exclusively was that we had to effectively built our own ORM over the REST-API. While we unfortunately lost our git history on that one, it involved writing an ActiveModel compliant API, figure out UUID generation and other stuff. Not so fun. Ultimately, we ended up with just too much time spent on managing complexity and technically debt early on, which wasted a lot of critical time in the summer.
With these learnings in mind, we decided to overhaul to a second-generation schema with Credport. That schema was much simpler and so straightforward, that I won’t go into too much detail. We would store all data in Postgres, only persisting a copy of identity and connection data with their ids in Neo4j. Our common connection query would find the appropriate nodes, return the ids and instantiate their appropriate ActiveRecord objects from Postgres. Managing two databases was a lot less stressful than we expected, and although we clearly had the issue of having to roundtrip to two databases, Rails great caching infrastructure allowed us to still have great response times for our API, since profiles stay relatively static.
While most of the evolution has been around simplifying the schema, we did end up adding what we called connection-context protocols. We preserved the context idea from our previous schema as third-party interoperability was still a priority for us. (Remember, a context gives developers everything they need to understand and appropriately display associated connections) However, we ran across the problem of contexts having similar traits such as reviews from different websites always containing text. Thus we came up with connection-context-protocols which act like validators of certain properties of the connection for connection-contexts which implement them. A bit over engineered again, but useful as we actually had API consumers this time around. Both Tamyca and Kinderfee had their own connection-contexts which implemented the text-protocol.
Frontend - Make the User Trustworthy
While the backend was mostly about storing and querying our data in an appropriate way, the fronted problem was all revolved around one question: How do we make the user and his profile trustworthy? Assuming the user granted us data to their information, what was the best way to display it?
After some deliberation, we distilled the trust problem into the following principles:
- The gut: The first impression. The feeling of seeing someone for just a split second and forming a preliminary opinion about it. Incredibly strong and usually a good heuristic developed over generations of human interaction.
- The brain: Analytical mode, where we try to rationalize our opinion, find objective measures of why we should trust a person. We usually return here when we get doubts.
Any good frontend design would have to account for these two aspects of trust-building instincts. We derived them specifically to counter the concept of an in-transparent score. We disliked the idea of being told to trust someone, we just didn’t believe that this is how trust online works.
As you can maybe see from my history at Apture, or my recent redesign of my about page, I’m a huge fan of contextual depth: Keep things simple, but show more when appropriate. This is especially useful in our case, as the data we need to show to satisfy the gut is different from the brain. Our working thesis was to make the profile easily glancable and trustable, but allow the viewer of a profile to dig deeper and find out why they should trust the profile.
This is what we ultimately created:
For the gut, we had a clean and simple design, less chrome and more content with a grid-ish structure for easy digestion. A profile pictures, customizable background picture and tagline give it a personal touch. On the top, you can see a quick quantitative summary as well as a list of verifications on the left. We used the LiveTiles box structure because we needed to let the user know what sections were available above the fold, while still be able to show content representative of that section. The rotation of the content as well as hover animations gave a sense of more information to allow the brain to inquire more.
Here's the badge that marketplaces could embed:
We had a lot less space obviously, but I think we made the most of it. Upon click, it would open the profile page in a lightbox-iframe.
The best profile ever
I’m usually a lot more humble, but I really do believe that we had one of the most advanced and best profile pages out there to achieve our goal of making the web a more trustworthy place. It was easy to quick to create (at the height, we imported data from7 sources, all with user consent only) and extensible by marketplaces with real transaction data. Unfortunately, only later we come to realize that it didn’t matter. To quote our email to advisors: “Trust is a problem that people often talk about, but don't really act upon.” (Just look at Craigslist)
That being said, we had an awful amount of fun with stretches of pain along the way, and clearly learned a lot, so we hope whatever we do next is gonna be half as interesting.
P.S: With much delay, we are happy to be able to finally open-source our source code for Credport. Check it out here and let us know if you have any question. We'd love to help.