NoSQL?
TL:DR;
A selection of high level notes and links about NoSQL.
This is more a data dump than a presentation.
3 parts
- Database & Models
- Document Databases
- Learnings
There is a lot of data in this presentation
The slides and links are available on acmconsulting.eu
I will be going fast
The basics
Databases & Models
A database is a collection of information that is organized so that it can easily be accessed, managed, and updated.
– Wikipedia
A database model is a type of data model that determines the logical structure of a database and fundamentally determines in which manner data can be stored, organized, and manipulated.
–Wikipedia
Some revision
Relational Databases
This model organizes data into one or more tables (or “relations”) of columns and rows, with a unique key identifying each row.
– Wikipedia
Oracle, MySQL, MS SQL, PostgreSQL, DB2…. etc
Relational Databases - Popularity
7 of the top scoring 10 systems are relational
Relational Databases - Details
- Uses Structured Query Language, Third Normal Form etc
- Sample Query:
SELECT * FROM users WHERE email=”[email protected] AND type=‘userProfile’; - But you know all this :)
Get on with it
NoSQL?
A NoSQL (originally referring to “non SQL” or “non relational”) database [models it data] in means other than the tabular relations used in relational databases – Wikipedia
NoSQL = All non relational databases
NoSQL - Details
- Martin Fowler calls them “Aggregate Oriented Databases”
- 4 basic sub-types
- Key-Value (Reddis, Memcache, …)
- Document (MongoDB, Couchbase, …)
- Column-family (Cassandra, HBase, …)
- Graph (Neo4j, OrientDB, …)
NoSQL - Numbers?
60% of the available database engines are NoSQL!
NoSQL - Popular?
Only 18.3% of the score, for all NoSQL databases.
Document stores are the 2nd most popular category, 6.7%
Side note - CAP Theroem
Consistency, Availability and Partition tolerance
– Wikipedia
CAP Theroem - Details
- Jepsen is an excellent series of posts testing databases response to network partitions
- The first post in the series On the perils of network partitions is a must read
- Please stop calling databases CP or AP is also worth reading
Tell us more about
Document Databases!
Ok
Document Database?
Document databases get their type information from the data itself, normally store all related information together, and allow every instance of data to be different from any other. – Wikipedia
Document Databases - Scores
MongoDB is the biggest player by far, ~70% share of all Document Database score
So lets look at MongoDB
MongoDB
MongoDB (from humongous) … eschews the traditional table-based relational database structure in favor of JSON-like documents with dynamic schemas (MongoDB calls the format BSON)
–Wikipedia
MongoDB - Details
- Open Source (with commercial license option)
- Commercial support
- 4th highest scoring engine overall, and highest scoring document store
- Database as a service from many providers, including MongoDB itself
- Plenty of 3rd party apps, tools, drivers and support, more details
- Jespen links : 2013, 2015
- Sample Query : db.getCollection('user’).find({email: '[email protected]’, company: 'MongoDB’})
MongoDB - NodeJS
MongoDB - NodeJS
NodeJS and MongoDB have “grow up together” and have become a very popular combination of technologies
Some MongoDB features (e.g. Map-Reduce) are used via writing Javascript functions
MongoDB - My thoughts
- A top 10 database thats easy to use and focused on developer experience
- Feels quite familiar to SQL users, but that can result in people trying to use it like an RDBMS, which is very bad
- Had some early bad PR around issues of data loss and Jespen seems to break it
- Loads of tutorials, easy to get started
- Used in things like MEAN stack and KeystoneJS
- Not a bad place to start
An Alternative
Couchbase
Couchbase Server, originally known as Membase, is an … NoSQL document-oriented database software package that is optimized for interactive applications.
–Wikipedia
Couchbase - Details
- 2nd highest scoring document store, but with only 8% of MongoBD’s score
- Was formed by the merging of CouchOne (CouchDB) and Membase (Memcache)
- Open Source (with commercial license option)
- Commercial support
- No off the shelf database as a service (that i could find), but they do have a number of partners to help you
- Jespen link: none
- Sample Query: SELECT * FROM users WHERE email='[email protected]’ AND WHERE type='userProfile’;
Couchbase - My thoughts
- A distant number 2, but still number 2!
- Nice UI and management interface
- Really nice datacenter awareness/replication and scaling
- N1QL is very much like SQL. Tutorial
- I’ve used it at work and it worked well for us
- Number 4, Apache CouchDB is very similar - Couchbase vs CouchDB
A Wild card
ReThinkDB
RethinkDB is an open source, NoSQL, distributed document-oriented database. It stores JSON documents with dynamic schemas, and is designed to facilitate pushing real-time updates for query results to applications. –Wikipedia
ReThinkDB - Details
- 7th highest scoring document store, but rising fast (was 11th)
- Selling point is its realtime update feature, like firebase
- Open Source
- Commercial support
- No off the shelf database as a service, yet, but one in the works. Horizon.io
- Jespen links: 2016
- Sample query: r.table('users’).filter({email: '[email protected]’, company: 'RethinkDB’})
ReThinkDB - My thoughts
- Very nice UI and management interface, much like Couchbase
- Only used for a side project, but enjoying using it
- Installed on a raspberry pi 2, dealing with 250k object dataset with no issues and low cpu load
- Not used the clustering/scaling/sharding
- Can do joins!
- http://horizon.io/ is potentially very interesting for rapid prototyping
Interesting developments
3 NodeJS Document stores
- NodeJS “pure” databases, run as part of your node process, no external process or binary required
- Great for little side projects or IOT applications
- Quite a few of them, but 3 examples:
NodeJS Document store - PouchDB
inspired by Apache CouchDB … designed to run well within the browser. … enables applications to store data locally while offline, then synchronize it with CouchDB and compatible servers when the application is back online –PouchDB
12th highest scoring document store,
but rising fast (was 16th)
NodeJS Document store - NeDB
Embedded persistent or in memory database for Node.js, nw.js, Electron and browsers, 100% JavaScript, no binary dependency. API is a subset of MongoDB’s
–NeDB
NodeJS Document store - TingoDB
TingoDB
TingoDB is an embedded JavaScript in-process filesystem or in-memory database upwards compatible with MongoDB at the API level.
–TingoDB
The old dog/elephant?
PostgreSQL?
As a document store?
JSONB turns the JSON document into a hierarchy of key/value data pairs … there’s support for GIN indexes
–compose.io
PostgreSQL - Details
- 5th highest scoring db-engine, one behind MongoDB
- Open Source (with commercial license option)
- Commercial support
- Lots of hosting options, including quite a few as a service
- Jespen link: 2013
- Sample Query: SELECT * FROM users WHERE email='[email protected]’ AND type='userProfile’;`
PostgreSQL - Some thoughts
- I haven’t used JSONB except for experimentation
- We do use it in production at work
- Lead dev who uses it (Nick Johnson) had the following comments :
- JSONB = JSON with Element Query + Indexes
- Trade off is read/write speed vs standard JSON
- Can even query nested objects
- Keeps all RDBMS advantages (Joins, ACIDity, Materialized views)
- Limited features compared to “full” document stores, but still very good.
Phew.
Conclusion
- It depends
- (doesn’t it always)
- Mongo is a safe choice due to adoption
- Your use case may fit another better
- Lots of innovation and experimentation going on
- Its amazing what you can do with JavaScript
What did we learn about NoSQL?
NoSQL - Silver bullet?
No.
Of course not.
- NoSQL requires Non-relational thinking
- Trying to use NoSQL as relational db == bad time
- Joins in software works ok, until it doesn’t (i.e. at scale)
NoSQL - Schema less ness?
Just because it CAN be schema-less doesn’t mean it SHOULD be schema-less
NoSQL - What about joins?
- Denormalization
- Optimizes for reads, at the expense of writes/complexity
- Tricky to keep things in sync (we’ve used hooks/micro-services)
- Others suggest you don’t do it any more
- Increasingly NoSQL engines do support limited forms of joins!
NoSQL - Same old problems
- Data migration and schema changes are still as much “fun” as they ever were
- We use Mongo Migrations and it works well for us
NoSQL - Final thought
“NoSQL, or rather NoAuthentication, has been a huge gift to the hacker community. Just when I was worried that they’d finally patched all of the authentication bypass bugs in MySQL, new databases came into style that lack authentication by design.”
– ArsTechinca “hacking team gets hacked”
Go Play
Have fun!
Credits
Presentation written in Markdown
Presented using Reveal.js / Reveal-MD
By Alex McFadyen
With help from the BEN Development team
Scoring screenshots from http://db-engines.com