Just make a simple experiment... cover all three monochrome sensors with tape (Scotch "magic tape" is fine since has an oppaline translucid effect without blocking all light) and leave the two rgb sensors uncovered. I did this with a phone case so that tape does not contact directly with the N9 glass back.
Put the camera on monochrome mode and focus on something, and you will see a perfect grayscale image. Then take a picture on monochrome mode and you will see a competely blurred image, both on jpg and raw files.
My interpretation... the live monochrome image is a desaturated version of a color image obtained through one rgb sensor (it is the central one since nothing happens when I cover the other), but the final rgb images are the fusion of the three monochrome output.
Try for yourselves...