I don't understand why so many people recommend baby Rudin (Principles of Mathematical Analysis). The presentation in Rudin is not merely terse, but also quite dry and unmotivated. I suggest you avoid it--regardless of how much talent or maturity you have. There are plenty of more interesting texts which will teach you just as much: Spivak and Pugh are nice, I also recommend the recent two-volume work by Zorich.
By the way, as you aquire experience you'll gain confidence and get over the urge to always check your answers. Here's a good exercise with a built-in answer key: When reading a text, every time you get to a result (claim, theorem, etc) try to prove it on your own before you continue. You probably should be doing that more often than not.
In any case, don't stick to just one text/source. Shop around, read a few pages here and there before you settle on something. There's no way a stranger on the internet can make a good recommendation: Find what works for you. The most important thing is that you're fully engrossed!
I've wondered that too. My conclusion is that it's mostly coming from folks who aren't distinguishing between something like elegance as a mathematical work and effective pedagogy.
The first person I met singing its praises was a hardcore linux guy who insisted on doing every task through a terminal with emacs—and this doesn't surprise me. I feel like there's a similar aesthetic at play here, and maybe a bit of fear that doing anything but the toughest option will make them weak (choosing these things on their own would be insufficient for that conclusion—but it often comes with a kind of scoffing attitude toward the 'lesser' options).
The logic behind toughest = most effective is a little confusing to me. Sure, grit has its uses in intellectual work, but getting effective instruction and building a solid foundation of concepts seems like it would outweigh it.
I'm from Australia. My theory is that of the subset of people that are interested in both computer science and math, a significant portion use linux and if you use linux, then emacs is the best LaTeX editor (auctex and reftex are amazing). And "doing everything in emacs" is just what naturally happens when you use emacs long enough.
Question: I did EE degrees. Trying to fix possible gaps, I am going through Robert merlose notes on functional analysis. He refers to rudin for metric spaces. I am comfortable reading rudin, but I am hoping for an intuitive motivation for abstract definitions in general. Metric spaces have a physical motivation. What does one gain, if it is axiomatized?
Maybe my bigger question is, in mathematics research, will abstraction always go towards symbolic manipulation and set theory. I am looking to avoid definition via axioms and motivate it. I hope I've explained myself.
You are asking big questions! My answer is brief but I hope it helps a little.
By axiomatizing a definition we can begin to prove theorems rigorously. If we formally abstract a concept and deduce formal statements from it (which can themselves be quite unituitive), we can be confident that the statements apply to the concrete situation at hand. On the other hand, if we always worked only with concrete or physical examples, we would have to figure out everything from scratch every time. A good general theory is one which concisely explains many specific cases at once.
For example: Can every periodic continuous function be approximated uniformly to whatever degree of precision we like by a partial Fourier sum? If we abstract away what's important, and study the situation in the general context of Hilbert spaces, we can answer this question not only for this specific case but also for a broad class of families of approximants in one fell swoop. We save a lot of effort by doing this, and we also usually gain extra insight into the problem.
Still, you don't want to abstract too early. It's important to understand the concrete case first before jumping a level in abstraction, otherwise you end up understanding nothing. If you're having any trouble with metric spaces (I'm not sure if you are) then you might find it helpful to look at a few specific examples to see what they're used for. Examples: In coding theory, we often use what's called the Hamming metric, the distance between two words. In graph theory, there is a natural geodesic metric: the shortest distance between two vertices as a walk along the edges. If that's not concrete enough, consider the popular "6 degrees of separation" rule: people are vertices and relationships are edges. Of course there are the usual examples of Euclidian space R^n, unitary space C^n, and the other various normed spaces you're studying in functional analysis.
By working with the axioms of metric spaces we can prove theorems which apply to all such cases at once. Here's an example of a tricky theorem: Let x be a point in a metric space. Suppose a subsequence satisfies the condition that each subsequence has a subsequence which converges to x. Then the entire sequence converges to x. You wouldn't want to prove this theorem from scratch in every specific concrete case! It's more efficient to abstract out the essential features (the axioms) and then prove the theorem in the general setting.
As for your other question: Set theory is (in my opinion) not a fundamental feature of mathematics. It just so happens that mathematicians today like to axiomatize everything in terms of set theory. Still, sets will always be useful even if we don't choose to define everything in terms of sets. A set is (loosely speaking) nothing more than a collection of objects, and we'll probably always find it useful to deal with collections of objects.
As for symbols: What more is a symbol than a hook on which we havg an abstraction? How can we manipulate abstractions if we don't have symbols for them? I'm using the word "symbol" in the broad sense here, which includes diagrams (or pieces of diagrams) and words in a natural language. If you want to see abstractions manipulated via diagrams and words, take a look at classical sources (say, pre-1500's). I believe our modern notation is far more amenable to manipulation and understanding.
In short: In mathematics, abstraction and symbol manipulation are the name of the game. If we have no symbols then we have no abstraction and no mathematics.
Rudin is a good reference book. It's not good to learn from, but if you're looking for the authoritatively _best_ proof, that cuts right to the heart of the problem, then Rudin is great.
It's not good for self study, but I used it as a supplement to a real analysis course I was taking at the time, and I appreciated the terseness and dryness. My lecturer went into the details, so it was good to have a terse book to remind me of lecture content when doing assignments and studying for exams.
By the way, as you aquire experience you'll gain confidence and get over the urge to always check your answers. Here's a good exercise with a built-in answer key: When reading a text, every time you get to a result (claim, theorem, etc) try to prove it on your own before you continue. You probably should be doing that more often than not.
In any case, don't stick to just one text/source. Shop around, read a few pages here and there before you settle on something. There's no way a stranger on the internet can make a good recommendation: Find what works for you. The most important thing is that you're fully engrossed!