Ruby Symbols


Many programmers from different backgrounds are unable to get what Ruby symbols are.

Let's make a few points more clear about that.


What are Ruby symbols?

Strings? Objects? Names?

According to the API documentation:

Let's explore each one of those items using code.

We will be better using IRB when studding Symbols.

Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

D:\Ruby186-398>ruby -v
ruby 1.8.6 (2010-02-04 patchlevel 398) [i386-mingw32]

D:\Ruby186-398>irb
irb(main):001:0>
      

Creating Symbols

We can create symbols in various ways:

irb(main):001:0> # Normal way, just prefix a token with ':'
irb(main):002:0* greeting = :hi #=> :hi
=> :hi
irb(main):003:0> # Multi token symbol
irb(main):004:0* another_greeting = :"hello man" #=> :"hello man"
=> :"hello man"
irb(main):005:0> # Use the .to_sym if it's defined for your object class
irb(main):006:0* # For example .to_sym is defind in String class
irb(main):007:0* a_third_greeing = "howdy".to_sym #=> :howdy
=> :howdy
irb(main):008:0> # Using %s[ ]
irb(main):009:0* %s[a 4th one] #=> :"a 4th one"
=> :"a 4th one"
irb(main):010:0> # We can also cast a symbol to string with to_s
irb(main):011:0* :ds.to_s #=> "ds"
=> "ds"
irb(main):012:0>
      

Representing Names

In computer science there is a term called: Symbol table. In Ruby, the symbol table stores various things like method names and symbol names.

irb(main):001:0> # Not working on ruby 1.9
irb(main):002:0* :ds.to_i 
NoMethodError: undefined method `to_i' for :ds:Symbol
        from (irb):2
        from D:/Ruby19/bin/irb:12:in '<main>'
irb(main):003:0>

irb(main):001:0* # Works on Ruby 1.8.6
irb(main):002:0* :ds.to_i 
=> 33897
irb(main):003:0> # Notice the value of the symbol is not its object id
irb(main):004:0* :ds.object_id 
=> 406772
irb(main):005:0> # Symbol values can't be changed
irb(main):006:0* :ds = 3 
SyntaxError: compile error
(irb):6: syntax error, unexpected '=', expecting $end
:ds = 3 #SyntaxError: compile error
     ^
        from (irb):6
        from :0
irb(main):007:0>
      

Taking a more in depth example, let's explore the symbol table:

irb(main):001:0> class Dummy; def hello; end ; end
=> nil
irb(main):002:0> # Let's check what symbols names start with 'hello'
irb(main):003:0* puts Symbol.all_symbols.collect{|x| x.to_s}.grep(/^hello.*$/)
hello
=> nil
irb(main):004:0> # Now let's define a new method called 'hello_world'
irb(main):005:0* class Dummy; def hello_world; end ; end
=> nil
irb(main):006:0> puts Symbol.all_symbols.collect{|x| x.to_s}.grep(/^hello.*$/)
hello
hello_world
=> nil
irb(main):007:0>
      

When we defined the class 'Dummy' and, more specifically, when we defined the 'hello_world' method, it was added to the symbol table.

Let's take another example:

irb(main):001:0> Symbol.all_symbols.size
=> 3981
irb(main):002:0> :koko
=> :koko
irb(main):003:0> Symbol.all_symbols.size
=> 3982
irb(main):004:0>
      

Symbols are unique

:Fred is :Fred wherever you see it no matter the context. This is not true for strings.

      name = :Fred  #=> :Fred                  
      
      module M
      	Cons = :Fred
      end
      
      class Inside_M
      	CConst = :Fred
      	@@theName = "nothing"
      	def myMethod
      		@myName = :Fred
      		puts "a - #{@myName.object_id}"
      		@@theName = @myName
      		puts "b - #{@@theName.object_id}"
      	end
      end
      
      myInstance = Inside_M.new.myMethod 
      puts "c - #{name.object_id}" 
      puts "d - #{M::Cons.object_id}" 
      puts "e - #{Inside_M::CConst.object_id}" 
      puts "Fred".object_id 
      

This will produce following result:

      a - 238916
      b - 238916
      c - 238916
      d - 238916
      e - 238916
      22110660
      

When to use Ruby symbols?

You might be wondering, why has Matz chosen to give us this low level introspection in the language by allowing us to work with the interpreter stuff?

We can list two reasons at least: Performance and Efficiency

Performance

Strings are mutable, the Ruby interpreter never knows what they may hold in terms of data. As such, every String needs to have its own place in memory.

Symbols on the other hand, are not mutable, once created, the Ruby interpreter knows exaclty what it holds, and the unique place in memory where it is.

We can see this by creating some Strings and Symbols and printing their object id.

      puts "hello_world".object_id
      puts "hello_world".object_id
      puts "hello_world".object_id
      puts '==============='
      puts :"hello_world".object_id
      puts :"hello_world".object_id
      puts :"hello_world".object_id
      puts '==============='
      puts :hello_world.object_id
      puts :hello_world.object_id
      puts :hello_world.object_id
      

This will produce following result:

      23328190
      23328170
      23328150
      ===============
      199058
      199058
      199058
      ===============
      199058
      199058
      199058
      

NOTE: Your object id's will be different then the ones above.


Efficiency

We are talking about memory efficiency here,so let's check this with some code:

      # The more commonly used way
      if name == "Marcos Ricardo"
      

It is really bad in terms of memory efficiency:

  1. Comparing strings is costly, specially when they are long
  2. It is reserving "changeable" amount of memory, 14 bytes in this case
  3. The GC will have to clean this memory later on

Let's try another way:

      # The "rubyist" way 
      if name.to_sym == :"Marcos Ricardo"
      

What are the gains?

  1. Comparing integers (a symbol's reference is an integer) is cheaper
  2. Reserving memory 4 bytes for a symbol reference
  3. The GC will not clean this symbol, symbols remain until program termination

So, prefer using symbols over strings as much as you can, but also remember to avoid defining many symbols, because they are not deleted by GC while program runs.