**** Scoping in Python ****

If you know languages like Java or C, you might be a little bit irritated how scoping is done in JavaScript and Python.  Especially in JavaScript where you (have to) declare variables with the "var" statement, you have to know that the variable then is declared for the entire scope and that there are no block scopes.

In Python there is nothing like a "var" declaration.  The variable is declared the first time it is assigned to.  However there is a difference between JavaScript and Python:

A variable is not declared in a scope before it is assigned the first time.  However it's scope already is used before it is declared.  (This differs from JavaScript and any other language.)

To understand the difference have a look here:

[[
def outer(a):
  b = a
  def inner(c):
    print b
    b = 5
    return c+b
  print b
  return inner

f = outer(2)
f(2)
f(3)
]]
compare this with JavaScript:
[[
function outer(a)
{
  var b = a;
  function inner(c)
    {
      document.writeln(b);
      var b = 5;
      return c+b;
    }
   document.writeln(b);
   return inner;
}

f = outer(2);
document.writeln(f(2));
document.writeln(f(3));
]]

In Python you will see an error, while JavaScript just works fine.  The cause is the different scoping, which is difficult to spot in Python.

In Python the scope of variable b is within the "inner" function because it is assigned to within the "inner" function.  So the first reference to variable b from within inner() function is to an undeclared variable.  Hence the error.

The problem is that this is a runtime error, not a compile time error.  This problem could be spotted at compile time already, such that if you enter "b=5" within inner() you could already get an error like "variable declared after use within this scope" which would be far better than the opposite in runtime "variable undefined" when "print" is executed.

The bad thing about this is, that there are *two possible root causes* for undefined variables, where the latter is not very obvious:
- Variable is really not declared in enclosing scopes (usually it is not yet assigned, but this can be a little bit more complex).
- Variable is declared in the current scope but not yet present in the current scope.

A variable cannot change the scope in Python, that is, a variable is never searched in different scopes.  So after compile the variable scope is fixed.  If the scope where the variable must be does not contain the variable yet, you see the error.  So the "variable undefined" error is a little bit misleading, as it is missing a hint, in which scope (or namespace) the variable was expected.


*** How to access (write to) variables in an outer scope ***

It's easy bot not trivial.  If you want to assign to variables of the outer scope, you *must use the dot* (give a namespace) to access the variables.  If you do not use *namespace.variable* (even in the outer scope) the variable will instead be declared in the inner scope and therefor you cannot access it in the outside scope!

So all you have to do is to introduce a namespace like this:

[[
def outer(a):
  out = {}
  out.b = a
  def inner(c):
    print out.b
    b = 5
    return c+b
  print out.b
  return inner

f = outer(2)
f(2)
f(3)
]]
You can see clearly, what is done where!  The drawback is that you have to type a little bit more.

Note that this corresponds to "you see what you get", so there are no hidden namespaces in Python, but you have to remember that the language does not supply you with some automated implicite nested scoping (which can lead to confusion if you overdraw it).

The disadvantage about typing a little bit more (this is type "namespace." in front of the variable) is, that you do not need any complex construct to access outer scope variables somehow.  If it is "only read", the scoping is done implicitely, but if it is assigned to, you must use namespaces.

You might argue that this is illogic.  Reading is fine while writing is not.  However accidential writing to the wrong variable in outer scope can be a major source of programming errors.  So as done by Python it is fine that writing is denied.

However I agree, that it is asymmetric to leave read access to outer scope variables.  There shall be some keyword to declare that this variable shall have access to the outer scope (perhaps even with a way to denote the scope it shall be).  There currently only is one such keyword "global" which puts a variable in the global (topmost) scope.  However there is no way to explicitely specify another scope a variable should go in, except that you are using namespaces.  Namespaces, however, are nothinge else than variables (or dicts - you can use arrays for this, of course).

As there is no difference between a variable and a namespace, there is some possibility which leads to programming errors.  So I think this must be fixed in a distant future.  (Note that introducing namespaces in Python is not easy, as the upgrade script then must be able to track scoping, which is a complex task and cannot be completely automated, as there are too many possible pitfalls today.  So the introduction must be planned ahead for a longer time, it will take years to fix that.)

My poroposal would be, that in the end, variables have a flag, if a name is used as namespace or not.  Also there shall be some "global namespace" and "local namespace".

If used as a namespace, it can be found from an inner scope.  If it is not "global" it can only be found from the next inner scope or from a namespace command.

So here is a proposal for me:

- Drop the reserved word "global", make it a namespace, so
[ global x
becomes
[ from global pass x

- A NAMESPACE is nothing else than a special class:
[[
class NAMESPACE:
  __namespace__ = True
]]

- Use of namespace then is:
[[
from NAMESPACE pass var,var,..

with NAMESPACE:
  ...
]]
- To pass a namespace deeper into the scope, one could use:
[ pass namespace,namespace,...

- Perhaps introduce syntactic sugar, so you can write:
[ namespace X:
instead of
[[
class X: __namespace__=True
with X:
]]

This would lead to some more sane writing:

[[
def outer(a):
  namespace out:
    b = a
    def inner(c):
      # note that accessing a is impossible here as a is not a __namespace__
      print out.b
      b = 5
      return c+b
    print b
    return inner

f = outer(2)
f(2)
f(3)
]]
note that, today, you can already write this as:
[[
def outer(a):
  class out: pass
  with out:
    b = a
    def inner(c):
      # note that accessing a is possible!
      print out.b
      b = 5
      return c+b
    print b
    return inner
]]
or shorter
[[
def outer(a):
  out = {}
  with out:
    b = a
    def inner(c):
      print out.b
      b = 5
      return c+b
    print b
    return inner
]]

The difference is only, that, today, you can access all outer variables in a read-only fashion, while I would like to see that only variables which are __namespace__==True are propagated into the next deeper block and those with __global_namepace__==True are passed into deeper nested inner scopes.


-Tino, 2011-01-16