Archive for the 'Bwain' Category

Script again

Bwain| No Comments »

So after taking almost 1 year off, (big diversion in looking at finance stuff), I started looking at the scripting problem again. What’s good about having stopped is that the proper solution is clear. Start with the scripting language first. Start with that before anything, from that we get everything, expressions, shading language, proper objects, aggregated data types, namespaces, everything. I need to start a new scripting language seperately, implement a new type system, and start from scratch.
Also, I need to focus on the pass by reference/value/pointer issue. Wasn’t solved in the original implementation, so now I have a chance to squash it.

Here is a spec:

Grammar:

  • Recursive

  • Namespaces
  • objects/classes/constructors

Internals:

  • Reuse the letter/envelope idiom used for spa. (Was very effective)

  • Has axiom data types. Int/Double/Enum/Function/String/void/etc.
  • Some contructors are purely script based, but some constructor will have a c++ component as well
  • Type dictionary
  • Gets compiled into bytecode.
  • compiled bytecode is executed, and manipulates a stack
  • Objects define themselves as being copied by value or reference.
  • handle references, maybe pointers.
  • typesafe
  • Has an execution stack, and heap.
  • Any non-axoim (binary) objects are callable, with a return type.

Assignment operations with the stack

Bwain| No Comments »

I’m getting stuck in how to implement assignment operations. Specifically, things like m[0][1] = abc, things like that. The way I have it now, I would be copying multiple instances of m onto the stack, then having the [] operators write their part, then copy all of ‘m’ back, which is a complete waste. What if ‘m’ is some huge 2D array? Beginning with ‘m’, that should return a reference, not a value.

How do I encode this using the existing type system?

spa::Value can have a boolean ‘ref’. If its true, then it doesn’t own its own data. spa::Value has the option to do a deep copy, which would mean owning its own data if necessary.

Got the 2nd proof of concept working! (The first proof of concept being the failed design I gave up on…) The simple example I started on works! Advantages to doing a proper code flattening: faster, easier to debug, can save out a pre-compiled script, also easier to do things like throw-catch, easier to look at the run process.

Now I’m left with having to implement the script commands for all existing funct types.

Crud… this is going to take a while… and even longer to completely debug….

Also, going to have to find some trickery to get references of subTypes. Previously, I was copying it, but with references, its trickier. ValueCon<> is a subclass of ValueImp, I could try creating ValueConPt<>, which would point to its data, but not own it. That will work… Also, that means that I have an implicit pointer implementation. Whoa… With pointers, I can randomly access packed arrays of data, which is kind of what the bracket operator does. So its actually more of a reference than a pointer. BUT, what if I allowed pointers to alloc/dealloc their own memory? That could be really cool, or get very confusing. A scripting language with its own heap space. Would that be a first? Also, I don’t know how the type system would handle it. It would be cool if I could keep chaining pointer modifiers, like int****, but not possible, and completely unecessary. So at the most, any value could be a reference/pointer.

Ok, this is definitely a diversion, but I would need to really test this stuff before I break everything.

Change of plan, not going to do it. Just going to keep the existing reference method in place. All bracket operators will copy the contents of the slice on the stack. If its too slow, I’ll deal with it later. Maybe have to create an iterator class to deal with this. The method I mentioned above is too much of a hack.

This is the most fragmented blog entry. I decided not to erase my train of thought. So here is the latest attempt. Using the Type mechanism, I’m going to try to make a reference type. So we can do Ref or Ref or anything. Hopefully, I can find an easy way to make it work…

Another interesting thing. The bracket operator has different meaning depending on when its part of a lhs expression or a rhs expression. In a rhs expression, its read-only, so the value can be copied. For the lhs expression, that’s when we need the reference functionality. So we need to be able to flag whether we’re in a lhs or rhs…..


New Script virtual machine design

Bwain| No Comments »

There is no way around it. I have to flatten the code, the smd::Cmd and spa::Funct have to execute in a single stream of instructions. Two stacks, one for manipulating symbols, another for local variables. In keeping with the simd model, the Funct class will have some static method that can be added to a list of script functions that get called in succession.

  • Instructions can have state, but they should be kept with a list of spa::Value in the superclass. This way, the instructions will be easy to copy. That also means that we wont be dealing with virtual function calls, which will keep the execution time optimal.
  • Symbols can have a name, and probably should for debugging purposes. But ultimately, symbols should be referenced by index.
  • The really good thing about this is that it will be trivial to copy compiled scripts around, and they can optionally be saved in bytecode form. I don’t need this feature yet, but it could be handy.
  • It would be nice to be able to just re-run the simd virtual machine as a single thread to run the scripting language, but we would lose the symbol abstraction. Also, the simd virtual machine is register based and can’t handle recursion. The script virtual machine is stack based.
  • One question, should the instruction stack be decoupled from the symbol stack? An instruction push always implies a local symbol table, but a local symbol stack does not imply an instruction stack. I would have more flexibility if they were decoupled.

Ok, a lot of work to do. In the end, it will be a WAY cleaner implementation. The variable binding business with VarValue was getting messy. Looking forward to having the recursive example work properly.

Of course, the only thing harder than designing an API is coming up with an example that’s simple enough to run as a first test….

Ok, here’s something funny… if I had started writing the scripting language first, I could have had the virtual machine create simd code as well. I would have been able to share a lot more code base that way. As it is now, I’m duplicating a lot of functionality. Live and learn I guess. Writing the simd virtual machine was a lot harder though. Maybe its easier doing the hard part first, then connecting later.

Ephiphany

Bwain| No Comments »

Each invocation of the execution stack doesn’t need an entire symbol table. All it needs is a list (deque) of spa::Value to hold local variables, and that’s it. The variable names have already been resolved during execution. Short of creating another virtual machine, this will save a LOT of computation and RAM overhead.

I got the new execution stack setup implemented, and it STILL fails at roughly 750 calls deep. WTF?? And its still using roughly the same amount of RAM. I should try compiling a binary to just run the scripting language, with no application or gui. Just to see how far that gets. But I should probably let this go, can’t figure out why its throwing a bad_alloc.

Ok, I’m not letting this go. I can’t risk having future memory corruption happen because the scripting language is unreliable. I’ve come up with some strategies to isolate the problem. I tried just stacking an array of spa::Value objects. That just keeps growing with no problem, so spa::Value isn’t it. I also simplified the script function itself. I’ve come across a sloppy problem with VarValue. I should allow for VarValue to be constant, and not be required to be binded. That way, the function symbol doesn’t have to be updated. Also, what could help is to add a function call to the grammar. Also, VarValue has a ’sym’ pointer that should be cleared before executing as a script. Just some cleanup that I’ve come across while simplifying the problem.

I’ve tried to grow the virtual machine execution stack arbitrarily deep, and it seems to be working. That doesn’t leave much else except for the expression evaluation. To test, I’m going to add a function call to the grammar, instead of having it being invoked as an expression. If that works, then I know to focus on the expression evaluation.

I FOUND THE FKING PROBLEM…

The execution stack size defaults to 1MB. I can change the stack size in the compiler, but maybe that’s not the way to go. I thought that the stack would just keep growing into the heap space until the two met, but I apparantly not.

SO, that means that I have to do more work on the virtual machine. Unfortunately, that also means that the way I’m evaluating expression wont work for the scripting language. I hope I can avoid doing this, but its looking like I may have to edit each Funct class again to operate on a stack. That at the very least, I may also have to resort to flattening the code as well. The SIMD engine does all its operations in a single loop. Optimally, the scripting engine should be able to do the same thing. But that’s a LOT more work… ugh….

I’ve been giving it thought… don’t think there is any way around it. I’ll have to add a method to all Funct operators that will execute themselves on a stack instead of using the c++ execution stack. Now whether that will require for me to flatten all the code is another question, I’ll see what I can get away with….

Recursion Error

Bwain| No Comments »

So I tried the following:

void ctor(int i) { print “ctor ” i “\n”; ctor(i+1); }

ctor(0);

I ran it, and it only recurses 757 times before it kind of stops. Its odd, I can’t tell where its failing, it seems odd to me that it would just run out of RAM. I tried running a similar function in perl, and it just keeps going. I tracked the RAM use, that’s definitely not the issue…

I narrowed the point of failure to a single deque assignment. I’m not too sure as to what would cause an stl function to silently end the process. Maybe memory is thrashed. In any case, I suspect I’m doing too much processing for each function recursion. I need to list exactly what happens when I’m doing a function call and see if I can’t streamline it.

I found this, explaining how stl will throw exceptions when they have memory failures, and possibly ways to alleviate it. When I run it, the task manager only report the process at 20M, which is nothing. Don’t know why stl would be failing. Also, the runtime stack is WAY deep. The debugger wont (can’t) display all of it. Each function call adds about 4 layers to the runtime stack. Given that the script is at 760 calls deep, that’s about 3040 runtime stack calls. But perl still works really well. Maybe I should try a release compile and see how far that gets.

Did a release build, and it executed the function call 3890 deep. Still, not as efficient as perl, but I guess it does appear to be a ram issue. Tomorrow, I’ll work on optimizing/streamlining the function calls. Crud… I ran the perl example until it ran out of RAM, it went 2262191 deep…. Python defaults to a stack depth of 1000, but when you extend it, it runs even deeper than perl.

Looked around online. To do a scalable, production ready scripting language, requires flattening the code into opcodes, and running a virtual machine in a similar way that I’m running the simd virtual machine. I’m not going to get into that now, I just want to get it working. Speed and scalability will have to wait. Found this book that explains a generalized virtual machine implementation.

I’ve been using the debugger to trace down the cause of the bad_alloc. I’ve narrowed it down to a SymbolTable constructor. But that doesn’t solve anything. The SymbolTables have been working forever, now that I’m allocating thousands of them, they’re crashing. I’ve noticed that there is a LOT of overhead for the symbolTable base class. So the next step definitely should be a streamlining step. There are many std::map and std::multimap’s being created unnecessarily. The symbol table for the stack should be a lot lighter, if its going to be instanced like that.

variable binding

Bwain| No Comments »

Still trying to figure out the variable binding. I think there are about 3 different ways to do this, so I think it might be better to just enumerate each one, choose one, or implement all and make it a compiler option.

  • Local scope only - All variables within a function have to be either declared locally, or are arguments. No access to global variables. That would include application varaibles, might not be the best way to handle things. But functions in vex are done this way.
  • Local + Global - A function can access the same variables as above, plus globally *defined* variables. That would include system wide symbols, but exclude variables defined outside the function.
  • Dynamic Binding - The symbol tables would just get stacked on top of one another. Any variable not defined by the function could be queried in the calling function, an so on, until the global symbol table is reached.
  • Frame access - If the same function is caling itself, then a common variable would keep getting redefined in the current frame. But what you could do it force a query a variable in the previous frame, not just the current frame. I can’t think of a use for this, (also really bad programming practice), but I think I read that its possible in scheme or lisp. If I found a formal term for this kind of addressing, I might find a use for it. So maybe ‘i’ in the previous frame could be ‘__i’. But then you have to ask why don’t you just pass in the variable as an argument? I should look this up…

A lot of this is just symbol table management. Every function needs to have pointers to its local variables to bind to a given symbol table.

Found this little gem. It formalized a lot of the definitions.

Another thing I have to do is differentiate between symbol tables that define the language, and tables that are used for global and local definitions. The language table should always be on the stack. The global/local ones will be taken on/off depending on what policy we specify above.

I got the input binding to work.

Implementing script functions

Bwain| No Comments »

So script functions are going to be objects that are callable. That way, they can also be passed around like data.

A mistake I made with the Types, is that the Types carry all the data within themselves. The reality is that there are (finite) amount of different types, and all the data that should be copied around is a unique identifier. So with the function type, data will definitely be non-trivial, it needs to know its argument and return types, and all the code which is a parsed Cmd object.

What I’m going to do is for the function objects to only be an identifier to be passed around, another lut is going to keep the necessary data. So these things can be passed by value, with no overhead.

Once I get this, I can start getting the virtual machine to start executing them.

UGH, I’ve come up against this RIDICULOUS msvc bug, it doesn’t setup all the virtual function table pointers for certain classes. This was happening with the Else statement class, but who knows where else this could be creeping into? I found a workaround for not using this one virtual function, but I seem to be hitting a limit/bug to the virtual function table size. And some of the virtual functions still aren’t being initialized… what a joke…

FUNCTON CALLS WORK….

RECURSIVE FUNCTION CALLS WORK….

Fuck yeah….

I’m just going to get the return values working, then maybe make it count recursively to test it, then continue with the operator stuff.

I’m having trouble getting the function inputs bindings to work properly. I know I don’t want to reparse or recopy the entire function. That’s unecessary, it would make the memory usage grow way too fast for deep recursion, and it would take too long. I should be able to get away with re-using a single function instance, and just re-binding the function inputs locally. The entire state of an execution frame should be contained in the symbol table.

So that means that VarValue nodes need to be tagged by a function command whenever they’re used to resolve an input argument. The VarValue needs a stack to bind to new local temp values, then get popped to restore them to previous values. VarValue just holds a pointer to its symbolTable, which is fortunate. To do a binding would only require updating these pointers.

Also need to make certain that variable definitions get cleaned up after they leave scope. A single variable can have multiple local instances. Don’t have a mechanism for dealing with that yet…. Function commands definitely need to have a list of any variables that resolve to their inputs. That’s step 1.

Good news

Bwain| No Comments »

Had a pretty good breakthrough. The recursive copy seems to WORK. I managed to edit over 100 classes, and at least a handful of them handle it well.

Recursive arrays of arbitrary types works.

array val = { { “one”, 1.0, “two” }, “zero”, { “three” } };

parses. What I’ve also discovered is that Python does some kind of similar organization. Objects are just lists of things, and these types get passed around. If I had started with that model, I might have had a slightly different outcome for the shading/scripting languages. In any case, that functionality is doable with this implementation.

Bad side, the scriping language is highly inefficient. Just to do that one assignment, I had to create 3 copies of the array, and do the assignment 3 times. Its the overhead of requiring every operation as a node. spa::Value has always been used to carry extremely light types, with very low overhead for carrying multiple copies of objects. I should start thinking about having spa::Value that passes things by reference, not value. Expecially for things that you can’t make arbitrary copies of, like images, matrices, huge vectors, etc.

Still, big win today. Next to implement is a node class constructor object for defining operator types.

I was thinking that there are two ways to get around the efficiency issues. One is to pass everything by reference. I think I might have to have the types specify whether they are to be passed by reference or not. Anything with a non-trivial copy constructor would qualify for this. Another thing is to do lazy evaluation for copies. So if you assign multiple objects to the same value, the copy only happens when you modify one of them. Any shared pointer implementation should be able to take care of this.

Got the global ‘app’, ‘opInfo’, and ‘OpCtor’ objects working. Scripting language is kicking ass. Also added syntax that allows for calling constructors on objects are they’re defined.

Added a ‘print’ function to the grammar, that was missing for some reason. Getting script functions to work, and that means recursion.

Back to it

Bwain| No Comments »

Long hiatus after finishing work on Land Of the Lost. Also, with Luna around, I have even less time….

Last left off at doing an overhaul on the type system. I’ve made a difference between a Type and a TypeField. As a result, I have to edit EVERY source that defines operators/functions. Including all the simd work, that’s a LOT. But its good. Its good to look at the entire work as a whole, I end up doing some large scale cleanup, and keeping the implementation somewhat consistent. As opposed to incrementally having it morph into different directions. Some duplicated data structures are also being merged.

My typing is a bit sloppier/slower. The side effect of being a user over a coder.

So the current task: Putting a script interface/interpreter on the app. Big step. I have to restructure the current grammar to allow for the language to be a list of statements, with each statement being a possible function. Currently, the grammar is a list of functions, which means that each script command would have to be wrapped up in a function. Don’t want that….

I’ve extended Type to have an array of modifiers. So we can do things like array. Its a slick way to pass types as arguments to objects. Right now, I can only see using it for arrays. I like the syntax as well.

I’ve been avoiding this, but I’m getting to the point that I have to have the script execute, which means I have to come up with some sort of virtual machine to execute it. I want it to be able to handle recursion, throw/catch, so that means having an execution stack. for/if/then/else/continue/break, the normal flow control commands.

So the commands will execute themselves, and will be able to manipulate the entire virtual machine. That’s pretty much how its going to work. The Cmd class should be under app, not smd.

UGH…. huge roadblock…. It hadn’t occured to me that I would need to be able to copy the expression trees. But yes, we DO. That means that I’m going to have to go into each Funct subclass and implement a virtual copy method. Sucks…


Generic operators

Bwain| No Comments »

I was able to create seperate grammars for functions, and shaders. So I can now re-use the function grammar for other things.

Next step: Create generic operators. Kinda like .otls. A good place to start is the Ultrafractal merge, since I’ll be using this to combine layers. Found this link describing the different algorithms used. I also want to create a ’switch’ statement for the shading language. Having a chain of if-then-else statements will be kind of a drag.

I’ve come up against the next roadblock. I want to be able to create meta types, which are types parameterized by other types. Kind of like class templates that take other types as arguments. It would be easy to extend the grammar to handle this. The Array class, (and any container class really) would be good candidates for this.

The problem is spa::Type. Currently, its a bit field, which really is overkill. All it should really be is an unsigned integer. The bitfield should be a seperate class. The functionality for the bitfield only exists in the symbol tables for specifying multiple types, and that’s pretty much it. Changing spa::Type to an ‘int’, and creating a seperate bitfield will require breaking a lot.

But its necessary. Once that’s done, we can do

Array abc = {”one”, “two”, “Three”};

I suppose you can debate about which arguments should be meta-types, and which arguments should be constructor arguments. With a scripting language, it doesn’t matter since everything is an object. But I really like the template syntax.

Another thing is doing a proper (recursive) array value type. I want to be able to do this:

Array stuff = { { “one”, “two”, “three” }, { 10 11 12 }, { Blue Red Green } } ;

So some arrays will be typesafe, but some wont…. This might require for variables to be able to change types depending on their assignment. That could be really cool, or confusing.  The current Array implementation can handle all of these cases, which is good.


Copyright © 2010 Luna-Canis | Created by miloIIIIVII
Top | Sidebar | Sitemap | Disclaimer | Network