Avoiding data memory duplication in subVI calls

I am on a Quest to better understand some of the subtle ways of the LabVIEW memory manager. Overall, I want to (as much as practically possible) eliminate calls to the memory manager while the code is running.
(I mainly do RT code that is expected to run "forever", the more static and "quiet" the memory manager activity is, the faster and simpler it is to prove beyond reasonable doubt that your application does not have memory leaks, and that if will not run into memory fragmentation (out of memory) issues etc. What I like to see as much as possible, are near static "used memory" and "largest contiguous block available" stats over days and weeks of deployed RT code.)
In my first example (attached, "IPE vs non-IPE.png"), I compared IPE buffer allocation (black dots) for doing some of the operations in an IPE structure vs. "the old way". I see fewer dots the old way, and removed the IPE structure.
Next I went from initializing an array of size x to values y to using a constant array (0 values) with an "array add" to get an array with the same values as my first version of the code. ("constant array.png")
The length of the constant array is set to my "worst case" of 25 elements (in example). Since "replace sub-array" does not change the size of the input array even when the sub-array is "too long", this saves me from constantly creating small, variable sized arrays at run-time. (not sure what the run-time cpu/memory hit is if you tried to replace the last 4 elements with a sub-array that is 25 elements long...??)
Once I arrived at this point, I found myself wondering "how exactly the constant array is handled during run-time?". Is it allocated the first time that this sub-vi is called then remains in memory until the main/top VI terminates, or is it unloaded every time the SubVI finishes execution? (I -think- Mac's could unload, while windows and linux/unix it remains in memory until top level closes?)  When thinking (and hopefully answering),  consider that the the code is compiled to an RTEXE runningg on a cRIO-9014 (vxWorks OS).  
In this case, I could make the constant array a control, and place the constant on the diagram of the caller, and pipe the constant all the way up to the top level VI, but this seems cumbersome and I'm not convinced that the compiler would properly reckognize that at the end of a long chain of sub-sub-sub VI's all those "controls" are actually always tied off to a single constant. Another way would perhaps be to initialize a FG with this constant array and always "read it" out from the FG. (using this cool trick on creating large arrays on a shift register with only one copy which avoids the dual copy (one for shift register, one from "initialize array" function)).
This is just one example of many cases where I'm trying to avoid creating memory manager activity by making LabVIEW assign memory space once, then only operate on that data "in-place" as much as possible. In another discussion on "in-place element" structures (here), I got the distinct sense that in-place very rarely adds any advantage as the compiler can pick up on and do "in-place" automatically in pretty much any situation. I find the NI documentation on IPE's lacking in that it doesn't really show good examples of when it works and when it doesn't. In particular, this already great article would vastly benefit from updates showing good/bad use of IPE's.
I've read the following NI links to try and self-help (all links should open in new window/tab):
cool trick on creating large arrays on a shift register with only one copy
somewhat dated but good article on memory optimization
IPE caveats and recommendations
How Can I Optimize the Memory Use in My LabVIEW VI?
Determining When and Where LabVIEW Allocates a New Buffer
I do have the memory profiler tool, but it shows min/max/average allocations, it doesn't really tell me (or I don't know how to read it properly) how many times blocks are allocated or re-allocated.
Thanks, and I hope to build on this thread with other examples and at the end of the thread, hopefully everyone have found one or two neat things that they can use to memory optimize their own applications.  Next on my list are probably handling of large strings, lots of array math operations on various input arrays to create a result output array etc.
CLD LabVIEW 7.1 to 2013
IPE vs non-IPE.png ‏4 KB
constant array.png ‏3 KB


I sense a hint of frustration on your part, I'm not trying to be dense or difficult, but do realize that this is more towards the "philosophical" side than "practical" side. Code clarity and practicalities are not necessarily the objectives here.
Also, I have greatly appreciated all your time and input on this and the other thread!
The answer to your first question is actually "yes, sort of". I had a RT application that developed a small memory leak (through a bug with the "get volume info.vi' from NI), but to isolate it and prove it out took a very long time because the constant large allocation/deallocations would mask out the leak. (Trace's didn't work out either since it was a very very slow leak and the traces would bomb out before showing anythinng conclusive.) The leak is a few bytes, but in addition to short term memory oscilations and  long term (days) cyclical "saw-tooth" ramps in memory usage, made this very hard to see. A more "static" memory landscape would possibly have made this simpler to narrow down and diagnose. or maybe not. 
Also, you are missing my point entierely, this is not about "running out of memory" (and the size of 25 in my screen-shot may or may not be what that array (and others) end up being). This is about having things allocated in memory ONCE then not de-allocated or moved, and how/when this is possible to accomplish.  Also this is a quest (meaning something  I'm undertaking to improve and expand my knowledge, who said it has to be practical).
You may find this document really interesting, its the sort of thing you could end up being forced to code to, albeit, I don't see how 100% compliance with this document would ever be possible in LabVIEW, thats not to say its worthless: JPL Institutional Coding Standard for the C Programming Language (while it is directed at C, they have a lot of valid general points in there.)
Yes, you are right that the IPE would grow the output if the lenght of my replacement array is not the same, and since I can't share the full VI's its a bit of a stretch to expect people to infer from the small screen dummp that the I32 wires on the right guarantee the lengths will match up in the IPE example.
Once, on recomendation of NI support, I actually did use the Request deallocation primitive during the hunt for what was going on in that RT app I was debugging last year. At that particular time, the symptom was constant fragmentation of memory, until the largest contiguous block would be less than a couple of kB and the app would terminate with 60+MB of free memory space.. (AKA memory leak, though we could not yet prove that from diagnostic memory consumption statistics due to the constant dynamic behavior of the program)  I later removed them. Also, they would run counter to my goal of "allocate once, re-use forever" that I'm chasing. and again, I'm chasing this more as a way to learn than because all my code MUST run this way. 
I'm not sure I see what you mean by "copying data in and out of some temporary array". Previously (before the constant array) at every call to the containing sub-vi, I used to "initialize array" with x elements of value y (where x depends to a large degree on a configuration parameter, and y is determined by the input data array). Since I would call to "initialize" a new array each time the code was called, and the size of the array could change, I looked for a way that I could get rid of the dynamic size, and get rid of dynamically creating the array from scratch each time the sub-vi was called. What I came up with is perhaps not as clear as the old way I did it, but with some comments, I think its clear enough. In the new way, the array is created as a constant, so I would think that would cause less "movement" in memory as it at that point should be preventing the "source" array from (potentially) moving around in memory?  Considering the alternative of always re-creating a new array, how is this adding an "extra" copy that creating new ones would not create?
How would you accomplish the task of creating an array of "n" elements, all of value "y" without creating "extra" copies? -auto-indexing using a for loop is certainly a good option, but again, is that sure to reuse the same memory location with each call? Would that not, in a nit-picking way, use more CPU cycles since you are building the array one element at the time instead of just using a primitive array add operation (which I have found to be wicked fast operations) and operate on a constant data structure?
I cannot provide full VI's without further isolation, maybe down the road (once my weekends clear up a bit). Again, I appreciate your attention and your time!
CLD LabVIEW 7.1 to 2013