Announcement

Collapse

Looking for a User App or Add-On built by the NinjaTrader community?

Visit NinjaTrader EcoSystem and our free User App Share!

Have a question for the NinjaScript developer community? Open a new thread in our NinjaScript File Sharing Discussion Forum!
See more
See less

Partner 728x90

Collapse

EMA output from custom DataSeries = wonky

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    EMA output from custom DataSeries = wonky

    Hey, I'm trying to put my own data into a DataSeries so that I can run Indicator methods on it.

    So for example, I set the last 12 values of a DataSeries to be 100.0 and then pass it into an EMA.

    Here's the example code:

    Code:
    for( int i=0; i<12; i++ )
    {
    	testSeries.Set( i, 100.0 );
    }
    
    IDataSeries emaTest = EMA( testSeries, 5 );
    
    for( int i=0; i<5; i++ )
    {
    	Print( "VALUE: " + testSeries[i].ToString() );
    	Print( "EMA: " + emaTest[i].ToString() );
    }
    I would expect the output from the EMA to be 100.0 since that's all that's being fed into it. However, this is the output:

    Code:
    VALUE: 100
    EMA: 109.097235188783
    VALUE: 100
    EMA: 113.645852783175
    VALUE: 100
    EMA: 120.468779174763
    VALUE: 100
    EMA: 130.703168762144
    VALUE: 100
    EMA: 146.054753143216
    This appears to only be a problem with the EMA. SMA and LinReg both return correct values. If I fill testSeries with a larger number of values (ie 30), then it gets closer to the correct EMA value (100.0). However, if I am setting the period to 5, it shouldn't be looking back in the array further than 10 if I'm only looking at first 5 values.

    What am I doing wrong here? Is there a bug in the EMA method?

    #2
    LiquidDrift, the EMA is an infinite filter and as such would always take all values into consideration (although the longer the series gets their weights get really small).
    BertrandNinjaTrader Customer Service

    Comment


      #3
      Originally posted by LiquidDrift View Post
      Hey, I'm trying to put my own data into a DataSeries so that I can run Indicator methods on it.

      So for example, I set the last 12 values of a DataSeries to be 100.0 and then pass it into an EMA.

      Here's the example code:

      Code:
      for( int i=0; i<12; i++ )
      {
      	testSeries.Set( i, 100.0 );
      }
      
      IDataSeries emaTest = EMA( testSeries, 5 );
      
      for( int i=0; i<5; i++ )
      {
      	Print( "VALUE: " + testSeries[i].ToString() );
      	Print( "EMA: " + emaTest[i].ToString() );
      }
      I would expect the output from the EMA to be 100.0 since that's all that's being fed into it. However, this is the output:

      Code:
      VALUE: 100
      EMA: 109.097235188783
      VALUE: 100
      EMA: 113.645852783175
      VALUE: 100
      EMA: 120.468779174763
      VALUE: 100
      EMA: 130.703168762144
      VALUE: 100
      EMA: 146.054753143216
      This appears to only be a problem with the EMA. SMA and LinReg both return correct values. If I fill testSeries with a larger number of values (ie 30), then it gets closer to the correct EMA value (100.0). However, if I am setting the period to 5, it shouldn't be looking back in the array further than 10 if I'm only looking at first 5 values.

      What am I doing wrong here? Is there a bug in the EMA method?
      You are not quite doing what you think that you are.
      Code:
      IDataSeries emaTest = EMA( testSeries, 5 );
      is not the correct way to populate a DataSeries.

      Comment


        #4
        OK, thanks, I figured out that indeed it is an infinite series and older entries in my "testSeries" were throwing off the values.

        @koganam - What am I doing wrong there? I've been populating dataseries that way all over the place with no problems yet. I'm coming from a C++ background, so I may be doing improper assignment for sure. What is the correct way?

        Thanks!

        Comment


          #5
          Originally posted by LiquidDrift View Post
          OK, thanks, I figured out that indeed it is an infinite series and older entries in my "testSeries" were throwing off the values.

          @koganam - What am I doing wrong there? I've been populating dataseries that way all over the place with no problems yet. I'm coming from a C++ background, so I may be doing improper assignment for sure. What is the correct way?

          Thanks!
          Unfortunately that is not your problem. Your original statement was the correct one. It does not matter how many terms there are in a rectangular distribution, the average value, no matter how it is measured, ema, sma, weighted, etc., will always be exactly the same; the value of every identical member of the distribution. Even a cursory glance at how any average is calculated will make this clear. If all the members of the distribution are 100, then the value of the average MUST be 100, or the average itself must be being miscalculated or improperly defined. This is not simply a mathematical nicety. The definition of average, as the most likely value, means that it must be so. The most likely value of a collection whose every member is 100 cannot be anything but 100.

          The problem lies in your code.

          As you have written it, on each barUpdate, you are redefining and initializing an Interface, IDataSeries to an unknown state.

          To correctly do what you want, you should declare a class variable of type EMA (EMA is a class, hence object). You then assign/instantiate this named instance of an EMA (in either Initialize(), or preferably in OnStartUp(), or even reinitialize it every time in OnBarUpdate() like you have done. But it must be a named instance of the class, not a new declaration of the Interface.

          Here is what I mean:
          Code:
          private EMA emaTest;
          Code:
          protected override void Initialize()
          {
          this.testSeries		= new DataSeries(this);
          }
          Code:
          protected override void OnBarUpdate()
          {
          // Use this method for calculating your indicator values. Assign a value to each
          // plot below by replacing 'Close[0]' with your own formula.
          //            Plot0.Set(Close[0]);
          if (CurrentBar < 5) return;
          for( int i=0; i<12; i++ )
          {
          testSeries.Set( i, 100.0 );
          }
          
          emaTest = EMA( testSeries, 5 ); //this statement can go in Initialize(), or OnStartUp(), which is the most efficient place for it.
          
          for( int i=0; i<5; i++ )
          {
          Print( "VALUE: " + testSeries[i].ToString() );
          Print( "EMA: " + emaTest[i].ToString() );
          }
          }

          Comment


            #6
            Ah, I see, I did not know what a C# Interface was, that was helpful thanks.

            I'm still having problems with this however. It appears that DataSeries data lingers in the EMA, even if the DataSeries has been completely overwritten.

            For example:

            Code:
            ++barNum;
            if( barNum == 40 || barNum == 100 )
            {
            	Print( "-------------------------" );
            	int cnt = Math.Min( testSeries.Count, 256 );
            	for( int i=1; i<cnt; i++ )
            	{
            		testSeries.Set( i, 100.0 );
            	}
            	testSeries.Set( 0, 500.0 );
            	
            	emaTest = EMA( testSeries, 5 );
            
            	for( int i=0; i<5; i++ )
            	{
            		Print( "VALUE: " + testSeries[i].ToString() );
            		Print( "EMA: " + emaTest[i].ToString() );
            	}
            
            }
            return;
            Produces this:

            Code:
            -------------------------
            VALUE: 500
            EMA: 233.333333333333
            VALUE: 100
            EMA: 100
            VALUE: 100
            EMA: 100
            VALUE: 100
            EMA: 100
            VALUE: 100
            EMA: 100
            -------------------------
            VALUE: 500
            EMA: 322.222222222222
            VALUE: 100
            EMA: 233.333333333333
            VALUE: 100
            EMA: 100
            VALUE: 100
            EMA: 100
            VALUE: 100
            EMA: 100
            So it runs through the same code twice at different times and comes up with 2 different results. I can get it to work correctly if I insert:

            Code:
            	EMA( testSeries, 5 ).Dispose();
            before this line:
            Code:
            	emaTest = EMA( testSeries, 5 );
            But that results in horrible performance in a backtest and NT eventually runs out of memory. Is there some way to do this properly such that the EMA data is correct, but I don't need to call Dispose() every time?

            Comment


              #7
              Originally posted by LiquidDrift View Post
              Ah, I see, I did not know what a C# Interface was, that was helpful thanks.

              I'm still having problems with this however. It appears that DataSeries data lingers in the EMA, even if the DataSeries has been completely overwritten.

              For example:

              Code:
              ++barNum;
              if( barNum == 40 || barNum == 100 )
              {
              	Print( "-------------------------" );
              	int cnt = Math.Min( testSeries.Count, 256 );
              	for( int i=1; i<cnt; i++ )
              	{
              		testSeries.Set( i, 100.0 );
              	}
              	testSeries.Set( 0, 500.0 );
              	
              	emaTest = EMA( testSeries, 5 );
              
              	for( int i=0; i<5; i++ )
              	{
              		Print( "VALUE: " + testSeries[i].ToString() );
              		Print( "EMA: " + emaTest[i].ToString() );
              	}
              
              }
              return;
              Produces this:

              Code:
              -------------------------
              VALUE: 500
              EMA: 233.333333333333
              VALUE: 100
              EMA: 100
              VALUE: 100
              EMA: 100
              VALUE: 100
              EMA: 100
              VALUE: 100
              EMA: 100
              -------------------------
              VALUE: 500
              EMA: 322.222222222222
              VALUE: 100
              EMA: 233.333333333333
              VALUE: 100
              EMA: 100
              VALUE: 100
              EMA: 100
              VALUE: 100
              EMA: 100
              So it runs through the same code twice at different times and comes up with 2 different results. I can get it to work correctly if I insert:

              Code:
              	EMA( testSeries, 5 ).Dispose();
              before this line:
              Code:
              	emaTest = EMA( testSeries, 5 );
              But that results in horrible performance in a backtest and NT eventually runs out of memory. Is there some way to do this properly such that the EMA data is correct, but I don't need to call Dispose() every time?
              Where are you running this code? As a method, or in an event handler. Yes, it would make a difference.

              Comment


                #8
                Originally posted by LiquidDrift View Post
                Ah, I see, I did not know what a C# Interface was, that was helpful thanks.

                I'm still having problems with this however. It appears that DataSeries data lingers in the EMA, even if the DataSeries has been completely overwritten.

                For example:

                Code:
                ++barNum;
                if( barNum == 40 || barNum == 100 )
                {
                	Print( "-------------------------" );
                	int cnt = Math.Min( testSeries.Count, 256 );
                	for( int i=1; i<cnt; i++ )
                	{
                		testSeries.Set( i, 100.0 );
                	}
                	testSeries.Set( 0, 500.0 );
                	
                	emaTest = EMA( testSeries, 5 );
                
                	for( int i=0; i<5; i++ )
                	{
                		Print( "VALUE: " + testSeries[i].ToString() );
                		Print( "EMA: " + emaTest[i].ToString() );
                	}
                
                }
                return;
                Produces this:

                Code:
                -------------------------
                VALUE: 500
                EMA: 233.333333333333
                VALUE: 100
                EMA: 100
                VALUE: 100
                EMA: 100
                VALUE: 100
                EMA: 100
                VALUE: 100
                EMA: 100
                -------------------------
                VALUE: 500
                EMA: 322.222222222222
                VALUE: 100
                EMA: 233.333333333333
                VALUE: 100
                EMA: 100
                VALUE: 100
                EMA: 100
                VALUE: 100
                EMA: 100
                So it runs through the same code twice at different times and comes up with 2 different results. I can get it to work correctly if I insert:

                Code:
                	EMA( testSeries, 5 ).Dispose();
                before this line:
                Code:
                	emaTest = EMA( testSeries, 5 );
                But that results in horrible performance in a backtest and NT eventually runs out of memory. Is there some way to do this properly such that the EMA data is correct, but I don't need to call Dispose() every time?
                Ah, the painful vicissitudes of optimized processors and FPU's. That looks to me like floating point inaccuracies inherent in trying to use digital equipment to approximate floating point numbers.

                The clue that it is probably the effects of the optimization that holds the structure in the pipeline, instead of flushing it? The fact, that when you explicitly Dispose() of it, the problem goes away.

                It looks like you may have to specify the precision of your calculation results if you want consistency.

                Comment


                  #9
                  Originally posted by koganam View Post
                  Ah, the painful vicissitudes of optimized processors and FPU's. That looks to me like floating point inaccuracies inherent in trying to use digital equipment to approximate floating point numbers.

                  The clue that it is probably the effects of the optimization that holds the structure in the pipeline, instead of flushing it? The fact, that when you explicitly Dispose() of it, the problem goes away.

                  It looks like you may have to specify the precision of your calculation results if you want consistency.
                  Floating point errors are usually relatively small compared to a number like 100. Also, if it's a floating point error, it's a hell of a coincidence that the second EMA of the second set is identical to the first EMA of the first set. AND, I'm completely filling the DataSeries before I do any calculation, so the floating point error should show up the same way both times.

                  The EMA / DataSeries appears to be doing some stuff under the hood that does not allow this kind of behavior. I believe I'm going to have to integrate my own EMA with a regular array to get what I need.

                  Thanks so much for looking at it everyone. NT people - it would be good if you could look at this further, this may be a sign that there's a bug on your end somewhere.

                  Comment


                    #10
                    Originally posted by LiquidDrift View Post
                    Floating point errors are usually relatively small compared to a number like 100. Also, if it's a floating point error, it's a hell of a coincidence that the second EMA of the second set is identical to the first EMA of the first set. AND, I'm completely filling the DataSeries before I do any calculation, so the floating point error should show up the same way both times.

                    The EMA / DataSeries appears to be doing some stuff under the hood that does not allow this kind of behavior. I believe I'm going to have to integrate my own EMA with a regular array to get what I need.

                    Thanks so much for looking at it everyone. NT people - it would be good if you could look at this further, this may be a sign that there's a bug on your end somewhere.
                    Hey, wait a Holy Minute right there!! There is more to this than meets the eye. When I run your code, that is NOT the output that I got. My output is what I made the comment from.

                    This is what I got:

                    Code:
                    -------------------------
                    VALUE: 500
                    EMA: 233.333333333333
                    VALUE: 100
                    EMA: 100
                    VALUE: 100
                    EMA: 100
                    VALUE: 100
                    EMA: 100
                    VALUE: 100
                    EMA: 100
                    -------------------------
                    VALUE: 500
                    EMA: 233.33333333696
                    VALUE: 100
                    EMA: 100.000000005439
                    VALUE: 100
                    EMA: 100.000000008159
                    VALUE: 100
                    EMA: 100.000000012239
                    VALUE: 100
                    EMA: 100.000000018358
                    where we do see a reasonably small floating point error. In fact, just to be sure, I reset barNum at the end of the loop, to force the code to run multiple times, and the solution did converge to a consistent rounding error. That is why, in addition to your discovery of what happens when the named EMA is disposed, I concluded that all I was seeing was a rounding error caused by pipeline optimization.

                    Comment


                      #11
                      Originally posted by koganam View Post
                      Unfortunately that is not your problem. Your original statement was the correct one. It does not matter how many terms there are in a rectangular distribution, the average value, no matter how it is measured, ema, sma, weighted, etc., will always be exactly the same; the value of every identical member of the distribution. Even a cursory glance at how any average is calculated will make this clear. If all the members of the distribution are 100, then the value of the average MUST be 100, or the average itself must be being miscalculated or improperly defined. This is not simply a mathematical nicety. The definition of average, as the most likely value, means that it must be so. The most likely value of a collection whose every member is 100 cannot be anything but 100.

                      The problem lies in your code.
                      Good grief. I have a BS in math and comp sci. rectangular distribution? I guess I need to catch up. My real world job has made me soft.

                      Comment


                        #12
                        Hey, wait a Holy Minute right there!! There is more to this than meets the eye. When I run your code, that is NOT the output that I got. My output is what I made the comment from.
                        Yes, your output does appear to be floating point errors, and indeed it may be floating point errors if you are resetting barNum at the end of the loop. I got my results by having the code snippet run just a couple of times, far apart from each other, ie the 40, 100 values.

                        My output is much further off, and again, the coincidence of the 233.333333333 value in both outputs leads me to conclude that even though I'm overwriting the DataSeries, old data continues to live somewhere in NT and continues to be processed by the EMA.

                        I'm bummed that you were not able to reproduce my results, maybe you can if you try running on different bars further apart. Since I believe it's a memory/data issue, no 2 computers are going to get the same results every time however.

                        The reason I created the code snippet was to attempt to narrow down the same issue that I'm seeing in a more complex strategy, and to hopefully either figure out if I'm doing something wrong, or shine some light on the problem.

                        I now have 2 (3 if I count yours) cases where this is happening, and that's more than enough for me to not trust data that I'm filling into a DataSeries and processing with an Indicator. When I use third party code to do the same thing using regular arrays, I'm not seeing any issues.

                        Just want to say once again, I'm very thankful for your time koganam for taking a look at this and your feedback!

                        Comment


                          #13
                          Originally posted by LiquidDrift View Post
                          Yes, your output does appear to be floating point errors, and indeed it may be floating point errors if you are resetting barNum at the end of the loop. I got my results by having the code snippet run just a couple of times, far apart from each other, ie the 40, 100 values.
                          Actually, the output that I showed was what I got from running your exact code (cut-and-paste). It was that in investigating if I was just seeing FP error, that I reset barNum so that the code run multiple times (plenty of bars on the chart), knowing that if I was just seeing FP error, then the solution would have to converge to a stable values state, which it did.

                          My output is much further off, and again, the coincidence of the 233.333333333 value in both outputs leads me to conclude that even though I'm overwriting the DataSeries, old data continues to live somewhere in NT and continues to be processed by the EMA.
                          The 233.3 recurring is actually exactly correct for the way that the EMA is being initialized . The calculation is also mathematically correct.

                          I am actually more surprised that there are FP errors in the first place. I would expect each run of the code to produce the exact same results; not perfectly correct results in the first run, then anything else in the next. After all, the values are being shown to be exactly 100 in both cases, so there should be no difference.

                          The only question now is: "Have we found an error in the way C# handles objects, or is the error in the way that the CPU handles caching and pipelining. To me, our mutual results point to a processing error somewhere.
                          ... I now have 2 (3 if I count yours) cases where this is happening, and that's more than enough for me to not trust data that I'm filling into a DataSeries and processing with an Indicator. When I use third party code to do the same thing using regular arrays, I'm not seeing any issues.

                          Just want to say once again, I'm very thankful for your time koganam for taking a look at this and your feedback!
                          Don't mention it. It was an interesting conundrum arising from something that at first glance looked trivial. I am the better for having looked at it. Thank you!
                          Last edited by koganam; 08-29-2012, 12:37 AM. Reason: Corrected spelling

                          Comment


                            #14
                            Originally posted by sledge View Post
                            Good grief. I have a BS in math and comp sci. rectangular distribution? I guess I need to catch up. My real world job has made me soft.
                            Just made you soft? It knocked me out, then picked me, chewed me up really nicely, and then spit me right out.

                            Comment


                              #15
                              The 233.3 recurring is actually exactly correct for the way that the EMA is being initialized . The calculation is also mathematically correct.
                              If this is true, than why do we not see it recurring in your output, only mine? Or were you seeing it in your output as well?

                              If what you're saying is true, then the EMA doesn't take into account new data entered into the DataSeries, which would make sense given the output.

                              Comment

                              Latest Posts

                              Collapse

                              Topics Statistics Last Post
                              Started by RideMe, 04-07-2024, 04:54 PM
                              5 responses
                              28 views
                              0 likes
                              Last Post NinjaTrader_BrandonH  
                              Started by f.saeidi, Today, 08:13 AM
                              1 response
                              4 views
                              0 likes
                              Last Post NinjaTrader_ChelseaB  
                              Started by DavidHP, Today, 07:56 AM
                              1 response
                              6 views
                              0 likes
                              Last Post NinjaTrader_Erick  
                              Started by kujista, Today, 06:23 AM
                              3 responses
                              9 views
                              0 likes
                              Last Post kujista
                              by kujista
                               
                              Started by Mindset, Yesterday, 02:04 AM
                              2 responses
                              18 views
                              0 likes
                              Last Post NinjaTrader_RyanS  
                              Working...
                              X