chessgod101
Please login to view all of the forum content.

The Zirie Tests

Page 1 of 2 1, 2  Next

View previous topic View next topic Go down

The Zirie Tests

Post  Zirie on Thu Dec 19, 2013 4:31 pm

Ok, since it looks like I'm going to be doing this for a while, I might as well give it a name. I'ma call it the 'Zirie Tests', and it is basically a comparison of the top engines in both long and short time controls. I know there are many of these out there, but I like doing my own, too. Here I will publish what I find, and with your help, I will improve my tests for the benefit of the computer chess community. I will try to add a part to my website for this, namely chess.zirie.com, soon.

Zirie

Posts : 49
Points : 65
Reputation : 6
Join date : 2013-12-09

View user profile

Back to top Go down

Re: The Zirie Tests

Post  Zirie on Thu Dec 19, 2013 4:53 pm

Here is the first of my findings so far.

Bullet Time Control

This test was run with a Bullet time control of 1m per game + 0s increment per move (1+0), using 4 cores, ponder off, Perfect12t as book. All were 64 versions.



This is how I interpret these results: At Bullet time control,

  • Houdini 4 and Stockfish are very close in strength. In 100 games, H4 only had 1 game advantage over SF, but this is too small to be significative.
  • Houdini 4 clearly dominates Komodo TCEC in Bullet. In 100 games, H4 had a 57.5 - 42.5 advantage over KT.
  • Stockfish seems to overcome Komodo TCEC, but by less. In 100 games, SF had a 52.5 - 47.5 advantage over KT.

Zirie

Posts : 49
Points : 65
Reputation : 6
Join date : 2013-12-09

View user profile

Back to top Go down

Re: The Zirie Tests

Post  Zirie on Thu Dec 19, 2013 5:10 pm

What follows are the preliminary results of tests at Rapid time controls.

Rapid Time Control - Ongoing

As always, 4 cores per engine, ponder off, Perfect12t as book, etc. The time control I used was as follows: (15m+15s)/40 + (10m+10s)/40 + (5m+5s). I will let it run to at least 100 games between each pair of engines. So far, I'm about a third there.



It is still early, but here are some insights from the results so far. At these Rapid time controls...

  • Stockfish may be slightly better than Houdini 4, since after 38 games, SF has a small advantage of 2 games over H4.
  • Stockfish and Komodo TCEC are very close to each other. After 38 games, they are tied.
  • Houdini 4 may be better than Komodo TCEC, since after 38 games H4 has a 4 game advantage over KT.


I expect these results may change as the test proceeds. Right now, for example, it seems Komodo is inching a victory over Stockfish. I will keep you posted.

Zirie

Posts : 49
Points : 65
Reputation : 6
Join date : 2013-12-09

View user profile

Back to top Go down

Re: The Zirie Tests

Post  Zirie on Thu Dec 19, 2013 5:19 pm

What follows are the preliminary results of tests at long time controls (LTC).

Long Time Control - Ongoing

As always, 4 cores per engine, ponder off, Perfect12t as book, etc. The time control I used was as follows: (90m+30s)/40 + (15m+30s). I hope to let it run to at least 100 games between each pair of engines. Here's where I am now.



It is still early, but here are some insights from the results so far. At these long time controls...


  • Komodo TCEC seems stronger than Stockfish.
  • Stockfish seems even stronger than Houdini.
  • And yet, Houdini 4 has proven equal to Komodo TCEC so far


So, apparently it's a rock, scissors, paper kind of thing. I expect these results may change as the test proceeds. I will keep you posted.

Zirie

Posts : 49
Points : 65
Reputation : 6
Join date : 2013-12-09

View user profile

Back to top Go down

Re: The Zirie Tests

Post  Mohamed Nayeem on Fri Dec 20, 2013 12:18 am


Mohamed Nayeem

Posts : 73
Points : 110
Reputation : -1
Join date : 2013-12-09

View user profile

Back to top Go down

Re: The Zirie Tests

Post  plantprot on Fri Dec 20, 2013 4:22 am

nice one zirie. what is your estimated date that you LTC will be finished?

plantprot

Posts : 35
Points : 37
Reputation : 0
Join date : 2013-12-12

View user profile

Back to top Go down

Re: The Zirie Tests

Post  Zirie on Fri Dec 20, 2013 2:47 pm

plantprot wrote:nice one zirie. what is your estimated date that you LTC will be finished?

Thanks! Early January, I'd say...

Zirie

Posts : 49
Points : 65
Reputation : 6
Join date : 2013-12-09

View user profile

Back to top Go down

Re: The Zirie Tests

Post  plantprot on Fri Dec 20, 2013 4:09 pm

Zirie wrote:
plantprot wrote:nice one zirie. what is your estimated date that you LTC will be finished?

Thanks! Early January, I'd say...
Ohhhh... longer than what I've expected...  Very Happy Smile Cool I love you I love you  You really have got a heavy tournament there Very Happy

plantprot

Posts : 35
Points : 37
Reputation : 0
Join date : 2013-12-12

View user profile

Back to top Go down

Re: The Zirie Tests

Post  Zirie on Fri Dec 20, 2013 5:45 pm

Long Time Control - Test 1

I have stopped the first LTC test after 39 games, meaning each pair of engines played 13 games against each other engine. As always, 4 cores per engine, ponder off, Perfect12t as book, Nunn's DB as opening playing both colors, etc. The time control I used was as follows: (90m+30s)/40 + (15m+30s). The result after 39 games is shown below:



The reason I stopped it is that the tendency is by now pretty clear. Here are my insights for these long time controls:


  •  At these time controls, apparently, Komodo TCEC is stronger than Stockfish, 8.0 - 5.0
  •  At these time controls, apparently, Stockfish is stronger than Houdini 4, 8.0 - 5.0
  •  At these time controls, apparently, Komodo TCEC is equal to Houdini 4, 6.5 - 6.5


So, apparently, this it's a rock, scissors, paper kind of thing. This explains why Komodo and Stockfish drew in Stage 4 of TCEC and Komodo beat Stockfish in the Final of TCEC.

I will now run LTC Test 2, using the latest Stockfish (Dec 19) and FIDE's time control: 90 min for the first 40 moves, then 30 minutes, plus 30s increase for each move starting in move 1.

Zirie

Posts : 49
Points : 65
Reputation : 6
Join date : 2013-12-09

View user profile

Back to top Go down

Re: The Zirie Tests

Post  plantprot on Sat Dec 21, 2013 4:36 am

oh man.. it's aborted... who knows what would happen after game 80... but anyway, it's been tested already... komodo tcec is still the winner in LTC... thanks for the test  I love you

plantprot

Posts : 35
Points : 37
Reputation : 0
Join date : 2013-12-12

View user profile

Back to top Go down

Re: The Zirie Tests

Post  Zirie on Sat Dec 21, 2013 2:17 pm

Yes, I agree with you: it's already been tested. Komodo is really stronger than Stockfish in LTC. The funny thing is that it is apparently not that stronger than Houdini 4, and that is puzzling to me, particularly because Stockfish is stronger than Houdini 4 in LTC. To be honest, the reason I wanted to abort it and start again was to use the FIDE time control. Will keep you guys posted.

Zirie

Posts : 49
Points : 65
Reputation : 6
Join date : 2013-12-09

View user profile

Back to top Go down

Re: The Zirie Tests

Post  Zirie on Sun Dec 22, 2013 3:36 pm

Zirie wrote:I will keep you posted.

Update on the Rapid Time Control, with 114 games per engine completed, 57 for each pair, of the top three.



Now, this is a rock scissors paper situation!

Again, this time control is 15m+15s for first 40, then 10m+10s for next 40 (or 20), then 5m + 5s.

Zirie

Posts : 49
Points : 65
Reputation : 6
Join date : 2013-12-09

View user profile

Back to top Go down

Re: The Zirie Tests

Post  Zirie on Mon Dec 23, 2013 11:50 pm

Rapid Time Controls

Test 1. With the time controls of 15m+15s for the first 40, 10m+10s for the next 40 and 5m+5s for the rest, I ran 118 games with each of the top three engines, for a total of 59 games between each pair of engines. Below the results when I stopped the test:



At the time I stopped it, the engines seemed to be pretty equal in being unequal, e.g. SF > H4, H4 > KT, and KT > SF. Truly a scissors, paper, rock situation.

Test 2. With the same time controls, I started a new run, but using:
- Stockfish Dec 19 instead of Stockfish Dec 10, and
- the licensed Komodo TCECr I got yesterday

Much to my surprise, the results do not resemble those of Test 1 at all: H4 and SF seem to be almost tied, but KT is taking a beating:



This smells fishy, so I will stop the tournament to check my settings.


Last edited by Zirie on Tue Dec 24, 2013 12:40 am; edited 1 time in total

Zirie

Posts : 49
Points : 65
Reputation : 6
Join date : 2013-12-09

View user profile

Back to top Go down

Re: The Zirie Tests

Post  plantprot on Tue Dec 24, 2013 12:00 am

hmm.. so a licensed komodo is weaker than a cracked komodo? heheh.. just kidding. check wether you might have forgot to set komodo threads to multiple... komodo's thread is set to only 1 by default...

plantprot

Posts : 35
Points : 37
Reputation : 0
Join date : 2013-12-12

View user profile

Back to top Go down

Re: The Zirie Tests

Post  Zirie on Tue Dec 24, 2013 12:40 am

plantprot wrote:hmm.. so a licensed komodo is weaker than a cracked komodo? heheh.. just kidding. check wether you might have forgot to set komodo threads to multiple... komodo's thread is set to only 1 by default...

Hehe! Ok, I checked the settings, and they seemed fine. The only thing is that the hash may have been too big. 1024MB for a Rapid game. But all engines were set to 4 threads.

I took advantage of the stop to download the new Stockfish (Dec 23), reduce the hash to 512MB, and set the time control to something more normal, like 30m + 10s.

I will keep you posted.

Zirie

Posts : 49
Points : 65
Reputation : 6
Join date : 2013-12-09

View user profile

Back to top Go down

Re: The Zirie Tests

Post  Zirie on Thu Dec 26, 2013 2:19 am

Zirie wrote:I will now run LTC Test 2, using the latest Stockfish (Dec 19) and FIDE's time control: 90 min for the first 40 moves, then 30 minutes, plus 30s increase for each move starting in move 1.

An update on this second LTC test: 15 games, or 5 games between each pair, have been played. Below the results:



Notice that:

  • Houdini and Stockfish are tied, and
  • Both have victories against Komodo.


This ranking is consistent with CCRL 40/40:
Code:
http://www.computerchess.org.uk/ccrl/4040/rating_list_all.html

Zirie

Posts : 49
Points : 65
Reputation : 6
Join date : 2013-12-09

View user profile

Back to top Go down

Re: The Zirie Tests

Post  Zirie on Thu Dec 26, 2013 1:21 pm

Hypothesis: Komodo TCEC is not stronger than Houdini 4 and the latest Stockfish development version in long time controls.

Zirie

Posts : 49
Points : 65
Reputation : 6
Join date : 2013-12-09

View user profile

Back to top Go down

Re: The Zirie Tests

Post  Zirie on Thu Dec 26, 2013 7:16 pm

Long Time Control - Test 2

I've stopped it at 18 games, 12 per engine, 6 per pair.



Insights:

  • Houdini was best in this time setting, with no loses
  • Stockfish followed closely, losing only 1 game to Houdini and drawing the other 5.
  • Komodo lost two games out of six to Houdini and Stockfish each.


Could it be that the post TCEC Houdini (e.g. Houdini 4) and the latest Stockfish are better than Komodo in LTC, or at least in FIDE time controls? I am starting to think so.

I have started a third round using FIDE Time Controls:
- Again, Perfect 12t as book
- But now, without Nunn's DB
- 2GB of hash
- Legal Houdini 4, legal Komodo TCECr and latest Stockfish from today.

Zirie

Posts : 49
Points : 65
Reputation : 6
Join date : 2013-12-09

View user profile

Back to top Go down

Re: The Zirie Tests

Post  Zirie on Sun Jan 05, 2014 6:30 pm

Long Time Control - Test 3 (Syzygy)

On the last day of the year, I started running a new test. This time:
- I used Syzygy endgame tablebases for Houdini and Stockfish (Komodo doesn't use them)
- I used 75 minutes for the first time control to compensate for the fact that engines play the first dozen moves from the book. I still consider this equivalent to FIDE time control.
- I used the latest Stockfish at that date, the Dec 30 version with Syzygy.
- Notice I do not have a SSD for the EGTB, just a regular hard disk.

I noticed that Stockfish is getting better, and once one uses the Syzygy tablebases both Houdini and Stockfish gain a sort of clarividence in the late stages of the middlegame. The results after 30 games are shown below. Judge by yourselves:



My insights, from an admittedly small number of games:
- Komodo TCEC seems to be equally strong, if not a bit weaker, than Houdini 4 with Syzygy.
- The development Stockfish with Syzygy seems to be stronger than both Komodo TCEC and Houdini 4 with Syzygy.

If I was team Stockfish, I would use Syzygy tablebases for the next TCEC. If I was in team Komodo, I'd start implementing Syzygy in my code. Just sayin'...

Zirie

Posts : 49
Points : 65
Reputation : 6
Join date : 2013-12-09

View user profile

Back to top Go down

Re: The Zirie Tests

Post  Zirie on Mon Jan 06, 2014 12:39 pm

Rapid Time Control

Running in a time control of 30m + 10s, about two weeks of analysis yielded the following results:



Notice no EGTB were used. I may retry with EGTB for H4 and SF.

Insights:
- Stockfish may already be the best engine at time controls over 30 minutes (considering this results and the ones from FIDE time controls)

I wonder how strong would Komodo be with EGTB.

Zirie

Posts : 49
Points : 65
Reputation : 6
Join date : 2013-12-09

View user profile

Back to top Go down

Re: The Zirie Tests

Post  Zirie on Fri Jan 10, 2014 3:08 pm

Hypothesis: the current development version of Stockfish with Syzygy is the strongest chess engine out there:

- at bullet time controls 1m+2s
- at blitz time controls 3m+5s
- at rapid time controls 15m+10s
- at long time controls 75m+30s

Zirie

Posts : 49
Points : 65
Reputation : 6
Join date : 2013-12-09

View user profile

Back to top Go down

Re: The Zirie Tests

Post  Zirie on Fri Jan 10, 2014 3:09 pm

Bullet Time Control Test - January 6, 2014



Stockfish dominates.

Zirie

Posts : 49
Points : 65
Reputation : 6
Join date : 2013-12-09

View user profile

Back to top Go down

Re: The Zirie Tests

Post  Zirie on Fri Jan 10, 2014 3:24 pm

Blitz Time Control Test - January 8, 2014



Stockfish comes on top.

Zirie

Posts : 49
Points : 65
Reputation : 6
Join date : 2013-12-09

View user profile

Back to top Go down

Re: The Zirie Tests

Post  Zirie on Mon Jan 13, 2014 11:56 am

Rapid Time Control - January 10-13

Rapid Time Control

RankEngineScoreStHoKoS-B
1Stockfish29.0/48· ·· ·· ·· ·· ·· ·· ·· ·· ·· ·==0=1===1011=1====1=0====11==1=1110=0===11=01011 618.50 
2Houdini 24.0/48==1=0===0100=0====0=1===· ·· ·· ·· ·· ·· ·· ·· ·· ·· ·1=1=10101==0===011=0=1== 561.00 
3Komodo  19.0/48=00==0=0001=1===00=101000=0=01010==1===100=1=0==· ·· ·· ·· ·· ·· ·· ·· ·· ·· · 498.50 


72 games played / Tournament is finished

Tournament start: 2014.01.10, 23:22:04
Latest update: 2014.01.13, 09:59:33
Site/ Country:United States
Level: Blitz 15/10
Hardware: Intel(R) Core(TM) i7-3840QM CPU @ 2.80GHz with 7.9 GB Memory
Operating system: Windows 7 Enterprise Professional Service Pack 1 (Build 7601) 64 bit
Table created with: Arena 3.5

Zirie

Posts : 49
Points : 65
Reputation : 6
Join date : 2013-12-09

View user profile

Back to top Go down

Re: The Zirie Tests

Post  Zirie on Mon Jan 20, 2014 5:27 pm

Long Time Control - January 14 through 20

Long Time Control

RankEngineScoreStHoKoS-B
1Stockfish15.5/24· ·· ·· ·· ·· ·==1=1=01=10111==0=111=== 158.75 
2Houdini 10.5/24==0=0=10=010· ·· ·· ·· ·· ·==========01 129.75 
3Komodo  10.0/2400==1=000=============10· ·· ·· ·· ·· · 125.00 


36 games played / Tournament is finished

Tournament start: 2014.01.13, 12:16:12
Latest update: 2014.01.20, 16:02:58
Site/ Country: ROBERTO-HOME, United States
Level: Blitz 120/30
Hardware: Intel(R) Core(TM) i7-3840QM CPU @ 2.80GHz with 7.9 GB Memory
Operating system: Windows 7 Enterprise Professional Service Pack 1 (Build 7601) 64 bit
PGN-File: Long Time Control.pgn
Table created with: Arena 3.5


Stockfish dominates! It's impressive how it has improved in two months.

Zirie

Posts : 49
Points : 65
Reputation : 6
Join date : 2013-12-09

View user profile

Back to top Go down

Re: The Zirie Tests

Post  Sponsored content Today at 5:01 pm


Sponsored content


Back to top Go down

Page 1 of 2 1, 2  Next

View previous topic View next topic Back to top

- Similar topics

Permissions in this forum:
You cannot reply to topics in this forum