[Dev] NeoScrypt GPU Miner - Public Beta Test

Atomicat

Okay, here you go (NSFW): https://ottrbutt.com/miner/neoscryptwolf-11042014.png

As you can see, every card is around mid-70C or below, except that 270X with the dead fan. What you don’t see is that all of them have a decent sized voltage bump, except that 270X, which is slightly undervolted to compensate for its lack of cooling. So, even at this hashrate, the cards are running quite cool - too cool, as my room is rather warm (probably from all three desktops with high end GPUs…) - so I need to trade off some of the memory usage for extra computations. Compute power is likely still plentiful, unused because of memory access times.

Here’s my 7950 doing 150Kh/s. Fairly Safe For Work. (In Soviet Russia, You maul Bear! Then if you’re lucky, she mauls you back) I could use a suggestion for a good overclocking bios though. Been playing around boosting a few to 1.3v and I just ran across The Stilt’s mods the other day but haven’t had much luck with them either. As you can see, temperatures are absolutely no problem. Stays under 70 even when scrypt-hashing full load, full crank. Too many variations! Can’t spend days freezing and rebooting. It’s a Gigabyte, part 113-HD685ZNF63.SB, Hynix mem of course. My best so far is around 1260/1740. Would love to hit 1300/1800.

SS2006

Gentlemen, is 3.7.7c the latest version? I don’t have linux unfortunately so can’t compile, and I probably don’t have the knowledge to do so :(. Is there a place I can go to find the latest version in compiled/windows format

SS2006

i assume this is the newest version: http://cryptomining-blog.com/3715-new-cgminer-3-7-8-with-improved-neoscrypt-performance/

unfortunately it took a step backwards for nvidia cards (slower), but its meant for AMD so if that is better there, then good!

i assume this is the newest version: http://cryptomining-blog.com/3715-new-cgminer-3-7-8-with-improved-neoscrypt-performance/

unfortunately it took a step backwards for nvidia cards (slower), but its meant for AMD so if that is better there, then good!

The download link that this blog provides is hosted by itself… No sigs etc…

Not 100% sure if I would trust that download link.

Wolf0 has a nice windows compile right here signed and all.

Okay, done. I’m pretty sure it works, but haven’t tested on a Windows installation. This is a zip of my kernel, slighly modified for SGMiner, as well as all the other kernels included on the github’s develop branch, and a Win64 binary. Static compile, no DLLs, just like my standard SGMiner builds on Litecointalk. Also GPG signed, like my standard builds. Someone please test for me and ensure it works.

https://ottrbutt.com/sgminer/neoscrypt/sgminer5-neoscrypt-11-02-2014.zip

And of course, GPG sigs for those that check them (you should be): https://ottrbutt.com/sgminer/neoscrypt/sgminer5-neoscrypt-11-02-2014.zip.sig

ghostlander

Congrats - not a bad chacha for 6xxx. Your salsa though… needs work.

It’s scalar now. I’m not impressed by one found in old Scrypt kernels. Can write a better one probably. It isn’t a bottleneck anyway for these cards. When I replaced ChaCha with this one, it went from something like 12KH/s to 14KH/s. Loop unrolling delivered more alone.

BTW, it runs reasonably good with old AMD drivers and OpenCL compilers. HD5870 on Windows XP with 12.4 drivers went from 2.5KH/s to 10KH/s. Hell yeah, a 4x increase. Don’t try to use this kernel for NVIDIA. It fails to compile the vectorised ChaCha code.

How it use? Which miner?

Rename and put into any miner with the NeoScrypt support. Tested on cgminer v3.7.7, works fine.

Alpha Wolf

For those with old Radeon cards. This is my current OpenCL kernel: neoscrypt_vliw.cl

It is optimised to some extent for VLIW4/VLIW5. I get 17.5KH/s with it on a HD6970. That’s not much, but still better than 6KH/s with the default kernel.

Working great here for me with a couple 6950 unlocked to 6970. Went from 5kh/s to 16.5kh/s with

use of only -12 -w 64 -g 2 also works with 3.7.7c and 3.7.8 all I did was backup neoscrypt140909.cl then

delete it and remane your file to neoscrypt140909.cl to replace it, then backed up then deleted all .bin files

letting it make new .bins and bam I was off to the races. :)

Thank you!

Now I don’t mind running them, before I wouldn’t even use them to mine with.

Catalyst Version 13.12

{
"pools" : [
	{
		"url" : "http://us.mine-ftc.co.uk:19327",
		"user" : "xxxxxxxxxxxxy",
		"pass" : "x"
	}
]
,
"intensity" : "12,12",
"vectors" : "1,1",
"worksize" : "64,64",
"gpu-engine" : "825-825,825-825",
"gpu-fan" : "0-95,0-80",
"gpu-memclock" : "1300,1300",
"gpu-memdiff" : "0,0",
"gpu-powertune" : "0,0",
"gpu-vddc" : "0.000,0.000",
"temp-cutoff" : "90,90",
"temp-overheat" : "85,85",
"temp-target" : "70,70",
"api-mcast-port" : "4028",
"api-port" : "4028",
"expiry" : "1",
"failover-only" : true,
"gpu-dyninterval" : "7",
"gpu-platform" : "0",
"gpu-threads" : "2",
"log" : "5",
"neoscrypt" : true,
"no-pool-disable" : true,
"no-submit-stale" : true,
"queue" : "0",
"scan-time" : "1",
"temp-hysteresis" : "3",
"shares" : "0",
"kernel-path" : "/usr/local/bin",
"device" : "0-1"
}

MrBeen

I am right that I can not do anything with my GeForce 7600GS?

cisahasa

For those with old Radeon cards. This is my current OpenCL kernel: neoscrypt_vliw.cl

It is optimised to some extent for VLIW4/VLIW5. I get 17.5KH/s with it on a HD6970. That’s not much, but still better than 6KH/s with the default kernel.

i think there is some error…

/* NeoScrypt core engine:
* N = 128, r = 2, p = 1, salt = password */
__attribute__((reqd_work_group_size(WORKGROUPSIZE, 1, 1)))

it should be? ???

/* NeoScrypt core engine:
* N = 128, r = 2, p = 1, salt = password */
__attribute__((reqd_work_group_size(WORKSIZE, 1, 1)))

ghostlander

i think there is some error…

/* NeoScrypt core engine:
* N = 128, r = 2, p = 1, salt = password */
__attribute__((reqd_work_group_size(WORKGROUPSIZE, 1, 1)))

it should be? ???

/* NeoScrypt core engine:
* N = 128, r = 2, p = 1, salt = password */
__attribute__((reqd_work_group_size(WORKSIZE, 1, 1)))

There is no error, but it doesn’t really matter.

[2014-11-07 18:47:02] Started cgminer 3.7.8
[2014-11-07 18:47:07] Probing for an alive pool
[2014-11-07 18:47:08] Error -11: Building Program (clBuildProgram)
[2014-11-07 18:47:08] “/tmp/OCLxlhCZF.cl”, line 665: error: identifier “WORKSIZE” is undefined
__attribute__((reqd_work_group_size(WORKSIZE, 1, 1)))
^

1 error detected in the compilation of “/tmp/OCLxlhCZF.cl”.

Internal error: clc comp

Wolf0

i think there is some error…

/* NeoScrypt core engine:
* N = 128, r = 2, p = 1, salt = password */
__attribute__((reqd_work_group_size(WORKGROUPSIZE, 1, 1)))

it should be? ???

/* NeoScrypt core engine:
* N = 128, r = 2, p = 1, salt = password */
__attribute__((reqd_work_group_size(WORKSIZE, 1, 1)))

WORKSIZE is only for the newer SGMiner.

cisahasa

thats what i used…

Wolf0

It’s scalar now. I’m not impressed by one found in old Scrypt kernels. Can write a better one probably. It isn’t a bottleneck anyway for these cards. When I replaced ChaCha with this one, it went from something like 12KH/s to 14KH/s. Loop unrolling delivered more alone.

BTW, it runs reasonably good with old AMD drivers and OpenCL compilers. HD5870 on Windows XP with 12.4 drivers went from 2.5KH/s to 10KH/s. Hell yeah, a 4x increase. Don’t try to use this kernel for NVIDIA. It fails to compile the vectorised ChaCha code.

Rename and put into any miner with the NeoScrypt support. Tested on cgminer v3.7.7, works fine.

I’ve never worked on 6xxx, but isn’t shuffle cheap? One shuffle for the permutation, keep it through all the salsa rounds + XOR ops, one shuffle to fix it. Seems like it’d be worth it - of course, unrolling will deliver a lot, probably.

Wolf0

It seems my 280X and 290X do like parallel chacha - I just needed to tweak it a bit more. Code size seems about the same, though, small speedup on execution time, I think.

Wolf0

OH MY GOD. I’ve been staring at this code for ages, and it only JUST NOW occurred to me that SMix() is parallelizable. Not the internals of SMix, of course, but the two calls to it…

cisahasa

i really hope wolf you are not doing this just for your self…

you would gain more if you release your work, im sure people here would like to collect some bounty for your work to you release latest kernels.

these people here are fair people.

Wolf0

i really hope wolf you are not doing this just for your self…

you would gain more if you release your work, im sure people here would like to collect some bounty for your work to you release latest kernels.

these people here are fair people.

I’m doing this because it’s interesting. Also, SMix being parallelizable hardly matters unless you split it into 3 kernels, which is doable, but idk what the overhead on the kernel launches would be…

T4rQu1N

Quick question, in my bat file, how do I specify different values for say -i or -w so that my two cards (which are different) have different settings?

-i 14,15?

-w 48, 72?

Kind regards,

T4

einkerl

Quick question, in my bat file, how do I specify different values for say -i or -w so that my two cards (which are different) have different settings?

-i 14,15?

-w 48, 72?

Kind regards,

T4

“intensity” : “18,18,18”,
“worksize” : “256,128,256”,

specify in your .conf

ghostlander

neoscrypt_vliw.cl v2

It’s 19.5KH/s now on a HD6970. FastKDF and BLAKE2s have been cleaned up and optimised, memory requirements reduced.

OH MY GOD. I’ve been staring at this code for ages, and it only JUST NOW occurred to me that SMix() is parallelizable. Not the internals of SMix, of course, but the two calls to it…

Yeah, I’ve mentioned this in my white paper. Not sure if it’s of any use for mining.

I’ve never worked on 6xxx, but isn’t shuffle cheap? One shuffle for the permutation, keep it through all the salsa rounds + XOR ops, one shuffle to fix it. Seems like it’d be worth it - of course, unrolling will deliver a lot, probably.

It is, but that’s not what concerns me now. With FastKDF removed, the kernel gets reduced in size by ~60% and outputs 30KH/s.That’s a big overhead, but not critical and I’ve expected more out of ChaCha + Salsa. With ChaCha only enabled, it’s 58KH/s and with Salsa only = 56KH/s. Scalar Salsa isn’t supposed to be about as fast as vectorised ChaCha. It’s clearly scalar because the AMD compiler isn’t really smart and the kernel size is about double of ChaCha only size. Anyway, there is a huge bottleneck somewhere and it needs to be identified.

Wolf0

neoscrypt_vliw.cl v2

It’s 19.5KH/s now on a HD6970. FastKDF and BLAKE2s have been cleaned up and optimised, memory requirements reduced.

Yeah, I’ve mentioned this in my white paper. Not sure if it’s of any use for mining.

It is, but that’s not what concerns me now. With FastKDF removed, the kernel gets reduced in size by ~60% and outputs 30KH/s.That’s a big overhead, but not critical and I’ve expected more out of ChaCha + Salsa. With ChaCha only enabled, it’s 58KH/s and with Salsa only = 56KH/s. Scalar Salsa isn’t supposed to be about as fast as vectorised ChaCha. It’s clearly scalar because the AMD compiler isn’t really smart and the kernel size is about double of ChaCha only size. Anyway, there is a huge bottleneck somewhere and it needs to be identified.

I have a really hard time reading your style, but the code is pretty good! Don’t you think that bottleneck is waiting for global memory, though?