Forum Home
    • Register
    • Login
    • Search
    • Recent
    • Tags
    • Popular

    [Dev] NeoScrypt GPU Miner - Public Beta Test

    Technical Development
    52
    802
    574465
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • ghostlander
      ghostlander Regular Member last edited by

      Congrats - not a bad chacha for 6xxx. Your salsa though… needs work.

      It’s scalar now. I’m not impressed by one found in old Scrypt kernels. Can write a better one probably. It isn’t a bottleneck anyway for these cards. When I replaced ChaCha with this one, it went from something like 12KH/s to 14KH/s. Loop unrolling delivered more alone.

      BTW, it runs reasonably good with old AMD drivers and OpenCL compilers. HD5870 on Windows XP with 12.4 drivers went from 2.5KH/s to 10KH/s. Hell yeah, a 4x increase. Don’t try to use this kernel for NVIDIA. It fails to compile the vectorised ChaCha code.

      How it use? Which miner?

      Rename and put into any miner with the NeoScrypt support. Tested on cgminer v3.7.7, works fine.

      1 Reply Last reply Reply Quote 0
      • A
        Alpha Wolf last edited by

        For those with old Radeon cards. This is my current OpenCL kernel: neoscrypt_vliw.cl

        It is optimised to some extent for VLIW4/VLIW5. I get 17.5KH/s with it on a HD6970. That’s not much, but still better than 6KH/s with the default kernel.

        Working great here for me with a couple 6950 unlocked to 6970. Went from 5kh/s to 16.5kh/s with

        use of only -12 -w 64 -g 2 also works with 3.7.7c and 3.7.8 all I did was backup neoscrypt140909.cl then

        delete it and remane your file to neoscrypt140909.cl to replace it, then backed up then deleted all .bin files

        letting it make new .bins and bam I was off to the races. :)

        Thank you!

        Now I don’t mind running them, before I wouldn’t even use them to mine with.

        Catalyst Version 13.12

        {
        "pools" : [
        	{
        		"url" : "http://us.mine-ftc.co.uk:19327",
        		"user" : "xxxxxxxxxxxxy",
        		"pass" : "x"
        	}
        ]
        ,
        "intensity" : "12,12",
        "vectors" : "1,1",
        "worksize" : "64,64",
        "gpu-engine" : "825-825,825-825",
        "gpu-fan" : "0-95,0-80",
        "gpu-memclock" : "1300,1300",
        "gpu-memdiff" : "0,0",
        "gpu-powertune" : "0,0",
        "gpu-vddc" : "0.000,0.000",
        "temp-cutoff" : "90,90",
        "temp-overheat" : "85,85",
        "temp-target" : "70,70",
        "api-mcast-port" : "4028",
        "api-port" : "4028",
        "expiry" : "1",
        "failover-only" : true,
        "gpu-dyninterval" : "7",
        "gpu-platform" : "0",
        "gpu-threads" : "2",
        "log" : "5",
        "neoscrypt" : true,
        "no-pool-disable" : true,
        "no-submit-stale" : true,
        "queue" : "0",
        "scan-time" : "1",
        "temp-hysteresis" : "3",
        "shares" : "0",
        "kernel-path" : "/usr/local/bin",
        "device" : "0-1"
        }
        
        1 Reply Last reply Reply Quote 0
        • M
          MrBeen Regular Member last edited by

          I am right that I can not do anything with my GeForce 7600GS?

          1 Reply Last reply Reply Quote 0
          • C
            cisahasa last edited by

            For those with old Radeon cards. This is my current OpenCL kernel: neoscrypt_vliw.cl

            It is optimised to some extent for VLIW4/VLIW5. I get 17.5KH/s with it on a HD6970. That’s not much, but still better than 6KH/s with the default kernel.

            i think there is some error…

            /* NeoScrypt core engine:
            * N = 128, r = 2, p = 1, salt = password */
            __attribute__((reqd_work_group_size(WORKGROUPSIZE, 1, 1)))

            it should be? ???

            /* NeoScrypt core engine:
            * N = 128, r = 2, p = 1, salt = password */
            __attribute__((reqd_work_group_size(WORKSIZE, 1, 1)))

            1 Reply Last reply Reply Quote 0
            • ghostlander
              ghostlander Regular Member last edited by

              i think there is some error…

              /* NeoScrypt core engine:
              * N = 128, r = 2, p = 1, salt = password */
              __attribute__((reqd_work_group_size(WORKGROUPSIZE, 1, 1)))

              it should be? ???

              /* NeoScrypt core engine:
              * N = 128, r = 2, p = 1, salt = password */
              __attribute__((reqd_work_group_size(WORKSIZE, 1, 1)))

              There is no error, but it doesn’t really matter.

              [2014-11-07 18:47:02] Started cgminer 3.7.8
              [2014-11-07 18:47:07] Probing for an alive pool
              [2014-11-07 18:47:08] Error -11: Building Program (clBuildProgram)
              [2014-11-07 18:47:08] “/tmp/OCLxlhCZF.cl”, line 665: error: identifier “WORKSIZE” is undefined
              __attribute__((reqd_work_group_size(WORKSIZE, 1, 1)))
              ^

              1 error detected in the compilation of “/tmp/OCLxlhCZF.cl”.

              Internal error: clc comp

              1 Reply Last reply Reply Quote 0
              • W
                Wolf0 Regular Member last edited by

                i think there is some error…

                /* NeoScrypt core engine:
                * N = 128, r = 2, p = 1, salt = password */
                __attribute__((reqd_work_group_size(WORKGROUPSIZE, 1, 1)))

                it should be? ???

                /* NeoScrypt core engine:
                * N = 128, r = 2, p = 1, salt = password */
                __attribute__((reqd_work_group_size(WORKSIZE, 1, 1)))

                WORKSIZE is only for the newer SGMiner.

                1 Reply Last reply Reply Quote 0
                • C
                  cisahasa last edited by

                  thats what i used…

                  1 Reply Last reply Reply Quote 0
                  • W
                    Wolf0 Regular Member last edited by

                    It’s scalar now. I’m not impressed by one found in old Scrypt kernels. Can write a better one probably. It isn’t a bottleneck anyway for these cards. When I replaced ChaCha with this one, it went from something like 12KH/s to 14KH/s. Loop unrolling delivered more alone.

                    BTW, it runs reasonably good with old AMD drivers and OpenCL compilers. HD5870 on Windows XP with 12.4 drivers went from 2.5KH/s to 10KH/s. Hell yeah, a 4x increase. Don’t try to use this kernel for NVIDIA. It fails to compile the vectorised ChaCha code.

                    Rename and put into any miner with the NeoScrypt support. Tested on cgminer v3.7.7, works fine.

                    I’ve never worked on 6xxx, but isn’t shuffle cheap? One shuffle for the permutation, keep it through all the salsa rounds + XOR ops, one shuffle to fix it. Seems like it’d be worth it - of course, unrolling will deliver a lot, probably.

                    1 Reply Last reply Reply Quote 0
                    • W
                      Wolf0 Regular Member last edited by

                      It seems my 280X and 290X do like parallel chacha - I just needed to tweak it a bit more. Code size seems about the same, though, small speedup on execution time, I think.

                      1 Reply Last reply Reply Quote 0
                      • W
                        Wolf0 Regular Member last edited by

                        OH MY GOD. I’ve been staring at this code for ages, and it only JUST NOW occurred to me that SMix() is parallelizable. Not the internals of SMix, of course, but the two calls to it…

                        1 Reply Last reply Reply Quote 0
                        • C
                          cisahasa last edited by

                          i really hope wolf you are not doing this just for your self…

                          you would gain more if you release your work, im sure people here would like to collect some bounty for your work to you release latest kernels.

                          these people here are fair people.

                          1 Reply Last reply Reply Quote 0
                          • W
                            Wolf0 Regular Member last edited by

                            i really hope wolf you are not doing this just for your self…

                            you would gain more if you release your work, im sure people here would like to collect some bounty for your work to you release latest kernels.

                            these people here are fair people.

                            I’m doing this because it’s interesting. Also, SMix being parallelizable hardly matters unless you split it into 3 kernels, which is doable, but idk what the overhead on the kernel launches would be…

                            1 Reply Last reply Reply Quote 0
                            • T
                              T4rQu1N Regular Member last edited by

                              Quick question, in my bat file, how do I specify different values for say -i or -w so that my two cards (which are different) have different settings?

                              -i 14,15?

                              -w 48, 72?

                              Kind regards,

                              T4

                              1 Reply Last reply Reply Quote 0
                              • E
                                einkerl last edited by

                                Quick question, in my bat file, how do I specify different values for say -i or -w so that my two cards (which are different) have different settings?

                                -i 14,15?

                                -w 48, 72?

                                Kind regards,

                                T4

                                “intensity” : “18,18,18”,
                                “worksize” : “256,128,256”,

                                specify in your .conf

                                1 Reply Last reply Reply Quote 0
                                • ghostlander
                                  ghostlander Regular Member last edited by

                                  neoscrypt_vliw.cl v2

                                  It’s 19.5KH/s now on a HD6970. FastKDF and BLAKE2s have been cleaned up and optimised, memory requirements reduced.

                                  OH MY GOD. I’ve been staring at this code for ages, and it only JUST NOW occurred to me that SMix() is parallelizable. Not the internals of SMix, of course, but the two calls to it…

                                  Yeah, I’ve mentioned this in my white paper. Not sure if it’s of any use for mining.

                                  I’ve never worked on 6xxx, but isn’t shuffle cheap? One shuffle for the permutation, keep it through all the salsa rounds + XOR ops, one shuffle to fix it. Seems like it’d be worth it - of course, unrolling will deliver a lot, probably.

                                  It is, but that’s not what concerns me now. With FastKDF removed, the kernel gets reduced in size by ~60% and outputs 30KH/s.That’s a big overhead, but not critical and I’ve expected more out of ChaCha + Salsa. With ChaCha only enabled, it’s 58KH/s and with Salsa only = 56KH/s. Scalar Salsa isn’t supposed to be about as fast as vectorised ChaCha. It’s clearly scalar because the AMD compiler isn’t really smart and the kernel size is about double of ChaCha only size. Anyway, there is a huge bottleneck somewhere and it needs to be identified.

                                  1 Reply Last reply Reply Quote 0
                                  • W
                                    Wolf0 Regular Member last edited by

                                    neoscrypt_vliw.cl v2

                                    It’s 19.5KH/s now on a HD6970. FastKDF and BLAKE2s have been cleaned up and optimised, memory requirements reduced.

                                    Yeah, I’ve mentioned this in my white paper. Not sure if it’s of any use for mining.

                                    It is, but that’s not what concerns me now. With FastKDF removed, the kernel gets reduced in size by ~60% and outputs 30KH/s.That’s a big overhead, but not critical and I’ve expected more out of ChaCha + Salsa. With ChaCha only enabled, it’s 58KH/s and with Salsa only = 56KH/s. Scalar Salsa isn’t supposed to be about as fast as vectorised ChaCha. It’s clearly scalar because the AMD compiler isn’t really smart and the kernel size is about double of ChaCha only size. Anyway, there is a huge bottleneck somewhere and it needs to be identified.

                                    I have a really hard time reading your style, but the code is pretty good! Don’t you think that bottleneck is waiting for global memory, though?

                                    1 Reply Last reply Reply Quote 0
                                    • ghostlander
                                      ghostlander Regular Member last edited by

                                      My 1st guess it runs out of private memory. It takes 512 bytes for block mixing + 800 bytes for FastKDF and BLAKE2s per kernel instance. That’s not including local variables, counters, etc. Scrypt consumes 3 times less private memory. It’s opposite for global memory requirements, so you are not going to exceed them. Although the GCN cards report about the same amounts of local and constant memory (32Kb + 64Kb), they also have 32Kb of L1 cache which may help. Maybe they also have more private space (registers). Global memory is used for V space only. Not much activity there. Everything else runs in private/local space.

                                      Another guess there is something wrong with the miner itself related to scheduling of kernel threads. Increase intensity over 13 and hash rate reduces. Increase it even more and see HW errors. Set to 20 and it hangs up. Scrypt can do 20, but it’s different. Need to start with a clean fork and add the NeoScrypt support myself probably. Have a few other ideas, but they also need work.

                                      1 Reply Last reply Reply Quote 0
                                      • W
                                        Wolf0 Regular Member last edited by

                                        My 1st guess it runs out of private memory. It takes 512 bytes for block mixing + 800 bytes for FastKDF and BLAKE2s per kernel instance. That’s not including local variables, counters, etc. Scrypt consumes 3 times less private memory. It’s opposite for global memory requirements, so you are not going to exceed them. Although the GCN cards report about the same amounts of local and constant memory (32Kb + 64Kb), they also have 32Kb of L1 cache which may help. Maybe they also have more private space (registers). Global memory is used for V space only. Not much activity there. Everything else runs in private/local space.

                                        Another guess there is something wrong with the miner itself related to scheduling of kernel threads. Increase intensity over 13 and hash rate reduces. Increase it even more and see HW errors. Set to 20 and it hangs up. Scrypt can do 20, but it’s different. Need to start with a clean fork and add the NeoScrypt support myself probably. Have a few other ideas, but they also need work.

                                        I feel stupid. For some reason, I was thinking of GCN cards while talking about 6xxx. Oops.

                                        1 Reply Last reply Reply Quote 0
                                        • W
                                          Wolf0 Regular Member last edited by

                                          Preparing my GCN kernel for public release; cleaning code, removing stuff I tried that really sucked, like completely unrolled chacha/salsa, stuff like that. After that, I’ll package it up with SGMiner and it should be good to go. Should give results like this (NSFW): https://ottrbutt.com/miner/neoscryptwolf-11082014.png

                                          1 Reply Last reply Reply Quote 0
                                          • A
                                            Alpha Wolf last edited by

                                            Preparing my GCN kernel for public release; cleaning code, removing stuff I tried that really sucked, like completely unrolled chacha/salsa, stuff like that. After that, I’ll package it up with SGMiner and it should be good to go. Should give results like this (NSFW): https://ottrbutt.com/miner/neoscryptwolf-11082014.png

                                            Those numbers look great, can’t wait to try this. :)

                                            Does the version of SGMiner your building have xIntensity or have you given any thought to using cgminer 3.7.3 Kalroth that has xIntensity for a build?

                                            More info can be found here from that page it states the new SGMIner 4.1 has xintensity and might be a better choose. Personally I like

                                            cgminer better and had better results with it than sgminer so far.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post