Forum Home
    • Register
    • Login
    • Search
    • Recent
    • Tags
    • Popular

    [Dev] NeoScrypt GPU Miner - Public Beta Test

    Technical Development
    52
    802
    574522
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • A
      Atomicat last edited by

      Okay, here you go (NSFW): https://ottrbutt.com/miner/neoscryptwolf-11042014.png

      As you can see, every card is around mid-70C or below, except that 270X with the dead fan. What you don’t see is that all of them have a decent sized voltage bump, except that 270X, which is slightly undervolted to compensate for its lack of cooling. So, even at this hashrate, the cards are running quite cool - too cool, as my room is rather warm (probably from all three desktops with high end GPUs…) - so I need to trade off some of the memory usage for extra computations. Compute power is likely still plentiful, unused because of memory access times.

      Here’s my 7950 doing 150Kh/s. Fairly Safe For Work. (In Soviet Russia, You maul Bear! Then if you’re lucky, she mauls you back) I could use a suggestion for a good overclocking bios though. Been playing around boosting a few to 1.3v and I just ran across The Stilt’s mods the other day but haven’t had much luck with them either. As you can see, temperatures are absolutely no problem. Stays under 70 even when scrypt-hashing full load, full crank. Too many variations! Can’t spend days freezing and rebooting. It’s a Gigabyte, part 113-HD685ZNF63.SB, Hynix mem of course. My best so far is around 1260/1740. Would love to hit 1300/1800.

      1 Reply Last reply Reply Quote 0
      • S
        SS2006 last edited by

        Gentlemen, is 3.7.7c the latest version? I don’t have linux unfortunately so can’t compile, and I probably don’t have the knowledge to do so :(. Is there a place I can go to find the latest version in compiled/windows format

        1 Reply Last reply Reply Quote 0
        • S
          SS2006 last edited by

          i assume this is the newest version: http://cryptomining-blog.com/3715-new-cgminer-3-7-8-with-improved-neoscrypt-performance/

          unfortunately it took a step backwards for nvidia cards (slower), but its meant for AMD so if that is better there, then good!

          1 Reply Last reply Reply Quote 0
          • ?
            A Former User last edited by

            i assume this is the newest version: http://cryptomining-blog.com/3715-new-cgminer-3-7-8-with-improved-neoscrypt-performance/

            unfortunately it took a step backwards for nvidia cards (slower), but its meant for AMD so if that is better there, then good!

            The download link that this blog provides is hosted by itself… No sigs etc…

            Not 100% sure if I would trust that download link.

            Wolf0 has a nice windows compile right here signed and all.

            Okay, done. I’m pretty sure it works, but haven’t tested on a Windows installation. This is a zip of my kernel, slighly modified for SGMiner, as well as all the other kernels included on the github’s develop branch, and a Win64 binary. Static compile, no DLLs, just like my standard SGMiner builds on Litecointalk. Also GPG signed, like my standard builds. Someone please test for me and ensure it works.

            https://ottrbutt.com/sgminer/neoscrypt/sgminer5-neoscrypt-11-02-2014.zip

            And of course, GPG sigs for those that check them (you should be): https://ottrbutt.com/sgminer/neoscrypt/sgminer5-neoscrypt-11-02-2014.zip.sig

            1 Reply Last reply Reply Quote 0
            • ghostlander
              ghostlander Regular Member last edited by

              Congrats - not a bad chacha for 6xxx. Your salsa though… needs work.

              It’s scalar now. I’m not impressed by one found in old Scrypt kernels. Can write a better one probably. It isn’t a bottleneck anyway for these cards. When I replaced ChaCha with this one, it went from something like 12KH/s to 14KH/s. Loop unrolling delivered more alone.

              BTW, it runs reasonably good with old AMD drivers and OpenCL compilers. HD5870 on Windows XP with 12.4 drivers went from 2.5KH/s to 10KH/s. Hell yeah, a 4x increase. Don’t try to use this kernel for NVIDIA. It fails to compile the vectorised ChaCha code.

              How it use? Which miner?

              Rename and put into any miner with the NeoScrypt support. Tested on cgminer v3.7.7, works fine.

              1 Reply Last reply Reply Quote 0
              • A
                Alpha Wolf last edited by

                For those with old Radeon cards. This is my current OpenCL kernel: neoscrypt_vliw.cl

                It is optimised to some extent for VLIW4/VLIW5. I get 17.5KH/s with it on a HD6970. That’s not much, but still better than 6KH/s with the default kernel.

                Working great here for me with a couple 6950 unlocked to 6970. Went from 5kh/s to 16.5kh/s with

                use of only -12 -w 64 -g 2 also works with 3.7.7c and 3.7.8 all I did was backup neoscrypt140909.cl then

                delete it and remane your file to neoscrypt140909.cl to replace it, then backed up then deleted all .bin files

                letting it make new .bins and bam I was off to the races. :)

                Thank you!

                Now I don’t mind running them, before I wouldn’t even use them to mine with.

                Catalyst Version 13.12

                {
                "pools" : [
                	{
                		"url" : "http://us.mine-ftc.co.uk:19327",
                		"user" : "xxxxxxxxxxxxy",
                		"pass" : "x"
                	}
                ]
                ,
                "intensity" : "12,12",
                "vectors" : "1,1",
                "worksize" : "64,64",
                "gpu-engine" : "825-825,825-825",
                "gpu-fan" : "0-95,0-80",
                "gpu-memclock" : "1300,1300",
                "gpu-memdiff" : "0,0",
                "gpu-powertune" : "0,0",
                "gpu-vddc" : "0.000,0.000",
                "temp-cutoff" : "90,90",
                "temp-overheat" : "85,85",
                "temp-target" : "70,70",
                "api-mcast-port" : "4028",
                "api-port" : "4028",
                "expiry" : "1",
                "failover-only" : true,
                "gpu-dyninterval" : "7",
                "gpu-platform" : "0",
                "gpu-threads" : "2",
                "log" : "5",
                "neoscrypt" : true,
                "no-pool-disable" : true,
                "no-submit-stale" : true,
                "queue" : "0",
                "scan-time" : "1",
                "temp-hysteresis" : "3",
                "shares" : "0",
                "kernel-path" : "/usr/local/bin",
                "device" : "0-1"
                }
                
                1 Reply Last reply Reply Quote 0
                • M
                  MrBeen Regular Member last edited by

                  I am right that I can not do anything with my GeForce 7600GS?

                  1 Reply Last reply Reply Quote 0
                  • C
                    cisahasa last edited by

                    For those with old Radeon cards. This is my current OpenCL kernel: neoscrypt_vliw.cl

                    It is optimised to some extent for VLIW4/VLIW5. I get 17.5KH/s with it on a HD6970. That’s not much, but still better than 6KH/s with the default kernel.

                    i think there is some error…

                    /* NeoScrypt core engine:
                    * N = 128, r = 2, p = 1, salt = password */
                    __attribute__((reqd_work_group_size(WORKGROUPSIZE, 1, 1)))

                    it should be? ???

                    /* NeoScrypt core engine:
                    * N = 128, r = 2, p = 1, salt = password */
                    __attribute__((reqd_work_group_size(WORKSIZE, 1, 1)))

                    1 Reply Last reply Reply Quote 0
                    • ghostlander
                      ghostlander Regular Member last edited by

                      i think there is some error…

                      /* NeoScrypt core engine:
                      * N = 128, r = 2, p = 1, salt = password */
                      __attribute__((reqd_work_group_size(WORKGROUPSIZE, 1, 1)))

                      it should be? ???

                      /* NeoScrypt core engine:
                      * N = 128, r = 2, p = 1, salt = password */
                      __attribute__((reqd_work_group_size(WORKSIZE, 1, 1)))

                      There is no error, but it doesn’t really matter.

                      [2014-11-07 18:47:02] Started cgminer 3.7.8
                      [2014-11-07 18:47:07] Probing for an alive pool
                      [2014-11-07 18:47:08] Error -11: Building Program (clBuildProgram)
                      [2014-11-07 18:47:08] “/tmp/OCLxlhCZF.cl”, line 665: error: identifier “WORKSIZE” is undefined
                      __attribute__((reqd_work_group_size(WORKSIZE, 1, 1)))
                      ^

                      1 error detected in the compilation of “/tmp/OCLxlhCZF.cl”.

                      Internal error: clc comp

                      1 Reply Last reply Reply Quote 0
                      • W
                        Wolf0 Regular Member last edited by

                        i think there is some error…

                        /* NeoScrypt core engine:
                        * N = 128, r = 2, p = 1, salt = password */
                        __attribute__((reqd_work_group_size(WORKGROUPSIZE, 1, 1)))

                        it should be? ???

                        /* NeoScrypt core engine:
                        * N = 128, r = 2, p = 1, salt = password */
                        __attribute__((reqd_work_group_size(WORKSIZE, 1, 1)))

                        WORKSIZE is only for the newer SGMiner.

                        1 Reply Last reply Reply Quote 0
                        • C
                          cisahasa last edited by

                          thats what i used…

                          1 Reply Last reply Reply Quote 0
                          • W
                            Wolf0 Regular Member last edited by

                            It’s scalar now. I’m not impressed by one found in old Scrypt kernels. Can write a better one probably. It isn’t a bottleneck anyway for these cards. When I replaced ChaCha with this one, it went from something like 12KH/s to 14KH/s. Loop unrolling delivered more alone.

                            BTW, it runs reasonably good with old AMD drivers and OpenCL compilers. HD5870 on Windows XP with 12.4 drivers went from 2.5KH/s to 10KH/s. Hell yeah, a 4x increase. Don’t try to use this kernel for NVIDIA. It fails to compile the vectorised ChaCha code.

                            Rename and put into any miner with the NeoScrypt support. Tested on cgminer v3.7.7, works fine.

                            I’ve never worked on 6xxx, but isn’t shuffle cheap? One shuffle for the permutation, keep it through all the salsa rounds + XOR ops, one shuffle to fix it. Seems like it’d be worth it - of course, unrolling will deliver a lot, probably.

                            1 Reply Last reply Reply Quote 0
                            • W
                              Wolf0 Regular Member last edited by

                              It seems my 280X and 290X do like parallel chacha - I just needed to tweak it a bit more. Code size seems about the same, though, small speedup on execution time, I think.

                              1 Reply Last reply Reply Quote 0
                              • W
                                Wolf0 Regular Member last edited by

                                OH MY GOD. I’ve been staring at this code for ages, and it only JUST NOW occurred to me that SMix() is parallelizable. Not the internals of SMix, of course, but the two calls to it…

                                1 Reply Last reply Reply Quote 0
                                • C
                                  cisahasa last edited by

                                  i really hope wolf you are not doing this just for your self…

                                  you would gain more if you release your work, im sure people here would like to collect some bounty for your work to you release latest kernels.

                                  these people here are fair people.

                                  1 Reply Last reply Reply Quote 0
                                  • W
                                    Wolf0 Regular Member last edited by

                                    i really hope wolf you are not doing this just for your self…

                                    you would gain more if you release your work, im sure people here would like to collect some bounty for your work to you release latest kernels.

                                    these people here are fair people.

                                    I’m doing this because it’s interesting. Also, SMix being parallelizable hardly matters unless you split it into 3 kernels, which is doable, but idk what the overhead on the kernel launches would be…

                                    1 Reply Last reply Reply Quote 0
                                    • T
                                      T4rQu1N Regular Member last edited by

                                      Quick question, in my bat file, how do I specify different values for say -i or -w so that my two cards (which are different) have different settings?

                                      -i 14,15?

                                      -w 48, 72?

                                      Kind regards,

                                      T4

                                      1 Reply Last reply Reply Quote 0
                                      • E
                                        einkerl last edited by

                                        Quick question, in my bat file, how do I specify different values for say -i or -w so that my two cards (which are different) have different settings?

                                        -i 14,15?

                                        -w 48, 72?

                                        Kind regards,

                                        T4

                                        “intensity” : “18,18,18”,
                                        “worksize” : “256,128,256”,

                                        specify in your .conf

                                        1 Reply Last reply Reply Quote 0
                                        • ghostlander
                                          ghostlander Regular Member last edited by

                                          neoscrypt_vliw.cl v2

                                          It’s 19.5KH/s now on a HD6970. FastKDF and BLAKE2s have been cleaned up and optimised, memory requirements reduced.

                                          OH MY GOD. I’ve been staring at this code for ages, and it only JUST NOW occurred to me that SMix() is parallelizable. Not the internals of SMix, of course, but the two calls to it…

                                          Yeah, I’ve mentioned this in my white paper. Not sure if it’s of any use for mining.

                                          I’ve never worked on 6xxx, but isn’t shuffle cheap? One shuffle for the permutation, keep it through all the salsa rounds + XOR ops, one shuffle to fix it. Seems like it’d be worth it - of course, unrolling will deliver a lot, probably.

                                          It is, but that’s not what concerns me now. With FastKDF removed, the kernel gets reduced in size by ~60% and outputs 30KH/s.That’s a big overhead, but not critical and I’ve expected more out of ChaCha + Salsa. With ChaCha only enabled, it’s 58KH/s and with Salsa only = 56KH/s. Scalar Salsa isn’t supposed to be about as fast as vectorised ChaCha. It’s clearly scalar because the AMD compiler isn’t really smart and the kernel size is about double of ChaCha only size. Anyway, there is a huge bottleneck somewhere and it needs to be identified.

                                          1 Reply Last reply Reply Quote 0
                                          • W
                                            Wolf0 Regular Member last edited by

                                            neoscrypt_vliw.cl v2

                                            It’s 19.5KH/s now on a HD6970. FastKDF and BLAKE2s have been cleaned up and optimised, memory requirements reduced.

                                            Yeah, I’ve mentioned this in my white paper. Not sure if it’s of any use for mining.

                                            It is, but that’s not what concerns me now. With FastKDF removed, the kernel gets reduced in size by ~60% and outputs 30KH/s.That’s a big overhead, but not critical and I’ve expected more out of ChaCha + Salsa. With ChaCha only enabled, it’s 58KH/s and with Salsa only = 56KH/s. Scalar Salsa isn’t supposed to be about as fast as vectorised ChaCha. It’s clearly scalar because the AMD compiler isn’t really smart and the kernel size is about double of ChaCha only size. Anyway, there is a huge bottleneck somewhere and it needs to be identified.

                                            I have a really hard time reading your style, but the code is pretty good! Don’t you think that bottleneck is waiting for global memory, though?

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post