google-site-verification: googlebaca44933768a824.html memcpy/memset - Old Royal Hack Forum

Announcement

Collapse
No announcement yet.

memcpy/memset

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    memcpy/memset

    Keep in mind there's a couple things to remember:

    1) The compiler usually does an intrinsic implementation which is very efficient if your project flags are set. With regards to the code and stuff posted here the implementation that the compiler does is also usually inlined saving us from setting up a new stack frame. However what's in this thread is almost as fast. If super desirable use SSE to implement an ultra fast 128 bit memcpy/memset.

    2) You have to disable link time code generation in the property page for the memory.cpp ( Whole Program Optimization to no ) file or wherever the code is located. All other optimizations may and preferably be kept on. This bypasses that pesky error that everybody complains about when trying to implement memset which is undefined external _memset.

    The compiler will output the efficient rep stosb and rep stosd for the code. The bitwise AND quickly determines modulus.

    memory.h

    Code:
    //========================================================================================
    #pragma once
    //========================================================================================
    extern "C" void * __cdecl memcpy ( void*, const void*, size_t );
    extern "C" void * __cdecl memset ( void*, int, size_t );
    //========================================================================================
    void* __cdecl memcpy ( void* _Dest, const void* _Source, size_t _Size );
    void* __cdecl memset ( void* _Dest, int _Val, size_t _Size );
    //===============================================================
    memory.cpp, remember disable whole program optimization in the property page

    Code:
    //========================================================================================
    #pragma function ( memcpy, memset ); // tell the compiler to not use the intrinsic form
    //========================================================================================
    void* __cdecl memcpy ( void* _Dest, const void* _Source, size_t _Size )
    {
    	unsigned int uiBufferSize, uiRemainderSource, uiRemainderDest;
    
    	PBYTE pbDest = ( PBYTE )_Dest;
    	PBYTE pbSource = ( PBYTE )_Source;
    
    	PDWORD pdwDest = ( PDWORD )_Dest;
    	PDWORD pdwSource = ( PDWORD )_Source;
    
    	uiRemainderSource = ( sizeof ( DWORD ) - ( ( DWORD )_Source & ( sizeof ( DWORD ) -1 ) ) );
    
    	uiRemainderDest = ( sizeof ( DWORD ) - ( ( DWORD )_Dest & ( sizeof ( DWORD ) -1 ) ) );
    
    	if ( uiRemainderSource == uiRemainderDest
    		&& uiRemainderSource != 0 && _Size >= uiRemainderSource  )
    	{		
    		_Size -= uiRemainderSource;
    
    		while ( uiRemainderSource-- )
    		{
    			*pbDest++ = *pbSource++;
    		}
    
    		pdwDest = ( PDWORD )pbDest;
    		pdwSource = ( PDWORD )pbSource;
    	}
    
    	// see how many dwords can fit in the space
    		
    	uiBufferSize = _Size >> 2;
    
    	if ( uiBufferSize != 0 )
    	{
    		_Size -= ( uiBufferSize << 2 );
    
    		while ( uiBufferSize-- )
    		{
    			*pdwDest++ = *pdwSource++;
    		}
    	}
    	if ( _Size != 0 ) // we still have space left over
    	{		
    		pbDest = ( PBYTE )pdwDest;
    		pbSource = ( PBYTE )pdwSource;
    
    		while ( _Size-- )
    		{
    			*pbDest++ = *pbSource++;
    		}
    	}
    
    	return _Dest;
    }
    //========================================================================================
    void* __cdecl memset ( void* _Dest, int _Val, size_t _Size )
    {
    	unsigned int uiBufferSize, uiRemainder;
    	
    	PDWORD pdwDest = ( PDWORD )_Dest;
    
    	PBYTE pbDest = ( PBYTE )_Dest;
    	
    	uiRemainder = ( sizeof ( DWORD ) - ( ( DWORD )_Dest & ( sizeof ( DWORD ) -1 ) ) );
    		
    	if ( uiRemainder != 0 && _Size >= uiRemainder ) // unaligned memory
    	{		
    		_Size -= uiRemainder;
    			
    		while ( uiRemainder-- ) // get us on aligned memory
    		{
    			*pbDest++ = _Val;
    		}
    
    		pdwDest = ( PDWORD )pbDest;
    	}
    	
    	// see how many dwords can fit in the space
    	
    	uiBufferSize = _Size >> 2;
    
    	if ( uiBufferSize != 0 )
    	{
    		_Size -= ( uiBufferSize << 2 );
    
    		while ( uiBufferSize-- )
    		{
    			*pdwDest++ = _Val;
    		}
    	}
    	
    	if ( _Size != 0 ) // we still have space left over
    	{
    		pbDest = ( PBYTE )pdwDest;
    				
    		while ( _Size-- )
    		{
    			*pbDest++ = _Val;
    		}
    	}
    		
    	return _Dest;
    }

    #2
    Re: memcpy/memset

    How i can code my l33t p4yh4GS with this?

    Comment


      #3
      Re: memcpy/memset

      here we go again..
      *Tom: badazz b trollin n hatin , sellin crack , hiding smack and now having to jack cause life gone all whack*
      www.myg0t.com

      Comment


        #4
        Re: memcpy/memset

        Originally posted by GetJump View Post
        How i can code my l33t p4yh4GS with this?
        you need to copypaste this into your hack and change all memcpy calls in your detour-class to use wav's implementation, guaranteed proofens.
        lolmaoman: Germans are born with a lifetime x22 login engraved into their birth certificates. True story.
        I DONT HAVE TEAMVIEWER AND IM NOT GOING TO GIVE ANY 24/7 ONLINE SUPPORT VIA STEAM, XFIRE OR OTHER IM PROGRAMS SO DONT BOTHER ASKING. THANKS.

        Comment


          #5
          Re: memcpy/memset

          uses ring-1 proofens via hypervisior haq

          Comment


            #6
            Re: memcpy/memset

            can i beat wav for lulz? cuz his codens is fat & slow bs :troll:
            true memcpys: fast as shit on fan: on i7 faster than any kind of sse100600 implementation

            PHP Code:
            #include <intrin.h>

            #undef memcpy
            #undef memset
            #undef memcmp

            #define memcpy(x,y,z) \
                
            __movsb((unsigned char *)(x), (const unsigned char *)(y), (z))

            __forceinline void _memcpy(void *dst, const void *src, const size_t count)  <---- global opt ON count arg must be immediate constant
            {
            #ifdef _AMD64_
                
            if(!(count 7))
                    
            __movsq((unsigned __int64 *)dst, (const unsigned __int64 *)srccount >> 3);
                else
            #endif
                
            if(!(count 3))
                    
            __movsd((unsigned long *)dst, (const unsigned long *)srccount >> 2);
                else
                    
            __movsb((unsigned char *)dst, (const unsigned char *)srccount);
            }

            #define memset(x,y,z) \
                
            __stosb((unsigned char *)(x), (unsigned char)(y), (z))

            __forceinline void memset0(void *dst, const size_t count)
            {
            #ifdef _AMD64_
                
            if(!(count 7))
                    
            __stosq((unsigned __int64 *)dst0count >> 3);
                else
            #endif
                
            if(!(count 3))
                    
            __stosd((unsigned long *)dst0count >> 2);
                else
                    
            __stosb((unsigned char *)dst0count);

            Comment


              #7
              Re: memcpy/memset

              mattdog unaligned memory there bud

              your code doesn't handle it and the compiler will auto optimize a lot of my code anyway

              SSE implementation is only worth doing if you're moving 16 bytes at a time

              Comment


                #8
                Re: memcpy/memset

                Originally posted by MattDog View Post
                can i beat wav for lulz?
                With intrinsics? I doubt it.

                Comment


                  #9
                  Re: memcpy/memset

                  Originally posted by wav View Post
                  mattdog unaligned memory there bud

                  your code doesn't handle it and the compiler will auto optimize a lot of my code anyway

                  SSE implementation is only worth doing if you're moving 16 bytes at a time
                  srsly, who cares about alignment? checks, math, jmps = cpu time
                  rep movs faster in any case - aligned, cached, swapped, wutever fcuked mem (~80% faster vs sse2 memcpy if copy some huge shits)

                  tested on i7, 4|page aligned mem

                  Comment


                    #10
                    Re: memcpy/memset

                    now test it on unaligned memory

                    writing memset/memcpy that just bursts rep movs on unaligned memory is asking for trouble

                    cache miss is the least of your worries, especially in kernel mode

                    and tsx doc speaks against REP prefix, which makes my code bad as well

                    Comment

                    Working...
                    X