查看完整版本 : String Sorting

sswroom 2017-11-22 11:37 PM

String Sorting

見大家都在玩Integer Sorting, 我就來玩一下String Sorting~~
Compare Function是相當於Windows Explorer用來Sort File Name那種方式, 數字會跟數字的值來排, 其他字才跟字的碼來排
我是將MT19937的Random unsigned int32轉成String放入Array, 然後對這個String Array做Sorting
Sorting Algorithm是之前的Artificial Quick Sort再加少少修改, 包括在Insertion Sort部分由Sequencial Search變成Binary Search[code]Gen Random N = 10000000, seed = 0, time = 0.30180369931807s
Memory copy time = 0.018777617910824s
AQuickSort 8 Thread N = 10000000, Sort time = 2.8981403879059s

First 100 values: 109 683 810 892 1326 1593 1829 2605 3009 5078 5486 5696 5732 5746
6037 6109 6414 6446 7159 7673 7830 8001 8483 8697 9049 9315 9675 11385 11573 12660
12744 13394 13450 14233 14687 15118 16167 16496 16764 17227 17295 17674 17696 19768
20421 20750 20845 21026 21072 21780 22383 23554 23638 23690 24004 24038 24600 25773
25842 25977 26211 27511 27661 27825 28515 29126 29457 29669 29746 30065 30362 30777
31498 31893 32049 33301 33622 35071 35601 35874 35930 36181 36281 38216 38230 38959
39108 39261 39432 39597 39883 40053 41467 42239 43397 43944 44036 44292 45567 46415

Last 100 values: 4294924147 4294924892 4294925262 4294926173 4294927884 4294928230
4294928466 4294929402 4294929442 4294929735 4294930049 4294930149 4294930173 4294930410
4294931264 4294931677 4294931709 4294931883 4294932262 4294932514 4294932777 4294932780
4294933487 4294934160 4294934862 4294934991 4294935092 4294935909 4294936177 4294936203
4294936493 4294936644 4294939076 4294939235 4294939353 4294939406 4294939473 4294939887
4294940560 4294940770 4294941744 4294941995 4294942115 4294942345 4294942687 4294942847
4294943223 4294943764 4294945054 4294945062 4294945599 4294946199 4294946275 4294946933
4294947077 4294947162 4294948365 4294948447 4294949263 4294949330 4294949468 4294949939
4294950190 4294950267 4294951267 4294952942 4294953101 4294953168 4294953702 4294953994
4294954183 4294954263 4294955312 4294955608 4294956243 4294956804 4294957011 4294957600
4294958712 4294958890 4294958931 4294959282 4294959412 4294960236 4294961010 4294961341
4294961657 4294961687 4294961972 4294963066 4294963282 4294963799 4294964177 4294964304
4294964600 4294964710 4294964877 4294966080 4294966500 4294967281[/code]

煙民母親生賤種 2017-11-23 12:37 AM

可唔可以化返做 string 來讓大家參考下? :fst_008:

assembly.jc 2017-11-23 01:02 AM

是什麼 encoding ? ASCII ?

ncream 2017-11-23 06:39 AM

[quote]原帖由 [i]sswroom[/i] 於 2017-11-22 11:37 PM 發表 [url=http://computer.discuss.com.hk/redirect.php?goto=findpost&pid=471316680&ptid=27076882][img]http://computer.discuss.com.hk/images/common/back.gif[/img][/url]
見大家都在玩Integer Sorting, 我就來玩一下String Sorting~~
Compare Function是相當於Windows Explorer用來Sort File Name那種方式, 數字會跟數字的值來排, 其他字才跟字的碼來排
我是將MT19937的Random unsigne ... [/quote]

勁啊。ching:smile_o12:


[size=16px]可惜MT19937出唔到負數。若果出到signed number 正負數就perfect。[/size]

[[i] 本帖最後由 ncream 於 2017-11-23 07:33 AM 編輯 [/i]]

sswroom 2017-11-23 08:30 AM

[quote]原帖由 [i]ncream[/i] 於 2017-11-23 06:39 AM 發表 [url=http://computer.discuss.com.hk/redirect.php?goto=findpost&pid=471323542&ptid=27076882][img]http://computer.discuss.com.hk/images/common/back.gif[/img][/url]


勁啊。ching:smile_o12:


可惜MT19937出唔到負數。若果出到signed number 正負數就perfect。 [/quote]MT19937是可以出負數, 只是String Compare時'-'會當成文字另外排, 所以不太適合

sswroom 2017-11-23 08:37 AM

[quote]原帖由 [i]assembly.jc[/i] 於 2017-11-23 01:02 AM 發表 [url=http://computer.discuss.com.hk/redirect.php?goto=findpost&pid=471319762&ptid=27076882][img]http://computer.discuss.com.hk/images/common/back.gif[/img][/url]
是什麼 encoding ? ASCII ? [/quote]我有兩個版本做處理, 一個版本是UTF-16LE (WChar*), 另一個版本是跟OS的Code Page (Char*), 今次是Char*版本的速度

Susan﹏汪汪 2017-11-23 10:24 AM

Unicode normalization form係邊種

sswroom 2017-11-23 11:20 AM

[quote]原帖由 [i]Susan﹏汪汪[/i] 於 2017-11-23 10:24 AM 發表 [url=http://computer.discuss.com.hk/redirect.php?goto=findpost&pid=471329931&ptid=27076882][img]http://computer.discuss.com.hk/images/common/back.gif[/img][/url]
Unicode normalization form係邊種 [/quote]
現時沒有做Normalization, 是直接Char code compare

煙民母親生賤種 2017-11-23 12:45 PM

其實 sort string 應該幾複雜, 涉及不同 encoding ... :fst_016:唔同 encoding, 排法有分別:fst_005:

[attach]7596927[/attach]

[[i] 本帖最後由 煙民母親生賤種 於 2017-11-23 12:48 PM 編輯 [/i]]

sswroom 2017-11-23 02:45 PM

[quote]原帖由 [i]煙民母親生賤種[/i] 於 2017-11-23 12:45 PM 發表 [url=http://computer.discuss.com.hk/redirect.php?goto=findpost&pid=471336363&ptid=27076882][img]http://computer.discuss.com.hk/images/common/back.gif[/img][/url]
其實 sort string 應該幾複雜, 涉及不同 encoding ... :fst_016:唔同 encoding, 排法有分別:fst_005:

7596927 [/quote]與Encoding關係不大, 理論上是只與Locale有關。但實際應用上, Locale Compare Function作用不是很大, 而且Implement所需時間很多, 所以一般不會做。例如港式的中文理論上是跟筆劃數和筆順排序, 但大部分人習慣了分Char Code的次序排, 跟足Locale反而可能會不適應。

我現時的String Compare Function大約是:[code]OSInt MyString_StrCompare(const Char *str1, const Char *str2)
{
    OSInt i = 0;
    OSInt j = 0;
    while (1)
    {
        if (*str1 == 0 && *str2 == 0)
            return 0;
        if (*str1 == 0)
            return -1;
        if (*str2 == 0)
            return 1;

        if (*str1 >= 0x30 && *str1 <= 0x39 && *str2 >= 0x30 && *str2 <= 0x39)
        {
            i = 0;
            j = 0;
            while (*str1 >= 0x30 && *str1 <= 0x39)
            {
                i = i * 10 + ((*str1++) - 0x30);
            }
            while (*str2 >= 0x30 && *str2 <= 0x39)
            {
                j = j * 10 + ((*str2++) - 0x30);
            }

            if (i > j)
                return 1;
            else if (i < j)
                return -1;
        }
        else if (*str1 > *str2)
        {
            return 1;
        }
        else if (*str1 < *str2)
        {
            return -1;
        }
        else
        {
            str1++;
            str2++;
        }
    }
    return 0;
}[/code]

煙民母親生賤種 2017-11-23 11:37 PM

咁你排出來的效果是那一種? (以我貼圖例子為例) :fst_007:

sswroom 2017-11-24 12:40 AM

String Sorting算是比較慢的Compare Function, Compare的次數對整體速度影響很大....

不同的Sorting Algorithm的速度如下:

InsertSort = Insertion Sort

InsertSortB = Insertion Sort with Binary Search[code]AQuickSort 1 Thread N = 128, Sort time = 4.1568660742645e-5s, Result sorted
AQuickSort 1 Thread N = 256, Sort time = 5.4156353784431e-5s, Result sorted
AQuickSort 1 Thread N = 512, Sort time = 1.7154390982528e-4s, Result sorted
AQuickSort 1 Thread N = 1024, Sort time = 3.0971579623745e-4s, Result sorted
AQuickSort 1 Thread N = 2048, Sort time = 6.1708969609503e-4s, Result sorted
AQuickSort 1 Thread N = 4096, Sort time = 0.0012485235075168s, Result sorted
AQuickSort 1 Thread N = 8192, Sort time = 0.0026290714234485s, Result sorted
AQuickSort 1 Thread N = 16384, Sort time = 0.0054987726999284s, Result sorted
AQuickSort 1 Thread N = 32768, Sort time = 0.015915820534626s, Result sorted
AQuickSort 1 Thread N = 65536, Sort time = 0.030778665909454s, Result sorted
AQuickSort 1 Thread N = 131072, Sort time = 0.062197547741753s, Result sorted
AQuickSort 1 Thread N = 262144, Sort time = 0.14145551787379s, Result sorted
AQuickSort 1 Thread N = 524288, Sort time = 0.36745378779784s, Result sorted
AQuickSort 1 Thread N = 1048576, Sort time = 0.82796136456447s, Result sorted
AQuickSort 1 Thread N = 2097152, Sort time = 2.0375300604356s, Result sorted
AQuickSort 8 Thread N = 128, Sort time = 2.9859178843308e-5s, Result sorted
AQuickSort 8 Thread N = 256, Sort time = 5.3570879689465e-5s, Result sorted
AQuickSort 8 Thread N = 512, Sort time = 9.1333958814825e-5s, Result sorted
AQuickSort 8 Thread N = 1024, Sort time = 1.5749253154608e-4s, Result sorted
AQuickSort 8 Thread N = 2048, Sort time = 3.2728001908646e-4s, Result sorted
AQuickSort 8 Thread N = 4096, Sort time = 5.8401040972941e-4s, Result sorted
AQuickSort 8 Thread N = 8192, Sort time = 0.001261989411701s, Result sorted
AQuickSort 8 Thread N = 16384, Sort time = 0.0024645532027629s, Result sorted
AQuickSort 8 Thread N = 32768, Sort time = 0.0054847213216492s, Result sorted
AQuickSort 8 Thread N = 65536, Sort time = 0.011898297294963s, Result sorted
AQuickSort 8 Thread N = 131072, Sort time = 0.023348706907277s, Result sorted
AQuickSort 8 Thread N = 262144, Sort time = 0.045118390180428s, Result sorted
AQuickSort 8 Thread N = 524288, Sort time = 0.098310760867497s, Result sorted
AQuickSort 8 Thread N = 1048576, Sort time = 0.19133000686468s, Result sorted
AQuickSort 8 Thread N = 2097152, Sort time = 0.50441754841505s, Result sorted
AQuickSort 8 Thread N = 4194304, Sort time = 1.443204768101s, Result sorted
InsertSort 1 Thread N = 16, Sort time = 1.7564222849005e-6s, Result sorted
InsertSort 1 Thread N = 32, Sort time = 5.5620039021848e-6s, Result sorted
InsertSort 1 Thread N = 64, Sort time = 1.8149696943972e-5s, Result sorted
InsertSort 1 Thread N = 128, Sort time = 7.9624476915488e-5s, Result sorted
InsertSort 1 Thread N = 256, Sort time = 2.8805325472368e-4s, Result sorted
InsertSort 1 Thread N = 512, Sort time = 9.8945122049394e-4s, Result sorted
InsertSort 1 Thread N = 1024, Sort time = 0.0039908841683414s, Result sorted
InsertSort 1 Thread N = 2048, Sort time = 0.017605206035652s, Result sorted
InsertSort 1 Thread N = 4096, Sort time = 0.064560228451992s, Result sorted
InsertSort 1 Thread N = 8192, Sort time = 0.26752711842824s, Result sorted
InsertSort 1 Thread N = 16384, Sort time = 1.1145064380195s, Result sorted
InsertSortB 1 Thread N = 16, Sort time = 8.2259110342839e-5s, Result sorted
InsertSortB 1 Thread N = 32, Sort time = 3.512844569801e-6s, Result sorted
InsertSortB 1 Thread N = 64, Sort time = 8.1966373295356e-6s, Result sorted
InsertSortB 1 Thread N = 128, Sort time = 2.9859178843308e-5s, Result sorted
InsertSortB 1 Thread N = 256, Sort time = 4.8594349882247e-5s, Result sorted
InsertSortB 1 Thread N = 512, Sort time = 1.129965003286e-4s, Result sorted
InsertSortB 1 Thread N = 1024, Sort time = 2.8746778062871e-4s, Result sorted
InsertSortB 1 Thread N = 2048, Sort time = 7.6755653850151e-4s, Result sorted
InsertSortB 1 Thread N = 4096, Sort time = 0.0022297780906812s, Result sorted
InsertSortB 1 Thread N = 8192, Sort time = 0.0077323563722269s, Result sorted
InsertSortB 1 Thread N = 16384, Sort time = 0.02633316110637s, Result sorted
InsertSortB 1 Thread N = 32768, Sort time = 0.098792606047655s, Result sorted
InsertSortB 1 Thread N = 65536, Sort time = 0.40127223520836s, Result sorted
InsertSortB 1 Thread N = 131072, Sort time = 1.6766136763821s, Result sorted
BubbleSort 1 Thread N = 16, Sort time = 2.6346334273507e-6s, Result sorted
BubbleSort 1 Thread N = 32, Sort time = 9.6603225669526e-6s, Result sorted
BubbleSort 1 Thread N = 64, Sort time = 5.5912776069332e-5s, Result sorted
BubbleSort 1 Thread N = 128, Sort time = 1.5632158335614e-4s, Result sorted
BubbleSort 1 Thread N = 256, Sort time = 6.9817785824794e-4s, Result sorted
BubbleSort 1 Thread N = 512, Sort time = 0.002682056829043s, Result sorted
BubbleSort 1 Thread N = 1024, Sort time = 0.010500770630277s, Result sorted
BubbleSort 1 Thread N = 2048, Sort time = 0.041553731153223s, Result sorted
BubbleSort 1 Thread N = 4096, Sort time = 0.1637076318012s, Result sorted
BubbleSort 1 Thread N = 8192, Sort time = 0.66039370205516s, Result sorted
BubbleSort 1 Thread N = 16384, Sort time = 2.6588688347748s, Result sorted[/code]

sswroom 2017-11-25 07:39 PM

試了一下Bitonic Sort, 速度比Insertion Sort快, 但比Quick Sort慢......暫時Comparison Sorting之中, 似乎以Quick Sort最快[code]BitonicSort 1 Thread N = 16, Sort time = 2.9273704748341e-6s, Result sorted
BitonicSort 1 Thread N = 32, Sort time = 7.0256891396019e-6s, Result sorted
BitonicSort 1 Thread N = 64, Sort time = 2.6346334273507e-5s, Result sorted
BitonicSort 1 Thread N = 128, Sort time = 4.5374242359929e-5s, Result sorted
BitonicSort 1 Thread N = 256, Sort time = 1.6364000954323e-4s, Result sorted
BitonicSort 1 Thread N = 512, Sort time = 3.5421182745493e-4s, Result sorted
BitonicSort 1 Thread N = 1024, Sort time = 6.1504053676265e-4s, Result sorted
BitonicSort 1 Thread N = 2048, Sort time = 0.001517548854154s, Result sorted
BitonicSort 1 Thread N = 4096, Sort time = 0.0032915353619035s, Result sorted
BitonicSort 1 Thread N = 8192, Sort time = 0.0091134897622536s, Result sorted
BitonicSort 1 Thread N = 16384, Sort time = 0.0181406220955s, Result sorted
BitonicSort 1 Thread N = 32768, Sort time = 0.040440452161644s, Result sorted
BitonicSort 1 Thread N = 65536, Sort time = 0.09538602502609s, Result sorted
BitonicSort 1 Thread N = 131072, Sort time = 0.21572963977243s, Result sorted
BitonicSort 1 Thread N = 262144, Sort time = 0.5081402854479s, Result sorted
BitonicSort 1 Thread N = 524288, Sort time = 1.1644889469809s, Result sorted[/code]

Susan﹏汪汪 2017-11-25 08:14 PM

[quote]原帖由 [i]sswroom[/i] 於 2017-11-25 07:39 PM 發表 [url=http://www.discuss.com.hk/redirect.php?goto=findpost&pid=471445197&ptid=27076882][img]http://www.discuss.com.hk/images/common/back.gif[/img][/url]
試了一下Bitonic Sort, 速度比Insertion Sort快, 但比Quick Sort慢......暫時Comparison Sorting之中, 似乎以Quick Sort最快BitonicSort 1 Thread N = 16, Sort time = 2.9273704748341e-6s, Result sorted
Bitonic ... [/quote]
你的Bitonic Sort係single thread?

Bitonic Sort用single thread的複雜度係O( n log^2(n) )

parallel 計算先做到O( log^2(n) )

[[i] 本帖最後由 Susan﹏汪汪 於 2017-11-25 08:15 PM 編輯 [/i]]

煙民母親生賤種 2017-11-25 09:08 PM

[quote]原帖由 [i]Susan﹏汪汪[/i] 於 2017-11-25 08:14 PM 發表 [url=http://www.discuss.com.hk/redirect.php?goto=findpost&pid=471446476&ptid=27076882][img]http://www.discuss.com.hk/images/common/back.gif[/img][/url]

你的Bitonic Sort係single thread?

Bitonic Sort用single thread的複雜度係O( n log^2(n) )

parallel 計算先做到O( log^2(n) ) [/quote]

沒用的。煙調個奧貧屍佬版本的必劏你索都係慢過喬打果個威的索幾倍。除非你煙皮文果個好得過煙調果個。別忘記人地以經係知皮烏版。必劏你索最大弱點係要先做別劏你屍箘事!

sswroom 2017-11-25 09:19 PM

[quote]原帖由 [i]Susan﹏汪汪[/i] 於 2017-11-25 08:14 PM 發表 [url=http://computer.discuss.com.hk/redirect.php?goto=findpost&pid=471446476&ptid=27076882][img]http://computer.discuss.com.hk/images/common/back.gif[/img][/url]

你的Bitonic Sort係single thread?

Bitonic Sort用single thread的複雜度係O( n log^2(n) )

parallel 計算先做到O( log^2(n) ) [/quote]Single Thread與Multi Thread的複雜度理論上應該是沒有分別, 只是同時可以處理多組Data。我的CPU是8 Threads, 所以Optimize得好的話應該是現在時間的1/8左右(如果不用Thread Synchronization的話)。

煙民母親生賤種 2017-11-25 09:56 PM

[quote]原帖由 [i]sswroom[/i] 於 2017-11-25 09:19 PM 發表 [url=http://www.discuss.com.hk/redirect.php?goto=findpost&pid=471449004&ptid=27076882][img]http://www.discuss.com.hk/images/common/back.gif[/img][/url]
Single Thread與Multi Thread的複雜度理論上應該是沒有分別, 只是同時可以處理多組Data。我的CPU是8 Threads, 所以Optimize得好的話應該是現在時間的1/8左右(如果不用Thread Synchronization的話)。 [/quote]

必劏你屍群事基本每兩個數就要一個筆化。十萬個數就要五萬個。亦即最差要五萬乘二讀二寫。仲要計最大及最小值。不論埃奧及計算需求都很大。而且係階段相依,不利多綫程。所以索停要快, 一定係機械式的鴿巢索或威的索。

ncream 2017-11-25 10:03 PM

[quote]原帖由 [i]煙民母親生賤種[/i] 於 2017-11-25 09:56 PM 發表 [url=http://computer.discuss.com.hk/redirect.php?goto=findpost&pid=471450683&ptid=27076882][img]http://computer.discuss.com.hk/images/common/back.gif[/img][/url]


必劏你屍群事基本每兩個數就要一個筆化。十萬個數就要五萬個。亦即最差要五萬乘二讀二寫。仲要計最大及最小值。不論埃奧及計算需求都很大。而且係階段相依,不利多綫程。所以索停要快, 一定係機械式 ... [/quote]


想索停快唔係買個新GPU咩!:smile_o01::smile_o13:

Susan﹏汪汪 2017-11-25 10:19 PM

[quote]原帖由 [i]sswroom[/i] 於 2017-11-25 09:19 PM 發表 [url=http://www.discuss.com.hk/redirect.php?goto=findpost&pid=471449004&ptid=27076882][img]http://www.discuss.com.hk/images/common/back.gif[/img][/url]
Single Thread與Multi Thread的複雜度理論上應該是沒有分別, 只是同時可以處理多組Data。我的CPU是8 Threads, 所以Optimize得好的話應該是現在時間的1/8左右(如果不用Thread Synchronization的話)。 [/quote]
有分別

Wiki有寫[quote]
The resulting sorting networks consist of
O(n\log ^{2}(n)) comparators and have a delay of
O(\log ^{2}(n)), where
n is the number of items to be sorted.[/quote]

compare 次數係n log^2(n)次
但因為並行關係、delay時間係log^2(n)

留意下wiki寫的複雜度係有分parallel time同non-parallel Time
[attach]7605950[/attach]

[[i] 本帖最後由 Susan﹏汪汪 於 2017-11-25 10:21 PM 編輯 [/i]]

sswroom 2017-11-25 11:14 PM

[quote]原帖由 [i]Susan﹏汪汪[/i] 於 2017-11-25 10:19 PM 發表 [url=http://computer.discuss.com.hk/redirect.php?goto=findpost&pid=471451882&ptid=27076882][img]http://computer.discuss.com.hk/images/common/back.gif[/img][/url]

有分別

Wiki有寫

compare 次數係n log^2(n)次
但因為並行關係、delay時間係log^2(n)

留意下wiki寫的複雜度係有分parallel time同non-parallel Time
7605950 [/quote]這裏是寫Performance的分別, 不是Complexity的分別, 還有是假設是無限Processing Core下的......

Susan﹏汪汪 2017-11-26 12:11 AM

[quote]原帖由 [i]sswroom[/i] 於 2017-11-25 11:14 PM 發表 [url=http://www.discuss.com.hk/redirect.php?goto=findpost&pid=471454774&ptid=27076882][img]http://www.discuss.com.hk/images/common/back.gif[/img][/url]
這裏是寫Performance的分別, 不是Complexity的分別, 還有是假設是無限Processing Core下的...... [/quote]
Wiki 果到寫Performance用嘅係大O符號
即係complexity

Click 入條link去睇解釋都係講話time complexity的意思
https://en.m.wikipedia.org/wiki/Best,_worst_and_average_case

就算係quick sort都好
可以見到一樣係講話worst-case, best-case同average performance
[attach]7606367[/attach]

同埋最後果行係space complexity
不是講緊速度

所以performance果三行先係講緊個算法的複雜度

[[i] 本帖最後由 Susan﹏汪汪 於 2017-11-26 12:15 AM 編輯 [/i]]
頁: [1]
查看完整版本: String Sorting