Samsung’s DeepSort is Sort of a big deal

on November 26, 2014
Share open/close
URL Copied.

We expect computers to always give us accurate search results within a less than a half second. Forget that computers need to process practically infinite amount of raw data on the internet, we want it, and we want it now. Actually, we expect this search operation to get even quicker and more accurate as data piles up exponentially. Is this even fair?

Well, sort of.

To give you search results, computers first gather relevant information out of gazillion of raw data on the internet and gets them ready for processing. It’s like getting the blocks out of your Lego box and dividing them according to color and size to find them faster when you’re building the set. This process is called ‘sorting’ or ‘sorting algorithm’. Therefore, better sorting means faster and more accurate search results. 

How we are using it

Sorting is not just for search engines websites. Sorting is an essential part of improving the quality of apps or services we use daily. For example, Samsung’s Smart TVs and Milk Music rely on sorting to recommend TV shows or music based on your preference and history. Many internet connected apps or services that provide users with the desired information is possible because of sorting. And as we face the inevitable era of IoT, sorting is more relevant than ever. So, sorting is sort of a big deal.

Milk Music

Samsung breaks record with its new sorting engine

That is why Samsung winning the Sort Benchmark Competition 2014 in the Daytona Minute Sort category with DeepSort, a sorting engine, is big news. Samsung completed the benchmark with 3.7 Tera bytes of data sorted, which is twice the competition’s previous record for how much data a system can sort within a minute. DeepSort significantly improved performance efficiency and resource utilization of data processing in a large-scale data center.

3.7TB=1900 DVD movies (approx. 2gigdvd)

More impressive is the fact that this was all done with applicability and cost efficiency in mind – Samsung didn’t build a new state-of-the-art gigantic system just for the competition, it used only 384 commodity servers with HDDs. By focusing on software, the team came up with a system that’s scalable up to at least 1,000 servers with room for further improvement with better hardware (like SSDs).

So why this is really interesting

Now here are two facts that make things even more interesting.

– Last October, Samsung set 5G Speed Record at 7.5Gbps, which is over 30 times faster than 4G LTE.

– Last October Samsung announced the development of its 60 GHz WI-Fi technology, a five-fold increase from 866Mbps, or 108MB per second the maximum speed possible with existing consumer electronics devices.

What we see is clear. Samsung has been smashing records at the fields that are indispensable in realizing the Internet of Things, the ‘internet’. The essence of a majority of all intensive cloud application is to move distributed data around in some order and sorting is the most intensive of all these applications. With DeepSort, Samsung is emphasizing that it wants to establish solid foundation for IoT. Let’s not forget that Samsung also announced several Smart Home/IoT related SDKs at this year’s SDC including Smart Home SDKSamsung IoT SDK, and Smart Connectivity SDK for apps and developers.

So, what is DeepSort?

DeepSort is a ‘scalable and efficiency-optimized distributed general sorting engine.’ DeepSort enables a fluent data flow that shares the limited memory space and minimizes data movement, which makes it to be highly efficient at a large scale. 

Basically, it maximizes utility of the hardware’s capabilities by eliminating bottlenecks between processor/memory/storage within the server (optimized sorting algorithm), and at the connections between the servers (network optimization). The following is a bit ‘techier’:

DeepSort has two distinctive design features that make it a superior sorting engine: utilizing User-level lightweight thread* and minimizing the data movement between hard disc drive and memory**. A record is first fetched from the source disk to the memory, sorted, and distributed to the destination node, merged with other records based on order, and finally written to the destination disk.

*Utilizing User-level Light-weight Thread

Utilizing User-level Light-weight Thread


Parallelism at all sorting phases and components is maximized through properly exposing program concurrency to thousands of user-level lightweight threads. As a result, computation, communication, and storage tasks could be highly overlapped with low switching overhead. This is an advanced version of what many system designers call ‘fat tree’, a structure where servers aren’t connected parallel but more openly through a ‘fat tree’ of networks allowing the data to flow where ever computing power is readily available.

**Minimizing the Data Movement between Hard Disc Drive and Memory

pipeline stages of the DeepSort design

Data movements in both hard drives and memories are minimized through an optimized data flow design that maximizes memory and cache usage. Specifically, we avoid using disks for data buffering as much as possible. For data buffered in memory, we separate key data that need frequent manipulation from payload data that are mostly still.

For those less techy, this is how Zheng Li, Research Engineer, Cloud Research Lab, Samsung Research America Silicon Valley, and Juhan Lee, Vice President, Intelligence Solution Team, Samsung Software R&D Center, the people behind DeepSort, describes the overall design of the DeepSort in DeepSort: Scalable Sorting with High Efficiency:

The people behind DeepSort - Juhan Lee (Vice President, Intelligence Solution Team, Samsung Software R&D Center) and Zheng Li (Research Engineer, Cloud Research Lab, Samsung Research America Silicon Valley)

“A record is first fetched from the source disk to the memory, sorted, and distributed to the destination node, merged with other records based on order, and finally written to the destination disk. When the amount of the data is larger than the aggregated capacity of memory, multiple rounds of intermediate sorts are executed. The final round merges spilled intermediate data from previous rounds. The inputs of the un-sorted records are distributed evenly across nodes, and the outputs are also distributed based on key partitions.”

DeepSort set the record using 384 server nodes for the Sort Benchmark Competition 2014 with each server featuring CPU: 2.1GHz Intel Xeon 6-core processor, 64GB memory, 8 * 7200 RPM HDD, 10Gbps Ethernet Port and CentOS 6.4/ext4 file system. As discussed above, DeepSort would have put out an even better performance with more servers or better hardware, but that’s not the point. Anyhow, with DeepSort, Samsung found an optimized balance of algorithm, network, and server for efficient faster and efficient data processing, which is exciting for many.

The technology behind DeepSort is not really easy to comprehend completely, but knowing how it can change our lifestyle is definitely something to get excited about. As far as excitement goes, Lee and his colleagues at the Cloud Research Lab of Samsung Research America, Silicon Valley were very excited that they were able to develop DeepSort as they take pride of their commitments to provide the cutting-edge software technologies. He was also happy with the results DeepSort produced and couldn’t have done it without the strong relationship the HQ and Cloud Research Lab has.

Corporate > Technology

For any issues related to customer service, please go to Customer Support page for assistance.
For media inquiries, please click Media Contact to move to the form.