Difference between Scan and Query in Amazon DynamoDB? As Cloud Developer should know » –

For those of you who do programming related to many database manipulation procedures, read, write and optimize operations are a daily occurrence. But do you know the difference between Query and Scan in the database? In this article, I will learn about the difference between Scan and Query in the database, specifically Amazon DynamoDB is a NoSQL of AWS.

Scan

Scan operation is a query that needs to scan the entire table (accessing every item in a table) or secondary index to return one or more items and the attributes associated with that result.

Can retrieve maximum size limit of 1MB.

Scan uses the default eventual read consistency, although you can request Strong read consistency through the AWS API when starting the scan.

In general, Scan operation of a table in DynamoDB is an expensive request, greatly affecting the provisioned capacity, specifically the READ capacity. In practical terms, you can apply the following ways to minimize the negative impact on the performance of DynamoDB tables when using SCAN:

  • Reduce page size – default page size is 1MB. I can configure this page size to be smaller to reduce the number of READs and create “pause” stops between these READs.
  • Isolate scan operation – this concept is simply that the app will link to two different tables to do different tasks. “mission critical” table to run main and “shadow” table to do SCAN related workloads.

Query

Query operation is to find an item based on the primary key (can combine both partition key and sort key).

Always return the result or empty if the condition is not met. Query results will be sorted by default by sort key.

Similar to SCAN, QUERY also returns data up to 1MB. I can query based on primary key, local secondary index, global secondary index.

  • LSI supports strong consistency if you enable it in API calls. The default is eventual consistency.
  • GSI only supports eventual consistency.

For fast response from DynamoDB you should design using Query rather than using SCAN. SCAN can be suitable for strong OLAP databases like Redshift.

Parallel Scans

  • Parallel scan is the action of the application itself scanning to the DynamoDB table at the same time. This is doable and allowable because in terms of storing DynamoDB tables in various AZs but having this many SCAN actions can be called “rape” resources quickly become depleted, namely taking all the READ Provisioned capacity of that table, resulting in Throttled.
  • In terms of application development DynamoDB scan functions can support the following two parameters:
    • TotalSegments – number of workers will access table concurrently
    • Segments – segment of table will be accessed by worker.

References

Leave a Reply