How to Get Amazon ASINs from Amazon URLs in Ruby

Amazon identifies every product for sale with a ten-digit alphanumeric code called an ASIN (Amazon Standard Identification Number). Whenever you visit a product page on Amazon, these number is somewhere in the URL; however, it's not always in the same place in the URL. This makes it a little bit hard to pull the ASIN out of an arbitrary URL.

There are many ways to solve this problem. I choose to use a tests and Ruby regular expressions. This may not be the best way to solve this problem, but it works for me. If you'd like to learn more about Ruby, I recommend reading The Ruby Programming Language: Everything You Need to Know.

Problem Solving Methodology

First, I went around Amazon clicking on as many links and products as I could to find as many different link formats as possible. I then used those links to build the test below. This solution is part of a Ruby on Rails application, so I show the file location of each code snippet as a comment at the top of the code.

# test/models/amazon_test.rb
require 'test_helper'

class AmazonTest < ActiveSupport::TestCase
  def test_get_asin_from_url
    url = "https://smile.amazon.com/Programmable-Touchscreen-Thermostat-Geofencing-HomeKit/dp/A01LTHM8LG/ref=s9u_simh_gw_i1?_encoding=UTF8&fpl=fresh&pd_rd_i=B01LTHM8LG&pd_rd_r=21CHQY4CPXGGXZ5G3Q71&pd_rd_w=imw1F&pd_rd_wg=CNLFs&pf_rd_m=ATVPDKIKX0DER&pf_rd_s=desktop-1&pf_rd_r=XDXMZR4E839F0JD45S0F&pf_rd_r=XDXMZR4E839F0JD45S0F&pf_rd_t=36701&pf_rd_p=781f4767-b4d4-466b-8c26-2639359664eb&pf_rd_p=781f4767-b4d4-466b-8c26-2639359664eb&pf_rd_i=desktop"
    asin = "A01LTHM8LG"
    calculated_asin = Amazon.get_asin_from_url(url)
    assert_equal asin, calculated_asin

    url = "http://www.amazon.com/Kindle-Wireless-Reading-Display-Generation/dp/B0015T963C"
    asin = "B0015T963C"
    calculated_asin = Amazon.get_asin_from_url(url)
    assert_equal asin, calculated_asin

    url = "http://www.amazon.com/dp/C0015T963C"
    asin = "C0015T963C"
    calculated_asin = Amazon.get_asin_from_url(url)
    assert_equal asin, calculated_asin

    url = "http://www.amazon.com/gp/product/D0015T963C"
    asin = "D0015T963C"
    calculated_asin = Amazon.get_asin_from_url(url)
    assert_equal asin, calculated_asin

    url = "http://www.amazon.com/gp/product/glance/E0015T963C"
    asin = "E0015T963C"
    calculated_asin = Amazon.get_asin_from_url(url)
    assert_equal asin, calculated_asin

    url = "https://smile.amazon.com/gp/offer-listing/F018Y23P7K/ref=dp_olp_all_mbc?ie=UTF8&condition=all"
    asin = "F018Y23P7K"
    calculated_asin = Amazon.get_asin_from_url(url)
    assert_equal asin, calculated_asin

    url = "https://smile.amazon.com/product-reviews/G018Y23P7K/ref=acr_offerlistingpage_text?ie=UTF8&reviewerType=avp_only_reviews&showViewpoints=1"
    asin = "G018Y23P7K"
    calculated_asin = Amazon.get_asin_from_url(url)
    assert_equal asin, calculated_asin

    url = "https://smile.amazon.com/forum/-/Tx3FTP6XCFXMJAO/ref=ask_dp_dpmw_al_hza?asin=H018Y23P7K"
    asin = "H018Y23P7K"
    calculated_asin = Amazon.get_asin_from_url(url)
    assert_equal asin, calculated_asin

    url = "https://smile.amazon.com/gp/customer-reviews/R1VKN59YMEK5PC/ref=cm_cr_arp_d_viewpnt?ie=UTF8&ASIN=I018Y23P7K#R1VKN59YMEK5PC"
    asin = "I018Y23P7K"
    calculated_asin = Amazon.get_asin_from_url(url)
    assert_equal asin, calculated_asin

    url = "https://smile.amazon.com/Korean-Made-Simple-beginners-learning/dp/1497445825/ref=sr_1_1?ie=UTF8&qid=1493580746&sr=8-1&keywords=korean+made+simple"
    asin = "1497445825"
    calculated_asin = Amazon.get_asin_from_url(url)
    assert_equal asin, calculated_asin

    url = "http://stackoverflow.com/questions/1764605/scrape-asin-from-amazon-url-using-javascript"
    asin = false
    calculated_asin = Amazon.get_asin_from_url(url)
    assert_equal asin, calculated_asin
  end
end

Notice that I also included a false test case at the end because I don't want to return an ASIN if the string being processed isn't an Amazon URL. Also, the ASINs in the test have been modified slightly (I changed the first letter of each one) to make it easy to figure out which of the test cases was failing.

Once I had my tests set up, it was pretty easy to just go to town on the regular expressions until I got something that both made sense and passed my tests:

# models/amazon.rb
class Amazon
    # Return the ASIN from a URL copied and pasted by the user
    # Return false if no ASIN is found
    def self.get_asin_from_url(amazon_url)
        if amazon_url.match(/\/dp\/(\w{10})(\/|\Z)/)
            # /dp/B0015T963C
            asin = $1
        elsif amazon_url.match(/\/gp\/\w*?\/(\w{10})(\/|\Z)/)
            # /gp/product/D0015T963C
            asin = $1
        elsif amazon_url.match(/\/gp\/\w*?\/\w*?\/(\w{10})(\/|\Z)/)
            # /gp/product/glance/E0015T963C
            asin = $1
        elsif amazon_url.match(/\/gp\/[\w-]*?\/(\w{10})(\/|\Z)/)
            # /gp/offer-listing/F018Y23P7K
            asin = $1
        elsif amazon_url.match(/\/product-reviews\/(\w{10})(\/|\Z)/)
            # /product-reviews/G018Y23P7K
            asin = $1
        elsif amazon_url.match(/[?&]asin=(\w{10})(&|#|\Z)/i)
            # ?asin=H018Y23P7K
            # &ASIN=H018Y23P7K
            asin = $1
        else
            asin = false
        end
    end

If this helped you, or you have any suggestions on how to improve it, let me know in the comments!

Photo by Robert Scoble