django缓存模式修改,url参数变动不重复生成缓存的方法

标题可能看的让你有点蒙圈,那么具体的需求我说一下你就明白了。
需求:
django项目里所有的url都是静态模式:
诸如:
http://wjx.bugscaner.com/category/运动健康
http://wjx.bugscaner.com/article/451da7723a018c68
http://wjx.bugscaner.com/account/阿里云优惠券
http://wjx.bugscaner.com/

django开启缓存 大概的代码如下:
#! /usr/bin/env python
#coding=utf-8
from django.views.decorators.cache import cache_page


#缓存24小时
@cache_page(60*60*24)
def index(request):
              .........


   

经过测试,发现一个对我来说不太实用的弊端,拿首页来说,访问主页http://wjx.bugscaner.com/,会生成一份缓存 到本地服务器,下次再请求首页,就不用渲染模板了,可以直接从缓存读取!
访问首页,生存缓存如下:


#######注意   问题出现了!!!!!!
如果访问 http://wjx.bugscaner.com/?id=11111  又生成了缓存,其实跟刚才的生成的缓存是一模一样的



如果是单纯的普通用户,偶尔输错一个参数也没什么,要是有恶意用户 用工具跑注入 url路由都是正则出来的, 注入成功的几率微乎其微,,如果用sqlmap和其他注入工具盲目的跑 ,那么就完蛋啦,我缓存设置了最大数为1万,频繁的io读写,会对服务器性能造成影响,从而会连累服务器其他网站!
那么我实际操作一下,用sqlmap跑一下主页测试一下,看看会有多少缓存生成!



数量惊人啊,短短的10几秒,生成了663个缓存文件, 而且都是首页的缓存,都是同样的内容,蛋疼否?

好了,进入整体, 那么就修改一下django系统默认的规则,不考虑动态参数对缓存的影响吧!
从from django.views.decorators.cache import cache_page   这可以看出, 需要跟踪的修饰函数在 django.views.decorators.cache 这个路径里
从第十行代码开始,可以看到cache_page   这个函数
def cache_page(*args, **kwargs):
    """
    Decorator for views that tries getting the page from the cache and
    populates the cache if the page isn't in the cache yet.

    The cache is keyed by the URL and some data from the headers.
    Additionally there is the key prefix that is used to distinguish different
    cache areas in a multi-site setup. You could use the
    get_current_site().domain, for example, as that is unique across a Django
    project.

    Additionally, all headers from the response's Vary header will be taken
    into account on caching -- just like the middleware does.
    """
    # We also add some asserts to give better error messages in case people are
    # using other ways to call cache_page that no longer work.
    if len(args) != 1 or callable(args[0]):
        raise TypeError("cache_page has a single mandatory positional argument: timeout")
    cache_timeout = args[0]
    cache_alias = kwargs.pop('cache', None)
    key_prefix = kwargs.pop('key_prefix', None)
    if kwargs:
        raise TypeError("cache_page has two optional keyword arguments: cache and key_prefix")

    return decorator_from_middleware_with_args(CacheMiddleware)(
        cache_timeout=cache_timeout, cache_alias=cache_alias, key_prefix=key_prefix
    )

继续跟踪...................省略一万字找到了关键函数 在路径
G:\virtualenv\wjx\env\Lib\site-packages\django\utils\cache.py
def _generate_cache_key(request, method, headerlist, key_prefix):
    """Returns a cache key from the headers given in the header list."""
    ctx = hashlib.md5()
    for header in headerlist:
        value = request.META.get(header)
        if value is not None:
            ctx.update(force_bytes(value))
    url = hashlib.md5(force_bytes(iri_to_uri(request.build_absolute_uri())))
    cache_key = 'views.decorators.cache.cache_page.%s.%s.%s.%s' % (
        key_prefix, method, url.hexdigest(), ctx.hexdigest())
    return _i18n_cache_key_suffix(request, cache_key)


def _generate_cache_header_key(key_prefix, request):
    """Returns a cache key for the header cache."""
    url = hashlib.md5(force_bytes(iri_to_uri(request.build_absolute_uri())))
    cache_key = 'views.decorators.cache.cache_header.%s.%s' % (
        key_prefix, url.hexdigest())
    return _i18n_cache_key_suffix(request, cache_key)

其中关键的参数就是  request.build_absolute_uri()   这个函数的值 继续跟踪代码
G:\virtualenv\wjx\env\Lib\site-packages\django\http\request.py
找到了类HttpRequest的定义,里面包括了一些初始化数据 等等.
来看一下build_absolute_uri() 是怎么写的
    def build_absolute_uri(self, location=None):
        """
        Builds an absolute URI from the location and the variables available in
        this request. If no ``location`` is specified, the absolute URI is
        built on ``request.get_full_path()``. Anyway, if the location is
        absolute, it is simply converted to an RFC 3987 compliant URI and
        returned and if location is relative or is scheme-relative (i.e.,
        ``//example.com/``), it is urljoined to a base URL constructed from the
        request variables.
        """
        if location is None:
            # Make it an absolute url (but schemeless and domainless) for the
            # edge case that the path starts with '//'.
            location = '//%s' % self.get_full_path()
        bits = urlsplit(location)
        if not (bits.scheme and bits.netloc):
            current_uri = '{scheme}://{host}{path}'.format(scheme=self.scheme,
                                                           host=self.get_host(),
                                                           path=self.path)
            # Join the constructed URL with the provided location, which will
            # allow the provided ``location`` to apply query strings to the
            # base path as well as override the host, if it begins with //
            location = urljoin(current_uri, location)
        return iri_to_uri(location)

发现跟函数self.get_full_path()  有关,那么继续找到函数get_full_path  看一下怎么写的
    def get_full_path(self, force_append_slash=False):
        # RFC 3986 requires query string arguments to be in the ASCII range.
        # Rather than crash if this doesn't happen, we encode defensively.
        return '%s%s%s' % (
            escape_uri_path(self.path),
            '/' if force_append_slash and not self.path.endswith('/') else '',
            ('?' + iri_to_uri(self.META.get('QUERY_STRING', ''))) if self.META.get('QUERY_STRING', '') else ''
        )

发现跟 self.META 的键值有关,那么一切都清晰了,目前的需求就是更改QUERY_STRING 的值!
怎么该呢?
django有个middleware 中间件 , 这次可以排上用场了,
打开ulipad 新建一个文件  名字叫 filterargs.py  函数名为CacheArgsFilter  写入以下代码:
#! /usr/bin/env python
#coding=utf-8
#过滤路径中出现的?以后的参数,这样是为了缓存不必随着参数而改变
'''
比如主页开启了缓存模式,那么index?id=1  index?id=2 都能新增缓存
如果是这样的话,如果有人get sql注入 那就不妙了,可以一下子要生成好多缓存
'''
from django.http.request import QueryDict

class CacheArgsFilter(object):
    def process_request(self,request):
        #清空arg参数
        #清空post数据
        #因为这些东西对目前的网站一点用没有,而且还会给缓存增加负担  全部干掉
        request.META["QUERY_STRING"] = ''

然后在settings.py文件里 MIDDLEWARE_CLASSES 字典首行加入 'middleware.filterargs.CacheArgsFilter',
完整的代码大概是这样:
MIDDLEWARE_CLASSES = (
    'middleware.filterargs.CacheArgsFilter',
    'django_hosts.middleware.HostsRequestMiddleware',
    #'django.contrib.sessions.middleware.SessionMiddleware',
    #'django.middleware.common.CommonMiddleware',
    #'django.middleware.csrf.CsrfViewMiddleware',
    #'django.contrib.auth.middleware.AuthenticationMiddleware',
    #'django.contrib.auth.middleware.SessionAuthenticationMiddleware',
    #'django.contrib.messages.middleware.MessageMiddleware',
    #'django.middleware.clickjacking.XFrameOptionsMiddleware',
    #'django.middleware.security.SecurityMiddleware',
    'django_hosts.middleware.HostsResponseMiddleware',
)

没错 ,没错,你没看错,说了这么多,有用的代码 仅仅只有不到100个字,好嗨哦,感觉人生达到了高潮! 以后看文章,直接看最后好了!

ok了,大功告成, 收工!

您可能还会对下面的文章感兴趣: