Django community: RSS
This page, updated regularly, aggregates Community blog posts from the Django community.
-
TCP 的那些事儿(上)
转自: http://coolshell.cn/articles/11564.html TCP是一个巨复杂的协议,因为他要解决很多问题,而这些问题又带出了很多子问题和阴暗面。所以学习TCP本身是个比较痛苦的过程,但对于学习的过程却能让人有很多收获。关于TCP这个协议的细节,我还是推荐你去看W.Richard Stevens的《TCP/IP 详解 卷1:协议》(当然,你也可以去读一下RFC793以及后面N多的RFC)。另外,本文我会使用英文术语,这样方便你通过这些英文关键词来查找相关的技术文档。 之所以想写这篇文章,目的有三个 一个是想锻炼一下自己是否可以用简单的篇幅把这么复杂的TCP协议描清楚的能力。 另一个是觉得现在的好多程序员基本上不会认认真真地读本书,喜欢快餐文化,所以,希望这篇快餐文章可以让你对TCP这个古典技术有所了解,并能体会到软件设计中的种种难处。并且你可以从中有一些软件设计上的收获。 最重要的希望这些基础知识可以让你搞清很多以前一些似是而非的东西,并且你能意识到基础的重要。 所以,本文不会面面俱到,只是对TCP协议、算法和原理的科普。 我本来只想写一个篇幅的文章的,但是TCP真TMD的复杂,比C++复杂多了,这30多年来,各种优化变种争论和修改。所以,写着写着就发现只有砍成两篇。 上篇中,主要向你介绍TCP协议的定义和丢包时的重传机制。 下篇中,重点介绍TCP的流迭、拥塞处理。 废话少说,首先,我们需要知道TCP在网络OSI的七层模型中的第四层——Transport层,IP在第三层——Network层,ARP在第二层——Data Link层,在第二层上的数据,我们叫Frame,在第三层上的数据叫Packet,第四层的数据叫Segment。 首先,我们需要知道,我们程序的数据首先会打到TCP的Segment中,然后TCP的Segment会打到IP的Packet中,然后再打到以太网Ethernet的Frame中,传到对端后,各个层解析自己的协议,然后把数据交给更高层的协议处理。 TCP头格式 接下来,我们来看一下TCP头的格式 你需要注意这么几点: TCP的包是没有IP地址的,那是IP层上的事。但是有源端口和目标端口。 一个TCP连接需要四个元组来表示是同一个连接(src_ip, src_port, dst_ip, dst_port)准确说是五元组,还有一个是协议。但因为这里只是说TCP协议,所以,这里我只说四元组。 注意上图中的四个非常重要的东西: Sequence Number是包的序号,用来解决网络包乱序(reordering)问题。 Acknowledgement Number就是ACK——用于确认收到,用来解决不丢包的问题。 Window又叫Advertised-Window,也就是著名的滑动窗口(Sliding Window),用于解决流控的。 TCP Flag ,也就是包的类型,主要是用于操控TCP的状态机的。 关于其它的东西,可以参看下面的图示 TCP的状态机 其实,网络上的传输是没有连接的,包括TCP也是一样的。而TCP所谓的“连接”,其实只不过是在通讯的双方维护一个“连接状态”,让它看上去好像有连接一样。所以,TCP的状态变换是非常重要的。 下面是:“TCP协议的状态机”(图片来源) 和 “TCP建链接”、“TCP断链接”、“传数据” 的对照图,我把两个图并排放在一起,这样方便在你对照着看。另外,下面这两个图非常非常的重要,你一定要记牢。(吐个槽:看到这样复杂的状态机,就知道这个协议有多复杂,复杂的东西总是有很多坑爹的事情,所以TCP协议其实也挺坑爹的) 很多人会问,为什么建链接要3次握手,断链接需要4次挥手? 对于建链接的3次握手,主要是要初始化Sequence Number 的初始值。通信的双方要互相通知对方自己的初始化的Sequence Number(缩写为ISN:Inital Sequence Number)——所以叫SYN,全称Synchronize Sequence Numbers。也就上图中的 x 和 y。这个号要作为以后的数据通信的序号,以保证应用层接收到的数据不会因为网络上的传输的问题而乱序(TCP会用这个序号来拼接数据)。 对于4次挥手,其实你仔细看是2次,因为TCP是全双工的,所以,发送方和接收方都需要Fin和Ack。只不过,有一方是被动的,所以看上去就成了所谓的4次挥手。如果两边同时断连接,那就会就进入到CLOSING状态,然后到达TIME_WAIT状态。下图是双方同时断连接的示意图(你同样可以对照着TCP状态机看): 另外,有几个事情需要注意一下: 关于建连接时SYN超时。试想一下,如果server端接到了clien发的SYN后回了SYN-ACK后client掉线了,server端没有收到client回来的ACK,那么,这个连接处于一个中间状态,即没成功,也没失败。于是,server端如果在一定时间内没有收到的TCP会重发SYN-ACK。在Linux下,默认重试次数为5次,重试的间隔时间从1s开始每次都翻售,5次的重试时间间隔为1s, 2s, 4s, 8s, 16s,总共31s,第5次发出后还要等32s都知道第5次也超时了,所以,总共需要 1s + 2s + 4s+ 8s+ 16s + 32s = 2^6 -1 = 63s,TCP才会把断开这个连接。 关于SYN Flood攻击。一些恶意的人就为此制造了SYN Flood攻击——给服务器发了一个SYN后,就下线了,于是服务器需要默认等63s才会断开连接,这样,攻击者就可以把服务器的syn连接的队列耗尽,让正常的连接请求不能处理。于是,Linux下给了一个叫tcp_syncookies的参数来应对这个事——当SYN队列满了后,TCP会通过源地址端口、目标地址端口和时间戳打造出一个特别的Sequence Number发回去(又叫cookie),如果是攻击者则不会有响应,如果是正常连接,则会把这个 SYN Cookie发回来,然后服务端可以通过cookie建连接(即使你不在SYN队列中)。请注意,请先千万别用tcp_syncookies来处理正常的大负载的连接的情况。因为,synccookies是妥协版的TCP协议,并不严谨。对于正常的请求,你应该调整三个TCP参数可供你选择,第一个是:tcp_synack_retries 可以用他来减少重试次数;第二个是:tcp_max_syn_backlog,可以增大SYN连接数;第三个是:tcp_abort_on_overflow 处理不过来干脆就直接拒绝连接了。 关于ISN的初始化。ISN是不能hard code的,不然会出问题的——比如:如果连接建好后始终用1来做ISN,如果client发了30个segment过去,但是网络断了,于是 client重连,又用了1做ISN,但是之前连接的那些包到了,于是就被当成了新连接的包,此时,client的Sequence Number 可能是3,而Server端认为client端的这个号是30了。全乱了。RFC793中说,ISN会和一个假的时钟绑在一起,这个时钟会在每4微秒对ISN做加一操作,直到超过2^32,又从0开始。这样,一个ISN的周期大约是4.55个小时。因为,我们假设我们的TCP Segment在网络上的存活时间不会超过Maximum Segment Lifetime(缩写为MSL - Wikipedia语条),所以,只要MSL的值小于4.55小时,那么,我们就不会重用到ISN。 关于 MSL 和 TIME_WAIT。通过上面的ISN的描述,相信你也知道MSL是怎么来的了。我们注意到,在TCP的状态图中,从TIME_WAIT状态到CLOSED状态,有一个超时设置,这个超时设置是 2*MSL(RFC793定义了MSL为2分钟,Linux设置成了30s)为什么要这有TIME_WAIT?为什么不直接给转成CLOSED状态呢?主要有两个原因:1)TIME_WAIT确保有足够的时间让对端收到了ACK,如果被动关闭的那方没有收到Ack,就会触发被动端重发Fin,一来一去正好2个MSL,2)有足够的时间让这个连接不会跟后面的连接混在一起(你要知道,有些自做主张的路由器会缓存IP数据包,如果连接被重用了,那么这些延迟收到的包就有可能会跟新连接混在一起)。你可以看看这篇文章《TIME_WAIT and its design implications for protocols and scalable client server systems》 关于TIME_WAIT数量太多。从上面的描述我们可以知道,TIME_WAIT是个很重要的状态,但是如果在大并发的短链接下,TIME_WAIT 就会太多,这也会消耗很多系统资源。只要搜一下,你就会发现,十有八九的处理方式都是教你设置两个参数,一个叫tcp_tw_reuse,另一个叫tcp_tw_recycle的参数,这两个参数默认值都是被关闭的,后者recyle比前者resue更为激进,resue要温柔一些。另外,如果使用tcp_tw_reuse,必需设置tcp_timestamps=1,否则无效。这里,你一定要注意,打开这两个参数会有比较大的坑——可能会让TCP连接出一些诡异的问题(因为如上述一样,如果不等待超时重用连接的话,新的连接可能会建不上。正如官方文档上说的一样“It should not be changed without advice/request of technical experts”)。 关于tcp_tw_reuse。官方文档上说tcp_tw_reuse 加上tcp_timestamps(又叫PAWS, for Protection Against Wrapped Sequence Numbers)可以保证协议的角度上的安全,但是你需要tcp_timestamps在两边都被打开(你可以读一下tcp_twsk_unique的源码 )。我个人估计还是有一些场景会有问题。 关于tcp_tw_recycle。如果是tcp_tw_recycle被打开了话,会假设对端开启了tcp_timestamps,然后会去比较时间戳,如果时间戳变大了,就可以重用。但是,如果对端是一个NAT网络的话(如:一个公司只用一个IP出公网)或是对端的IP被另一台重用了,这个事就复杂了。建链接的SYN可能就被直接丢掉了(你可能会看到connection time out的错误)(如果你想观摩一下Linux的内核代码,请参看源码 tcp_timewait_state_process)。 关于tcp_max_tw_buckets。这个是控制并发的TIME_WAIT的数量,默认值是180000,如果超限,那么,系统会把多的给destory掉,然后在日志里打一个警告(如:time wait bucket table overflow),官网文档说这个参数是用来对抗DDoS攻击的。也说的默认值180000并不小。这个还是需要根据实际情况考虑。 Again,使用tcp_tw_reuse和tcp_tw_recycle来解决TIME_WAIT的问题是非常非常危险的,因为这两个参数违反了TCP协议(RFC 1122) 其实,TIME_WAIT表示的是你主动断连接,所以,这就是所谓的“不作死不会死”。试想,如果让对端断连接,那么这个破问题就是对方的了,呵呵。另外,如果你的服务器是于HTTP服务器,那么设置一个HTTP的KeepAlive有多重要(浏览器会重用一个TCP连接来处理多个HTTP请求),然后让客户端去断链接(你要小心,浏览器可能会非常贪婪,他们不到万不得已不会主动断连接)。 数据传输中的Sequence Number 下图是我从Wireshark中截了个我在访问coolshell.cn时的有数据传输的图给你看一下,SeqNum是怎么变的。(使用Wireshark菜单中的Statistics ->Flow Graph… ) 你可以看到,SeqNum的增加是和传输的字节数相关的。上图中,三次握手后,来了两个Len:1440的包,而第二个包的SeqNum就成了1441。然后第一个ACK回的是1441,表示第一个1440收到了。 注意:如果你用Wireshark抓包程序看3次握手,你会发现SeqNum总是为0,不是这样的,Wireshark为了显示更友好,使用了Relative SeqNum——相对序号,你只要在右键菜单中的protocol preference 中取消掉就可以看到“Absolute SeqNum”了 TCP重传机制 TCP要保证所有的数据包都可以到达,所以,必需要有重传机制。 注意,接收端给发送端的Ack确认只会确认最后一个连续的包,比如,发送端发了1,2,3,4,5一共五份数据,接收端收到了1,2,于是回ack 3,然后收到了4(注意此时3没收到),此时的TCP会怎么办?我们要知道,因为正如前面所说的,SeqNum和Ack是以字节数为单位,所以ack的时候,不能跳着确认,只能确认最大的连续收到的包,不然,发送端就以为之前的都收到了。 超时重传机制 一种是不回ack,死等3,当发送方发现收不到3的ack超时后,会重传3。一旦接收方收到3后,会ack 回 4——意味着3和4都收到了。 但是,这种方式会有比较严重的问题,那就是因为要死等3,所以会导致4和5即便已经收到了,而发送方也完全不知道发生了什么事,因为没有收到Ack,所以,发送方可能会悲观地认为也丢了,所以有可能也会导致4和5的重传。 对此有两种选择: 一种是仅重传timeout的包。也就是第3份数据。 另一种是重传timeout后所有的数据,也就是第3,4,5这三份数据。 这两种方式有好也有不好。第一种会节省带宽,但是慢,第二种会快一点,但是会浪费带宽,也可能会有无用功。但总体来说都不好。因为都在等timeout,timeout可能会很长(在下篇会说TCP是怎么动态地计算出timeout的) 快速重传机制 于是,TCP引入了一种叫Fast Retransmit 的算法,不以时间驱动,而以数据驱动重传。也就是说,如果,包没有连续到达,就ack最后那个可能被丢了的包,如果发送方连续收到3次相同的ack,就重传。Fast Retransmit的好处是不用等timeout了再重传。 比如:如果发送方发出了1,2,3,4,5份数据,第一份先到送了,于是就ack回2,结果2因为某些原因没收到,3到达了,于是还是ack回2,后面的4和5都到了,但是还是ack回2,因为2还是没有收到,于是发送端收到了三个ack=2的确认,知道了2还没有到,于是就马上重转2。然后,接收端收到了2,此时因为3,4,5都收到了,于是ack回6。示意图如下: Fast Retransmit只解决了一个问题,就是timeout的问题,它依然面临一个艰难的选择,就是重转之前的一个还是重装所有的问题。对于上面的示例来说,是重传#2呢还是重传#2,#3,#4,#5呢?因为发送端并不清楚这连续的3个ack(2)是谁传回来的?也许发送端发了20份数据,是#6,#10,#20传来的呢。这样,发送端很有可能要重传从2到20的这堆数据(这就是某些TCP的实际的实现)。可见,这是一把双刃剑。 SACK 方法 另外一种更好的方式叫:Selective Acknowledgment (SACK)(参看RFC 2018),这种方式需要在TCP头里加一个SACK的东西,ACK还是Fast Retransmit的ACK,SACK则是汇报收到的数据碎版。参看下图: 这样,在发送端就可以根据回传的SACK来知道哪些数据到了,哪些没有到。于是就优化了Fast Retransmit的算法。当然,这个协议需要两边都支持。在 Linux下,可以通过tcp_sack参数打开这个功能(Linux 2.4后默认打开)。 这里还需要注意一个问题——接收方Reneging,所谓Reneging的意思就是接收方有权把已经报给发送端SACK里的数据给丢了。这样干是不被鼓励的,因为这个事会把问题复杂化了,但是,接收方这么做可能会有些极端情况,比如要把内存给别的更重要的东西。所以,发送方也不能完全依赖SACK,还是要依赖ACK,并维护Time-Out,如果后续的ACK没有增长,那么还是要把SACK的东西重传,另外,接收端这边永远不能把SACK的包标记为Ack。 注意:SACK会消费发送方的资源,试想,如果一个攻击者给数据发送方发一堆SACK的选项,这会导致发送方开始要重传甚至遍历已经发出的数据,这会消耗很多发送端的资源。详细的东西请参看《TCP SACK的性能权衡》 Duplicate SACK – 重复收到数据的问题 Duplicate SACK又称D-SACK,其主要使用了SACK来告诉发送方有哪些数据被重复接收了。RFC-2833 里有详细描述和示例。下面举几个例子(来源于RFC-2833) D-SACK使用了SACK的第一个段来做标志, 如果SACK的第一个段的范围被ACK所覆盖,那么就是D-SACK 如果SACK的第一个段的范围被SACK的第二个段覆盖,那么就是D-SACK 示例一:ACK丢包 下面的示例中,丢了两个ACK,所以,发送端重传了第一个数据包(3000-3499),于是接收端发现重复收到,于是回了一个SACK=3000-3500,因为ACK都到了4000意味着收到了4000之前的所有数据,所以这个SACK就是D-SACK——旨在告诉发送端我收到了重复的数据,而且我们的发送端还知道,数据包没有丢,丢的是ACK包。 … -
E-Commerce Platform Options
The big name in open source e-commerce these days is Magento. In my previous job it was just too early and Magento was pretty buggy but it now seems to be the number one of choice of people I talk to in the industry. The main reason I'm not keen on it is that it is quite a big code base to learn and secondly it is in PHP. My attitude to PHP is similar to most French people's attitude to English: I can speak the language but I find it very inelegant and I'm really not keen on using it day-to-day. Over in Python-land I have a few options. Django Shop seems the best bet as a framework for building from but it seems pretty early days and most of my needs are different so I would end up with the vast bulk of the code being custom. Incidentally I picked up Beginning Django E-Commerce which for me was probably a bit basic but I would thoroughly recommend it for anyone new to e-commerce or Django. -
Python Web Frameworks
I thought long and hard about which framework to choose for this project. My first exposure to Python came from Zope. I really don't like being negative about projects and technologies and I met some very nice people in the Zope community. However I think Zope was a major factor in dooming the project I was working on to failure. The problem we had was the our development was very slow and we had big scaling problems. Zope had a very steep learning curve and while it is theoretically possible to scale it makes life very difficult. I got the chance to see Zope deployed in a variety of larger settings and every single one struggled on scaling. The other issue for me (which is very subjective) is that I simply didn't enjoy developing with Zope, I felt the framework kept pushing me in the wrong direction. I looked at Twisted and I know some very major websites using it and the performance is unbelievable, in fact I would go as far as to say that is the best performing of any Python framework. However I simply couldn't understand it! I'm sure if I'd persevered I would have got it … -
Launch!
It happened! Our first sale today. Somebody actually came to our website and bought from us! The whole system is working pretty much as planned. I find that I make fewer mistakes when I use Django and everything is going so much faster than I'm used to. I suppose there are a few other factors at play: The team is tiny so there is never much discussion.The codebase is also tiny. It is less than 1% the size of the last e-commerce codebase I worked with so there simply are fewer things to go wrong and everything is very easy to understand.I'm working Python rather than PHP so I have a language pushing me to do the right thing. Onwards and upwards. -
Supplier Extranet
One of the big challenges in E-Commerce is managing stock, if you run out it is a disaster but if you order too much that's also a disaster! Working with the manufacturers is always because as soon as you get to any sort of scale you can't simply buy from them, you need to give them forecasts and help them prepare their supply chains too. I'm always keen to automate away tasks so I've created an extranet to allow my manufacturers to get real time information on rate of sale (of their products) and to view rate of sale (ROS) per SKU. The slightly tricky thing here is security. The last time I did this I was working with Zen Cart which doesn't have a concept of permissions: you are either an administrator or not. Fortunately Django's authentication framework is great, very easy to use and allows finer grained access rights. I was thinking about generating weekly emails but I've decided not to bother. The web is just a much system for this type of problem. -
Mea Culpa
Today we had a pretty bad software problem which cost us nearly half the day's sales. We then had another issue due to a failed fix (although this didn't cost any sales). At launch although we had many difficulties the software platform wasn't one of them. It worked very well. The reason was that after it was finished it was tested extensively by Clare before there was a live deployment. Her testing caught many, many errors. Post launch there has been a very substantial redevelopment of the code. In particular, introducing PayPal and introducing i18n (which is still underway) have resulted in very major changes. I have however been testing the code myself rather than putting it out to someone else. This has meant that some of the bugs have got through to customers. From now on, I will get Clare to test all major changes. The other issue has been that my deployment to the live server is not automatic. It was always the plan to be automated via git but despite spending half a day on it I couldn't get it to work. So it was left as a manual and complex process which needless to say eventually … -
Currencies
I think that in e-commerce it is vital to bill customers in their own currency. Certainly for us, with the UK accounting for such a tiny proportion of the Nespresso capsule market then export will be very important for us. I’ve finally round to deploying the completed infrastructure for euro billing. All it takes is a simple change in the config file and the site will start using euros. As soon as Streamline get themselves in gear we will be able to launch an Irish website in euros. It will also be much easier to add other currencies in time. I do really wish that Django had support for currencies rather than leaving it up to the developer and a quick Google shows that I am not alone in wanting it. There is at least a decimal type but that's not quite the same. The big advantage of currency being built in is that it would make it easier for the different e-commerce projects using Django to share code. -
Test Suite
I've spent the past few days putting together a proper test suite. It's been a lot of work, 813 lines of code and 159 specific tests. It covers the bug that caused the site to fail I blogged about and from now on whenever there is a bug I will add a test for it to the suite to ensure it never happens again. I'm also using coverage.py which is a great way of spotting what's still to be done. The tests are a mix of unit tests on very specific parts of the codebase and also functional tests that go completely through the process of signing up for the site and placing an order and also an existing customer placing an order. These test go right through to charging a card on the SagePay test servers. In a commercial environment it's very hard to get time to spend on quality. There's always a big to-do list and to say "Let's just pause for a while to improve quality" it a hard thing to say. The benefits are that I can now go a bit faster with development and make changes with more confidence as I can be more sure … -
First Euro Transaction and a Bug
Today was a bit of a milestone as we launched our Irish Nespresso capsule site using the sites framework and processed our first euro transaction. Everything seemed fine. Unfortunately things then went wrong and a lot of our UK customers ended up presented with euro pricing! Up to this point I'd only had one development server but I quickly setup two different dev servers, one for each country. After a while I was able to reproduce the error on the dev servers with the currencies appearing wrongly on the different sites. After a bit of Googling I came across this post but the wonderful Graham Dumpleton (is it wrong to love another man?). It turns out I actually stumbled across my first bug in Django but fortunately there is a simple fix. -
Porting to Zope
After much thought I've decided to port my application away from Django to Zope. I've just been finding things too straightforward with Django and I miss the days of struggling with problems only to discover they were bugs in the framework or the challenge of getting the ZODB to scale. I'm also really missing XML. The only thing that makes me happier than coding in Python is working in XML. Update: This was of course a very poor April Fools joke! -
Heroku
I hurt my back yesterday and as a result I'm stuck in bed. So I thought I would make use of the time by evaluating Heroku. I am overall very impressed and next time I do a project I will deploy it on Heroku from the start. However I've come across a few problems. The first one is that Heroku requires the use of a CNAME. Unfortunately CNAME's can only be used if the domain is in the form http://www.finecoffeeclub.co.uk/ rather than http://finecoffeeclub.co.uk/ Changing the address is out as I would need to re-do several extended validation SSL certificates. Certain DNS providers have their own method of doing a CNAME at the root but unfortunately Freeparking doesn't. I would also have to do some work around getting a fixed IP endpoint for Sagepay, use of the temporary file system and setting up storage for the static assets. On the other hand: I have no scaling issues with the present infrastructure, indeed I forecast a 50 to 100 times more traffic on the present single server and to date there hasn't been a problem with the infrastructure. I do though have quite a lot of other things to do overall migrating … -
MailChimp
One way or another e-commerce tends to involve quite a bit of email! I had thought about doing it myself and I did run a site a long time ago which fired out a lot of email and you can get it to work but it really is jolly hard work. You need to implement SPF and DomainKeys and also get in contact with Microsoft and Yahoo! if you want your mail delivered (Google, as usually, just works perfectly without any hassle at all). Anyway, I've got too many other important tasks so I am using MailChimp. They have a brilliant API which allows really good reporting and very detailed (and easy!) segmentation. It is so nice for non-technical people to be able to mail a segment without requiring custom SQL. I've written a cronjob which calls a Django management command every day to upload my new customers and checkout abandonments. I'm also use MailChimp web hooks to call back to Django when customers unsubscribe. I've added MailChimp's e-com 360 tracking to Django so we've got reports of revenue from each campaign. I've used some incredibly expensive enterprise mail packages before but MailChimp is quite simply streets ahead and tens … -
The American Dream
Americans are beginning to get the Nespresso bug so I think the time has come for us to launch a USA Nespresso capsule site. The good news is that the currency work is already done so all we need to do is fix the English. The Americans really are much more sensible than us when it comes to writing English, they do cut out a whole lot of unnecessary vowels and generally spell words closer to their pronunciation. Django has great i18n and I'm translating from en-gb to en-us. The only issue is that out of the box the LocaleMiddleware sets the language based on the web browser. However I want the language set based on which URL the user is on. Fortunately it's a trivial thing to fix. All I've done is put this into my custom middleware.py: request.LANGUAGE_CODE = settings.LANGUAGE_CODE And then I can set LANGUAGE_CODE in the relevant settings.py One thing that slightly got me is that locale codes are different to language codes. A locale code looks like en_US while a language code is en-us. It does matter which you use. -
Encoding Problem
It was pointed out today that the euro sign was not displaying properly on the news page. I had a look and I saw that the problem is that the database encoding is set to Latin-1 instead of UTF-8. It turns out that was the default on the version Debian I started with and as I've dumped and restored the database the encoding has remained the same. The solution is relatively simple which is to dump the database and reload with the correct encoding. This could be done during a short maintenance window in the middle of the night. But no! As the sun no longer sets on the Fine Coffee Club empire there is no particularly good time when we can shut down for maintenance. Update: It turned out to only require a couple of minutes of outage. However the point remains. -
Currency Refactor
This is technical notes just for me on the currency refactor. The background is that currency is presently on the presentation layer in the templates. However as I'm expanding currencies this is getting more complex and there is also a problem that invoices are generated as a PDF so I have to do the currency code twice: once for the templates and once for the PDF which break the DRY (Do not Repeat Yourself) rule. The code has to deal with the following: Changing currency symbol, e.g. £1.00 and $1.00.Changing decimal separator, e.g. €1.00 and €1,00Changing currency symbol position, e.g. €1.00 and 1,00€Different currencies with the same symbol e.g. Canadian dollars (CAD) and US dollars (USD). An additional requirement is that I want to be able to add currencies relatively frequently without changing too much code. Pushing this down into the model layer should make things a bit easier but there is quite a lot of complexity that my design has to cope with. Products and Shipment Methods have multiple prices, one for each currency, chosen by settings.pyBaskets have one currency for multiple totals depending on settings.pyShipment and Order have the currency in the model and multiple totalsAdmin interface reports … -
Testing Update
I'm off on holiday on Friday for a week without much in the way of internet access so I'm going to have a code freeze as of now. The final code I committed was i18n for the French launch. Unfortunately there was a bug in one of the live site settings files which caused some problems on the Irish site. I fixed it and wrote a test which checks all the major settings across all the countries. I also ran coverage.py on the codebase which shows the percent of the code covered by tests and can generate a nice HTML report. At the moment it is 61%. During the code freeze I am going to focus on adding unit tests to try get the coverage over 70% this week. In other news, outsourcing of fulfilment is going well with Seko chosen who have warehousing in the USA, UK, Europe, Australia and other locations giving us a complete global presence. They are really a bit big for us but that means we can stick with them long term. There's going to be quite a lot of work in integrating into their API but it is very well documented. It also means … -
Django / ZenDesk API
Anna was off and I was left doing all of the email support which we do through the excellent Zendesk. After about 10 minutes I has deeply frustrated with the process of finding customer orders on our system. So I've written a simple application which links straight from a ticket to the relevant orders with just a couple of clicks. It has saved an enormous amount of time. I'm going to generalise this slightly by getting it to bring up the Django user and then I'll publish it as the first bit of code to be released. -
Django Errors Going to Spam
Like many sites I've setup Django to email me when there are errors. Unfortunately the errors were all ending up in my spam. I had a look at the mail system and there were no problems there. The first issue was very simple. It'd not set SERVER_EMAIL in settings.py which meant that the emails were being sent from 'root@localhost' which spam filters are not going to like. The second problem was the very large number of emails which was also a problem for me in that I stopped taking the emails seriously. The Django ALLOWED_HOSTS setting improves security but generates a lot of errors (this has been fixed in the development version of Django but for now it fills up the log). I found an excellent article Prevent email notification on SuspiciousOperation with detailed code which I've implemented and it's made a huge difference. -
Blog Move
I've decided to move my blog from WordPress.com onto Django on the Fine Coffee Club site. I put together a simple blogging app in about 100 lines of code. I'm struggling to write a proper justification for this, WordPress worked perfectly well and the last thing the world needed was another blogging product. The best I can come up with is that I wanted to play with the syndication framework and it gave me the chance to play around with some other odds and ends in the framework that I don't need in my day to day e-commerce. It is also easier to customise when it is on your own server and I would rather not install WordPress as that would also required PHP and MySQL to be installed which are a little heavy weight to be on my main server. It's a very weak justification I know! R6J9WMEMPRC4 -
Reverse Is A Good Idea
Up to know the Fine Coffee Club websites have each had their domain like this: finecoffeeclub.comfinecoffeeclub.co.ukfinecoffeeclub.iefinecoffeeclub.ft However for Australia and Canada we are not going to give them their own domain names for various good reasons, so these addresses will be: finecoffeeclub.com/aufinecoffeeclub.com/ca I thought that would be an easy change to make the change but it turned out to be quite tricky as all of the links in the templates and more than 100 redirects in the Python code will all have to change. It would also be a bit of a pain to work out exactly what the URL's as some of them will be prefixed with the country code and some of them not. I am now porting everything over to using reverse resolution and I wish I had done this from the start. Up till now I hadn't really appreciated how useful an idea it is. With the url tag I can now leave the whole thing up to Django. -
Converting our multi-page Django app to use AMD
Introduction The application which powers 2degrees has, to date, mainly been driven by a Django-powered back-end. Django is agnostic about any front-end stack and this has certainly contributed to a fairly ad-hoc approach to javascript. With an increasing demand for a more interactive experience, we have been adding more and more javascript, experimenting with AngularJS to power our pinboard, and looking at ways to use/build frameworks to play nicely with the back-end. As a result of this work, we have found issues of manageability of the javascript code. Manageability issues The issues we had fell into the following categories: Risk of missing dependencies when inheriting complex pages with lots of javascript. Dependencies loading out-of-order in some cases. The code becoming increasingly complex to manage (poor separation of concerns in the code, etc.) AMD to the rescue? We hoped that a modular approach to javascript would address all of these issues and more. After some research about various module patterns and loaders, we decided to use RequireJS. This decision was based mainly on API features and how well-maintained and well-documented the library is. Porting the codebase to using RequireJS After performing a spike to asses the complexity of the task, we … -
使用 uncss 找出未使用的 CSS 样式
我们知道, 保持代码的经凑干净对于代码的可阅读性至关重要, 之前的博文中也讲过这点. 但在修改CSS时, 我们经常会因为大量的CSS代码和CSS自身的可覆盖性而只增加不删减. 本篇中我们就介绍一下uncss, 一个可以用于找出未使用的CSS代码的工具, 方便我们精简代码. uncss最基本的用法是命令行: uncss http://example.com > styles.css 输出的结果就是在使用的CSS代码, 而未使用的CSS代码已经被自动删除掉. uncss是如何工作的呢? HTML文件通过PhantomJS载入, 相关JavaScript代码被运行 在使用的CSS代码从HTML文件中抽离 CSS代码通过css-parse重新组合起来 document.querySelector将HTML中未找到的selector分离出去 剩余的CSS代码重新加入组合 就像每个NodeJS工具一样, 我们能使用其JavaScript API: var uncss = require('uncss'); var files = ['my', 'array', 'of', 'HTML', 'files'], options = { ignore : ['#added_at_runtime', /test\-[0-9]+/], media : ['(min-width: 700px) handheld and (orientation: landscape)'], csspath : '../public/css/', raw : 'h1 { color: green }', stylesheets : ['lib/bootstrap/dist/css/bootstrap.css', 'src/public/css/main.css'], ignoreSheets : [/fonts.googleapis/], urls : ['http://localhost:3000/mypage', '...'], // Deprecated timeout : 1000, htmlroot : 'public' }; uncss(files, options, function (error, output) { console.log(output); }); /* Look Ma, no options! */ uncss(files, function (error, output) { console.log(output); }); /* Specifying raw HTML */ var raw_html = '...'; uncss(raw_html, options, function (error, output) { console.log(output); }); -
Extending Django's QuerySet to return approximate COUNTs
UPDATE: I've re-written and open-sourced a better way of doing the below as part of my library django-mysql. The docs there on approximate counting are just as good a read as the below, and you can pip install the solution. I was looking through the MySQL slow_log for YPlan and discovered that there were a lot of SELECT COUNT(*) queries going on, which take a long time because they require a full table scan. These were coming from the Django admin, which displays the total count on every page. "Why is SELECT COUNT(*) such a slow query?" you might think, "surely MySQL could just keep a number in the table metadata and update it on INSERT/DELETE." Aha! You are totally right - for the MyISAM storage engine. But Innodb, which you shoudl be using, provides transacational support and other niceties, at the cost of making such a metadata count impossible. Each transaction must be isolated from the others until it commits or rolls back, so a single 'accurate' COUNT(*) value per table is impossible. It would also be a point of contention from locking, which MyISAM doesn't care about anyway because it locks the whole table for any write. Hence, … -
Extending Django's QuerySet to return approximate COUNTs
UPDATE: I've re-written and open-sourced a better way of doing the below as part of my library django-mysql. The docs there on approximate counting are just as good a read as the below, and you can pip install the solution. I was looking through the MySQL slow_log for YPlan and discovered that there were a lot of SELECT COUNT(*) queries going on, which take a long time because they require a full table scan. These were coming from the Django admin, which displays the total count on every page. "Why is SELECT COUNT(*) such a slow query?" you might think, "surely MySQL could just keep a number in the table metadata and update it on INSERT/DELETE." Aha! You are totally right - for the MyISAM storage engine. But Innodb, which you shoudl be using, provides transacational support and other niceties, at the cost of making such a metadata count impossible. Each transaction must be isolated from the others until it commits or rolls back, so a single 'accurate' COUNT(*) value per table is impossible. It would also be a point of contention from locking, which MyISAM doesn't care about anyway because it locks the whole table for any write. Hence, … -
Extending Django's QuerySet to return approximate COUNTs
UPDATE: I’ve re-written and open-sourced a better way of doing the below as part of my library django-mysql. The docs there on approximate counting are just as good a read as the below, and you can pip install the solution. I was looking through the MySQL slow_log for YPlan and discovered that there were a lot of SELECT COUNT(*) queries going on, which take a long time because they require a full table scan. These were coming from the Django admin, which displays the total count on every page. “Why is SELECT COUNT(*) such a slow query?” you might think, “surely MySQL could just keep a number in the table metadata and update it on INSERT/DELETE.” Aha! You are totally right - for the MyISAM storage engine. But Innodb, which you shoudl be using, provides transacational support and other niceties, at the cost of making such a metadata count impossible. Each transaction must be isolated from the others until it commits or rolls back, so a single ‘accurate’ COUNT(*) value per table is impossible. It would also be a point of contention from locking, which MyISAM doesn’t care about anyway because it locks the whole table for any write. Hence, …